Friday, April 14, 2017

Apple II - Double Hi-Res From The Ground Up
Part 6: General Purpose Plot Routine 3


The story so far....

In the previous post we developed a routine which could write a green pixel anywhere across the top line of the DHR screen.  This was achieved by SIX table lookups.  Two to find the MAIN and AUX bytes used by this pixel, two more to MASK out the appropriate bits using the AND operation and finally another two to OR the bit pattern of a green pixel into memory.

Colour me impressed...


Adding the ability to select our pixels colour may seem like something of a challenge since the pixel colour comes from the MAINGR and AUXGR tables.  Obviously we can create more tables.  For example here are MAINMB and AUXMB tables for drawing a medium blue pixel.  (From now on I'll refer to these as "colour tables".)
MAINMB
 LUP 20
 DB %00000000,%00000000,%00000110,%01100000
 DB %00000000,%00000001,%00011000
 --^
AUXMB
 LUP 20
 DB %00000011,%00110000,%00000000,%00000000
 DB %00001100,%01000000,%00000000
 --^
But the question remains: How do we select these tables instead of MAINGR and AUXGR?  Well if you remember there are only two places where the colour table is referenced in our code and they are

ORA MAINGR,X

and

ORA AUXGR,X

Wouldn't it be great if we could just change these instructions to point to different locations when we feel like it?  Well brace yourself because that's exactly what we are going to do!

Self-modifying code


As a machine language program is just a stream of bytes and since machine language programs are great at manipulating bytes.  It should come as no surprise that we can write programs that modify themselves. This technique is known as: self-modifying code.

The ORA instruction is the one we are interested in.  It is represented by the byte 1D which is then followed by two bytes forming an address.  The 6502 stores this address in what is known as little-endian format.  Meaning that we store the second byte in the address first.  So for example if AUXGR was located at $6E12 then the CPU would expect to read the 12 before the 6E. So when the assembler sees ORA AUXGR,X it turns it into: 1D 12 6E.

From here, it should be easy to see that we could write a routine to change the address that both our ORA instructions are pointing at. To do so we're going to have to track exactly where these instructions are in memory.  This can be done simply by giving the assembler a label for each instruction like so...
DPLOT LDY MBOFFSET,X
 BMI AUX
 STA PAGE1
 LDA $2000,Y
 AND MAINAND,X
ORMAIN ORA MAINGR,X
 STA $2000,Y
AUX LDY ABOFFSET,X
 BMI END
 STA PAGE2
 LDA $2000,Y
 AND AUXAND,X
ORAUX ORA AUXGR,X
 STA $2000,Y
END RTS
Now we have the labels ORMAIN and ORAUX pointing to where those instructions are!

Next, let's assume we have colour tables for each of the 16 DHR colours with the following names

ColourTable name(s)
BlackMAINBL/AUXBL
MagentaMAINMG/AUXMG
BrownMAINBR/AUXBR
OrangeMAINOR/AUXOR
Dark GreenMAINDG/AUXDG
Grey 1MAING1/AUXG1
GreenMAINGR/AUXGR
YellowMAINYE/AUXYE
Dark BlueMAINDB/AUXDB
VioletMAINVI/AUXVI
Grey 2MAING2/AUXG2
PinkMAINPI/AUXPI
Medium BlueMAINMB/AUXMB
Light BlueMAINLB/AUXLB
AquaMAINAQ/AUXAQ
WhiteMAINWH/AUXWHH

We will now create four tables containing the following information:

Table nameDescription
CLOMLow byte of the address of all MAIN memory colour tables
CHIMHight byte of the address of all MAIN memory colour tables
CLOALow byte of the address of all AUX memory colour tables
CHIAHight byte of the address of all AUX memory colour tables

Most assemblers have a way of accessing the high byte and low byte of any label defined in your code.  In the case of Merlin Pro the operators are < and >.  So <MAINGR and >MAINGR refer to the low and high bytes of the MAINGR table respectively.  So writing the following:
CLOM DB <MAINBL,<MAINMG,<MAINBR,<MAINOR,<MAINDG
 DB <MAING1,<MAINGR,<MAINYE,<MAINDB,<MAINVI
 DB <MAING2,<MAINPI,<MAINMB,<MAINLB,<MAINAQ,<MAINWI
CHIM DB >MAINBL,>MAINMG,>MAINBR,>MAINOR,>MAINDG,>MAING1
 DB >MAINGR,>MAINYE,>MAINDB,>MAINVI,>MAING2,>MAINPI
 DB >MAINMB,>MAINLB,>MAINAQ,>MAINWI
CLOA DB <AUXBL,<AUXMG,<AUXBR,<AUXOR,<AUXDG
 DB <AUXG1,<AUXGR,<AUXYE,<AUXDB,<AUXVI
 DB <AUXG2,<AUXPI,<AUXMB,<AUXLB,<AUXAQ,<AUXWI
CHIA DB >AUXBL,>AUXMG,>AUXBR,>AUXOR,>AUXDG,>AUXG1
 DB >AUXGR,>AUXYE,>AUXDB,>AUXVI,>AUXG2,>AUXPI
 DB >AUXMB,>AUXLB,>AUXAQ,>AUXWI
Gives us a table with all the high and low byte addresses of our colour tables.  Now all we need is a routine to set the colour table location:  Let's call this program SETDCOLOR and expect the programmer to choose the colour by passing it in the accumulator:
SETDCOLOR TAY
 LDA CLOM,Y
 STA ORMAIN+1
 LDA CHIM,Y
 STA ORMAIN+2
 LDA CLOA,Y
 STA ORAUX+1
 LDA CHIA,Y
 STA ORAUX+2
 RTS
Done.  Now each time we call SETDCOLOR it updates our DPLOT routine to point to the appropriate table.

Vertical take off:


So what's left?  Oh right! Our routine is still "imprisoned" on the first line of the hi-res screen $2000.  So how can we change this?  Well...can't we use self-modifying code like we just did with the colour table information?  Well you could....but....like any anything we code we need to ask the question: What are we assuming about our execution environment?

When we wrote SETDCOLOR we knew we had to change something about the way our program executed to get it to look at the right colour table.  No matter what we did it was going to cost us time.  Also it's not unreasonable to assume that a plotting program is going to plot a number of points in a single colour before it changes to a different colour.

Can we make the same assumption here?  Maybe not.  We will have to change these values every time we plot a point.   Each time we do we're going to do four table lookups and rewrite four bytes.  Is that going to be too much?  We can get a sense of this by adding up the time it takes to execute the main part of our SETDCOLOR program:

InstructionCycles
LDA CLOM,Y5
STA ORMAIN+14
LDA CHIM,Y5
STA ORMAIN+24
LDA CLOA,Y5
STA ORAUX+14
LDA CHIA,Y5
STA ORAUX+24
TOTAL36

So every plot it's going to cost us 36 machine cycles.  Let's compare this to a different method: indirect addressing using the zero page.

Indirect addressing:


Anyone who has written 6502 assembly should know how to use indirect addressing so this will be a quick refresher.  Examine the following code:
 LDA #$00 ;Load the low byte of the address $2000
 STA $1D ;Store it in $001D
 LDA #$20 ;Load the high byte of the addres $2000
 STA $1E ;Store it in $001E
 LDA #$FF ;Load the accumulator with 255
 LDY #$00 ;Load Y with 0
 STA ($1D),Y ;Store the contents of the accumulator at location $2000
As you can see STA($1D),Y peeks into the two adjacent memory locations $1D and $1E.  Sees they contain the bytes 00 and 20 respectively. It then puts them together to form the address $2000 and stores the contents of  the accumulator in that memory location.

But wait!  Is that really any faster?  Actually yes! First, we only have to read and write two bytes instead of four. Second, the writes to the zero page only take three cycles instead of four.  The only drawback is that STA ($1D),Y takes one cycle longer than STA $2000,Y.  However even with that we still come out ahead.

 The question remains: How do we implement this?  You know the answer! That's right! Another table!  This time representing the high and low bytes of the beginning of each screen row.  As there are 192 screen lines, which is a lot of data.  For now I'll just give  you the table for the first eight lines, which if you recall are $400 bytes apart.
HTAB_LO DB $00,$00,$00,$00,$00,$00,$00,$00
HTAB_HI DB $20,$24,$28,$2C,$30,$34,$38,$3C
I've called them HTAB_LO and HTAB_HI for the horizontal low-byte table and horizontal high-byte table respectively.

Now just as our program expects the horizontal co-ordinate in the X register we'll modify our code to expect the vertical co-ordinate in the Y register.  To accomplish that we add the following to the beginning of our program:
 LDA HTAB_LO,Y
 STA $1D
 LDA HTAB_HI,Y
 STA $1E
Then we make one final change to the main part of our program that does the plotting.  We substitute the places where we wrote:

LDA $2000,X

With

LDA ($1D),X

And believe it or not we're done!  The following is a fully commented routine to draw a pixel of any colour anywhere on the DHR page. The listing also includes our SETDCOLOR routine:
 XC ;Required for Merlin Pro to use 65C02 instructions
 ORG $6000 ;Start assembling at memory location $6000
GRAPHON EQU $C050 
HIRESON EQU $C057
FULLON EQU $C052
DHRON EQU $C05E
ON80STOR EQU $C001
ON80COL EQU $C00D
PAGE1 EQU $C054
PAGE2 EQU $C055
SCRN_LO EQU $1D ;Zero page location for low byte of our screen row
SCRN_HI EQU $1E ;Zero page location for high byte of our screen row
INIT STA GRAPHON ;Turn on graphics
 STA HIRESON ;Turn on hi-res mode
 STA FULLON ;Turn on fullscreen mode
 STA DHRON ;Turn on Double hi-res
 STA ON80COL ;Turn on 80 Column mode
 STA ON80STOR ;Use PAGE1/PAGE2 to switch between MAIN and AUX memory
 LDA #$0E ;Set colour to 14 = Aqua
 JSR SETDCOLOR
 LDX #00 ;Set column to 0
 LDY #00 ;Set row to 0
DPLOT LDA HTAB_LO,Y ;Find the low byte of the row address
 STA SCRN_LO 
 LDA HTAB_HI,Y ;Find the high byte of the row address
 STA SCRN_HI
 LDY MBOFFSET,X ;Find what byte if any in MAIN we are working in
 BMI AUX ;If pixel has no bits in MAIN memory - go to aux routine
 STA PAGE1 ;Map $2000 to MAIN memory
 LDA (SCRN_LO),Y ;Load screen data
 AND MAINAND,X ;Erase pixel bits
ORMAIN ORA MAINGR,X ;Draw coloured bits
 STA (SCRN_LO),Y ;Write back to screen
AUX LDY ABOFFSET,X ;Find what byte if any in AUX we are working in
 BMI END ;If no part of the pixel is in AUX - end the program
 STA PAGE2 ;Map $2000 to AUX memory
 LDA (SCRN_LO),Y ;Load screen data
 AND AUXAND,X ;Erase pixel bits
ORAUX ORA AUXGR,X ;Draw coloured bits
 STA (SCRN_LO),Y ;Write back to screen
END RTS 
SETDCOLOR TAY ;Assume the desired colour is in the accumulator
 LDA CLOM,Y ;Lookup low byte of MAIN memory colour table
 STA ORMAIN+1 ;Update the ORA instruction
 LDA CHIM,Y ;Lookup high byte of MAIN memory colour table
 STA ORMAIN+2 ;Update the ORA instruction
 LDA CLOA,Y ;Lookup low byte of AUX memory colour table
 STA ORAUX+1 ;Update the ORA instruction
 LDA CHIA,Y ;Lookup high byte of AUX memory colour table
 STA ORAUX+2 ;Update the ORA instruction
 RTS 
MBOFFSET DB 255,0,0,0,255,1,1,255,2,2,2,255,3,3
 DB 255,4,4,4,255,5,5,255,6,6,6,255,7,7
 DB 255,8,8,8,255,9,9,255,10,10,10,255,11,11
 DB 255,12,12,12,255,13,13,255,14,14,14,255,15,15
 DB 255,16,16,16,255,17,17,255,18,18,18,255,19,19
 DB 255,20,20,20,255,21,21,255,22,22,22,255,23,23
 DB 255,24,24,24,255,25,25,255,26,26,26,255,27,27
 DB 255,28,28,28,255,29,29,255,30,30,30,255,31,31
 DB 255,32,32,32,255,33,33,255,34,34,34,255,35,35
 DB 255,36,36,36,255,37,37,255,38,38,38,255,39,39
ABOFFSET DB 0,0,255,1,1,1,255,2,2,255,3,3,3,255
 DB 4,4,255,5,5,5,255,6,6,255,7,7,7,255
 DB 8,8,255,9,9,9,255,10,10,255,11,11,11,255
 DB 12,12,255,13,13,13,255,14,14,255,15,15,15,255
 DB 16,16,255,17,17,17,255,18,18,255,19,19,19,255
 DB 20,20,255,21,21,21,255,22,22,255,23,23,23,255
 DB 24,24,255,25,25,25,255,26,26,255,27,27,27,255
 DB 28,28,255,29,29,29,255,30,30,255,31,31,31,255
 DB 32,32,255,33,33,33,255,34,34,255,35,35,35,255
 DB 36,36,244,37,37,37,255,38,38,255,39,39,39,255
MAINAND
 LUP 20
 DB %01111111,%01111110,%01100001,%00011111
 DB %01111111,%01111000,%00000111
 --^
AUXAND
 LUP 20
 DB %01110000,%00001111,%01111111,%01111100
 DB %01000011,%00111111,%01111111
 --^
MAINBL
 LUP 20
 DB %00000000,%00000000,%00000000,%00000000
 DB %00000000,%00000000,%00000000
 --^
AUXBL
 LUP 20
 DB %00000000,%00000000,%00000000,%00000000
 DB %00000000,%00000000,%00000000
 --^
MAINMG
 LUP 20
 DB %00000000,%00000001,%00010000,%00000000
 DB %00000000,%00000100,%01000000
 --^
AUXMG
 LUP 20
 DB %00001000,%00000000,%00000000,%00000010
 DB %00100000,%00000000,%00000000
 --^
MAINBR
 LUP 20
 DB %00000000,%00000000,%00001000,%00000000
 DB %00000000,%00000010,%00100000
 --^
AUXBR
 LUP 20
 DB %00000100,%01000000,%00000000,%00000001
 DB %00010000,%00000000,%00000000
 --^
MAINOR
 LUP 20
 DB %00000000,%00000001,%00011000,%00000000
 DB %00000000,%00000110,%01100000
 --^
AUXOR
 LUP 20
 DB %00001100,%01000000,%00000000,%00000011
 DB %00110000,%00000000,%00000000
 --^
MAINDG
 LUP 20
 DB %00000000,%00000000,%0000100,%01000000
 DB %00000000,%00000001,%00010000
 --^
AUXDG
 LUP 20
 DB %00000010,%00100000,%00000000,%00000000
 DB %00001000,%00000000,%00000000
 --^
MAING1
 LUP 20
 DB %00000000,%00000001,%00010100,%01000000
 DB %00000000,%00000101,%01010000
 --^
AUXG1
 LUP 20
 DB %00001010,%00100000,%00000000,%00000010
 DB %00101000,%00000000,%00000000
 --^
MAINGR
 LUP 20
 DB %00000000,%00000000,%00001100,%01000000
 DB %00000000,%00000011,%00110000
 --^
AUXGR
 LUP 20
 DB %00000110,%01100000,%00000000,%000000001
 DB %00011000,%00000000,%00000000
 --^
MAINYE
 LUP 20
 DB %00000000,%00000001,%00011100,%01000000
 DB %00000000,%00000111,%01110000
 --^
AUXYE
 LUP 20
 DB %00001110,%01100000,%00000000,%00000011
 DB %00111000,%00000000,%00000000
 --^
MAINDB
 LUP 20
 DB %00000000,%00000000,%00000010,%00100000
 DB %00000000,%00000000,%00001000
 --^
AUXDB
 LUP 20
 DB %00000001,%00010000,%00000000,%00000000
 DB %00000100,%01000000,%00000000
 --^
MAINVI
 LUP 20
 DB %00000000,%00000001,%00010010,%00100000
 DB %00000000,%00000100,%01001000
 --^
AUXVI
 LUP 20
 DB %00001001,%00010000,%00000000,%00000010
 DB %00100100,%01000000,%00000000
 --^
MAING2
 LUP 20
 DB %00000000,%00000000,%00001010,%00100000
 DB %00000000,%00000010,%00101000
 --^
AUXG2
 LUP 20
 DB %00000101,%01010000,%00000000,%00000001
 DB %00010100,%01000000,%00000000
 --^
MAINPI
 LUP 20
 DB %00000000,%00000001,%00011010,%00100000
 DB %00000000,%00000110,%01101000
 --^
AUXPI
 LUP 20
 DB %00001101,%01010000,%00000000,%00000011
 DB %00110100,%01000000,%00000000
 --^
MAINMB
 LUP 20
 DB %00000000,%00000000,%00000110,%01100000
 DB %00000000,%00000001,%00011000
 --^
AUXMB
 LUP 20
 DB %00000011,%00110000,%00000000,%00000000
 DB %00001100,%01000000,%00000000
 --^
MAINLB
 LUP 20
 DB %00000000,%00000001,%00010110,%01100000
 DB %00000000,%00000101,%01011000
 --^
AUXLB
 LUP 20
 DB %00001011,%00110000,%00000000,%00000010
 DB %00101100,%01000000,%00000000
 --^
MAINAQ
 LUP 20
 DB %00000000,%00000000,%00001110,%01100000
 DB %00000000,%00000011,%00111000
 --^
AUXAQ
 LUP 20
 DB %00000111,%01110000,%00000000,%00000001
 DB %00011100,%01000000,%00000000
 --^
MAINWI
 LUP 20
 DB %00000000,%00000001,%00011110,%01100000
 DB %00000000,%00000111,%01111000
 --^
AUXWI
 LUP 20
 DB %00001111,%01110000,%00000000,%00000011
 DB %00111100,%01000000,%00000000
 --^
CLOM DB <MAINBL,<MAINMG,<MAINBR,<MAINOR,<MAINDG
 DB <MAING1,<MAINGR,<MAINYE,<MAINDB,<MAINVI
 DB <MAING2,<MAINPI,<MAINMB,<MAINLB,<MAINAQ,<MAINWI
CHIM DB >MAINBL,>MAINMG,>MAINBR,>MAINOR,>MAINDG,>MAING1
 DB >MAINGR,>MAINYE,>MAINDB,>MAINVI,>MAING2,>MAINPI
 DB >MAINMB,>MAINLB,>MAINAQ,>MAINWI
CLOA DB <AUXBL,<AUXMG,<AUXBR,<AUXOR,<AUXDG
 DB <AUXG1,<AUXGR,<AUXYE,<AUXDB,<AUXVI
 DB <AUXG2,<AUXPI,<AUXMB,<AUXLB,<AUXAQ,<AUXWI
CHIA DB >AUXBL,>AUXMG,>AUXBR,>AUXOR,>AUXDG,>AUXG1
 DB >AUXGR,>AUXYE,>AUXDB,>AUXVI,>AUXG2,>AUXPI
 DB >AUXMB,>AUXLB,>AUXAQ,>AUXWI
HTAB_LO
 LUP 4
 DB $00,$00,$00,$00,$00,$00,$00,$00
 DB $80,$80,$80,$80,$80,$80,$80,$80
 --^
 LUP 4
 DB $28,$28,$28,$28,$28,$28,$28,$28
 DB $A8,$A8,$A8,$A8,$A8,$A8,$A8,$A8
 --^
 LUP 4
 DB $50,$50,$50,$50,$50,$50,$50,$50
 DB $D0,$D0,$D0,$D0,$D0,$D0,$D0,$D0
 --^
HTAB_HI
 LUP 3
 DB $20,$24,$28,$2C,$30,$34,$38,$3C
 DB $20,$24,$28,$2C,$30,$34,$38,$3C
 DB $21,$25,$29,$2D,$31,$35,$39,$3D
 DB $21,$25,$29,$2D,$31,$35,$39,$3D
 DB $22,$26,$2A,$2E,$32,$36,$3A,$3E
 DB $22,$26,$2A,$2E,$32,$36,$3A,$3E
 DB $23,$27,$2B,$2F,$33,$37,$3B,$3F
 DB $23,$27,$2B,$2F,$33,$37,$3B,$3F
 --^
This assembles to about 5K of code - mostly tables. It's worth pointing out that there are many ways to cut this program down in terms of size.  For example since all the colour tables and mask tables repeat in cycles of seven, we could take the value in the X register modulus 7 before doing our lookup.  This would save us a whopping 2K.  It would also increase the number of cycles we spend plotting each pixel significantly.  

There are even a couple of ways to make our code plot faster, mostly using approaches that are less easy to explain.  I may cover some of these speed/size optimizations in a later article but for now this code fits our two primary goals:

  • sufficient speed to be used in an action game
  • sufficiently clear so that the reader can go on and build out their own enhancements.

Next up, we learn about some code management and put this bad boy to work with some demo code...

2 comments:

Unknown said...

It's really nice that you took the huge time and effort to produce this blog on DHR.
I have learnt quite a lot from it!

I have a question though: to write to AUX memory is a simple matter of switching page.
BUT, how to read from AUX memory? Does it work also by switching page?

In my application I want to Get as well as Set pixel colors.
Cheers,
Tony

Axon Punch said...

Hi Tony,

In order to set individual pixels I have to read as well as write to AUX memory. So if you have the 80STOREON softswitch set ($C001) you can use the PAGE1/PAGE2 ($C054/%C055) softswitches to access AUX memory for writing OR reading.

However this has a limitation. Since you are already using the PAGE1/PAGE2 softswitches you can't flip the page. Which means with 80STOREON you can only use DHGR page 1.

If you need BOTH DHGR pages. Then it gets tricky. I discuss the difficulties in part three: http://www.battlestations.zone/2017/04/apple-ii-double-hi-res-from-ground-up_70.html

Apple II - Double Hi-Res From The Ground Up - Part 9: An API and a demo!

A better interface Perhaps you've noticed that all these drawing routines we've developed require a fair amount of memory to do an...