Thursday, May 11, 2017

Apple II - Double Hi-Res From The Ground Up
Part 8: Drawing lines with an 8-bit Bresenham algorithm 2

A big shoutout to the Folks at OpenApple for covering my humble blog. You can check out the episode here
The story so far...

After a lot of background work we now have the basis for two of the fundamental features of a graphics package: Drawing a point and drawing a line. These basic elements: points, lines, rectangles, etc.. are often referred to collectively as primitives.

Our line drawing algorithm works but can only handle lines where X1 > X0, Y1 > Y0 and DX > DY. So lets fix that!

So what happens when DY > DX?


The answer is simple! The Y axis becomes the major axis for our line. In other words, our line moves farther along the y-axis than the x-axis. Let's take a look at just part of the code from last post:
NXTLINE LDA Y1 ;Calculate DY as Y1-Y0
 SEC
 SBC Y0
 STA DY
 LDA X1 ;Calculate DX as X1-X0
 SEC
 SBC X0
 STA DX
 LSR ; Set epsilon to DX/2
 EOR #$FF
 STA EPSILON
 LDX X0 ;Start plotting at (X0,Y0)
 LDY Y0 
NXTVERT PUT DH.ROWSET ;Calculate horizontal value
NXTHORZ PHY
 PUT DH.PLOT ;Plot point
 PLY
 CPX X1 ;Are we done?
 BEQ ENDLINE
 INX ;Move along major axis
 LDA EPSILON ;Epsilon = Epsilon + DY
 CLC
 ADC DY
 STA EPSILON
 BCC NXTHORZ ;If Epsilon hasn't rolled over move to next point 
 SEC ;If epsilon has rolled over subtract DX
 SBC DX
 STA EPSILON
 INY ;Move along minor axis
 BRA NXTVERT
Notice that here, when X is the major axis we are doing five things:
  1. We set epsilon to 255-DX/2 (lines 9-11)
  2. We compare the X co-ordinate with X1 to see if were done drawing. (lines 18-19)
  3. We advance the X co-ordinate. (line 20)
  4. We add DY to epsilon at each iteration to determine if we need to move along the minor (Y) access (lines 21-25)
  5. If so, we subtract DX from epsilon and advance the Y co-ordinate (lines 26-29)
When DY is larger than DX, our algorithm changes to:
  1. Set epsilon to 255-DY/2 (lines 5-7)
  2. Compare the Y co-ordinate with Y1 to see if were done drawing. (lines 18-19)
  3. Advance the Y co-ordinate. (line 20)
  4. Add DX to epsilon at each iteration to determine if we need to move along the minor (X) access (lines 21-25)
  5. If so, we subtract DY from epsilon and advance the X co-ordinate (lines 26-29)
That code would look something like this:
NXTLINE LDA Y1 ;Calculate DY as Y1-Y0
 SEC
 SBC Y0
 STA DY
 LSR ;Set epsilon to 255-DY/2
 EOR #$FF
 STA EPSILON 
 LDA X1 ;Calculate DX as X1-X0
 SEC
 SBC X0
 STA DX
 LDX X0 ;Start plotting at (X0,Y0)
 LDY Y0 
NXTVERT PUT DH.ROWSET ;Calculate horizontal value
NXTHORZ PHY
 PUT DH.PLOT ;Plot point
 PLY
 CPY Y1 ;Are we done?
 BEQ ENDLINE
 INY ;Move along major axis
 LDA EPSILON ;Epsilon = Epsilon + DX
 CLC
 ADC DX
 STA EPSILON
 BCC NXTVERT ;NOTE: we've changed Y so we need to call DH.ROWSET
 SEC ;If epsilon has rolled over subtract DX
 SBC DY
 STA EPSILON
 INX ;Move along minor axis
 BRA NXTVERT
If you recall, last post I mentioned that one of the ways we can implement Bresenham's algorithm is to write multiple routines and that's exactly what we are going to do here. The combined code looks like this:
NXTLINE LDX X0
 LDY Y0
 LDA Y1 ;Calculate DY as Y1-Y0
 SEC
 SBC Y0
 STA DY
 LDA X1 ;Calculate DX as X1-X0
 SEC
 SBC X0
 STA DX
 CMP DY
 BCS DXLINE
DYLINE LDA DY ;Set EPSILON to 255-(DY/2)
 LSR
 EOR #$FF
 STA EPSILON
DYHORZ PUT DH.ROWSET ;Calculate horizontal value
 PHY
 PUT DH.PLOT ;Plot point
 PLY
 CPY Y1 ;Are we done?
 BEQ ENDLINE
 INY ;Move along major axis
 LDA EPSILON ;Epsilon = Epsilon + DX
 CLC
 ADC DX
 STA EPSILON
 BCC DYHORZ ;NOTE: we've changed Y so we need to call DH.ROWSET
 SEC ;If epsilon has rolled over subtract DX
 SBC DY
 STA EPSILON
 INX ;Move along minor axis
 BRA DYHORZ
DXLINE LDA DX ;Set EPSILON to 255-(DX/2)
 LSR
 EOR #$FF
 STA EPSILON
DXVERT PUT DH.ROWSET ;Calculate horizontal value
DXHORZ PHY
 PUT DH.PLOT ;Plot point
 PLY
 CPX X1 ;Are we done?
 BEQ ENDLINE
 INX ;Move along major axis
 LDA EPSILON ;Epsilon = Epsilon + DY
 CLC
 ADC DY
 STA EPSILON
 BCC DXHORZ ;If Epsilon hasn't rolled over move to next point 
 SEC ;If epsilon has rolled over subtract DX
 SBC DX
 STA EPSILON
 INY ;Move along minor axis
 BRA DXVERT
ENDLINE RTS
Keep in mind that we are still assuming that X0 < X1 and that Y0 < Y1. But what if that's not the case? Well if you think about it, as far as drawing the line goes the only difference between X0 > X1 and X0 < X1 is that we move along the X axis in the opposite direction. The proportion of steps on the X axis to the Y axis doesn't change. The same applies for the Y axis.

Clearly we could solve this problem by writing more routines but considering that there are two cases - that gives us four possibilities and each one would have to handle situations where DX > DY and DY < DX. Implementing each of these would yield eight separate routines!

So is there another way?


After all, what we really need to do is change the direction we are moving along one or both axes. We even know how to implement that! Just flip our INX to a DEX for the X axis and our INY to DEY to do the same thing for the Y axis.

To accomplish this we're going to, once again use self-modifying code. While we are calculating X1 - X0, we will check if the result is less than zero - in other words the carry bit will be cleared. At which point we will do two things: flip the sign back from negative to positive (accomplished with an EOR $FF) and then modify the INX/INY instructions in our two routines as needed.

This last part will require us to create a few more labels. CHGYA and CHGYB pointing to the two INY instructions and CHGXA and CHGXB for the two INX instructions.

The result looks like this:(self-modifying code and the portions that will be modified marked out in yellow)
NXTLINE LDX X0
 LDY Y0
 LDA Y1 ;Calculate DY as Y1-Y0
 SEC
 SBC Y0
 BCC Y0BIGGER
 STA DY
 LDA #$C8 ;SMC: Hex code for INY 
 BRA CHGY
Y0BIGGER EOR #$FF
 STA DY
 LDA #$88 ;SMC: Hex code for DEY 
CHGY STA CHGYA ;SMC: Modify instructions at CHGYA/CHGYB
 STA CHGYB
 LDA X1 ;Calculate DX as X1-X0
 SEC
 SBC X0
 BCC X0BIGGER
 STA DX
 LDA #$E8 ;SMC: Hex code for INX
 BRA CHGX
X0BIGGER EOR #$FF
 STA DX
 LDA #$CA ;SMC: Hex code for DEX
CHGX STA CHGXA ;Modify instructions at CHGXA/CHGXB
 STA CHGXB
 LDA DX
 CMP DY
 BCS DXLINE
DYLINE LDA DY ;Set EPSILON to 255-(DY/2)
 LSR
 EOR #$FF
 STA EPSILON
DYHORZ PUT DH.ROWSET ;Calculate horizontal value
 PHY
 PUT DH.PLOT ;Plot point
 PLY
 CPY Y1 ;Are we done?
 BEQ ENDLINE
CHGYA INY ;SMC: Move along major axis
 LDA EPSILON ;Epsilon = Epsilon + DX
 CLC
 ADC DX
 STA EPSILON
 BCC DYHORZ ;NOTE: we've changed Y so we need to call DH.ROWSET
 SEC ;If epsilon has rolled over subtract DX
 SBC DY
 STA EPSILON
CHGXA INX ;SMC: Move along minor axis
 BRA DYHORZ
DXLINE LDA DX ;Set EPSILON to 255-(DX/2)
 LSR
 EOR #$FF
 STA EPSILON
DXVERT PUT DH.ROWSET ;Calculate horizontal value
DXHORZ PHY
 PUT DH.PLOT ;Plot point
 PLY
 CPX X1 ;Are we done?
 BEQ ENDLINE
CHGXB INX ;SMC: Move along major axis
 LDA EPSILON ;Epsilon = Epsilon + DY
 CLC
 ADC DY
 STA EPSILON
 BCC DXHORZ ;If Epsilon hasn't rolled over move to next point 
 SEC ;If epsilon has rolled over subtract DX
 SBC DX
 STA EPSILON
CHGYB INY ;SMC: Move along minor axis
 BRA DXVERT
ENDLINE RTS
...and we now have a fully functioning line drawing routine! There's just one problem. Remember our DH.COLOUR macro? It modifies the DH.PLOT code! So what will happen here when we call SETDCOLOR?
*DH.COLOUR
*Sets colour for DH.PLOT routine.  Requires DH.PLOT.
*Jonathan Graham/Battlestations
*Free for non-commercial use with attribution
*
 LDA CLOM,Y ;Lookup low byte of MAIN memory colour table
 STA ]ORMAIN+1 ;Update the ORA instruction
 LDA CHIM,Y ;Lookup high byte of MAIN memory colour table
 STA ]ORMAIN+2  ;Update the ORA instruction
 LDA CLOA,Y ;Lookup low byte of AUX memory colour table
 STA ]ORAUX+1 ;Update the ORA instruction
 LDA CHIA,Y ;Lookup high byte of AUX memory colour table
 STA ]ORAUX+2 ;Update the ORA instruction
If you recall the ]ORMAIN and ]ORAUX variables take on the value they were defined with most recently in the code. Since they are defined in the DH.PLOT macro. These variables will point to the part of the line draw routine where DX > DY. Which means for our other code, which is used when DX < DY our line will be the default colour: green.

This problem can be solved in a number of ways but to preserve the modularity of our code we're going to be a little less than efficient. Our code, as it stands will modify the correct ORMAIN/ORAUX codes in the DXLINE routine (used when DX > DY). All we need is another to change the code in the DYLINE routine. If you take a look in DH.PLOT you will see that ]ORMAIN is 13 bytes from the begining, and ORAUX is 31 bytes from the beginning. So all we need is one more label at the point where we PUT DH.PLOT called PLOTDYL and then use the pseudo opcode EQU to do the following
SETDCOLOR TAY
 PUT DH.COLOUR
]ORMAIN EQU PLOTDYL+13 ;Redefine ORMAIN to point to DYLINE routine
]ORAUX EQU PLOTDYL+31 ;Redefine ORAUX to point to DYLINE routine
 PUT DH.COLOUR
 RTS
By placing the label PLOTDYL right where te second DH.PLOT macro is and then using the EQU's as above. We've tricked the assembler into making our second macro modify the DYLINE rounte....and that's it.

Are we on the right track?


Our original goals were to produce code that was not just understandable but also fast.  So exactly how fast is this code we are writing anyway? When benchmarking code in the real world. I always find it handy to compare my results as well as the value of any improvements I'm about to make by comparing them to the theoretical maximum for whatever technique I'm using.

As we've seen, in our approach addressing a single pixel on the DHR screen requires us to load the screen data from the place we are about to plot, clear the bits we need to use with AND, insert our pixel data with OR then write the value back to screen memory.  At it's cheapest a LDA from some arbitrary memory location is 3 machine cycles, an AND is two, an OR is two more and a store is 4 more again.  This brings us to 11 machine cycles.  Which means we could fill the entire screen with a pixel of a particular colour in 295,680 cycles or about .39 seconds on a 1 Mhz machine.  Which is about 3 FPS.  

This figure may seem small but keep in mind that in a real game you almost never address every point on the screen in a single frame. Furthermore a single pixel plot is considerably - about seven times - less efficient than drawing a bitmap on the screen a byte at a time.

I should also stress that what we are calculating is very much a unobtainable result since to draw a whole screen would require a stream of instructions about 107K in length which would consume all of the memory on a typical Apple //c or //e.

Another, perhaps more common benchmark is to compare your code against other applications or implementations. For the sake of completeness I've gathered a few popular DHR tools. In each case I've written the fastest code I could manage with the goal of filling the entire screen by drawing horizontal lines from left to right. Once the screen is filled we change colours and start again. After cycling through all 16 colours we divide our time by 16 to get the average time to fill an entire screen.

So without further ado here are our challengers...


Beagle Graphics - Created by the ever popular Beagle Bros. It contains a number of tools for drawing graphics on the DHR screen from basic. Here I used the following program:
&MODE(2):FOR C = 0 TO 15:&HCOLOR=C:FOR Y = 0 to 191:HPLOT 0,Y to 139,Y: NEXT Y,C:

Graphics Magician - Developed in 1982 by Penguin Software, this program was the basis for many graphical adventure games. It allows people to create drawings using a special program which records various primitive graphical operations (line draws, fills, text). The user can then save these in a file and redraw them on the screen from BASIC or assembly using a free-to-use engine written in assembly. In this case I reverse-engineered the file format and created a file with a series of line drawing commands covering the entire screen. This is followed by a command to change colour and another screens-worth of line drawing commands.

cc65 - One of the more popular solutions for doing cross-compilation for the Apple II. I adapted routines from Bill Buckles PDF on drawing DHR graphics using CC65 to write the following code:
int main(void)
{
 int c,i;
    videomode(VIDEOMODE_80COL);
    clrscr();
    dhireson();
    dhiresclear();
 for (c=0;c<16;c++)
  for (i=0;i<192;i++)
   dhrfbox(0, i, 139, i,c);
    return 0;
}
The source code and disk images for Bill's library can be found here

The results speak for themselves:

ApplicationScreen fill% of Theoretical
Theoretical0.39 seconds100%
Our Application2 seconds19.5%
cc653.4 seconds11%
Beagle Graphics10 seconds3.9%
Graphics Magician (DHR)24 seconds1.62%


In my experience 20% of the theoretical max on your first go is quite a good showing. We also appear to have beat out all of the other approaches we tested including commercial applications.

Can we do any better?


The answer is a definite "yes". You will notice that every time we plot a point we push the Y register to preserve it's contents, then we PULL it to get the value back so we can update it. However when DX > DY the majority of the time this is entirely unnecessary. Making this change brings us down to 1.85s or 20.1% of our theoretical max!

Another optimization comes from the fact that the values DX, DY and EPSILON never change once the line drawing has begun. Which means we could, by means of self-modifying code replace all the places where we use these values with absolute addressing rather than zero page addressing. This would save us about three cycles per plot.


Large scale vs small scale patterns.


All the optimizations I've just mentioned are all what I call "small scale" they involve taking the existing approach - in this case plotting a single point at a time - and making it run more tightly.  A large scale optimization is where we change the way we are solving the problem.  

For example, if we know we are going to be plotting lines and that lines generally contain more than one point.  We could re-write our plot routine to draw two pixels at a time. We could handle lines with an odd number of pixels by writing a special case.

If you look closely at the CC65 program you will see that we are using the dhrfbox function. This was actually designed for drawing filled boxes instead of straight lines. Why did we use it then? Because it uses this trick extensively, and will draw just about two pixels per plot by writing a whole byte to the screen at a time. If we were to use the line drawing function from the CC65 library it would be about 20 times slower!

Ok that's it for this post, next up we will build out a simple API for the drawing routine as well as an demo app doing some 3D style drawing. After that I think we should cover writing a character generator.

No comments:

Apple II - Double Hi-Res From The Ground Up - Part 9: An API and a demo!

A better interface Perhaps you've noticed that all these drawing routines we've developed require a fair amount of memory to do an...