Retro Programming: assembly

Showing posts with label assembly. Show all posts

Saturday, 22 July 2017

16-Bit Xorshift Pseudorandom Numbers in Z80 Assembly

Xorshift is a simple, fast pseudorandom number generator developed by George Marsaglia. The generator combines three xorshift operations where a number is exclusive-ored with a shifted copy of itself:

/* 16-bit xorshift PRNG */

unsigned xs = 1;

unsigned xorshift( )
{
    xs ^= xs << 7;
    xs ^= xs >> 9;
    xs ^= xs << 8;
    return xs;
}

There are 60 shift triplets with the maximum period 2¹⁶-1. Four triplets pass a series of lightweight randomness tests including randomly plotting various n × n matrices using the high bits, low bits, reversed bits, etc. These are: 6, 7, 13; 7, 9, 8; 7, 9, 13; 9, 7, 13.

7, 9, 8 is the most efficient when implemented in Z80, generating a number in 86 cycles. For comparison the example in C takes approx ~1200 cycles when compiled with HiSoft C v1.3.

; 16-bit xorshift pseudorandom number generator
; 20 bytes, 86 cycles (excluding ret)

; returns   hl = pseudorandom number
; corrupts   a

xrnd:
  ld hl,1       ; seed must not be 0

  ld a,h
  rra
  ld a,l
  rra
  xor h
  ld h,a
  ld a,l
  rra
  ld a,h
  rra
  xor l
  ld l,a
  xor h
  ld h,a

  ld (xrnd+1),hl

  ret

Wednesday, 19 July 2017

A Fast Z80 Integer Square Root

A project I'm working on needs a fast square root but I couldn't find anything suitable online. After implementing several versions of the bit-by-bit algorithm I discovered the following code is particularly efficient when unrolled:

/* Return the square root of numb */

int isqrt( numb )
{
    int root = 0, bit = 04000h;
    while ( bit != 0 )
    {
        if ( numb >= root + bit )
        {
            numb = numb - root - bit;
            root = root / 2 + bit;
        }
        else
            root = root / 2;
        bit = bit / 4;
    }
    return root;
}

First Make It Work

The looping version is small but clunky. It spends almost 400 t-states shifting bits. We'll be able to eliminate most of the shifting by hard-coding the bit positions when the loop is unrolled:

; 16-bit integer square root
; 34 bytes, 1005-1101 cycles (average 1053)

; call with de = number to square root
; returns   hl = square root
; corrupts  bc, de

  ld bc,08000h
  ld h,c
  ld l,c
sqrloop:
  srl b
  rr c
  add hl,bc
  ex de,hl
  sbc hl,de
  jr c,sqrbit
  ex de,hl
  add hl,bc
  jr sqrfi
sqrbit:
  add hl,de
  ex de,hl
  or a
  sbc hl,bc
sqrfi:
  srl h
  rr l
  srl b
  rr c
  jr nc,sqrloop

Then Make It Work Faster

First the loop is unrolled. The first 4 iterations are optimized to work on 8-bit values and bit positions are hard-coded. The first and last iteration are optimized as a special case, we work with the bitwise complement of the root instead of the root and small jumps are replace with overlapping code. The final code finds the root in an average of 362 t-states:

; fast 16-bit integer square root
; 92 bytes, 344-379 cycles (average 362)
; v2 - 3 t-state optimization spotted by Russ McNulty

; call with hl = number to square root
; returns    a = square root
; corrupts  hl, de

  ld a,h
  ld de,0B0C0h
  add a,e
  jr c,sq7
  ld a,h
  ld d,0F0h
sq7:

; ----------

  add a,d
  jr nc,sq6
  res 5,d
  db 254
sq6:
  sub d
  sra d

; ----------

  set 2,d
  add a,d
  jr nc,sq5
  res 3,d
  db 254
sq5:
  sub d
  sra d

; ----------

  inc d
  add a,d
  jr nc,sq4
  res 1,d
  db 254
sq4:
  sub d
  sra d
  ld h,a

; ----------

  add hl,de
  jr nc,sq3
  ld e,040h
  db 210
sq3:
  sbc hl,de
  sra d
  ld a,e
  rra

; ----------

  or 010h
  ld e,a
  add hl,de
  jr nc,sq2
  and 0DFh
  db 218
sq2:
  sbc hl,de
  sra d
  rra

; ----------

  or 04h
  ld e,a
  add hl,de
  jr nc,sq1
  and 0F7h
  db 218
sq1:
  sbc hl,de
  sra d
  rra

; ----------

  inc a
  ld e,a
  add hl,de
  jr nc,sq0
  and 0FDh
sq0:
  sra d
  rra
  cpl

Saturday, 29 April 2017

ZX Spectrum Scanline Flood Fill

A flood fill is a graphical algorithm to colour an area of screen bounded by pixels of another colour. The scanline technique is a fast, stack-efficient flood fill which can be implemented in 99 bytes of Z80, as demonstrated below:

; scanline fill by John Metcalf
; call with d=x-coord, e=y-coord

; set end marker

fill:
  ld l,255
  push hl

; calculate bit position of pixel

nextrun:
  ld a,d
  and 7
  inc a
  ld b,a
  ld a,1
bitpos:
  rrca
  djnz bitpos
  ld c,b
  ld b,a

; move left until hitting a set pixel or the screen edge

seekleft:
  ld a,d
  or a
  jr z,goright
  dec d
  rlc b
  call scrpos
  jr nz,seekleft

; move right until hitting a set pixel or the screen edge,
; setting pixels as we go. Check rows above and below and
; save their coordinates to fill later if necessary

seekright:  
  rrc b
  inc d
  jr z,rightedge
goright:
  call scrpos
  jr z,rightedge
  ld (hl),a
  inc e
  call checkadj
  dec e
  dec e
  call checkadj
  inc e
  jr seekright

; check to see if there's another row waiting to be filled

rightedge:
  pop de
  ld a,e
  inc a
  jr nz,nextrun
  ret  

; calculate the pixel address and whether or not it's set

scrpos:
  ld a,e
  and 248
  rra
  scf
  rra
  rra
  ld l,a
  xor e
  and 248
  xor e
  ld h,a
  ld a,l
  xor d
  and 7
  xor d
  rrca
  rrca
  rrca
  ld l,a
  ld a,b
  or (hl)
  cp (hl)
  ret

; check and save the coordinates of an adjacent row

checkadj:
  sla c
  ld a,e
  cp 192
  ret nc
  call scrpos+1
  ret z
  inc c
  bit 2,c
  ret nz
  pop hl
  push de
  jp (hl)

Sunday, 29 May 2016

Divide and Conquer Line Algorithm for the ZX Spectrum

While attempting to write a game in 256 bytes I needed a routine to draw lines, but Bresenham's line algorithm weighs in at approx ~120 bytes. The only suitable alternative I'm aware of is recursive divide and conquer: divide a line into two smaller lines and call the draw routine with each in turn:

/* Draw a line from (ax,ay) to (bx,by) */

int draw ( ax, ay, bx, by )
{
    int midx, midy;
    midx = ( ax+bx ) / 2;
    midy = ( ay+by ) / 2;
    if ( midx != ax && midy != ay )
    {
        draw( midx, midy, ax, ay );
        draw( bx, by, midx, midy );
        plot( midx, midy );
    }
}

This is significantly smaller thank Bresenham's, 32 byte of Z80. However, there are a couple of compromises: it's slower and the lines aren't perfect because the rounding errors accumulate.

; draw lines using recursive divide and conquer
; from de = end1 (d = x-axis, e = y-axis)
; to   hl = end2 (h = x-axis, l = y-axis)

DRAW:
  call PLOT

  push hl

; calculate hl = centre pixel

  ld a,l
  add a,e
  rra
  ld l,a
  ld a,h
  add a,d
  rra
  ld h,a

; if de (end1) = hl (centre) then we're done

  or a
  sbc hl,de
  jr z,EXIT
  add hl,de

  ex de,hl
  call DRAW    ; de = centre, hl = end1
  ex (sp),hl
  ex de,hl
  call DRAW    ; de = end2, hl = centre

  ex de,hl
  pop de
  ret

EXIT:
  pop hl
  ret

; ---------------------------

; plot d = x-axis, e = y-axis

PLOT:
  push hl
  ld a,d
  and 7
  ld b,a
  inc b
  ld a,e
  rra
  scf
  rra
  or a
  rra
  ld l,a
  xor e
  and 248
  xor e
  ld h,a
  ld a,l
  xor d
  and 7
  xor d
  rrca
  rrca
  rrca
  ld l,a
  ld a,1
PLOTBIT:
  rrca
  djnz PLOTBIT
  or (hl)
  ld (hl),a
  pop hl
  ret

Alternatively the de(end1) = hl(centre) test can be replaced with a recursion depth count to create an even slower 28 byte routine:

; draw lines using recursive divide and conquer
; from de = end1 (d = x-axis, e = y-axis)
; to   hl = end2 (h = x-axis, l = y-axis)

DRAW:
  ld c,8

DRAW2:
  dec c
  jr z,EXIT

  push de

; calculate de = centre pixel

  ld a,l
  add a,e
  rra
  ld e,a
  ld a,h
  add a,d
  rra
  ld d,a

  call DRAW2   ; de = centre, hl = end1
  ex (sp),hl
  call DRAW2   ; de = centre, hl = end2

  call PLOT
  ex de,hl
  pop hl
EXIT:
  inc c
  ret

Friday, 27 May 2016

Langton's Ant for the ZX Spectrum

Langton's Ant is an automata which creates a complex pattern by following a couple of simple rules:

If the ant is on an empty pixel, turn 90° right, set the pixel then move forward
If the ant is on a set pixel, turn 90° left, reset the pixel then move forward

The ant's path appears chaotic at first before falling into a repetitive “highway” pattern, moving 2 pixels diagonally every 104 cycles.

Here's the code to display Langton's Ant on the ZX Spectrum in 61 bytes. It runs in just over a second so you might want to add a halt to slow things down:

  org 65472

  ld de,128*256+96 

ANT:
; halt
  ld a,c      ; check direction
  and 3
  rrca
  add a,a
  dec a
  jr nc,XMOVE

  add a,e     ; adjust y position +/-1
  ld e,a
  cp 192
  ret nc  
  xor a

XMOVE:
  add a,d     ; adjust x position +/-1
  ld d,a

; ----------
  and 7       ; calculate screen address
  ld b,a
  inc b
  ld a,e
  rra
  scf
  rra
  or a
  rra
  ld l,a
  xor e
  and 248
  xor e
  ld h,a
  ld a,d
  xor l
  and 7
  xor d
  rrca
  rrca
  rrca
  ld l,a
  ld a,1
PLOTBIT:
  rrca
  djnz PLOTBIT
; ----------

  ld b,a      ; test pixel
  and (hl)

  jr nz,LEFT  ; turn left/right
  inc c
  inc c
LEFT:
  dec c

  ld a,b      ; flip pixel
  xor (hl)
  ld (hl),a
  jr ANT

Saturday, 3 October 2015

The Matrix Digital Rain for the ZX Spectrum

A few days ago I coded The Matrix digital rain effect, a fictional representation of the code for the virtual reality of The Matrix. The technique is simple: fill the screen with random characters and scroll down columns of attributes, occasionally switching between black and green.

Here's the final code - 147 bytes of Z80 using the default Sinclair font:

        org 08000h

; black border / black attributes

        xor a
        out (0FEh),a
        ld hl,05AFFh
attr:   ld (hl),a
        dec hl
        bit 2,h
        jr z,attr

; fill screen with random characters

        ld e,a
fillscr:ld d,040h
fill:   call rndchar
        ld a,d
        cp 058h
        jr nz,fill
        inc e
        jr nz,fillscr

; digital rain loop

frame:  ld b,06h
        halt
column: push bc

; randomize one character

        call random
        and 018h
        jr z,docol
        add a,038h
        ld d,a
        call random
        ld e,a
        call rndchar

; select a random column

docol:  call random
        and 01Fh
        ld l,a
        ld h,058h

; ~1% chance black -> white

        ld a,(hl)
        or a
        ld bc,0247h
        jr z,check

; white -> bright green

white:  cp c
        ld c,044h
        jr z,movecol

; bright green -> green

        cp c
        ld c,04h
        jr z,movecol

; ~6% chance green -> black

        ld bc,0F00h
check:  call random
        cp b
        jr c,movecol
        ld c,(hl)

; move column down

movecol:ld de,020h
        ld b,018h
down:   ld a,(hl)
        ld (hl),c
        ld c,a
        add hl,de
        djnz down
        pop bc
        djnz column

; test for keypress

        ld bc,07FFEh
        in a,(c)
        rrca
        jr c,frame
        ret

; display a random glyph

rndchar:call random
crange: sub 05Fh
        jr nc,crange
        add a,a
        ld l,a
        ld h,0
        add hl,hl
        add hl,hl
        ld bc,(05C36h)
        add hl,bc
        ld b,8
char:   ld a,(hl)
        ld (de),a
        inc d
        inc hl
        djnz char
        ret

; get a byte from the ROM

random: push hl
        ld hl,(seed)
        inc hl
        ld a,h
        and 01Fh
        ld h,a
        ld (seed),hl
        ld a,(hl)
        pop hl
        ret

seed:

Saturday, 15 March 2014

Plotting the Mandelbrot Set on the ZX Spectrum

The Mandelbrot set is a fractal which iterates the equation z_n+1 = z_n² + c in the complex plane and plots which points tend to infinity. Plotting the set with Sinclair BASIC takes over 24 hours so I was curious how much faster it would be in assembly.

It turns out if we use fast 16-bit fixed-point arithmetic we can plot the Mandelbrot in about 5 minutes. To minimise multiplications each iteration is calculated as:

r_n+1 = ( r_n + i_n ) × ( r_n - i_n ) + x

i_n+1 = 2 × i_n × r_n + y

The following test is used to detect points which tend to infinity:

|i_n| + |r_n| ≥ 2 × √ 2.

  org 60000
  ld de,255*256+191
XLOOP:
  push de
  ld hl,-180   ; x-coordinate
  ld e,d
  call SCALE
  ld (XPOS),bc
  pop de
YLOOP:
  push de
  ld hl,-96    ; y-coordinate
  call SCALE
  ld (YPOS),bc
  ld hl,0
  ld (IMAG),hl
  ld (REAL),hl
  ld b,15      ; iterations
ITER:
  push bc
  ld bc,(IMAG)
  ld hl,(REAL)
  or a
  sbc hl,bc
  ld d,h
  ld e,l
  add hl,bc
  add hl,bc
  call FIXMUL
  ld de,(XPOS)
  add hl,de
  ld de,(REAL)
  ld (REAL),hl
  ld hl,(IMAG)
  call FIXMUL
  rla
  adc hl,hl
  ld de,(YPOS)
  add hl,de
  ld (IMAG),hl
  call ABSVAL
  ex de,hl
  ld hl,(REAL)
  call ABSVAL
  add hl,de
  ld a,h
  cp 46        ; 46 ≅ 2 × √ 2 << 4
  pop bc
  jr nc,ESCAPE
  djnz ITER
  pop de
  call PLOT
  db 254       ; trick to skip next instruction
ESCAPE:
  pop de
  dec e
  jr nz,YLOOP
  dec d
  jr nz,XLOOP
  ret

FIXMUL:        ; hl = hl × de >> 24
  call MULT16BY16
  ld a,b
  ld b,4
FMSHIFT:
  rla
  adc hl,hl
  djnz FMSHIFT 
  ret

SCALE:         ; bc = (hl + e) × zoom
  ld d,0
  add hl,de
  ld de,48     ; zoom

MULT16BY16:    ; hl:bc (signed 32 bit) = hl × de
  xor a
  call ABSVAL
  ex de,hl
  call ABSVAL
  push af
  ld c,h
  ld a,l
  call MULT8BY16
  ld b,a
  ld a,c
  ld c,h
  push bc
  ld c,l
  call MULT8BY16
  pop de
  add hl,de
  adc a,b
  ld b,l
  ld l,h
  ld h,a
  pop af
  rra
  ret nc
  ex de,hl
  xor a
  ld h,a
  ld l,a
  sbc hl,bc
  ld b,h
  ld c,l
  ld h,a
  ld l,a
  sbc hl,de
  ret

MULT8BY16:     ; returns a:hl (24 bit) = a × de
  ld hl,0
  ld b,8
M816LOOP:
  add hl,hl
  rla
  jr nc,M816SKIP
  add hl,de
  adc a,0
M816SKIP:
  djnz M816LOOP
  ret

PLOT:          ; plot d = x-axis, e = y-axis
  ld a,7
  and d
  ld b,a
  inc b
  ld a,e
  rra
  scf
  rra
  or a
  rra
  ld l,a
  xor e
  and 248
  xor e
  ld h,a
  ld a,d
  xor l
  and 7
  xor d
  rrca
  rrca
  rrca
  ld l,a
  ld a,1
PLOTBIT:
  rrca
  djnz PLOTBIT
  or (hl)
  ld (hl),a
  ret

ABSVAL:        ; returns hl = |hl| and increments
  bit 7,h      ; a if the sign bit changed
  ret z
  ld b,h
  ld c,l
  ld hl,0
  or a
  sbc hl,bc
  inc a
  ret

XPOS:dw 0
YPOS:dw 0
REAL:dw 0
IMAG:dw 0

Friday, 3 January 2014

Fast Z80 Bit Reversal

For years I've been using the following simple code to reverse the bits in the A register by rotating the bits left out of one register and right into another:

; reverse bits in A
; 8 bytes / 206 cycles

  ld b,8
  ld l,a
REVLOOP:
  rl l
  rra
  djnz REVLOOP

Recently I wondered if it's possible to save a few cycles. It turns out the bits are at most 3 rotations away from their position in the reverse:

7	6	5	4	3	2	1	0
⇐1	⇐3	3⇒	1⇒	⇐1	⇐3	3⇒	1⇒
0	1	2	3	4	5	6	7

With this in mind I devised a bit-twiddling hack to reverse the bits in about a third of the time using only 6 rotates and a bit of logic to recombine the rotated bits. Here's the code, which no doubt has been done many times before:

; reverse bits in A
; 17 bytes / 66 cycles

  ld l,a    ; a = 76543210
  rlca
  rlca      ; a = 54321076
  xor l
  and 0xAA
  xor l     ; a = 56341270
  ld l,a
  rlca
  rlca
  rlca      ; a = 41270563
  rrc l     ; l = 05634127
  xor l
  and 0x66
  xor l     ; a = 01234567

Thursday, 1 August 2013

ZX Spectrum Koch (Lévy C) Curve

A few years ago I submitted a couple of type-in programs (C-Curve and Curtains) to Your Sinclair and they featured in the penultimate issue (August 1993).

Encouraged by a shiny new YS badge I sent off a new batch of programs. Unfortunately it was too late. The September issue would be Your Sinclair's "Big Final Issue".

C-Curve is one of the simplest fractal curves. It starts with a straight line. To find the next iteration, each line is replaced by two lines at 90°:

$C Curve fractal$

Here's a later 69 byte version of the program which plots the fractal in approximately 1.52 seconds! Assemble with Pasmo (pasmo ccurve.asm ccurve.bin), load the binary to address 65467 in your favourite emulator and run using RANDOMIZE USR 65467 :-)

  org 65467
  ld de,49023 ; d = position on x axis
              ; e = position on y axis
  ld bc,3840  ; b = number of iterations
              ; c = initial direction
RECURSE:
  djnz DOWN

  ld a,6      ; check direction
  and c       ; c=0, left
  rrca        ; c=2, up
  rrca        ; c=4, right
  add a,a     ; c=6, down
  dec a
  jr nc,XMOVE

  add a,e     ; adjust y position +/-1

  ld e,a      ; calculate high byte of screen pos
  rrca
  scf
  rra
  rrca
  xor e
  and 88
  xor e
  and 95
  ld h,a
  sub h

XMOVE:
  add a,d     ; adjust x position +/-1

  ld d,a      ; calculate low byte of screen pos
  rlca
  rlca
  rlca
  xor e
  and 199
  xor e
  rlca
  rlca
  ld l,a

  ld a,7      ; calculate bit position of pixel
  and d
  ld b,a
  inc b
  ld a,1
SHIFTBIT:
  rrca
  djnz SHIFTBIT

  xor (hl)    ; plot
  ld (hl),a
  ret

DOWN:
  inc c       ; turn 45° clockwise
  call RECURSE
  inc b
  dec c       ; turn 90° anti-clockwise
  dec c
  call RECURSE
  inc b
  inc c       ; turn 45° clockwise
  ret

Finally here's a short type-in program to poke the code into a real Spectrum!

Sunday, 8 November 2009

Secret Opcodes of the 8 Bit Processors

Undocumented instructions were common on early processors. A few would crash the computer (HCF - halt and catch fire) while others had strange but occasionally useful behaviour. Any self-respecting programmer would make use of these to squeeze out the last few cycles of performance.

The effect of undocumented opcodes would vary between different versions of some processors, no doubt leading to the classic excuse “it worked on my machine”. Here are a few examples I've found useful.

Secrets of the Z80

Zilog's Z80 was used in a number of popular 8 bit computers including the Sinclair Spectrum, Amstrad CPC, TRS-80 and MSX. There are a number of undocumented opcodes with the CB, DD, ED and FD prefix.

CB30-CB37 - SLL reg shifts a register left, setting bit 0.
DD - when used as a prefix to instructions which use H or L, either the high or low 8 bits of IX are used.
FD - as DD, but the high or low 8 bits of IY will be used.
ED70 - IN (C) reads from i/o port C, setting the flags and discarding the result.
ED71 - OUT (C),0 outputs a zero to port C.

Secrets of the 8086/8088

Intel's 8088 was used in the original IBM PC and has spawned an entire family of processors.

D6 - SALC sets the AL register to either 00 or FF depending on the carry flag. SALC was finally documented with the introduction of the Pentium Pro 27 years later.
0F - POP CS pops the CS register from the stack. Only works on 8086 processors.
0F05 - LOADALL loads all registers from memory location 0800. Only works on 80286 processors.

Which processors have you programmed and did you find any undocumented opcodes useful?

Sunday, 18 January 2009

Hello World for the RSSB Virtual Computer

The RSSB virtual computer has a single instruction, reverse subtract and skip if borrow. Each instruction has one operand, a pointer into memory. RSSB subtracts the accumulator from the contents of memory and the result is stored in both. If the accumulator was greater than the number in memory, the next instruction is skipped.

Jumps can be implemented by manipulating the instruction pointer at memory location 0, which normally requires 4 instructions. A conditional jump requires 6 instructions. Other special locations are as follows:

accumulator
always contains 0
character input
character output

The four lines of code below demonstrate the shortest jump:

        rssb   acc       ; set acc to 0
        rssb   $+2       ; set acc to loop offset
        rssb   ip        ; subtract acc from ip
        rssb   $-loop    ; the loop offset

The code below implements hello world for the RSSB virtual computer. The sum deserves an explanation. Each character is subtracted from sum until sum passes zero, indicating all character have been printed. The final value of sum is the offset required by the conditional jump!

loop    rssb   acc       ; acc = character from ptr
ptr     rssb   hello        

        rssb   out       ; display character

        rssb   zero      ; acc = -acc

        rssb   zero      ; always skipped

        rssb   sum       ; subtract acc from sum

        rssb   ip        ; skipped if sum is <0
                         ; otherwise jump to 0

one     rssb   acc       ; subtract 1 from ptr
        rssb   one
        rssb   ptr

        rssb   acc       ; jump to loop
        rssb   loopoff
        rssb   ip
loopoff rssb   $-loop

sum     rssb   -1116

        rssb   33        ; '!'
        rssb   100       ; 'd'
        rssb   108       ; 'l'
        rssb   114       ; 'r'
        rssb   111       ; 'o'
        rssb   87        ; 'W'
        rssb   32        ; ' '
        rssb   44        ; ','
        rssb   111       ; 'o'
        rssb   108       ; 'l'
        rssb   108       ; 'l'
        rssb   101       ; 'e'
hello   rssb   72        ; 'H'

If you can improve the code above, or you've seen any other programs inplemented with RSSB, please leave a message below.

Tuesday, 16 December 2008

What can you Write in 128 Bytes?

In contrast to the ever increasing memory available, a number of programmers are choosing to demonstrate their coding prowess by squeezing as much as possible into an incredibly tiny program. It has become a popular pastime to code an impressive graphical display in either 128 or 256 bytes.

Before zooming off to try this for yourself, why not check out some of the competition? Five of the best are listed below, and also my own contribution.

OKO by Ind. The coloured rings sway gently. Good use of colour.

Interference by New Generation Crew. A fast moving interference pattern between two sets of rings. Supplied with source code.

Ctverecky by RRRola. A chaotic, spiralling pattern. Only 93 bytes long. Supplied with source code.

Corkscrewed by lord Kelvin - a smooth, gentle swirl slowly draws you in. Great use of colour. Supplied with source code.

Color Dream by Digimind - a chaotic looking swarm of spheres. Impressive in motion.

Plasma Wave by John Metcalf - my own contribution. A display of flowing plasma. Supplied with source code.

If there's a program you think I should have included, or you want to show off your own demo, please leave a comment below.

Sunday, 21 September 2008

Optimising Assembly Like an 80's Hacker

Forget about fancy algorithms and data structures. If you want respect as an 80's hacker, follow these simple tips.

Never get caught setting a register to zero without using xor:

Z80 Code

ld a,0           ; bad, 2 bytes / 7 cycles

xor a            ; good, 1 byte / 4 cycles

8088 Code

mov ax,0         ; bad, 3 bytes / 4 cycles

xor ax,ax        ; good, 2 bytes / 3 cycles

Never set two 8 bit register independently. Code readability is not required:

Z80 Code

ld b,10          ; bad, 4 bytes / 14 cycles
ld c,32

ld bc,10*256+32  ; good, 3 bytes / 11 cycles

8088 Code

mov ch,10        ; bad, 4 bytes / 8 cycles
mov cl,32

mov cx,10*256+32 ; good, 3 bytes / 4 cycles

Never compare to zero:

Z80 Code

cp 0             ; bad, 2 bytes / 7 cycles

or a             ; good, 1 byte / 4 cycles

8088 Code

cmp ax,0         ; bad, 3 bytes / 4 cycles

test ax,ax       ; good, 2 bytes / 3 cycles

Remember, you don't need to worry about code alignment, order of instructions or processor penalties. Follow these simple tips and your super-optimised bubble sort will demand the utmost respect!

Sunday, 29 June 2008

Infinite Loop

More often than not, infinite loops are created by programming errors and are quickly dealt with. In sympathy for this highly persecuted group of programs, a safe haven has been created on retro code. The two infinite loops below are highly optimized examples. In fact, the actual loops are only 1 byte long!

Z80 Infinite Loop

  ld hl, HERE
HERE:jp (hl)

8080 Infinite Loop

  lxi h, HERE
HERE:pchl

Corewar Infinite Loop

  jmp #0, <-5

In Corewar, the infinite loop finds its niche destroying small mobile programs called imps!

If you know any infinite loops in need of shelter, please post them in the comments below.