## Saturday, 22 July 2017

### 16-Bit Xorshift Pseudorandom Numbers in Z80 Assembly

Xorshift is a simple, fast pseudorandom number generator developed by George Marsaglia. The generator combines three xorshift operations where a number is exclusive-ored with a shifted copy of itself:

``````/* 16-bit xorshift PRNG */

unsigned xs = 1;

unsigned xorshift( )
{
xs ^= xs << 7;
xs ^= xs >> 9;
xs ^= xs << 8;
return xs;
}
``````

There are 60 shift triplets with the maximum period 216-1. Four triplets pass a series of lightweight randomness tests including randomly plotting various n × n matrices using the high bits, low bits, reversed bits, etc. These are: 6, 7, 13; 7, 9, 8; 7, 9, 13; 9, 7, 13.

7, 9, 8 is the most efficient when implemented in Z80, generating a number in 86 cycles. For comparison the example in C takes approx ~1200 cycles when compiled with HiSoft C v1.3.

``````; 16-bit xorshift pseudorandom number generator
; 20 bytes, 86 cycles (excluding ret)

; returns   hl = pseudorandom number
; corrupts   a

xrnd:
ld hl,1       ; seed must not be 0

ld a,h
rra
ld a,l
rra
xor h
ld h,a
ld a,l
rra
ld a,h
rra
xor l
ld l,a
xor h
ld h,a

ld (xrnd+1),hl

ret
``````

## Wednesday, 19 July 2017

### A Fast Z80 Integer Square Root

A project I'm working on needs a fast square root but I couldn't find anything suitable online. After implementing several versions of the bit-by-bit algorithm I discovered the following code is particularly efficient when unrolled:

``````/* Return the square root of numb */

int isqrt( numb )
{
int root = 0, bit = 04000h;
while ( bit != 0 )
{
if ( numb >= root + bit )
{
numb = numb - root - bit;
root = root / 2 + bit;
}
else
root = root / 2;
bit = bit / 4;
}
return root;
}
``````

### First Make It Work

The looping version is small but clunky. It spends almost 400 t-states shifting bits. We'll be able to eliminate most of the shifting by hard-coding the bit positions when the loop is unrolled:

``````; 16-bit integer square root
; 34 bytes, 1005-1101 cycles (average 1053)

; call with de = number to square root
; returns   hl = square root
; corrupts  bc, de

ld bc,08000h
ld h,c
ld l,c
sqrloop:
srl b
rr c
ex de,hl
sbc hl,de
jr c,sqrbit
ex de,hl
jr sqrfi
sqrbit:
ex de,hl
or a
sbc hl,bc
sqrfi:
srl h
rr l
srl b
rr c
jr nc,sqrloop
``````

### Then Make It Work Faster

First the loop is unrolled. The first 4 iterations are optimized to work on 8-bit values and bit positions are hard-coded. The first and last iteration are optimized as a special case, we work with the bitwise complement of the root instead of the root and small jumps are replace with overlapping code. The final code finds the root in an average of 362 t-states:

``````; fast 16-bit integer square root
; 92 bytes, 344-379 cycles (average 362)
; v2 - 3 t-state optimization spotted by Russ McNulty

; call with hl = number to square root
; returns    a = square root
; corrupts  hl, de

ld a,h
ld de,0B0C0h
jr c,sq7
ld a,h
ld d,0F0h
sq7:

; ----------

jr nc,sq6
res 5,d
db 254
sq6:
sub d
sra d

; ----------

set 2,d
jr nc,sq5
res 3,d
db 254
sq5:
sub d
sra d

; ----------

inc d
jr nc,sq4
res 1,d
db 254
sq4:
sub d
sra d
ld h,a

; ----------

jr nc,sq3
ld e,040h
db 210
sq3:
sbc hl,de
sra d
ld a,e
rra

; ----------

or 010h
ld e,a
jr nc,sq2
and 0DFh
db 218
sq2:
sbc hl,de
sra d
rra

; ----------

or 04h
ld e,a
jr nc,sq1
and 0F7h
db 218
sq1:
sbc hl,de
sra d
rra

; ----------

inc a
ld e,a