Retro Programming: 16-Bit Xorshift Pseudorandom Numbers in Z80 Assembly

Saturday, 22 July 2017

16-Bit Xorshift Pseudorandom Numbers in Z80 Assembly

Xorshift is a simple, fast pseudorandom number generator developed by George Marsaglia. The generator combines three xorshift operations where a number is exclusive-ored with a shifted copy of itself:

/* 16-bit xorshift PRNG */

unsigned xs = 1;

unsigned xorshift( )
{
    xs ^= xs << 7;
    xs ^= xs >> 9;
    xs ^= xs << 8;
    return xs;
}

There are 60 shift triplets with the maximum period 2¹⁶-1. Four triplets pass a series of lightweight randomness tests including randomly plotting various n × n matrices using the high bits, low bits, reversed bits, etc. These are: 6, 7, 13; 7, 9, 8; 7, 9, 13; 9, 7, 13.

7, 9, 8 is the most efficient when implemented in Z80, generating a number in 86 cycles. For comparison the example in C takes approx ~1200 cycles when compiled with HiSoft C v1.3.

; 16-bit xorshift pseudorandom number generator
; 20 bytes, 86 cycles (excluding ret)

; returns   hl = pseudorandom number
; corrupts   a

xrnd:
  ld hl,1       ; seed must not be 0

  ld a,h
  rra
  ld a,l
  rra
  xor h
  ld h,a
  ld a,l
  rra
  ld a,h
  rra
  xor l
  ld l,a
  xor h
  ld h,a

  ld (xrnd+1),hl

  ret

5 comments:

Zeda7 September 2017 at 06:40
Oh my gosh, my new favorite site! I tried implementing this a while back with no luck. I did however find that combining a simple 16-bit LFSR and a 16-bit LCG works well. It's not as fast (148cc), but it does pass CACert labs' testing. Not sure how to post code boxes, but:

prng16:
;collab with Runer112
;;Output:
;; HL is a pseudo-random int
;; A and BC are also, but much weaker and smaller cycles
;; Preserves DE
;;148cc, super fast
;;26 bytes
;;period length: 4,294,901,760
seed1=$+1
ld hl,9999
ld b,h
ld c,l
add hl,hl
add hl,hl
inc l
add hl,bc
ld (seed1),hl
seed2=$+1
ld hl,987
add hl,hl
sbc a,a
and 101101
xor l
ld l,a
ld (seed2),hl
add hl,bc
ret
ReplyDelete
Replies
veikko13 May 2019 at 06:11
Great stuff! I ported this to C64, 30 cycles without the RTS. I didn't need what is equivalent to the second lda a,l / rra because 6502 EOR does not touch carry:

rng_zp_low = $02
rng_zp_high = $03
; seeding
LDA #1 ; seed, can be anything except 0
STA rng_zp_low
LDA #0
STA rng_zp_high
...
random
LDA rng_zp_high
LSR
LDA rng_zp_low
ROR
EOR rng_zp_high
STA rng_zp_high ; high part of x ^= x << 7 done
ROR ; A has now x >> 9 and high bit comes from low byte
EOR rng_zp_low
STA rng_zp_low ; x ^= x >> 9 and the low part of x ^= x << 7 done
EOR rng_zp_high
STA rng_zp_high ; x ^= x << 8 done
RTS
ReplyDelete
Replies
Anonymous8 August 2019 at 19:40
Thanks SO much for this. This is AWESOME!
ReplyDelete
Replies
Rui Martins1 April 2020 at 08:39
The function is not correct, because the second XOR is not correctly implemented.

Shifting the contents of HL >> 9, can never result in using any of the bits of the 8 bit register L, since they would have been completely shifted out after 8 right shifts.

So the second XOR snippet is incorrect.

Instead of this:

ld a,l
rra
ld a,h
rra
xor l

It should be:

ld a,h
rra
xor l

NOTE: Since carry is already reset, by the previous XOR, no extra instruction required to set Carry bit to zero before the RRA instruction.

And hence the complete (and shorter) function would become:

xrnd:
ld hl,1 ; seed must not be 0

ld a,h
rra
ld a,l
rra
xor h
ld h,a
ld a,h
rra
xor l
ld l,a
xor h
ld h,a

ld (xrnd+1),hl

ret
ReplyDelete
Replies

Add comment

Note: only a member of this blog may post a comment.