Saturday, 22 July 2017

16-Bit Xorshift Pseudorandom Numbers in Z80 Assembly

Xorshift is a simple, fast pseudorandom number generator developed by George Marsaglia. The generator combines three xorshift operations where a number is exclusive-ored with a shifted copy of itself:

/* 16-bit xorshift PRNG */

unsigned xs = 1;

unsigned xorshift( )
{
    xs ^= xs << 7;
    xs ^= xs >> 9;
    xs ^= xs << 8;
    return xs;
}

There are 60 shift triplets with the maximum period 216-1. Four triplets pass a series of lightweight randomness tests including randomly plotting various n × n matrices using the high bits, low bits, reversed bits, etc. These are: 6, 7, 13; 7, 9, 8; 7, 9, 13; 9, 7, 13.

7, 9, 8 is the most efficient when implemented in Z80, generating a number in 86 cycles. For comparison the example in C takes approx ~1200 cycles when compiled with HiSoft C v1.3.

; 16-bit xorshift pseudorandom number generator
; 20 bytes, 86 cycles (excluding ret)

; returns   hl = pseudorandom number
; corrupts   a

xrnd:
  ld hl,1       ; seed must not be 0

  ld a,h
  rra
  ld a,l
  rra
  xor h
  ld h,a
  ld a,l
  rra
  ld a,h
  rra
  xor l
  ld l,a
  xor h
  ld h,a

  ld (xrnd+1),hl

  ret
z80 xorshift

3 comments:

  1. Oh my gosh, my new favorite site! I tried implementing this a while back with no luck. I did however find that combining a simple 16-bit LFSR and a 16-bit LCG works well. It's not as fast (148cc), but it does pass CACert labs' testing. Not sure how to post code boxes, but:

    prng16:
    ;collab with Runer112
    ;;Output:
    ;; HL is a pseudo-random int
    ;; A and BC are also, but much weaker and smaller cycles
    ;; Preserves DE
    ;;148cc, super fast
    ;;26 bytes
    ;;period length: 4,294,901,760
    seed1=$+1
    ld hl,9999
    ld b,h
    ld c,l
    add hl,hl
    add hl,hl
    inc l
    add hl,bc
    ld (seed1),hl
    seed2=$+1
    ld hl,987
    add hl,hl
    sbc a,a
    and 101101
    xor l
    ld l,a
    ld (seed2),hl
    add hl,bc
    ret

    ReplyDelete
  2. Great stuff! I ported this to C64, 30 cycles without the RTS. I didn't need what is equivalent to the second lda a,l / rra because 6502 EOR does not touch carry:

    rng_zp_low = $02
    rng_zp_high = $03
    ; seeding
    LDA #1 ; seed, can be anything except 0
    STA rng_zp_low
    LDA #0
    STA rng_zp_high
    ...
    random
    LDA rng_zp_high
    LSR
    LDA rng_zp_low
    ROR
    EOR rng_zp_high
    STA rng_zp_high ; high part of x ^= x << 7 done
    ROR ; A has now x >> 9 and high bit comes from low byte
    EOR rng_zp_low
    STA rng_zp_low ; x ^= x >> 9 and the low part of x ^= x << 7 done
    EOR rng_zp_high
    STA rng_zp_high ; x ^= x << 8 done
    RTS

    ReplyDelete
  3. Thanks SO much for this. This is AWESOME!

    ReplyDelete