Saturday, 22 July 2017

16-Bit Xorshift Pseudorandom Numbers in Z80 Assembly

Xorshift is a simple, fast pseudorandom number generator developed by George Marsaglia. The generator combines three xorshift operations where a number is exclusive-ored with a shifted copy of itself:

/* 16-bit xorshift PRNG */

unsigned xs = 1;

unsigned xorshift( )
{
    xs ^= xs << 7;
    xs ^= xs >> 9;
    xs ^= xs << 8;
    return xs;
}

There are 60 shift triplets with the maximum period 216-1. Four triplets pass a series of lightweight randomness tests including randomly plotting various n × n matrices using the high bits, low bits, reversed bits, etc. These are: 6, 7, 13; 7, 9, 8; 7, 9, 13; 9, 7, 13.

7, 9, 8 is the most efficient when implemented in Z80, generating a number in 86 cycles. For comparison the example in C takes approx ~1200 cycles when compiled with HiSoft C v1.3.

; 16-bit xorshift pseudorandom number generator
; 20 bytes, 86 cycles (excluding ret)

; returns   hl = pseudorandom number
; corrupts   a

xrnd:
  ld hl,1       ; seed must not be 0

  ld a,h
  rra
  ld a,l
  rra
  xor h
  ld h,a
  ld a,l
  rra
  ld a,h
  rra
  xor l
  ld l,a
  xor h
  ld h,a

  ld (xrnd+1),hl

  ret
z80 xorshift

8 comments:

  1. Oh my gosh, my new favorite site! I tried implementing this a while back with no luck. I did however find that combining a simple 16-bit LFSR and a 16-bit LCG works well. It's not as fast (148cc), but it does pass CACert labs' testing. Not sure how to post code boxes, but:

    prng16:
    ;collab with Runer112
    ;;Output:
    ;; HL is a pseudo-random int
    ;; A and BC are also, but much weaker and smaller cycles
    ;; Preserves DE
    ;;148cc, super fast
    ;;26 bytes
    ;;period length: 4,294,901,760
    seed1=$+1
    ld hl,9999
    ld b,h
    ld c,l
    add hl,hl
    add hl,hl
    inc l
    add hl,bc
    ld (seed1),hl
    seed2=$+1
    ld hl,987
    add hl,hl
    sbc a,a
    and 101101
    xor l
    ld l,a
    ld (seed2),hl
    add hl,bc
    ret

    ReplyDelete
  2. Great stuff! I ported this to C64, 30 cycles without the RTS. I didn't need what is equivalent to the second lda a,l / rra because 6502 EOR does not touch carry:

    rng_zp_low = $02
    rng_zp_high = $03
    ; seeding
    LDA #1 ; seed, can be anything except 0
    STA rng_zp_low
    LDA #0
    STA rng_zp_high
    ...
    random
    LDA rng_zp_high
    LSR
    LDA rng_zp_low
    ROR
    EOR rng_zp_high
    STA rng_zp_high ; high part of x ^= x << 7 done
    ROR ; A has now x >> 9 and high bit comes from low byte
    EOR rng_zp_low
    STA rng_zp_low ; x ^= x >> 9 and the low part of x ^= x << 7 done
    EOR rng_zp_high
    STA rng_zp_high ; x ^= x << 8 done
    RTS

    ReplyDelete
  3. Thanks SO much for this. This is AWESOME!

    ReplyDelete
  4. The function is not correct, because the second XOR is not correctly implemented.

    Shifting the contents of HL >> 9, can never result in using any of the bits of the 8 bit register L, since they would have been completely shifted out after 8 right shifts.

    So the second XOR snippet is incorrect.

    Instead of this:

    ld a,l
    rra
    ld a,h
    rra
    xor l

    It should be:

    ld a,h
    rra
    xor l

    NOTE: Since carry is already reset, by the previous XOR, no extra instruction required to set Carry bit to zero before the RRA instruction.

    And hence the complete (and shorter) function would become:

    xrnd:
    ld hl,1 ; seed must not be 0

    ld a,h
    rra
    ld a,l
    rra
    xor h
    ld h,a
    ld a,h
    rra
    xor l
    ld l,a
    xor h
    ld h,a

    ld (xrnd+1),hl

    ret

    ReplyDelete
    Replies
    1. Probably just a typo but there's redundant sequence of LD H,A + LD A,H :) They are already equal.

      Delete
    2. I put this code in some programs I made to test RNGs on my ti-84+ calculator (Z80). This versions fails pretty hard. John's version works very well (it even competes with a combined LCG+LFSR that has a much longer cycle length even though xorshift is in the same class as LFSRs). John's code is extraordinarily optimized which makes it a bit deceptive on how each instruction is contributing and the purpose of each bit.

      Delete
  5. Hi, I'm a bit late to the party but I don't think the code does what it's supposed to do. I think the following sequence is meant to represent HL = HL ^ (HL << 7):

    ld a,h
    rra
    ld a,l
    rra ; a = HLLLLLLL (but what about L0000000?)
    xor h
    ld h,a

    ...but what about the lower byte which can still contain the lowest bit of L?

    ReplyDelete
    Replies
    1. I see now after I've read this: https://codebase64.org/doku.php?id=base:16bit_xorshift_random_generator - the code does x ^= x >> 9 and the low part of x ^= x << 7 in one step. False alarm :)

      Delete

Note: only a member of this blog may post a comment.