**The Mandelbrot set** is a fractal which iterates the equation *z _{n+1} = z_{n}² + c* in the complex plane and plots which points tend to infinity. Plotting the set with Sinclair BASIC takes over 24 hours so I was curious how much faster it would be in assembly.

It turns out if we use fast 16-bit fixed-point arithmetic we can plot the Mandelbrot in about 5 minutes. To minimise multiplications each iteration is calculated as:

r_{n+1}= ( r_{n}+ i_{n}) × ( r_{n}- i_{n}) + x

i_{n+1}= 2 × i_{n}× r_{n}+ y

The following test is used to detect points which tend to infinity:

|i| + |_{n}r| ≥ 2 × √ 2._{n}

org 60000 ld de,255*256+191 XLOOP: push de ld hl,-180 ; x-coordinate ld e,d call SCALE ld (XPOS),bc pop de YLOOP: push de ld hl,-96 ; y-coordinate call SCALE ld (YPOS),bc ld hl,0 ld (IMAG),hl ld (REAL),hl ld b,15 ; iterations ITER: push bc ld bc,(IMAG) ld hl,(REAL) or a sbc hl,bc ld d,h ld e,l add hl,bc add hl,bc call FIXMUL ld de,(XPOS) add hl,de ld de,(REAL) ld (REAL),hl ld hl,(IMAG) call FIXMUL rla adc hl,hl ld de,(YPOS) add hl,de ld (IMAG),hl call ABSVAL ex de,hl ld hl,(REAL) call ABSVAL add hl,de ld a,h cp 46 ; 46 ≅ 2 × √ 2 << 4 pop bc jr nc,ESCAPE djnz ITER pop de call PLOT db 254 ; trick to skip next instruction ESCAPE: pop de dec e jr nz,YLOOP dec d jr nz,XLOOP ret FIXMUL: ; hl = hl × de >> 24 call MULT16BY16 ld a,b ld b,4 FMSHIFT: rla adc hl,hl djnz FMSHIFT ret SCALE: ; bc = (hl + e) × zoom ld d,0 add hl,de ld de,48 ; zoom MULT16BY16: ; hl:bc (signed 32 bit) = hl × de xor a call ABSVAL ex de,hl call ABSVAL push af ld c,h ld a,l call MULT8BY16 ld b,a ld a,c ld c,h push bc ld c,l call MULT8BY16 pop de add hl,de adc a,b ld b,l ld l,h ld h,a pop af rra ret nc ex de,hl xor a ld h,a ld l,a sbc hl,bc ld b,h ld c,l ld h,a ld l,a sbc hl,de ret MULT8BY16: ; returns a:hl (24 bit) = a × de ld hl,0 ld b,8 M816LOOP: add hl,hl rla jr nc,M816SKIP add hl,de adc a,0 M816SKIP: djnz M816LOOP ret PLOT: ; plot d = x-axis, e = y-axis ld a,7 and d ld b,a inc b ld a,e rra scf rra or a rra ld l,a xor e and 248 xor e ld h,a ld a,d xor l and 7 xor d rrca rrca rrca ld l,a ld a,1 PLOTBIT: rrca djnz PLOTBIT or (hl) ld (hl),a ret ABSVAL: ; returns hl = |hl| and increments bit 7,h ; a if the sign bit changed ret z ld b,h ld c,l ld hl,0 or a sbc hl,bc inc a ret XPOS:dw 0 YPOS:dw 0 REAL:dw 0 IMAG:dw 0

It looks as though you've adapted Gauss's complex multiplication algorithm (see https://en.wikipedia.org/wiki/Multiplication_algorithm#Gauss.27s_complex_multiplication_algorithm) to your need for squaring followed by adding, then streamlined it. But that wikipedia article states "This algorithm uses only three multiplications, rather than four, and five additions or subtractions rather than two. If a multiply is more expensive than three adds or subtracts, as when calculating by hand, then there is a gain in speed. On modern computers a multiply and an add can take about the same time so there may be no speed gain." If there is a gain, you could gain even more by replacing that multiplication by 2 with a doubling by addition (I haven't looked at your code, maybe you already did).

ReplyDeleteIt also looks as though your test for unbounded divergence is based on the result that "... if the absolute value of P_c^n(0) ever becomes larger than 2, the sequence will escape to infinity" (see https://en.wikipedia.org/wiki/Mandelbrot_set#Basic_properties), with you simply testing for your simpler norm at a value that always lies outside the cartesian norm of 2 (since anything passing your test has met the requirements of the result, and anything that diverges unboundedly must necessarily get past your norm's boundary at some point). You could possibly simplify your test further by substituting a slightly larger but rational (better still, dyadic rational, i.e. with finitely many non-zero binary places) test criterion not involving the square root of 2 (again, maybe you already did).

You can speed it up considerably by not using multiplications at all, since you're just suqaring anyway. Squares can be determined from a lookup table, which, in case of the kind of 16-bit arithmetics that you use takes up 32 kilobytes and can be generated in a blink of an eye.

ReplyDeleteThus:

rₙ₊₁ = x + (rₙ)² - (iₙ)²

iₙ₊₁ = y + (rₙ + iₙ)² - [(rₙ)² + (iₙ)²]

That makes a total of 3 lookups. Moreover, if you calculate the Euclidean norm [(rₙ)² + (iₙ)²] first, it can be used to abort the iteration if it grows beyond 4.