Saturday, 23 June 2012

Itsy Forth: The Compiler

Itsy Forth is a 1kB subset of the Forth programming language. Itsy was developed top-down, implementing only the functions required to get the compiler up and running. So far we've looked at the following:

Next we'll define the words to complete the compiler.

Colon Definitions

  • : - ( -- ) define a new Forth word, taking the name from the input buffer
  • ; - ( -- ) complete the Forth word being compiled

: sets state to true to enter compile mode then creates a header for the new word. ; adds exit to the end of the word then sets state to false to end compile mode.

For example, : here dp @ ; creates a new Forth word which returns the contents of the variable dp.

: :
-1 state !
create
(;code)
docolon dec bp
        dec bp
        mov word[bp],si
        lea si,[di+2]
        jmp next

: ;
['] exit ,
0 state !
; immediate

Creating Headers

  • create - ( -- ) build a header for a new word in the dictionary, taking the name from the input buffer
  • (;code) - ( -- ) replace the xt of the word being defined with a pointer to the code immediately following (;code)

create adds a new header to the dictionary which includes a link to the previous entry, a name and execution token (xt). The xt initially points to dovar but can be modified using (;code).

For example, : variable create 0 , ; creates a new Forth word to define variables (dovar is the default xt for created words).

: create
dp @ last @ , last !
32 word count
+ dp ! 0 ,
(;code)
dovar   push bx
        lea bx,[di+2]
        jmp next

        primitive '(;code)',do_semi_code
        mov di,word[val_last]
        mov al,byte[di+2]
        and ax,31
        add di,ax
        mov word[di+3],si
        mov si,word[bp]
        inc bp
        inc bp
        jmp next

Constants

  • constant - ( x -- ) create a new constant with the value x, taking the name from the input buffer

For example, 0 constant false adds a new constant to the dictionary. When executed, false will push 0 on the stack.

: constant
create ,
(;code)
doconst push bx
        mov bx,word[di+2]
        jmp next

Testing the Compiler


It's time to give Itsy a quick test run. First we implement a few standard words: hex to switch to base 16, cr to move the cursor to the next line and variable to define new variables.

Next a simple test. We add a variable itest initialised to 041h (ASCII 'A') and a procedure to display and increment itest. Then the moment of truth... A B C it works!

Itsy Forth: The Next Step

What's next for Itsy Forth? First I'd like to implement the ANS core wordset from the Itsy prompt, then perhaps experiment with compiling to native code. In the meantime, here's the code for the current version of Itsy:

macros.asm

        %define link 0
        %define immediate 080h

        %macro head 4
        %%link dw link
        %define link %%link
        %strlen %%count %1
        db %3 + %%count,%1
        xt_ %+ %2 dw %4
        %endmacro

        %macro primitive 2-3 0
        head %1,%2,%3,$+2
        %endmacro

        %macro colon 2-3 0
        head %1,%2,%3,docolon
        %endmacro

        %macro constant 3
        head %1,%2,0,doconst
        val_ %+ %2 dw %3
        %endmacro

        %macro variable 3
        head %1,%2,0,dovar
        val_ %+ %2 dw %3
        %endmacro

itsy.asm

%include "macros.asm"

        org 0100h
        jmp xt_abort+2

; -------------------
; Variables
; -------------------

        variable 'state',state,0

        variable '>in',to_in,0

        variable '#tib',number_t_i_b,0

        variable 'dp',dp,freemem

        variable 'base',base,10

        variable 'last',last,final

        constant 'tib',t_i_b,32768

; -------------------
; Initialisation
; -------------------

        primitive 'abort',abort
        mov ax,word[val_number_t_i_b]
        mov word[val_to_in],ax
        xor bp,bp
        mov word[val_state],bp
        mov sp,-256
        mov si,xt_interpret+2
        jmp next

; -------------------
; Compilation
; -------------------

        primitive ',',comma
        mov di,word[val_dp]
        xchg ax,bx
        stosw
        mov word[val_dp],di
        pop bx
        jmp next

        primitive 'lit',lit
        push bx
        lodsw
        xchg ax,bx
        jmp next

; -------------------
; Stack
; -------------------

        primitive 'rot',rote
        pop dx
        pop ax
        push dx
        push bx
        xchg ax,bx
        jmp next

        primitive 'drop',drop
        pop bx
        jmp next

        primitive 'dup',dupe
        push bx
        jmp next

        primitive 'swap',swap
        pop ax
        push bx
        xchg ax,bx
        jmp next

; -------------------
; Maths / Logic
; -------------------

        primitive '+',plus
        pop ax
        add bx,ax
        jmp next

        primitive '=',equals
        pop ax
        sub bx,ax
        sub bx,1
        sbb bx,bx
        jmp next

; -------------------
; Peek and Poke
; -------------------

        primitive '@',fetch
        mov bx,word[bx]
        jmp next

        primitive '!',store 
        pop word[bx]
        pop bx
        jmp next

; -------------------
; Inner Interpreter
; -------------------

next    lodsw
        xchg di,ax
        jmp word[di]

; -------------------
; Flow Control
; -------------------

        primitive '0branch',zero_branch
        lodsw
        test bx,bx
        jne zerob_z
        xchg ax,si
zerob_z pop bx
        jmp next

        primitive 'branch',branch
        mov si,word[si]
        jmp next

        primitive 'execute',execute
        mov di,bx
        pop bx
        jmp word[di]

        primitive 'exit',exit
        mov si,word[bp]
        inc bp
        inc bp
        jmp next

; -------------------
; String
; -------------------

        primitive 'count',count
        inc bx
        push bx
        mov bl,byte[bx-1]
        mov bh,0
        jmp next

        primitive '>number',to_number
        pop di
        pop cx
        pop ax
to_numl test bx,bx
        je to_numz
        push ax
        mov al,byte[di]
        cmp al,'a'
        jc to_nums
        sub al,32
to_nums cmp al,'9'+1
        jc to_numg
        cmp al,'A'
        jc to_numh
        sub al,7
to_numg sub al,48
        mov ah,0
        cmp al,byte[val_base]
        jnc to_numh
        xchg ax,dx
        pop ax
        push dx
        xchg ax,cx
        mul word[val_base]
        xchg ax,cx
        mul word[val_base]
        add cx,dx
        pop dx
        add ax,dx
        dec bx
        inc di
        jmp to_numl
to_numz push ax
to_numh push cx
        push di
        jmp next

; -----------------------
; Terminal Input / Output
; -----------------------

        primitive 'accept',accept
        pop di
        xor cx,cx
acceptl call getchar
        cmp al,8
        jne acceptn
        jcxz acceptb
        call outchar
        mov al,' '
        call outchar
        mov al,8
        call outchar
        dec cx
        dec di
        jmp acceptl
acceptn cmp al,13
        je acceptz
        cmp cx,bx
        jne accepts
acceptb mov al,7
        call outchar
        jmp acceptl
accepts stosb
        inc cx
        call outchar
        jmp acceptl
acceptz jcxz acceptb
        mov al,13
        call outchar
        mov al,10
        call outchar
        mov bx,cx
        jmp next

        primitive 'word',word
        mov di,word[val_dp]
        push di
        mov dx,bx
        mov bx,word[val_t_i_b]
        mov cx,bx
        add bx,word[val_to_in]
        add cx,word[val_number_t_i_b]
wordf   cmp cx,bx
        je wordz
        mov al,byte[bx]
        inc bx
        cmp al,dl
        je wordf
wordc   inc di
        mov byte[di],al
        cmp cx,bx
        je wordz
        mov al,byte[bx]
        inc bx
        cmp al,dl
        jne wordc
wordz   mov byte[di+1],32
        mov ax,word[val_dp]
        xchg ax,di
        sub ax,di
        mov byte[di],al
        sub bx,word[val_t_i_b]
        mov word[val_to_in],bx
        pop bx
        jmp next

        primitive 'emit',emit
        xchg ax,bx
        call outchar
        pop bx
        jmp next

getchar mov ah,7
        int 021h
        mov ah,0
        ret

outchar xchg ax,dx
        mov ah,2
        int 021h
        ret

; -----------------------
; Dictionary Search
; -----------------------

        primitive 'find',find
        mov di,val_last
findl   push di
        push bx
        mov cl,byte[bx]
        mov ch,0
        inc cx
findc   mov al,byte[di+2]
        and al,07Fh
        cmp al,byte[bx]
        je findm
        pop bx
        pop di
        mov di,word[di]
        test di,di
        jne findl
findnf  push bx
        xor bx,bx
        jmp next
findm   inc di
        inc bx
        loop findc
        pop bx
        pop di
        mov bx,1
        inc di
        inc di
        mov al,byte[di]
        test al,080h
        jne findi
        neg bx
findi   and ax,31
        add di,ax
        inc di
        push di
        jmp next

; -----------------------
; Colon Definition
; -----------------------

        colon ':',colon
        dw xt_lit,-1,xt_state,xt_store,xt_create
        dw xt_do_semi_code
docolon dec bp
        dec bp
        mov word[bp],si
        lea si,[di+2]
        jmp next

        colon ';',semicolon,immediate
        dw xt_lit,xt_exit,xt_comma,xt_lit,0,xt_state
        dw xt_store,xt_exit

; -----------------------
; Headers
; -----------------------

        colon 'create',create
        dw xt_dp,xt_fetch,xt_last,xt_fetch,xt_comma
        dw xt_last,xt_store,xt_lit,32,xt_word,xt_count
        dw xt_plus,xt_dp,xt_store,xt_lit,0,xt_comma
        dw xt_do_semi_code
dovar   push bx
        lea bx,[di+2]
        jmp next

        primitive '(;code)',do_semi_code
        mov di,word[val_last]
        mov al,byte[di+2]
        and ax,31
        add di,ax
        mov word[di+3],si
        mov si,word[bp]
        inc bp
        inc bp
        jmp next

; -----------------------
; Constants
; -----------------------

        colon 'constant',constant
        dw xt_create,xt_comma,xt_do_semi_code
doconst push bx
        mov bx,word[di+2]
        jmp next

; -----------------------
; Outer Interpreter
; -----------------------

final:
        colon 'interpret',interpret
interpt dw xt_number_t_i_b,xt_fetch,xt_to_in,xt_fetch
        dw xt_equals,xt_zero_branch,intpar,xt_t_i_b
        dw xt_lit,50,xt_accept,xt_number_t_i_b,xt_store
        dw xt_lit,0,xt_to_in,xt_store
intpar  dw xt_lit,32,xt_word,xt_find,xt_dupe
        dw xt_zero_branch,intnf,xt_state,xt_fetch
        dw xt_equals,xt_zero_branch,intexc,xt_comma
        dw xt_branch,intdone
intexc  dw xt_execute,xt_branch,intdone
intnf   dw xt_dupe,xt_rote,xt_count,xt_to_number
        dw xt_zero_branch,intskip,xt_state,xt_fetch
        dw xt_zero_branch,intnc,xt_last,xt_fetch,xt_dupe
        dw xt_fetch,xt_last,xt_store,xt_dp,xt_store
intnc   dw xt_abort
intskip dw xt_drop, xt_drop, xt_state, xt_fetch
        dw xt_zero_branch,intdone,xt_lit,xt_lit,xt_comma
        dw xt_comma
intdone dw xt_branch,interpt

freemem:

4 comments:

  1. Thanks - that's really cool!

    For anyone who (like me) has never used nasm before, the command you need is (I think):

    nasm itsy.asm -fbin -o itsy.com

    The resultant 16-bit COM file won't run under 64-bit Windows, but seems to run fine under DOSBox.

    Incidentally, your blog inspired me to order a secondhand copy of 'Threaded Interpretive Languages' by Loeliger, which finally arrived (from the US) at the weekend. Because of the title I'd assumed it would be quite abstract and academic, but it's actually a really hands-on account of hacking about with a handmade version of Forth on an early micro computer. I'm really looking forward to reading it.

    ReplyDelete
    Replies
    1. Hi John, how are you getting on with TIL? Are you planning to implement your own Forth? I tried a couple of times and found it easier to implement the outer interpreter first and work down from there.

      I recommend taking a look at the Moving Forth series by Brad Rodriguez. It's available free online and covers a fair bit not mentioned by TIL.

      Delete
  2. 26/08/12 - fixed a bug in >NUMBER reported by Mike Adams.

    If >NUMBER could only convert part of the string, it returned the incorrect length for the remainder.

    ReplyDelete
    Replies
    1. That's important, because the count from >NUMBER is part of using it in a more full fledged number converter ... leading - or + sign as well as filling in the location of the double number separated when encountering a double number like 0.007 or 89:64.

      Delete