Sunday 15 April 2012

Itsy Forth: Implementing the Primitives

Itsy Forth is a tiny subset of the Forth programming language. So far we've looked at the Forth outer interpreter, inner interpreter and dictionary. This time we'll define the words required to complete the interpreter.

Peek and Poke

The Forth words to read and write memory are @ and !:
  • @ - ( addr -- x ) read x from addr
  • ! - ( x addr -- ) store x at addr
  • c@ - ( addr -- char ) read char from addr
( before -- after ) shows the contents of the stack before and after the word executes. Here's how @, c@ and ! are implemented. Remember we're keeping the top element of the data stack in the bx register.

        primitive '@',fetch
        mov bx,word[bx]
        jmp next

        primitive '!',store 
        pop word[bx]
        pop bx
        jmp next

        primitive 'c@',c_fetch
        mov bl,byte[bx]
        mov bh,0
        jmp next

Manipulating the Stack

  • drop - ( x -- ) remove x from the stack
  • dup - ( x -- x x ) add a copy of x to the stack
  • swap - ( x y -- y x ) exchange x and y
  • rot - ( x y z -- y z x ) rotate x, y and z

        primitive 'drop',drop
        pop bx
        jmp next

        primitive 'dup',dupe
        push bx
        jmp next

        primitive 'swap',swap
        pop ax
        push bx
        xchg ax,bx
        jmp next

        primitive 'rot',rote
        pop dx
        pop ax
        push dx
        push bx
        xchg ax,bx
        jmp next

Flow Control

if, else, then, begin and again all compile to branch or 0branch.
  • 0branch - ( x -- ) jump if x is zero
  • branch - ( -- ) unconditional jump
  • execute - ( xt -- ) call the word at xt
  • exit - ( -- ) return from the current word
The destination address for the jump is compiled in the cell straight after the branch or 0branch instruction. execute stores the return address on the return stack and exit removes it.

        primitive '0branch',zero_branch
        lodsw
        test bx,bx
        jne zerob_z
        xchg ax,si
zerob_z pop bx
        jmp next

        primitive 'branch',branch
        mov si,word[si]
        jmp next

        primitive 'execute',execute
        mov di,bx
        pop bx
        jmp word[di]

        primitive 'exit',exit
        mov si,word[bp]
        inc bp
        inc bp
        jmp next

Variables and Constants

  • tib - ( -- addr ) address of the input buffer
  • #tib - ( -- addr ) number of characters in the input buffer
  • >in - ( -- addr ) next character in input buffer
  • state - ( -- addr ) true = compiling, false = interpreting
  • dp - ( -- addr ) first free cell in the dictionary
  • base - ( -- addr ) number base
  • last - ( -- addr ) the last word to be defined

        constant 'tib',t_i_b,32768

        variable '#tib',number_t_i_b,0

        variable '>in',to_in,0

        variable 'state',state,0

        variable 'dp',dp,freemem

        variable 'base',base,10

        variable 'last',last,final

        ; execution token for constants
doconst push bx
        mov bx,word[di+2]
        jmp next

        ; execution token for variables
dovar   push bx
        lea bx,[di+2]
        jmp next

Compilation

  • , - ( x -- ) compile x to the current definition
  • c, - ( char -- ) compile char to the current definition
  • lit - ( -- ) push the value in the cell straight after lit

        primitive ',',comma
        mov ax,word[val_dp]
        xchg ax,bx
        add word[val_dp],2
        mov word[bx],ax
        pop bx
        jmp next

        primitive 'c,',c_comma
        mov ax,word[val_dp]
        xchg ax,bx
        inc word[val_dp]
        mov byte[bx],al
        pop bx
        jmp next

        primitive 'lit',lit
        push bx
        lodsw
        xchg ax,bx
        jmp next

Maths / Logic

  • + - ( x y -- z) calculate z=x+y then return z
  • = - ( x y -- flag ) return true if x=y

        primitive '+',plus
        pop ax
        add bx,ax
        jmp next

        primitive '=',equals
        pop ax
        sub bx,ax
        sub bx,1
        sbb bx,bx
        jmp next

Handling Strings

  • count - ( addr -- addr2 len ) addr contains a counted string. Return the address of the first character and the string's length
  • >number - ( double addr len -- double2 addr2 len2 ) convert string to number
addr contains a string of len characters which >number attempts to convert to a number using the current number base. >number returns the portion of the string which can't be converted, if any. If you're a Forth purist this is where Itsy starts to get ugly :-)

        primitive 'count',count
        inc bx
        push bx
        mov bl,byte[bx-1]
        mov bh,0
        jmp next

        primitive '>number',to_number
        pop di
        pop cx
        pop ax
to_numl test bx,bx
        je to_numz
        push ax
        mov al,byte[di]
        cmp al,'a'
        jc to_nums
        sub al,32
to_nums cmp al,'9'+1
        jc to_numg
        cmp al,'A'
        jc to_numh
        sub al,7
to_numg sub al,48
        mov ah,0
        cmp al,byte[val_base]
        jnc to_numh
        xchg ax,dx
        pop ax
        push dx
        xchg ax,cx
        mul word[val_base]
        xchg ax,cx
        mul word[val_base]
        add cx,dx
        pop dx
        add ax,dx
        dec bx
        inc di
        jmp to_numl
to_numz push ax
to_numh push cx
        push di
        jmp next

Terminal Input / Output

  • accept - ( addr len -- len2 ) read a string from the terminal
  • emit - ( char -- ) display char on the terminal
  • word - ( char -- addr ) parse the next word in the input buffer
accept reads a string of characters from the terminal. The string is stored at addr and can be up to len characters long. accept returns the actual length of the string.
word reads the next word from the terminal input buffer, delimited by char. The address of a counted string is returned. The string length will be 0 if the input buffer is empty.

        primitive 'accept',accept
        pop di
        xor cx,cx
acceptl call getchar
        cmp al,8
        jne acceptn
        jcxz acceptb
        call outchar
        mov al,' '
        call outchar
        mov al,8
        call outchar
        dec cx
        dec di
        jmp acceptl
acceptn cmp al,13
        je acceptz
        cmp cx,bx
        jne accepts
acceptb mov al,7
        call outchar
        jmp acceptl
accepts stosb
        inc cx
        call outchar
        jmp acceptl
acceptz jcxz acceptb
        mov al,13
        call outchar
        mov al,10
        call outchar
        mov bx,cx
        jmp next

getchar mov ah,7
        int 021h
        mov ah,0
        ret

outchar xchg ax,dx
        mov ah,2
        int 021h
        ret

        primitive 'word',word
        mov di,word[val_dp]
        push di
        mov dx,bx
        mov bx,word[val_t_i_b]
        mov cx,bx
        add bx,word[val_to_in]
        add cx,word[val_number_t_i_b]
wordf   cmp cx,bx
        je wordz
        mov al,byte[bx]
        inc bx
        cmp al,dl
        je wordf
wordc   inc di
        mov byte[di],al
        cmp cx,bx
        je wordz
        mov al,byte[bx]
        inc bx
        cmp al,dl
        jne wordc
wordz   mov byte[di+1],32
        mov ax,word[val_dp]
        xchg ax,di
        sub ax,di
        mov byte[di],al
        sub bx,word[val_t_i_b]
        mov word[val_to_in],bx
        pop bx
        jmp next

        primitive 'emit',emit
        xchg ax,bx
        call outchar
        pop bx
        jmp next

Searching the Dictionary

  • find - ( addr -- addr2 flag ) look up word in the dictionary
find looks in the Forth dictionary for the word in the counted string at addr. One of the following will be returned:
  • flag = 0, addr2 = counted string - if word not found
  • flag = 1, addr2 = call address if word is immediate
  • flag = -1, addr2 = call address if word is not immediate

        primitive 'find',find
        mov di,val_last
findl   push di
        push bx
        mov cl,byte[bx]
        mov ch,0
        inc cx
findc   mov al,byte[di+2]
        and al,07Fh
        cmp al,byte[bx]
        je findm
        pop bx
        pop di
        mov di,word[di]
        test di,di
        jne findl
findnf  push bx
        xor bx,bx
        jmp next
findm   inc di
        inc bx
        loop findc
        pop bx
        pop di
        mov bx,1
        inc di
        inc di
        mov al,byte[di]
        test al,080h
        jne findi
        neg bx
findi   and ax,31
        add di,ax
 inc di
        push di
        jmp next

Initialisation

  • abort - ( -- ) initialise Itsy then jump to interpret
abort initialises the stacks and a few variables before running the outer interpreter. When Itsy first runs it jumps to abort to set up the system.

        primitive 'abort',abort
        xor ax,ax
        mov word[val_state],ax
        mov word[val_to_in],ax
        mov word[val_number_t_i_b],ax
        xchg ax,bp
        mov sp,-256
        mov si,xt_interpret+2
        jmp next

Up and Running?

Itsy is now around 900 bytes and it's time to give the interpreter a quick test run:
Itsy Forth
Everything seems to be working fine. Next we'll define the compiler so we can continue building Itsy from the Itsy prompt :-)

Tuesday 3 April 2012

Itsy-Forth: the Dictionary and Inner Interpreter

Itsy Forth is a minimal Forth compiler implemented in under 1kB. Earlier we examined Itsy's outer interpreter. Now we take a closer look at the dictionary and inner interpreter.

Forth Dictionary

Itsy's dictionary is a linked list holding the name and code for each word (subroutine). Each entry in the list has a header containing a link, counted string and XT (execution token). For example here's the dictionary entry for nip:

        ; header
        dw link_to_previous_word
        db 3, 'nip'
xt_nip  dw docolon
        ; body
        dw xt_swap
        dw xt_drop
        dw xt_exit

The first line of the header links to the previous word in the dictionary. The second line holds the word's name preceded by its length. The final line contains the XT, a pointer to the routine which performs the actual operation of the word. Itsy uses four different XTs:

  • docolon - The word is a list of pointers to XTs. Call each in turn.
  • doconst - The word is a constant. Place its value on the data stack.
  • dovar - The word is a variable. Place its address on the data stack.
  • pointer to body - The word is a primitive (machine code). Execute it.

Macros

I'm not a big fan of macros. They're ugly and lock the code to a particular assembler. On the other hand they can add flexibility and make the code less prone to errors. Compare the definition of + with and without macros:

Without macros:

        dw link_to_previous_word
        db 1, '+'
xt_plus dw mc_plus
mc_plus pop ax
        add bx,ax
        jmp next

With macros:

        primitive '+',plus
        pop ax
        add bx,ax
        jmp next

The NASM macros to set up headers and maintain the linked list are pretty simple:

        %define link 0
        %define immediate 080h

        %macro head 4
        %%link dw link
        %define link %%link
        %strlen %%count %1
        db %3 + %%count,%1
        xt_ %+ %2 dw %4
        %endmacro

        %macro primitive 2-3 0
        head %1,%2,%3,$+2
        %endmacro

        %macro colon 2-3 0
        head %1,%2,%3,docolon
        %endmacro

        %macro constant 3
        head %1,%2,0,doconst
        val_ %+ %2 dw %3
        %endmacro

        %macro variable 3
        head %1,%2,0,dovar
        val_ %+ %2 dw %3
        %endmacro

Macro Examples

constant is used to define a Forth constant. E.g. to define false = 0:

        constant 'false',false,0

variable creates a Forth variable. E.g. to create base and initialise to 10:

        variable 'base',base,10

primitive sets up an assembly language word. E.g. to create drop:

        primitive 'drop',drop
        pop bx
        jmp next

colon defines a compiled Forth word. E.g. to define nip:

        colon 'nip',nip
        dw xt_swap
        dw xt_drop
        dw xt_exit

Register Allocation

Itsy's use of the registers is similar to most 8086 Forths. The system stack is used for the data stack while a register is used for the return stack. Note the top element of the data stack is kept in a register to enhance performance:

  • sp - data stack pointer
  • bp - return stack pointer
  • si - Forth instruction pointer
  • di - pointer to current XT
  • bx - TOS (top of data stack)

Itsy's Inner Interpreter

The Forth inner interpreter needs only three simple routines:

  • docolon - the XT to enter a Forth word. Save the Forth IP on the return stack then point it to the word being entered.
  • exit - return from a compiled Forth word. exit recovers the Forth IP from the return stack.
  • next - return from a primitive (machine code) word and call the next XT.
docolon dec bp
        dec bp
        mov word[bp],si
        lea si,[di+2]

next    lodsw
        xchg di,ax
        jmp word[di]

        primitive 'exit',exit
        mov si,word[bp]
        inc bp
        inc bp
        jmp next

Next we'll define approx 30 words and finally get the interpreter up and running. In the meantime I'd love to hear any comments on the code so far :-)