Wednesday, 14 March 2012

Itsy-Forth: the Outer Interpreter of a 1K Tiny Compiler

Itsy Forth is a tiny indirect-threaded compiler for a subset of the Forth programming language. The compiler contains an interactive interpreter which allows the programmer to define new words (subroutines), execute them and even extend the language.

Each word has an entry in the Forth dictionary. New words are defined with : and end with ;. The following example defines a new word inc which calls two words, 1 and +:

: inc 1 + ;

Outer Interpreter

Itsy was developed top-down, first implementing the interpreter. interpret takes the next word from the input buffer and searches for it in the dictionary. If the word isn't found, Itsy attempts to convert it to a number. In compile mode, Itsy will add the word or number to the current definition. Otherwise the word will be executed.

: interpret
begin
    #tib @ >in @ =
    if
        tib 50 accept #tib ! 0 >in !
    then
    32 word find dup
    if
        state @ =
        if
            ,
        else
            execute
        then
    else
        dup rot count >number
        if
            state @
            if
                last @ dup @ last ! dp !
            then
            abort
        then
        drop drop state @
        if
            ['] lit , ,
        then
    then
again
;

The Itsy interpreter uses 23 different words, 5 variables (state, >in, #tib, dp, last) and a constant (tib). Note the variables dp and last are non-standard. dp points to the first free cell in the data area. last points to the last word to be compiled.

Next we'll define the dictionary structure and implement the inner interpreter. In the meantime I'd love to hear any comments on the code so far :-)

13 comments:

  1. That's a very nice implementation. I took a crack at factoring it apart and commenting it: http://hastebin.com/raw/pocobaxuce

    No promises that you'll like my names. :)

    ReplyDelete
    Replies
    1. Hi Rodge,

      The word names are fine apart from NOPE! :-) The factored code is easier to understand but for now I'm trying to keep everything as compact as possible. Currently Itsy Forth is ~966 bytes.

      Some other improvements would be to use the constant BL instead of 32 and COMPILE, instead of , where appropriate. Using SOURCE would avoid #TIB and TIB which are considered obsolete. I wonder if DP can be avoided by using ALLOT?

      John

      Delete
  2. How do you deal with immmediate words? It looks like ";" will be compiled in.

    ReplyDelete
    Replies
    1. Hi Sam,

      FIND leaves -1 on the stack for a normal word, 0 for word not found or 1 for an immediate word.

      STATE @ = compares the value of STATE to the value returned by FIND. If they're not equal, the word needs to be executed.

      John

      Delete
  3. Sam: I think that's what the "state @ =" phrase does in combination with the result of "find". See the GForth manual for the contract of "find": http://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Word-Lists.html#index-find-_0040var_007b-c_002daddr-_002d_002d-xt-_002b_002d1-_007c-c_002daddr-0---_007d--core_002csearch-1628

    ReplyDelete
  4. Hi. I've read your Itsy Forth series, and I was wondering what was the reason for writing the word interpret as one large function, instead of factoring it into smaller words?

    ReplyDelete
    Replies
    1. Hi amca. It'd be much better factored into several words, but I only wanted to define words which are part of the ANS standard. I'm not convinced it's the best method. Maybe two separate loops, one to compile, one to interpret.

      Delete
    2. Note that just factoring interpret into distinct high level words wouldn't change the number of primitives in the system.

      Also note that while #tib and company are standard words, they were deprecated, so building the interpreter around SOURCE instead could well allow some space saving by having some of the variable as anonymous word or byte fields doing their job under the hood of primitives.

      SOURCE >IN and REFILL and factoring would give you: ...

      : find-word? ( -- ca 0 | xt -1 | xt 1 )
      : cs>number ( ca1 -- ud ca2 len ) 0 0 rot >number ;
      : do-compile ( -- / x*i -- x*j )
      find-word? dup if
      1 = if
      execute
      else
      ,
      then
      else
      drop cs>number if
      last @ dup @ last ! dp ! abort
      then
      drop drop ['] lit , ,
      then ;

      : do-interpret ( --n / x*i -- x*j )
      find-word? dup if
      execute
      else
      drop cs>number if
      abort
      then
      drop drop
      then ;

      interpreter ( -- )
      begin
      source >in @ = if refill then
      drop state @ if
      do-compile
      else
      do-interpret
      then
      again ;

      Delete
    3. & note that all the leading spaces to indent the source were trimmed. Oh, well.

      Delete
  5. Thanks for replying. In that case you could factor it slightly by using either the word POSTPONE or COMPILE, .

    ReplyDelete
  6. Hey John,

    Sorry for posting this here. I couldn't find a way to contact you privately.

    Would you be willing to share the itsyforth code under some kind of MIT-compatible license? I've spent some time studying it (especially this outer interpreter) and would like to actually use it for a project I'm working on. ( https://github.com/sabren/b4/ )

    Either way, thanks for writing the series! :)

    ReplyDelete
    Replies
    1. Hi Michal,

      You can considered Itsy to be under a permissive MIT style license. I'll add the license to the code at some point :-)

      Delete
  7. Itsy Forth is a tiny indirect-threaded compiler for a subset of the Forth programming language. The compiler contains an interactive interpreter which allows the programmer to define new words (subroutines), execute them and even extend the language. interpret.co.uk

    ReplyDelete