Trek through Pure Reason: 走火入魔 Orz

想著 toyasm 睡不著，竟在剛剛完成說明書（spec）了 Orz。JK-extended TOY assembly 的設計有點模仿 C++ 與 C 早期的關係 ─ 新功能都以舊語言實作，並改良一些細節。目前全文如下。

NAME
toyasm - Joshsoft TOY Assembler

SYNOPSIS
toyasm [-c] [-a] [-o outfile] infile

OPTIONS
-c Use classical (cyy) TOY assembly. If not specified, JK-extended TOY assembly will be used.
-a Assemble only and generate object file; do not link.
-o outfile Specifying the output file name. If not specified, the default name is 'a.toy' or 'a.obj'.

DESCRIPTION
The first part is a description of classical (cyy) TOY assembly. JK-extended TOY assembly are explained later.

The TOY assembly language is case-insensitive and line-oriented. Instructions and directives cannot span multiple lines, and one line can at most contain one instruction or directive. Within a line, you can add arbitrary number of whitespaces as you like.

Comments begin with semicolons and extend to the end of the line.

There are two types of numeric literals: decimal and hexidecimal. Hexadecimal literals begin with 0x.

Two assembly directives for data declaration:
DW for declaring a single variable, optionally initialized
DUP for declaring an uninitialized array
Examples:

  A  DW   32   ; variable A initialized to 32
  B  DW        ; uninitialized variable B
  C  DUP  10   ; array C of length 10

Data declarations must precede the first instruction.

Labels consist of letters and digits, and they must begin with a letter. No colons after labels.

Program starts from the first instruction the assembler meets.

Instructions:

  0 hlt 
  1 add RD, RS, RT
  2 sub RD, RS, RT
  3 and RD, RS, RT
  4 xor RD, RS, RT
  5 shl RD, RS, RT
  6 shr RD, RS, RT
  7 lda RD, addr
  8 ld  RD, addr
  9 st  RD, addr
  A ldi RD, RT
  B sti RD, RT
  C bz  RD, addr
  D bp  RD, addr
  E jr  RD
  F jl  RD, addr

The rest describes JK-extended TOY assembly. The greatest change from classical to JK-extended TOY assembly is that the latter supports a simulated stack, thereby implements several "built-in" stack operations and procedure-related directives. It also introduces label scope. To avoid potential confusion and for convenience, RF is always constant 1 in the entire program and RB is read-only in a procedure. RE also plays a special role. These are explained later.

By default, all labels (including variables and procedures) are local to the translation unit. You can precede a label with the keyword "export" to make the label globally visible. Labels inside a procedure cannot be exported.

Examples:

  A         DW  ; A is local to this translation unit
  export B  DW  ; B can be seen from other translation units

  PROC p                ; p is local to this translation unit
  ; ...
  loop1 sub R1, R1, R2  ; loop1 is local to p
  ; ...
  ENDPROC

  export PROC q         ; q can be called from other translation units
  ; ...
  loop1 add R3, R1, R2  ; ok, this loop1 does not conflict with the loop1 in p
  ; ...
  ENDPROC

The stack is located in the tail part of the memory, growing from higher address to lower one. RE is the stack pointer, pointing to the next position where the next pushed element will be put. When the program starts, RE is initialized to 0xFE and RF is initialized to 1 and guaranteed to remain 1, so you can use RF as the constant 1. The two stack operations and their equivalent classical code are listed below:

push RX
  sti RX, RE
  sub RE, RE, RF

pop RX    ; RX cannot be RF. Inside a procedure, RX cannot be RB, either.
  add RE, RE, RF
  ldi RX, RE

pop
  add RE, RE, RF

Procedures begin with a procedure declaration, which consists of the keyword PROC and the name of the procedure, and end with the keyword ENDPROC. When the assembler sees a procedure, it automatically does the following translation, building up the stack frame:

export PROC p       export p  sti RF, RE       ; push RF (the return address)
; code ...     ==>            lda RF, 1        ; restore RF to constant 1
ENDPROC                       sub RE, RE, RF
                              sti RB, RE       ; save previous RB
                              add RB, RE, R0   ; RB <- RE
                              sub RE, RE, RF
                              ; code ...
                              ret              ; if the line above ENDPROC
                                               ; is not "ret"

If a translation unit contains only procedures and data declarations (i.e., no "exposed" instructions), it can only be assembled into an object file.

Two directives "call" and "ret" are used to transfer control to and from a procedure.

call p    ; p is a visible procedure
  jl  RF, p
  lda RF, 1

ret       ; valid only inside a procedure
  add RE, RB, 0   ; RE <- RB, clear local variables
  ldi RB, RE      ; restore previous RB
  add RE, RE, RF
  ldi RF, RE      ; return address now in RF
  jr  RF

The following figures show the stages of a procedure call and the status of the stack:

1. pushing arguments (caller)

   push R1
   push R2

      |    |            |    |      RE -> |____|
      |    | ==>  RE -> |____|  ==>       |0001|
RE -> |____|            |0002|            |0002|

2. calling the procedure (caller & callee)

   call p

      |    |                RE -> |____|
RE -> |____|                      |00F1| <- RB
      |001D|           ==>        |001D|
      |0001|                      |0002|
      |0002| RB = 00F1            |0001|

3. allocating spaces for local variables (callee)

   sub RE, RE, RF

RE -> |____|
      |????|
      |00F1| <- RB
      |001D|
      |0001|
      |0002|

4. returning from the procedure (callee & caller)

   ret

RE -> |____|                   |    |                       |    |
      |????|                   |    |                       |    |
      |00F1| <- RB  ==>  RE -> |____|            ==>        |    |
      |001D|                   |001D|                 RE -> |____|
      |0001|                   |0001|                       |0001|
      |0002|                   |0002| RB = 00F1             |0002| RF = 001D

5. clearing arguments (caller)

   pop
   pop

RE -> |____|             |    |             |    |
      |0001|  ==>  RE -> |____|  ==>        |    |
      |0002|             |0002|       RE -> |____|

AUTHOR
Josh Ko, Department of Computer Science and Information Engineering, National Taiwan University.

ACKNOWLEDGMENTS
Department of Computer Science, Princeton for inventing the TOY machine.
Yung-Yu Chuang (cyy) at Dept. of CSIE, NTU for designing the classical TOY assembly and instructing the course on Computer Organization and Assembly Languages.

--
cyy 法力廣大無邊啊 Orz，該不會最後被他拐去做圖學吧 XD。

Labels: NTUCSIE, TOY86

Trek through Pure Reason

Josh Ko

Previous Posts

Archives

2006/12/12

走火入魔 Orz