next up previous contents
Next: Instruction Sets Up: SALTO Target Description Specifications Previous: Reservation Tables

Lexical Structure of the Assembly Language

The description of the lexical structure of the assembly language specifies the format of comments, operand separators and the generic operand-matching patterns used in identifying instruction operands during program parsing.

Comments

Three types of comments are supported:

Line comments begin with a specific character or string in the first non-blank position of a line. End-of-line comments are the comments at the end of an otherwise non-empty line; they start with a specific character or string and are terminated by the end-of-line character. Stream comments are started and terminated by specific characters or strings.

Comment identification is controlled by the following definition:

Example:

(comment_chars "#")  ; end-of-line comment start on the MIPS
(comment_chars "!#") ; end-of-line comment start on the SPARC
(comment_start "(*") ; start of a stream comment on the TM family
(comment_end "*)")   ; end of a stream comment on the TM family

Non-Blank Separators

Single-character non-blank separators are listed in an aggregate definition of the form

< def_exact>::=

 ( def_exact
"<string>"
)  

where <string> is a concatenation of the supported terminal symbols.

Multi-character separators are listed using a definition of the form

< def_separ>::=

 ( def_separ

["<string>"+])


where each <string> is the literal representation of a multi-character separator.

Notes:

  1. The definition of instruction formats in the description of the instruction set can only use separators defined by means of a def_exact or def_separ expression.
  2. If there are multiple def_exact or def_separ definitions, only the most recent one of each form is effective.

Pattern-Matching Tokens

The matching between actual operands in the assembly program and the symbolic operands in the description of the target instruction set relies on a set of meta-variables which must be recognized in appropriate positions when parsing the assembly program. In the target machine description, the meta-variables are designated using single-character tokens, and are associated with regular expression patterns describing the expected external representation of the corresponding actual operands.

A family of predefined input functions allows to read specific assembly language objects: identifiers, register names, arithmetic expressions, immediate integer or floating-point values:

The full definition of a meta-variable follows the syntax

< meta_var_def>::=

 ( def_token

"<char>"

[ (regex "<reg_exp>") |

(read_exp) |

(read_flp) |

(read_imm <size> "<sign>") ] )


where

The syntax of admissible regular expressions is a subset of the classical regular expression syntax of grep and Emacs:

Example:

; TM1000 tokens: registers or expressions

#define REG_TOKEN(CHAR)\
(def_token  \
  CHAR [(regex "r1[0-1][0-9]\\|r12[0-7]\\|\
                 r[1-9][0-9]\\|r[0-9]")
       ])

#define EXP_TOKEN(CHAR)\
(def_token CHAR [ (read_exp) ])

REG_TOKEN("d")    ; destination register
REG_TOKEN("s")    ; first source register
REG_TOKEN("t")    ; second source register
REG_TOKEN("g")    ; guard register
EXP_TOKEN("m")    ; modifier

Notes:

  1. It is good practice to order patterns in a regular expression so that the most specific patterns come first, and that the most general ones be at the end of the expression.
  2. The choice of meta-variable names defined using def_token is purely arbitrary; the author of a machine description is free to choose a naming convention that is best suited for the design of the target system.


next up previous contents
Next: Instruction Sets Up: SALTO Target Description Specifications Previous: Reservation Tables

Erven Rohou
Fri Oct 17 09:15:29 MET DST 1997