Next: Instruction Sets Up: SALTO Target Description Specifications Previous: Reservation Tables

Lexical Structure of the Assembly Language

The description of the lexical structure of the assembly language specifies the format of comments, operand separators and the generic operand-matching patterns used in identifying instruction operands during program parsing.

Comments

Three types of comments are supported:

line comments,
end-of-line comments,
stream comments.

Line comments begin with a specific character or string in the first non-blank position of a line. End-of-line comments are the comments at the end of an otherwise non-empty line; they start with a specific character or string and are terminated by the end-of-line character. Stream comments are started and terminated by specific characters or strings.

Comment identification is controlled by the following definition:

line comments on a line of their own:
< line_comment_chars>::=
```
 ( line_comment_chars
  "<string>")  
```
where <string> is the character or the sequence of characters that start a line comment;
end-of-line comments (comments following assembly language text): < comment_chars>::= ( comment_chars "<string>") where <string> is the character or the sequence of characters that start an end-of-line comment;
start and termination of stream comments: < comment_start>::= ( comment_start "<start_string>") < comment_end>::= ( comment_end "<end_string>") where <start_string> is the character or sequence of characters that begins a stream comment; <end_string> is the character or sequence of characters that terminates a stream comment;

Example:

(comment_chars "#")  ; end-of-line comment start on the MIPS
(comment_chars "!#") ; end-of-line comment start on the SPARC
(comment_start "(*") ; start of a stream comment on the TM family
(comment_end "*)")   ; end of a stream comment on the TM family

Non-Blank Separators

Single-character non-blank separators are listed in an aggregate definition of the form

< def_exact>::=

 ( def_exact
"<string>"
)

where <string> is a concatenation of the supported terminal symbols. Multi-character separators are listed using a definition of the form < def_separ>::= ( def_separ ["<string>"⁺]) where each <string> is the literal representation of a multi-character separator.

Notes:

The definition of instruction formats in the description of the instruction set can only use separators defined by means of a def_exact or def_separ expression.
If there are multiple def_exact or def_separ definitions, only the most recent one of each form is effective.

Pattern-Matching Tokens

The matching between actual operands in the assembly program and the symbolic operands in the description of the target instruction set relies on a set of meta-variables which must be recognized in appropriate positions when parsing the assembly program. In the target machine description, the meta-variables are designated using single-character tokens, and are associated with regular expression patterns describing the expected external representation of the corresponding actual operands.

A family of predefined input functions allows to read specific assembly language objects: identifiers, register names, arithmetic expressions, immediate integer or floating-point values:

regex reads a sequence of characters matching a given regular expression <reg_exp>;
read_exp reads an arithmetical expression;
read_imm reads an integer value (signed or not);
read_flp reads a floating-point value.

The full definition of a meta-variable follows the syntax

< meta_var_def>::=

 ( def_token


  "<char>"

  [  
   (regex "<reg_exp>") |

    (read_exp) |

    (read_flp) |

    (read_imm <size> "<sign>")  
    ]
)

where <char> can be any non-blank alphabetic character not in the separator character sets (defined using def_exact and def_separ), <reg_exp> is the regular expression defining the pattern to be matched, <size> is the size of the immediate value in bits, <sign>, if equal to ``signed'', indicates that the number should be interpreted as signed. Any other value of <sign> forces the number to be treated as unsigned. The syntax of admissible regular expressions is a subset of the classical regular expression syntax of grep and Emacs: a single dot (``.'') matches any single character; a set of characters enclosed in square brackets (``['' and ``]'') matches any single character from that set; two characters separated by a dash (``-'') in the set define an interval in terms of the ASCII code, from the character preceding the dash to the character following the dash, inclusive; a set starting with a caret (``^'') matches any character not present in the set; a sequence of patterns matches the longest matching sequence of characters, starting from the leftmost pattern; an asterisk (``*'') indicates zero or more occurrences of the preceding pattern; a sequence of patterns enclosed between a matching pair of strings ``\('' and ``\)'' is treated as a single pattern; the string ``\|'' specifies the alternative between the largest single patterns adjacent to it; a backslash (``\'') suppresses the special meaning of the immediately following character (backslash, dot, asterisk etc.).

Example:

; TM1000 tokens: registers or expressions

#define REG_TOKEN(CHAR)\
(def_token  \
  CHAR [(regex "r1[0-1][0-9]\\|r12[0-7]\\|\
                 r[1-9][0-9]\\|r[0-9]")
       ])

#define EXP_TOKEN(CHAR)\
(def_token CHAR [ (read_exp) ])

REG_TOKEN("d")    ; destination register
REG_TOKEN("s")    ; first source register
REG_TOKEN("t")    ; second source register
REG_TOKEN("g")    ; guard register
EXP_TOKEN("m")    ; modifier

Notes:

It is good practice to order patterns in a regular expression so that the most specific patterns come first, and that the most general ones be at the end of the expression.
The choice of meta-variable names defined using def_token is purely arbitrary; the author of a machine description is free to choose a naming convention that is best suited for the design of the target system.

Next: Instruction Sets Up: SALTO Target Description Specifications Previous: Reservation Tables

Erven Rohou
Fri Oct 17 09:15:29 MET DST 1997