The description of the lexical structure of the assembly language specifies the format of comments, operand separators and the generic operand-matching patterns used in identifying instruction operands during program parsing.
Three types of comments are supported:
Line comments begin with a specific character or string in the first non-blank position of a line. End-of-line comments are the comments at the end of an otherwise non-empty line; they start with a specific character or string and are terminated by the end-of-line character. Stream comments are started and terminated by specific characters or strings.
Comment identification is controlled by the following definition:
< line_comment_chars>::=
< comment_chars>::=
< comment_start>::=
< comment_end>::=
( line_comment_chars
"<string>")
where <string> is the character or the sequence of characters that
start a line comment;
( comment_chars
"<string>")
where <string> is the character or the sequence of characters that
start an end-of-line comment;
( comment_start
"<start_string>")
( comment_end
"<end_string>")
where
(comment_chars "#") ; end-of-line comment start on the MIPS (comment_chars "!#") ; end-of-line comment start on the SPARC (comment_start "(*") ; start of a stream comment on the TM family (comment_end "*)") ; end of a stream comment on the TM family
Single-character non-blank separators are listed in an aggregate definition of the form
< def_exact>::=
( def_exact
"<string>"
)
where <string> is a concatenation of the supported terminal
symbols.
Multi-character separators are listed using a definition of the form
< def_separ>::=
["<string>"+])
( def_separ
where each <string> is the literal representation of a
multi-character separator.
The matching between actual operands in the assembly program and the symbolic operands in the description of the target instruction set relies on a set of meta-variables which must be recognized in appropriate positions when parsing the assembly program. In the target machine description, the meta-variables are designated using single-character tokens, and are associated with regular expression patterns describing the expected external representation of the corresponding actual operands.
A family of predefined input functions allows to read specific assembly language objects: identifiers, register names, arithmetic expressions, immediate integer or floating-point values:
The full definition of a meta-variable follows the syntax
< meta_var_def>::=
"<char>"
[
(regex "<reg_exp>") |
(read_exp) |
(read_flp) |
(read_imm <size> "<sign>")
]
)
( def_token
where
The syntax of admissible regular expressions is a subset of the classical regular expression syntax of grep and Emacs:
^
'') matches any character
not present in the set;\(
'' and ``\)
'' is treated as a single pattern;\|
'' specifies the alternative between the
largest single patterns adjacent to it;\
'') suppresses the special meaning of the
immediately following character (backslash, dot, asterisk etc.).
; TM1000 tokens: registers or expressions #define REG_TOKEN(CHAR)\ (def_token \ CHAR [(regex "r1[0-1][0-9]\\|r12[0-7]\\|\ r[1-9][0-9]\\|r[0-9]") ]) #define EXP_TOKEN(CHAR)\ (def_token CHAR [ (read_exp) ]) REG_TOKEN("d") ; destination register REG_TOKEN("s") ; first source register REG_TOKEN("t") ; second source register REG_TOKEN("g") ; guard register EXP_TOKEN("m") ; modifier