CS405 Assignment 1
Below is the EBNF grammar for a small language manipulating vector and 2-dimentional matrix.
<program> -> <block>
<block> -> <var-decl> <proc-decl> <statement>
<var-decl> -> { (<integer-decl> | <vector-decl> | <matrix-decl>)}
<integer-decl> -> int id {
, id } ;
<vector-decl> -> vector id [ integer ] {, id [integer] } ;
<matrix-decl> -> matrix id [ integer, integer] { , id [ integer, integer] } ;
<proc-decl> -> { procedure id { <block> } }
<statement> -> { (<assignment-statement>| <function-call-statement>
| <conditional-statement>|
<loop-statement>)}
<assignment-statement>-> id = <expression> ;
<expression> -> <term> [ <adding-operator> <term>]
<adding-operator> -> (+ | -)
<term> -> (<factor>{* <factor>})
<factor>->(id| integer | <integer-list> | <list-of-integer-list>| (<expression>) )
<integer-list> -> ( integer {
,integer} )
<list-of-integer-list> -> (<integer-list> { , <integer-list>} )
<function-call-statement> -> call id ;
<conditional-statement> -> if (<condition>) then { <statement> }
<condition>->
(id | integer)
<relational-expr> (id | integer)
<relational-expr> -> (
==
| != | > | < | >= | <= )
<loop-statement>->
while
(<condition>) {
<statement>}
Note the EBNF meta-symbol [], () and {} are represented in italicized form. They are in blue if you print it in color printer or read the softcopy. Those not italicized are grammar tokens.
id represents identifier token.Assume that an identifier can only contain letters (only alphabetic characters), digits, and under-scores (_) with the restrictions that it must begin with a letter, cannot end with an underscore and cannot have two consecutive underscores. For example, give_2_Joe, tell_me and A45Asm3 are valid identifiers, but 6gh, two__ bad, and no_end_ are not legal identifiers. integer is a token of unsigned integer. Assume that comments are indicated by being preceded by //.
What to do:
1.Identify the tokens in the grammar
2 Design and implement a lexical analyzer procedure to read a source program in the
above language and print the next token in the input stream. If the token detected is a
valueless token, such as a keyword, then it is sufficient to print only the keyword. If it
has a value, then both the token type and lexeme should be printed.
Suggestions:
1. Construct a
token class to represent the token data structure, including a method to print
a token.
2. Use the JLex
tool to automatically construct a lexical analyzer in Java from a set of
regular expressions specifying tokens. This can interface with your token class
to return a token to the main program which then prints the token.
3. JLEX example: http://www.cis.uab.edu/cs405/jlexslides.pdf
Sample source code:
-----------------------------------------
int i, j;
vector A[3], B[3], C[3];
matrix M[3][3];
A= (1,2,4);
B= (4,6,7)
C= A+B;
i =9;
procedure foo {
int m , n;
vector G[2], H[2];
matrix T[2,2];
G= (3,6);
H= 8*G+G;
T= G*H;
}
If(i >0)
{
C=2*A-B;
j= i;
M= A*B;
i=i-1;
}
call foo;
-----------------------------------------