Next: Tokens from Literals, Up: Lexical [Contents][Index]
yylex
The value that yylex
returns must be the positive numeric code for
the type of token it has just found; a zero or negative value signifies
end-of-input.
When a token is referred to in the grammar rules by a name, that name in the
parser implementation file becomes a C macro whose definition is the proper
numeric code for that token type. So yylex
can use the name to
indicate that type. See Symbols.
When a token is referred to in the grammar rules by a character literal, the
numeric code for that character is also the code for the token type. So
yylex
can simply return that character code, possibly converted to
unsigned char
to avoid sign-extension. The null character must not
be used this way, because its code is zero and that signifies end-of-input.
Here is an example showing these things:
int yylex (void) { … if (c == EOF) /* Detect end-of-input. */ return 0; … if (c == '+' || c == '-') return c; /* Assume token type for '+' is '+'. */ … return INT; /* Return the type of the token. */ … }
This interface has been designed so that the output from the lex
utility can be used without change as the definition of yylex
.