At the end of the 2nd formal language lecture of the TDDB44/TDDD55 compiler
courses I presented a number of grammar construction problems. I promised you
the solutions by e-mail. Well, it became a text file on the web instead:
-------------------------------------------------------------------------------
L0={a^n b^n | n>=0} This was given at the lecture slides. I just include it
as a reference
L1={a^n b^(2n) | n>=0}
L2={a^n b^m | n>=m>=0}
L3={a^m b^n a^n b^m | n>=0}
L4={a^m b^m a^n b^n | n>=0}
L5={a^m b^n a^m b^n | n>=0}
To L3-L5 I added the comment "Which is/are impossible?".
In the following text small leter "e" stands for epsilon, the empty string.
In all grammars S is the start symbol.
-------------------------------------------------------------------------------
The main idea behind this given grammar for L0:
S->aSb
|e
is that at each step of derivation you get another a to the left and another b
to the right of a central S. Finally the S is replaced by an e and
"disappears".
-------------------------------------------------------------------------------
L1: S->aSbb
|e
The same idea as for L0, but there are 2 b:s each time to the right.
-------------------------------------------------------------------------------
L2: S->aS
|T
T->aTb
|e
T follows L0. Before that the first rule for S puts extra a:s in the front.
Another solution is
L2: S->aSb
|T
T->aT
|e
S builds equal number of a:s and b:s around T which generates the extra number
of a:s.
You could alse use
L2: S->aSb
|aS
|e
since the grammar don't need to be unambiguous. In this case you use the first
rule to get equal number of a:s and b:s and now and then you shift to the
second one to get some extra a:s.
-------------------------------------------------------------------------------
L3: S->aSb
|T
T->bTa
|e
First, the first S rule handles the number of symbols with the m exponent in
the set description of the language above. in the center of that then the T
rules handles the number of symbols with the n exponent.
-------------------------------------------------------------------------------
L4: S->TT
T->aTb
|e
Yes, it's OK to have the same nonterminal, T, twice. That doesn't define the
language {a^n b^n a^n b^n | n>=0}. The first T could be derived to some string
independently of what string the second T derives to. If you write
L4: S->TU
T->aTb
|e
U->aUb
|e
it's correct, but maybe it shows that you really don'tt know what you are
dealing with...
-------------------------------------------------------------------------------
L5: impossible.
Loose, sloppy, hand-waving sort-of-motivation: The to m:s could be matched
against each other using a technique like L0. But you can't on the same time
match the n:s - one in the middle of the m:s and one to the right.
Little bit more technical description: Context free languages are parsed using
*ONE* stack. L0 is parsed by pushing something on the stack for each a that is
read and then popping the stack for each b that is read. For L5 you should
first push for each a in the first a^m part and later pop for each a in the
second a^m part. In between something must be done to the stack for the first
b^n sequence, but that can't be kept on the stack, it would hide the symbols to
pop for the second a^m part. So there will be nothing left on the stack for the
second b^n part to match against.