Welp, we’ve proven a lot of languages aren’t regular. Any language that requires counting cannot be regular, since they do not require a constant amount of state.
The class of context-free languages is an intermediate class between the class of regular languages and the class of decidable languages, meaning it is a strict superset of the regular languages
Context-free languages are described by context-free grammars
This class includes N-PDA and the less powerful D-PDA
Instead of writing out a system to recognize languages, we make a system to generate strings inductively. To recognize the language , we start with a a basis is that is in the language, and use the recursive step "".
A grammar is a set of substitution rules for recursively generating strings
Definition: A context-free grammar is a -tuple , where:
- is a finite set called the variables
- is a finite set, disjoint from , called the terminals
- is a finite set of rules taking the form a string of variables and terminals
- is the start symbol
Example: The language for
, , ,
We may apply rules recursively however we’d like until all symbols are terminal. This is very powerful.
A derivation of a string is leftmost if we always expand the leftmost variable. Two derivations are different the leftmost (or rightmost) derivation are different.
A parse tree is a tree encoding the steps in a derivation, where leaves are labeled by a terminal or , interior nodes are labeled by a variable, and the root is the start symbol. This is an non-ordered representation of a derivation. We can traverse the leaves with pre-order to get the correct language.
Theorem: For every parse tree, there is a unique leftmost and a unique rightmost derivation
On the topic of ambiguous grammars, Antonella is very amused by “Let’s eat_,_ Grandma”
A grammar is ambiguous if it can yield a string in multiple ways (uniqueness defined as above), or in other words, if there is a string in the language with two different leftmost (or rightmost) derivations. We can make ambiguous grammars for any language, but many have good non-ambiguous grammars. However, some languages are inherently ambiguous (for example, ).
Example:
is ambiguous
is not
In Chomsky Normal Form, every rule takes the form of or
To convert a CFG to CNF,
- Add a new start symbol
- Determine the nullable variables and remove the null transitions
- Eliminate unit productions
- Replace long productions with shorter ones
- Move terminals to unit productions
Swapping steps 4 and 5 works too
What closure properties can we rely on for context-free languages?
- Union
- Concatenation
- Star
IntersectionComplement
Pushdown Automata
PDAs are equivalent in specification power with CFG, however they recognize a context-free language instead of generating one
They are similar to N-DFA with the addition of a stack
Definition: A pushdown automaton is a 6-tuple (), where and are all finite sets, and
- is the set of states,
- is the input alphabet
- is the stack alphabet
- is the transition function,
- is the start state, and
- is the set of accept states.
Some CFL are more easily described one way or another
We can also derive relationships between a CFL and RL. For example, . To see this, consider crossing a pushdown automata with a DFA.
Theorem: A language is context free some pushdown automaton recognizes it.
It’s not too hard to construct a PDA that mimics a CFG. The other way around is slightly trickier though.
Non-Context-Free Languages
Turns out there are also limitations to context-free languages
There is a pumping lemma on context-free languages, stating that all longer strings can be “pumped”, which in this context means dividing the string into five parts (so that the second and fourth may be repeated together any number of times)
Theorem: If is a context-free language, where such that , where , , , and
This takes slightly more advanced reasoning than the DFA pumping lemma, but ultimately uses the pigeon-hole principle in a similar way
Deterministic CFL
Deterministic PDAs are actually weaker than the PDAs we’ve examined so far. We still allow the use of transitions but there can only ever be one valid transition at a time.
Definition: A deterministic pushdown automaton is a 6-tuple,
- is the set of states,
- is the input alphabet,
- is the stack alphabet,
- is the transition function,
- is the start state, and
- is the set of accept states.
Exactly one of the states is not .
Acceptance occurs as in PDAs. If a DPDA enters an accept state after it has read the last input symbol of an input string, it accepts that string. Rejection occurs if the DPDA doesn’t arrive at an accept state at the end, tries to pop an empty stack (hangs), or makes an endless sequence of -input moves (loops).
Definition: The language of a DPDA is called a deterministic context-free language
To reason more easily about DPDAs, we can first convert any DPDA into one that reads the entire input string.
Theorem: The class of DCFLs is closed under complementation
To see this, we modify a DPDA so that it hits at most one accept state per input symbol (by splitting reading and popping states). Then we invert the reading states.
This theorem implies that some CFLs are not DCFLs. If a CFL’s complement is not a CFL then that CFL is not a DCFL (think )
DCFLs are not closed under union, intersection, star, and reversal
As a useful tool, we use the fact that is a DCFL (with an endmark) is a DCFL
Deterministic pushdown automata are equivalent in power to deterministic context-free grammars, provided we restrict our attention to endmarked languages
Deterministic CFGs are defined in terms of reductions instead of “generations”. If and are strings of variables and terminals, we write to mean can be obtained from via a reduce step. means can be obtained from through a series of reduce steps.
In a CFG with start variable and string , we write a leftmost reduction of as . We say that determines its entire leftmost reduction, which implies the grammar is unambiguous. This is an important but not sufficient requirement.
and means is a handle of
Example: Consider
This is the language of
It should be clear that this is not a DCFL
But if we use then it is
The key idea is we need handles to be restricted so that a DPDA can identify them easily. As in, one must be able to find the handle without knowing the reduce step in advance.
We introduce the requirement that the reduce step is uniquely determined by the prefix of up through and including the reducing string of that reduce step. In other words, we say that is a forced handle if is the unique handle in every valid string
Definition: A deterministic context-free grammar is a context-free grammar such that every valid string has a forced handle
For simplicity, we say isn’t on the right-hand side of any rule, and that there are no unused variables
This is a good definition but doesn’t make it obvious how to determine whether a CFG is deterministic
The DK-test says that for any CFG we can construct an associated DFA to identify handles
Each state of the DFA represents our matching progress for each rule. We include transitions for each terminal and symbol. When we reach a symbol, we expand it by each rule.
This test fails,

This test passes (note the bottom right box should be ),

We have a DCFG if each accept state has,
- Exactly one completed rule
- No dotted rule in which a terminal symbol immediately follows the dot
Note our definition of a forced handle.
At each step, we run the current string through the machine to end up in a single accept state. If an accept state has multiple rules then we wouldn’t know which one to apply. If an accept state has a terminal symbol transition then we’d be able to move on to another rule, and the result would still be ambiguous. However, if we can transition out of the accept state with a symbol then that previous state is not actually a forced handle, so we safely move on.
Any DCFG can be converted to a DPDA, but only endmarked languages can be converted from a DPDA to a DCFG
Theorem: An endmarked language is generated by a DCFG it is a DCFL
To make a DPDA for a DCFG, we essentially simulate , keeping its states on the stack, and using that to iteratively reduce any input string. Once we reduce to the start state , we accept.
To do the other way around, is much more complicated and involves the -test
LR(K) Grammars
DCFGs are great but can be a bit inconvenient with their restrictions
grammars are a broader class of grammars, close to DCFGs, that can be directly converted into DPDAs
Definition: In an grammar, a handle is forced according to the symbols following the handle
- Left to right input processing
- Rightmost derivations
- symbols of lookahead
Therefore, a DCFG is the same as an grammar
We can convert any grammar to DPDA with a test
A test works quite similar to a test, but we keep track of the possible length strings that could follow a handle after reducing. This lets us track ambiguities in a procedural way.