Non-Context Grammar and Pushdown Automata

2.35. Let NFA  be represented graphically as follows:

Start

Maybe you are interested!

Assessment of surface water resources in Dong Nai river basin to serve sustainable development goals in the context of climate change - 2
Directions for Promoting Traditional Ethical Values for Building a New Lifestyle for Students in the Current Context of Globalization
Managing sex education activities for students at Nong Trang secondary school, Viet Tri city, Phu Tho province in the current context - 19
Vietnam - Japan economic and trade relations in the international context. Current situation and solutions - 1
Managing moral education for students in secondary schools in Hung Yen city, Hung Yen province in the context of the 4.0 Industrial Revolution - 15



1 1 

1) Construct a linear grammar that must be equivalent to the above NFA  .

2) Construct a left linear grammar equivalent to the above NFA  .

3) Find the regular expression equivalent to the above NFA  .

2.36. Given the regular expressions: a) 0 (1 + 2) 3 *

b) 0 (1+ 2 + 3) +

c) (0 + 1) * + (23) +

1) Construct NFAs  that recognize the languages represented by each of the above expressions.

2) Construct a regular grammar for the languages represented by each expression above.

2.37. Given a right linear grammar G: A → 0A| 0B| 1C | ε ;

B → 0C| 1A ;

C → 0B| 1C| 0 | 1.

1) Identify the components of G.

2) Construct a finite automaton M: L(G) = L(M).

3) Represent M as a graph.

4) Write a regular expression to represent L(G).

2.38. Given a regular grammar G: S → aA | bB| aC | b| c;

A → aA | b;

B → aC | aA | a | b | ε;

C → bC| bB | a | b.

1) Identify the components of G.

2) Construct a finite automaton M: L(G) = L(M).

3) Represent M in tabular and graphical form.

4) Write a regular expression to represent L(G).

2.39. Given a regular grammar G: S → Aa | B| Ca | b|  ;

B → Cb | Ah | a | b; C → Cb| B | b.

1) Identify the components of G.

2) Construct a finite automaton M: L(G) = L(M).

3) Represent M as a graph.

4) Write a regular expression to represent L(G).

2.40. Given a left linear grammar G: A → A0| B| C1 | ε ;

B → C0| A1 ;

C → B0| C1| 0 | A.

1) Identify the components of G.

2) Construct a finite automaton M: L(G) = L(M).

3) Represent M as a graph.

4) Write a regular expression to represent L(G).

2.41. Prove that the language L = {0 1 | n is a positive integer} is not regular.

2.42. Which of the following languages is not a regular language?. Prove.

a) L = {0 2n | n is a positive integer}.

b) L = {0 n 1 m 0 n+m | m, n is a positive integer}.

c) L = {0 n | n is a prime number}.

Chapter 3. Context-Free Grammars and Pushdown Automata

Chapter 3. CONTEXT-FREE GRAMMAR AND PUSH DOWN AUTOMATICS

(Contexet free Grammar - CFG and push down Automata - PDA)

Target:

To enable students to:

- Understand the concept and identify the components of a CFG.

- Identify the class of context-free languages (CFL) generated by CFG grammar and the properties of CFL.

- Construct the components of a CFG that specify a CFL class.

- Understand and build derivation and derivation trees.

- CFG can be shortened and standardized.

- Understand the concept and identify the components of a PDA

- Build components of PDA to recognize language generated by CFG and build CFG to generate language recognized by PDA. Main content:

- Context-free grammar: definition, derivation and languages and derivation trees.

- Context-free grammar shortening.

- Standardize context-free grammar.

- Properties of CFL.

- Push-down automaton: definition, language recognized by PDA.

- Relationship between PDA and CFG.

3.1. Context Free Grammar (CFG)

The origin of context-free grammar is description through natural languages. We can write the syntactic rules to express the sentence “An is a good student” as follows:

< simple sentence > → < subject > < predicate >

< subject > → < noun >

< predicate > → < verb > < complement >

< complement > → < noun > < adjective >

<noun> → An

<noun> → student

< verb > → is

< adjective > → good

The words in the pair of < > brackets such as <simple sentence>, <subject>, <predicate>, ... are syntactic categories, giving us the roles of the sentence components. We see a sentence being generated through gradual steps of development according to syntactic rules. This is also the form of the production rules in context-free grammar. Thus, context-free grammar can also be chosen as a model for the grammars of natural languages.

However, in computer science, with the need to represent programming languages, the context-free grammar CFG is also designed into an equivalent form called the BNF grammar ( Back-Nautical Form ) . This is also the CFG grammar with minor changes in form and some abbreviations that computer scientists often apply in expressing the syntax of high-level programming languages (such as ALGOL, PASCAL, ...). In the form of the BNF grammar, the symbol::= is used instead of the symbol →. For example, to define an arithmetic expression (expression) including identifiers (identifiers) participating in the operations +, * or sub-expressions nested in parentheses, we write:

<expression>::= ( <expression> )

The study of context-free grammars has provided a solid theoretical basis for the representation of programming languages, the search for parsing algorithms used in compilers, and many other string processing applications. For example, it is very useful in describing arithmetic expressions with nested parentheses or block structures in structured programming languages that cannot be specified by regular expressions.

A context-free grammar is a finite set of variables ( also called non-terminated symbols ), each representing a language. The language is represented by

Variables are described recursively in terms of another concept called the terminating symbol . The rules that relate variables to terminating symbols are called production rules. Each production rule has the form a variable on the left-hand side that produces a string that may contain both variables and terminating symbols in the grammar.

3.1.1. Definition

A context-free grammar (CFG) is a system of four components, denoted by G = (N, T, P, S); where:

- N is a finite set of non-terminating characters (or variables);

- T is a finite set of terminal characters; N  T = ∅ ;

- P is a finite set of productions where each production has the form A → α with A  N and

α  (N  T) * ;

- S is a special non-terminal character called the grammar start character.

For example: Context-free grammar G = (N, T, P, S); where: N = {S, A, B};

T = {a, b}; S = S;

P includes the following production rules:

S → AB;

A → aA;

A → a;

B → bB;

B → b.

Symbol convention:

- The uppercase letters A, B, C, D, E, ... start the Latin alphabet and S stands for non-ending characters (S is often used as the starting character).

- Small letters a, b, c, d, e, ..., numbers and some other characters represent ending characters.

- The uppercase letters X, Y, Z represent characters that may or may not end.

- The Greek letters α, β, γ, ... denote strings consisting of terminating and non-terminating characters.

We will represent the grammar in a concise way by just listing its productions. If A → α 1 ; A → α 2 ; ... ; A → α k are productions of the variable A in a certain grammar, we will write it briefly as A → α 1 | α 2 | ... | α k

For example: The grammar given in the above example can be abbreviated as:

S → AB;

A → aA | a;

B → bB | b‟

3.1.2. Derivatives and languages

1) Derivative

Given the CFG grammar: G = (N, T, P, S), suppose the production rule A → β  P and the string

αAγ  (N  T) * with α , β, γ  (N  T) * . If we replace the non-terminal character A in the string αAγ with the string β to obtain the string αβγ or apply the production rule A → β to the string αAγ to obtain the string αβγ; then, we say that the string αAγ is directly derived from the string αβγ or the string αβγ is directly derived from the string αAγ in the grammar G and is denoted as αAγ  G αβγ.

Suppose α 1 , α 2 , ..., α m are strings belonging to (N  T) * with m ≥ 1 and α 1  G α 2 ; α 2  G α 3 ; …; α m -1  G α m

then we denote it as α 1 *  G α m and say that α 1 derives (zero, one or more steps) into α m or that α m is derived from α 1 in the grammar G.

If α 1 is derived into α m in i steps, then we denote it as α 1 i  G α m .

If α 1 is derived into α m by one or more derivation steps, then we denote it as α 1 +  G α m .

Note that α *  G α for every string α. and if α *  G  ; *  G  then α *  G  with

every string α,  ,  .

Normally, if there is no confusion, we will use the symbols  , *  , + 

i  replaces the corresponding symbols  , *  , +  , i  .

GGGG

2) Language generated by context-free grammar

Given CFG grammar: G = (N, T, P, S) , Language generated by non-linguistic grammar

scene G is L(G) = {w | w  T

and S * 

w} and is called a context-free language.

That is, a string is in L(G) if:

1. String consisting entirely of terminating characters.

2. The string is derived from the starting character S.

A string α consisting of terminal and non-terminal characters is called a

A sentence is generated from grammar G if S *  α .

Two context-free grammars G , G are called equivalent if L(G ) = L(G ).

1 2 1 2

For example: 1. Consider the CLG grammar: G = (N, T, P, S), in which:

N = {S}; T = {a, b}; P = {S → aSb, S → ab}.

By applying the first generation rule n-1 times and the second generation rule 1 time, we have: S  aSb  aaSbb  a 3 Sb 3  ...  a n-1 b n-1  a n b n

So, L(G) contains strings of the form a n b n with n ≥ 1; or L(G) = {a n b n | n ≥ 1}.

2. Consider the CLG grammar: G' = (N', T, P', S), in which:

N' = {S, A}; T = {a, b}; P‟ = {S → aAb; A → aAb; A →  }.

By applying the first generation rule once, the second generation rule n -1 times and the third generation rule 1 time, we have:

S  aAb  aaAbb  a 3 Ab 3  ...  a n-1 Ab n-1  a n Ab n  a n b n

So, L(G‟) contains strings of the form a n b n with n ≥ 1; or L(G‟) = {a n b n | n ≥ 1}. And we have L(G) =L(G‟) so the two context-free grammars above are equivalent.

3.1.3. Derivative tree

To visualize the generation of strings in a context-free grammar, we often represent a derivation sequence as a tree. Formally, we define a derivation tree as follows:

1) Definition

Given a CFG grammar: G = (N, T, P, S). The derivation tree (or parse tree) of G is defined as follows:

1. Each node has a label, which is a character  (N  T  {ε}).

2. The root node is labeled with the starting letter S.

3. If the internal (intermediate) node has label A then A  N.

4. If node n has label A and nodes n , n , ..., n are children of n in order from left to right

1 2k

to the right labeled X , X , ..., X then A → XX ... X is a production rule

1 2k 1 2k

in the P. law set

5. If node n has the label empty word ε then n must be a leaf node and the only child of its parent.

2) For example:

a) Given grammar G = ({S, A}, {a, b}, P, S), in which P consists of: S → aAS | a;

A → SbA | SS | ba.

A derivation tree from a grammar looks like this:

a S

S b A

aba

Figure 3.1. Tree derived from grammar

We see that node 1 has label S and its children are a, A, S (note that S → aAS is a production). Similarly, node 3 has label A and its children are S, b, A (from production A → SbA). Nodes 4, 5 have the same label S and have child node labeled a (production S → a). Finally, node 7 has label A and has child nodes b, a (production A → ba).

On the derivation tree, if we read the leaves in order from “left to right”, we have a sentence in G. We call this string the string generated by the derivation tree.

Comment