Property 5 (Recursive Nature of Language)

1.3.2. Grammar and language type 1

A grammar G is called a type 1 grammar if its generation rules are of the form:    , with the condition   (N  T) + and     

Type 1 grammar is also known as Context Sensitive Grammar (CSG).

Languages generated by type 1 grammars are called type 1 languages and are also known as context-sensitive languages (CSLs).

1.3.3. Grammar and language type 2

A grammar G is called a type 2 grammar if its generation rules have the form:    ; with  N  (N  T) * .

Type 2 grammar is also known as Context free Grammar (CFG).

Languages generated by type 2 grammars are called type 2 languages and are also known as context-free languages (CFLs).

1.3.4. Grammar and language type 3

A grammar G is called a type 3 grammar if all its generation rules have one of two forms:

1. A  Bw or A  w (left linear grammar);

2. A  wB or A  w (right linear grammar). Where: A, B  N and w  T * .

Type 3 grammar is also called Regular Grammar (RG).

Languages generated by type 3 grammars are called type 3 languages and are also called regular languages (RL).

Attention:

From the above classification, we have the following observations: Type 3 grammar is also type 2 grammar; type 2 grammar is also type 1 grammar; type 1 grammar is also type 0 grammar. That is: G 3  G 2  G 1  G 0 .

From here we can deduce that the above statement is also true for languages. That is:

L 3  L 2  L 1  L 0 .

1.3.5. Example

1) Consider the grammar G = (N, T, P, S): a) - N = {S, A};

- T = {a, b};

- S = S;

- P = {S → aS; S → aA; A → bA; A → b}

This is a type 3 grammar (because the production set is right linear).

For example, a derivative of S has the form:

3 4

S  aS  aaS  aaaA  aaabA  aaabbA  aaabbbA  aaabbbb = ab

Or the grammar that generates the language L(G) = {ab | m, n ≥ 1}

b) - N = {S};

- T = {a, b};

- S = S;

- P: S → abS  a

is a right linear grammar. c) - N = {S, A, B};

- T = {a, b};

- S = S;

- P: S → Aab; A → Aab  B; B → a  

is left linear grammar.

2) Consider the grammar G = (N, T, P, S): N = {S};

T = {a, b};

P = { S → aSb; S → ab } This is a type 2 grammar (CFG).

For example, a derivative of S has the form:

4 4

S  aSb  aaSbb  aaaSbbb  aaaabbbb = ab

Or the grammar that generates the language L(G ) = {ab

| n ≥ 1}

3) Consider the grammar G = (N, T, P, S): N = {S, B, C};

T = {a, b, c};

P = { S → aSBC; S → aBC; CB → BC; aB →ab; bB → bb; bC → bc; cC → cc}. This is a type 1 grammar (CSG).

For example, a derivative of S has the form:

S  aSBC  aaBCBC  aabCBC  aabBCC  aabbCC  aabbcC

2 2 2

 aabbcc = abc

nnn

Or the grammar that generates the language L(G ) = {abc

| n > 0}.

4) Consider the grammar G = (N, T, P, S): N = {S, A};

T = {a, b, c};

S = S;

P: SA → a; S → bcA; acS → cA. This is a type 0 grammar.

1.4. Some properties of language

The nature of language is a set, so language has all the properties of a set. On the other hand, language is generated by grammar, so we need to consider what other properties language has besides the properties of sets. This section will present some properties of language generated by grammar.

1.4.1. Property 1

Type of closed language with union.

In other words, property 1 asserts that the union of two languages of the same type is a language of the same type.

Suppose given 2 grammars of the same type G 1 = (N 1 , T, P 1 , S 1 ), G 2 = (N 2 , T, P 2 , S 2 ), grammars G 1 , G 2 respectively generate languages L 1 , L 2 . Suppose L = L 1  L 2 we need to prove: L is the language generated by grammar G of the same type as grammars G 1 , G 2 . It is worth noting here that the two grammars must have the same alphabet T for the consideration to be meaningful.

We construct a grammar G = (N, T, P, S) that satisfies 2 conditions:

- G is of the same type as G 1 , G 2 ;

- G generates L such that L = L 1  L 2 .

To do so, we take N = N 1  N 2  S  (S does not belong to N 1 , N 2 ); P = P 1  P 2   S  S 1 , S  S 2  .

With the above construction of G, it is clear that G is of the same type as G 1 , G 2 . Now, we

need to prove: L = L(G) = L 1  L 2 .

Suppose w  L, by definition S *  w in G, from the construction of P, the first step in G can only be:

S  S 1 *  w or S  S 2 *  w; but from S 1 *  w or S 2 *  w we deduce that w  L 1  L 2 . Conversely, if w  L 1  L 2 , then w  L 1 or w  L 2 .

By definition it follows that S 1 *  w or S 2 *  w. If w  L 1 it means that

S 1 *  w . It follows that S  S 1 *  w ; that is, S *  w or w  L, and if w  L 2 it means that S 2 *  w. It follows that S  S 2 *  w , it follows that S *  w or w  L.

1.4.2 Property 2

Type of closed language with iteration.

Property 2 asserts that the iteration of a language is a homology language. That is, if M = LL…L = L i then M is a homology language of L. We can check the correctness of the above properties.

1.4.3. Property 3

Given G = (N, T, P, S) is not a type 0 grammar, with initial symbol S on the right side of the generating rule, then there exists a grammar G‟ of the same type and equivalent to G without initial symbol on the right side of the generating rule.

Prove:

Suppose G = (N, T, P, S) is a type 1, or type 2, or type 3 grammar, we construct G'= (N', T, P', S') as follows:

N‟= N   S‟  ; here S‟ is a new symbol not yet present in N and T. P‟ = P   S‟   if S    P,   (N  T) + 

In other words, P‟ includes all the generation rules of G with additional generation rules of the form S‟   if P has a generation rule of the form S   .

With the above construction it is clear that in G‟ there is no rule that the starting symbol S‟ appears on the right-hand side. Now, we only need to prove that G and G‟ are equivalent, that is, L(G) = L(G‟).

Suppose w  L(G) then by definition we have the derivative S *  w, separating the first step of this derivative we have S   *  w, with   (N  T) + . According to the construction of G‟ because in G there is the rule S   so in G‟ there is the rule S‟   . Then, the derivative S‟   *  w is in G‟ because the derivative  *  w in G is also a derivative in G‟.

Hence w  L(G‟) or L(G)  L(G‟).

Conversely, suppose w  L(G‟) then by definition we have S‟ *  w. By similar reasoning as above we have L(G)  L(G‟). Hence L(G) = L(G‟).

Attention:

Property 3 allows us to consider grammars of types 1, 2, 3 as having no starting symbol on the right-hand side of the production rules. Therefore, if the empty word   L(G) then there must be a rule S   in P.

1.4.4. Property 4

Let G = (N, T, P, S) be not a type 0 grammar, then L(G)  or L(G)   is a language of the same type as L(G).

Prove:

Property 4 is easily proved by property 3 because if the grammar is not a type 0 grammar generated from the empty, then the set of generating rules P contains the rule S   . Therefore, the language set L(G)-  or L(G)   corresponds to removing from or adding to P the generating rule S   and this clearly does not change the type of the language.

1.4.5. Property 5 (Recursive nature of language)

We say that a grammar G is recursive if there exists an algorithm to determine whether a given word w is in L(G) or not? Before talking about the recursive nature of languages, we have the following observations:

- Suppose G = (N, T, P, S) is a context-sensitive grammar, the empty string   L(G) if and only if the set P has the rule S   . Thus, we need to check whether the rule S   exists in P or not? By removing the rule S   from P, we have a new context-sensitive grammar:

G‟ = (N, T, P‟, S)

Grammar G‟ generates L(G)  , every derivative in G‟ satisfies the condition of context-sensitive grammar. That is, in every generating rule of G‟, the length of the right-hand side is greater than or equal to the left-hand side.

Suppose V = N  T,  V  = n and w is a non-empty string in L(G‟). We deduce that S *  w. Suppose the derivation has the form:

S  1  2 …  m (a) Here  m = w. From the above observation we deduce:

 1  2  …  m 

Suppose that there exists i such that the strings  i ,  i+j in the above derivation sequence have the same length and are equal to p. Suppose that j > n p then it follows that in the derivation sequence (a) there are at least two identical strings, therefore it follows that in V * there are at most n p strings of length p in the opposite case we can skip at least one step in the above derivation sequence. Suppose  r =  s with r < s then the derivation (a) can be rewritten more briefly as:

S  i …  r  s+1 …  m = w.

This intuitively allows us to observe that if there is a derivative of w, it will have a derivative that is not too long. This observation is the basis for the following theorem:

Theorem: Let G = (N, T, P, S) be a context-sensitive grammar then G is a recursive grammar.

Prove:

From the above observation, suppose L(G) does not contain the empty word, then we have the right to assume that P does not contain the rule S   and w belongs to L(G), w  T + and  w  = n. We need to show the algorithm to determine whether w belongs to L(G) or not?

We define the set T m as follows:

T m =   V + ;   n ; S *   has no more than m steps  Clearly T 0 =  S  ;

We can also easily see that we can find T m through T m-1 T m = T m-1     ;  T m-1;    n  (b)

If S *   and   n then  must belong to some T m , otherwise if  cannot be derived or  > n then  will not belong to any T m .

From the above construction of T m, we can deduce that T m is a non-decreasing sequence T m-1  T m for all m  1; furthermore, this sequence is bounded so there exists an m such that T m = T m-1 = T‟. If w does not belong to T m then w does not belong to L(G) and of course if w belongs to T m then S *  w.

Suppose V = N  T and  V  = q, then in V + the number of strings with length less than or

equal to n is s = q + q 2 + q 3 +……+ q n . We have s < (1+q) n+1 , clearly the number of these words is finite in T‟. Therefore, we can compare w with the words in T‟. That is, there exists an algorithm to determine whether w is in L(G) or not?.

For example:

Give

G = (N, T, P, S):

N=  S, B, C  ;

T=  a, b, c  ;

S  aSBC; S  aBC;

CB  BC; aB  ab;

BB  bb; bC  bc;

cC  cc;

Maybe you are interested!

Considering from w = abac in L(G); according to the construction of T m in the above theorem we have: T 0 =  S  ;

T 1 =  S, aSBC, aBC  ;

T 2 =  S, aSBC, aBC, abc  ; T 3 =  S, aSBC, aBC, abc  ; T 3 = T 2 = T'

Since w = abac does not belong to T‟, w does not belong to L(G).

1.5. Automat

1.5.1. Automated description

In addition to grammar, people also use another means to represent language; that is automaton. An automaton is understood as an abstract “machine” with a very simple structure and operation but capable of predicting language. With any string, after a number of steps, the automaton will give the answer whether that string belongs to the language or not. To have such an automatic process, people often have to pre-program it with an “execution route” and the machines only need to operate according to this route. One of these typical and most powerful automatons can be said to be

digital computers today. Although they operate in a “machine” manner, each step of an automaton is essentially a symbolic substitution, that is, a derivative step as mentioned above. In general, an automaton model usually includes the following main components:

a 1

a 2

a 3

a 4

…

a i

…

a n

Get in

Reader

Controller

Figure 1. General model for an automaton

The input string to be determined will be stored on the input tape. At each point in time, corresponding to the current state, the machine reads a character on the input tape, possibly combined with examining the corresponding symbol in memory, the automaton's controller will decide the next step to the next state.

The types of automatons corresponding to some grammar classes will be introduced in turn in the following chapters.

1.5.2. Classification of automation

Based on the operation of the automaton, it can be divided into two types: deterministic automaton and non-deterministic automaton.

Deterministic Automata: An automaton in which at each step, the controller has only one possibility to choose the next state when reading a character on the input tape. This uniqueness represents determinism, meaning that the transfer function of this type of automaton is always single-valued.

Non-deterministic Automata: An automaton in which at each step the controller has some choice of which state to move to next when reading a character on the input tape. This choice is non-deterministic, meaning that the transfer function of this type of automaton is multi-valued.

Comment