The tree connection grammar was also built for Vietnamese in [22] by extracting from the Vietnamese treebank. In terms of language representation, a tree-connected grammar is capable of representing context-sensitive languages. This approach is effective when the Vietnamese treebank is large enough.
1.2. Approach through stroke structure and unified grammar
Unified grammar is built on the basis of merging feature structures. The stroke structure is represented through an attribute value matrix (AVM) of the form:
Stroke 1 Value1
Stroke 2 Value2
… . . .
Stroke n Value n
For example, a noun structure in English describes the features of a noun: Type – noun, Number – Singular, Person – 3 as follows:
Maybe you are interested!
- Vietnamese linking grammar model - 2
- Context-Free Grammar For Natural Language Representation
- Lexicalization Probabilistic Context-Free Grammar
- Vietnamese linking grammar model - 6
- Vietnamese linking grammar model - 7
- Vietnamese linking grammar model - 8
CAT NP
NUMBER SG
PERSON 3
The stroke structure is defined as the mapping F → VF, F is the set of strokes , VF is the set of values that can be assigned to the strokes.
The above example is a stroke structure on the stroke set F = { CAT, NUMBER, PERSON }, the value set VF = { NP, SG, 3 }.
The incremental grammar contains the rules of the form A → X1…Xn where A is the name of the parent stroke structure, X1, …Xn is the child stroke structure.
Rules in additive grammar are represented by a stroke structure containing variables, so that the rule can be applied to many different situations. For example, the rule of addition for a simple noun phrase:
(NP NUMBER ?n) → (ART NUMBER ?n) (N NUMBER ?n )
represents the numerical unity of articles and nouns.
If the stroke can be represented as a line graph, then the stroke graphs can be merged into one large graph. It is the main component of unified grammar.
Unified grammar is a tool that can represent language class 0 which is the largest language class according to Chomsky’s hierarchy [63]. According to Tran Ngoc Tuan group [26], using unified grammar can solve some phenomena in Vietnamese such as the association of some words. Words can join together only when a conjugation that unites their strokes can be made. For example, the word “book” with the SHAPE: square/thin stroke is only associated with objects that have the same SHAPE stroke description, such as “book”. However, the detailed description for most of the phenomena of Vietnamese grammar to build a specific analyzer is too complicated. The authors of [26] only deal with a subset of Vietnamese nouns.
1.3. Dependency approach
1.3.1. Some concepts
Dependent grammar has its origins in the ancient Indian language Panini, the modern model introduced by Lucien Tesnière [75]. The study of dependent grammar flourished in Slavic languages [92], Turkish due to the free characterization of word order.
An important point in the dependency grammar model is an asymmetric relationship called a dependency (or dependency – dependency). The dependency relationship that occurs between a dependent word and another word on which it depends is called the head word.
The dependency grammar uses two alphabets: the terminating symbol set and the auxiliary symbol set.
Each element of the terminating symbol set is a smallest syntactic unit (prime unit), e.g. morpheme (in morphologically modified languages), pronunciation or word… Statement is treated as a sequence of elements of the terminating symbol set.
The auxiliary symbol set is the set of occurrence type names of the terminating symbols. Complementary symbols are not allowed to be ambiguous; Each symbol has fixed syntactic properties.
There are different models of dependency grammars. The first model was formally described by Hays [62] and Gaifman [57].
Definition 1.3. [57]
The dependent grammar is a set of four components DG = ( L, C, F, R ), where
L: Terminal alphabet.
C: The auxiliary alphabet.
F: L → C assignment function.
R: The set of rules depends on one of the following three forms:
- Xi(Xj1, Xj2,… ,*, …, Xjn), where Xi is the central word, Xj1, Xj2,…, Xjn are the dependent words, n is a number. The order of words in rule 1 is the order in which they appear in the sentence (there may be interjections between words mentioned in the rule). The * marks the position of the central word when standing with its dependent words in the utterance.
- Xi (*), indicating that the terminator for Xi can appear without the dependent word.
- *(Xi), indicates that the unit corresponding to Xi can occur without a central word. This object is the center of the utterance in which it appears.
Eg:
Grammar DG = ( L, C, F, R )
L = { John, loves, a, woman }
C = { N, V, Det }
F: John → N, woman → N, loves → V, a → Det
R including the laws :
- *(V)
- V(N, *, N)
- N(Det, *)
- N(*)
- Det(*)
Usually, a ROOT word is added to easily handle objects like V. The sentence “John loves a woman” can be represented as a tree as shown in Figure 1.4 below:
Figure 1.4. Sentence analysis “John loves a woman” in a dependent grammar model
With regard to dependent grammars, there are several important concepts and properties that will be discussed below.
The following definitions are taken from [75]
Definition 1.4.
A sentence is a sequence of prefixes (words) represented by S = w0w1…wn
For simplicity, assume that the sequence w1,…wn is a sequence of different words, for example in the sentence “Mary saw John and Fred saw Susan”, two different instances of the word “saw” are considered distinct. .
Definition 1.5.
Suppose R = { r1, … , rm } is a finite set of possible dependencies between two words in a sentence. The relation type r R is called the label of the arc,
Definition 1.6.
The dependency graph G = (V, A) is a directed graph including vertex set V and arc set A such that for the sentence S = w0w1…wn and label set R, the following statements are true:
- V ⊆ { w0, w1, … wn }.
- A ⊆ V× R × V.
- Nếu (wi , r, wj) ∈ A thì (wi . r’,wj) ∉A với mọi r’≠ r.
Example: The dependency graph of the sentence “Economic news had little effect on financial market” in figure 1.5.
Figure 1.5. Dependency graph of the sentence “Economic news had little effect on financial market”
G = (V, A)
V = VS = { ROOT, Economic, news, had, little, effect, on , financial, markets }
A = { (ROOT, PRED, had), (had, SBJ, news), (had, OBJ, effect), (had, PU,.), (news, ATT, Economic), (effect, ATT, little), (effect, ATT, on), (on, PC, market), (market, ATT, financial) }
The definition of dependency (wi , r , wj ) is not unique but varies across different linguistic theory systems.
Definition 1.7.
The correct dependency graph G = (V, A) of the sentence on S and the set of dependencies R is a tree-shaped, directed dependency graph that originates from node w0 and has a set of frame nodes.
V = VS. We call this dependency graph the dependency tree.