After studying and drawing out the features of this language model, the thesis will focus on solving the following problems:
- Parsing problem. This is a must-do problem when building a new syntactic representation model.
- Machine translation problem. The associative grammar model represents many distinctive features of Vietnamese that need to be transformed into another language. Therefore, the thesis chooses the Vietnamese – English translation problem to take advantage of the ability to represent the direct relationship between words of the associative grammar model.
CHAPTER 2
VIETNAMESE LINKED PHARMACOLOGY MODEL
2.1.Associative grammar for Vietnamese
From the formal definition of a linking grammar, it can be seen that the most important job when building a grammar is to map words with linking nodes.
If the elemental unit in parsing of some languages is a morpheme, then that unit in Vietnamese is a word. According to the document of the Social Science Committee [28], each word in Vietnamese can consist of many morphemes. The word limit detection in the text is done by the automatic word separator.
Vietnamese has different characteristics from other languages, such as in semantics, there is no meaning in the morphological category (like, number, way); In sentence-making activities, grammatical relationships are not expressed in transformations but in word order [16]. Connections of the association grammar can perfectly represent these relationships.
Links appear when words are combined. According to Nguyen Tai Can [2], there are three main types of combinations: conjugation, clause and short phrase. Conjugation and clause will be considered when performing complex parsing steps and will be covered in the next chapter. Phrases are combinations consisting of a center connected to the sub-elements by the main-sub-relation [2]. Depending on the type of center, the phrase is divided into noun, verb or adjective. The association relationships will be built based on the structure of the phrases. In addition, some relationships are not represented by word relations, for example “my mother”, “ao anh”, which are two noun phrases that go side by side, the second noun will indicate the owner of the second noun. Best. This is one of many special phenomena of Vietnamese syntax. Showing these relationships will effectively support the machine translation system with the source language being Vietnamese.
All linking cases will be stored in the linking grammar dictionary.
2.1.1. Link dictionary structure
The English grammar dictionary system was built by Sleator and Temperley, according to [111]. In 2003, Szolovits added a series of medical words [113]. From 2008 to 2011, the dictionary was updated by Linas Vepstas, adding clause relations, Mike Ross also added some new entries mainly related to subordinate clauses with the words “than” and words. “wh” form link [137].
The system is divided into 12 large sections with 7 categories for English words: nouns, determiners, pronouns, verbs, adjectives, adverbs and prepositions. Also included are the following items:
- Number formats.
- Words indicating time and place.
- Conjunctions, question words.
- From comparison.
- Punctuation, other words.
In order to organize the storage easily, [111] has given the notation to form the formula to represent the association rules, that is:
Link direction:
The “+” sign after the connection name is only associated with the word on the right,
The “-” sign after the connection name is only associated with the word on the left,
Operator :
& occurs simultaneously on both component associations.
or occurs in either, or both, component associations.
xor selects only one of the two component links. This operator is added by the thesis to the Vietnamese parser to handle the case where it is allowed to choose only one of two ways of linking, for example linking with the word “beautiful” can be “very beautiful” or “beautiful”. wonderful” but not “very beautiful”.
{C}: C may or may not appear.
@C: Multiple instances of a type C connection can occur at the same time, for example in the phrase “the cute red hat”, two adjectives “cute”, “red”, both modify the noun “hat”.
Macros: Allows you to define a number of “macros” to make formulas shorter and easier to understand, for example a macro that defines a clause:
In the following formulas, all occurrences of the expression on the right side are replaced by
The Vietnamese linked dictionary also has the same structure as the English linked dictionary, meaning that each formula is set up for words of the same type. According to [16], Vietnamese words are divided into categories as shown in Table 2.1. down here:
Table 2.1.Types of Vietnamese words
NumberS | Type code | Type name |
1 | N | noun |
2 | V | verb |
3 | A | adjective |
4 | M | number of words |
5 | P | pronouns |
6 | R | adverb |
7 | E | preposition |
8 | C | conjunctions |
9 | I | auxiliary word |
10 | O | sympathy |
11 | D | the word |
12 | Z | word elements (real, no, etc.) |
13 | X | Unknown |
Maybe you are interested!
- Approach Through Stroke Structure And Unified Grammar
- Vietnamese linking grammar model - 6
- Vietnamese linking grammar model - 7
- Links Of Nouns Act As Subject And Complement
- Vietnamese linking grammar model - 10
- Vietnamese linking grammar model - 11
Words are further divided into subcategories. In Table 2.2 below are the subcategories based on the hierarchy of [16] with the addition of the number of subcategories to meet the requirements of distinguishing links when translating according to the machine translation system of the thesis.
Table 2.2. Vietnamese word subcategories
Numbers | Symbol | Type code | Subtype name |
1 | Np | N | proper noun |
2 | Nc | N | monosyllabic noun |
3 | Ng | N | overall noun |
4 | Na | N | Abstract nouns |
5 | Ns | N | noun of type |
6 | Nu | N | unit noun |
7 | Nl | N | position noun |
8 | Vi | V | intransitive verb |
9 | Vt | V | transitive verb |
10 | Vs | V | state verbs |
11 | Vm | V | modal verb |
12 | Vr | V | relational verbs |
13 | Ap | A | adjective |
14 | Ar | A | relational adjective |
15 | Ao | A | onomatopoeia |
16 | Ai | A | pictographic adjective |
17 | Mc | M | number from number |
18 | Mo | M | ordinal word number |
19 | Pp | P | address pronouns |
20 | Pd | P | subject |
21 | Pq | P | quantity pronouns |
22 | Pi | P | interrogative pronoun |
23 | Rt | R | present time subjunctive |
24 | Rp | R | past time subjunctive |
25 | Rf | R | future time adverb |
26 | Rl | R | adverb of degree |
27 | Rc | R | comparative adverb |
28 | Ra | R | affirmative adverb |
29 | Rn | R | negative adverb |
30 | Rs | R | adverb of range |
31 | Es | E | preposition range |
32 | Ep | E | position preposition |
33 | Eo | E | possessive preposition |
34 | Em | E | material prepositions |
35 | Eg | E | purpose preposition |
36 | Cs | C | main conjunction |
37 | Cc | C | conjugated conjunctions |
38 | I | I | auxiliary word |
39 | O | O | sympathy |
40 | Dp | D | determine from quantity |
41 | Dp | D | plural adjective |
42 | Ds | D | singular adjective |
43 | Z | Z | word elements (real, no, etc.) |
44 | X | X | Unknown |