Vietnamese linking grammar model - 8

After studying and drawing out the features of this language model, the thesis will focus on solving the following problems:

Parsing problem . This is a must-do problem when building a new syntactic representation model. 1

  • Parsing problem. This is a must-do problem when building a new syntactic representation model.
  • Machine translation problem. The associative grammar model represents many distinctive features of Vietnamese that need to be transformed into another language. Therefore, the thesis chooses the Vietnamese – English translation problem to take advantage of the ability to represent the direct relationship between words of the associative grammar model.



2.1.Associative grammar for Vietnamese

From the formal definition of a linking grammar, it can be seen that the most important job when building a grammar is to map words with linking nodes.

If the elemental unit in parsing of some languages ​​is a morpheme, then that unit in Vietnamese is a word. According to the document of the Social Science Committee [28], each word in Vietnamese can consist of many morphemes. The word limit detection in the text is done by the automatic word separator.

Vietnamese has different characteristics from other languages, such as in semantics, there is no meaning in the morphological category (like, number, way); In sentence-making activities, grammatical relationships are not expressed in transformations but in word order [16]. Connections of the association grammar can perfectly represent these relationships.

Links appear when words are combined. According to Nguyen Tai Can [2], there are three main types of combinations: conjugation, clause and short phrase. Conjugation and clause will be considered when performing complex parsing steps and will be covered in the next chapter. Phrases are combinations consisting of a center connected to the sub-elements by the main-sub-relation [2]. Depending on the type of center, the phrase is divided into noun, verb or adjective. The association relationships will be built based on the structure of the phrases. In addition, some relationships are not represented by word relations, for example “my mother”, “ao anh”, which are two noun phrases that go side by side, the second noun will indicate the owner of the second noun. Best. This is one of many special phenomena of Vietnamese syntax. Showing these relationships will effectively support the machine translation system with the source language being Vietnamese.

All linking cases will be stored in the linking grammar dictionary.

2.1.1. Link dictionary structure

The English grammar dictionary system was built by Sleator and Temperley, according to [111]. In 2003, Szolovits added a series of medical words [113]. From 2008 to 2011, the dictionary was updated by Linas Vepstas, adding clause relations, Mike Ross also added some new entries mainly related to subordinate clauses with the words “than” and words. “wh” form link [137].

The system is divided into 12 large sections with 7 categories for English words: nouns, determiners, pronouns, verbs, adjectives, adverbs and prepositions. Also included are the following items:

  • Number formats.
  • Words indicating time and place.
  • Conjunctions, question words.
  • From comparison.
  • Punctuation, other words.

In order to organize the storage easily, [111] has given the notation to form the formula to represent the association rules, that is:

Link direction:

The “+” sign after the connection name is only associated with the word on the right,

The “-” sign after the connection name is only associated with the word on the left,

Operator :

& occurs simultaneously on both component associations.

or occurs in either, or both, component associations.

xor selects only one of the two component links. This operator is added by the thesis to the Vietnamese parser to handle the case where it is allowed to choose only one of two ways of linking, for example linking with the word “beautiful” can be “very beautiful” or “beautiful”. wonderful” but not “very beautiful”.

{C}: C may or may not appear.

@C: Multiple instances of a type C connection can occur at the same time, for example in the phrase “the cute red hat”, two adjectives “cute”, “red”, both modify the noun “hat”.

Macros: Allows you to define a number of “macros” to make formulas shorter and easier to understand, for example a macro that defines a clause:

: {({@COd-} & (C- or )) or ({@CO-} & (Wd- & {CC+})) or [Rn-]};

In the following formulas, all occurrences of the expression on the right side are replaced by .

The Vietnamese linked dictionary also has the same structure as the English linked dictionary, meaning that each formula is set up for words of the same type. According to [16], Vietnamese words are divided into categories as shown in Table 2.1. down here:

Table 2.1.Types of Vietnamese words

                             NumberSType codeType name
4Mnumber of words
9Iauxiliary word
11Dthe word
12Zword elements (real, no, etc.)

Maybe you are interested!

Words are further divided into subcategories. In Table 2.2 below are the subcategories based on the hierarchy of [16] with the addition of the number of subcategories to meet the requirements of distinguishing links when translating according to the machine translation system of the thesis.

Table 2.2. Vietnamese word subcategories

NumbersSymbolType codeSubtype name
1NpNproper noun
2NcNmonosyllabic noun
3NgNoverall noun
4NaNAbstract nouns
5NsNnoun of type
6NuNunit noun
7NlNposition noun
8ViVintransitive verb
9VtVtransitive verb
10VsVstate verbs
11VmVmodal verb
12VrVrelational verbs
14ArArelational adjective
16AiApictographic adjective
17McMnumber from number
18MoMordinal word number
19PpPaddress pronouns
21PqPquantity pronouns
22PiPinterrogative pronoun
23RtRpresent time subjunctive
24RpRpast time subjunctive
25RfRfuture time adverb
26RlRadverb of degree
27RcRcomparative adverb
28RaRaffirmative adverb
29RnRnegative adverb
30RsRadverb of range
31EsEpreposition range
32EpEposition preposition
33EoEpossessive preposition
34EmEmaterial prepositions
35EgEpurpose preposition
36CsCmain conjunction
37CcCconjugated conjunctions
38IIauxiliary word
40DpDdetermine from quantity
41DpDplural adjective
42DsDsingular adjective
43ZZword elements (real, no, etc.)
Date published: 01/11/2021
Trang chủ Tài liệu miễn phí