The important role of the dependency model is clear. However, the dependent grammar model has linguistic difficulties. According to Nguyen Tai Can [2], there are many debates about the dependence between elements in Vietnamese sentences, for example, some objects can play a minor role in syntax, but play a major role in terms of grammar or vocabulary. The central role of nouns and verbs belongs to any object, and there are many different views. Therefore, despite being mentioned in some documents such as [6], no work on dependent grammar has been published in the field of linguistics. Vietnamese has a graph dependency parser [17], but it is difficult to develop further because it does not have a complete dependency grammar system. With the desire to approach the dependent but lexical grammar model, the thesis has chosen for its topic a model in the direction of dependence but completely lexicographically: the model of associative grammar.
Associative grammar is a model proposed by D.Sleator and D. Temperley [111], which allows each word to have some relationship with words on the left or right, satisfying the requirements of flatness, connectivity, satisfaction, order, and exclusion. A linking grammar is a grammar with a dependency approach, shown in the following points:
- Link analysis does not contain non-terminating symbols, even simpler structure than the dependency tree. The association analysis can be thought of as a linear list with each node containing no more than 3 relationships with other nodes. The analysis bank is therefore simpler than the structure tree bank. Many databases are established from large analytical banks such as multimedia data banks [128]. Link analysis is commonly used for other applications such as information extraction [84], [106], [110], machine translation [35], automatic question and answer [95], [105]. Many parsers for different languages are built on the associative grammar model for English[111], Russian [132], German [76], Turkish [68]. ..
- Associative grammars are also capable of directly representing relationships between words that are not necessarily adjacent. Thus, associative grammar also allows for a relatively free word order, for example the set of connections of the sentences “I am very tired today” and the sentences “I am very tired today” are not different. The analysis of the two statements above differs only in the order of the links. Of course, according to Schneider [109], because the associative grammar model requires flatness, it is not as flexible as the dependency grammar when representing the long distance dependency in a sentence. This is acceptable for Vietnamese, because in general, Vietnamese sentences follow the SVO order, the structure of nouns, verbs, and adjectives is generally fixed, the number of components has positions that change arbitrarily. much.
- Associative grammars can represent semantic relationships. It is easier to represent semantic relations than a dependency grammar because sentence analysis in a linking grammar may contain cycles.
- Distinguishing main-subordinate components in a sentence becomes more complicated because linkages are not as directional as dependencies. Therefore, for some problems, such as text summarization, the associative grammar model is not as convenient as the dependent grammar. However, in many fields such as knowledge representation, machine translation, etc., associative grammar is very effective.
- Associative grammars do not require a rule-dependency relationship, so the analysis of component clauses can be easily combined into one large analysis, making the analysis of multiple-clause compound sentences easier.
- Associative grammar is one of the very few models that is completely lexicalized, so it can represent lexical relationships in much more detail than dependent grammars, syntactic grammars (relationships specifying definitions). to word category). This feature allows to represent many phenomena in Vietnamese. For example, directional verbs such as “run”, “carry”, “open”, “cover” can be combined with directional auxiliary elements: “out”, “in”, “in” up and down”. The DR association is established between the above words that do not exist with any other word type.
- Links can be used to represent knowledge [53], links are also very close to concept graphs, so it is easy to switch from association to concept graph [131]. Linkage analysis is also used to extract information [50], [52], [90], [97], especially semantic information [82].
- Labeled associations should directly represent predicate-modifier relationships and other relationships, facilitating translation into morphologically modified languages, better than models where the relationship is not dependent. labeled (according to Zamin [129]).
Through initial research and experimentation, the thesis draws some comments:
- Up to now, the most common way to represent Vietnamese syntax is through a model of a (context-free) grammar grammar with a structure tree. However, Vietnamese has its own characteristics that this structure is not easy to represent: hiding possessive prepositions, changing word types, combining word numbers and unit nouns, etc. These characteristics can be represented. flexibly and simply through the association model. Especially when solving the translation problem from Vietnamese to another language, detecting the direct relationship between words gives the ability to convert to the structure of the target language with high quality.
- Sentence analysis according to the association model is very close to human thinking, so it can effectively support Vietnamese learners when learning syntax and making sentences. The result of the link analysis of the sentence is much simpler than that of the structure tree. Although it is a graph, association analysis is more like a linear list of words, each of which is related to no more than three other words. That allows easier analytical bank lookups than treebanks, facilitating statistical approaches.
- Due to the complexity of compound and complex sentence structure, not many studies on automatic parsing are interested in this type of sentence, especially in Vietnamese. The linking grammar model gives a way of linking clauses based on large connections, making it possible to analyze and process compound and complex sentences effectively.
- Currently, because Vietnamese does not have many resources for machine translation, the machine translation systems are mainly English-Vietnamese and follow a rule-based approach. Since the associative grammar model flexibly represents many syntactic phenomena of Vietnamese, and it is quite easy to convert syntactic associations into other languages, it can be used to build a machine translation system. Vietnamese – English is based on the rule of easily handling many differences between the source language and the target language, well supporting many translation requirements in practice. This system can be integrated with other approaches such as based on examples, statistics to create translations of good quality: fluent and correct in syntax and grammar.
From there, the thesis determines the goal to focus on researching and building a grammar model linking Vietnamese with the following characteristics:
- Based on the associative grammar model proposed by Sleator and Temperley [111].
- Based on the characteristics of Vietnamese syntax and French words.
- Can be used to parse Vietnamese by link analysis method. The scope of the parser is simple sentences as well as compound sentences that include many isometric and dependent clauses.
- Can be applied to solve the problem of Vietnamese – English machine translation.
- Create research products: linked dictionaries, bilingual dictionaries with annotated anthology.
To do that, it is necessary to carry out many core research contents such as: Approaches to syntax representation (especially the dependency approach), model of association grammar and relationships. with dependent grammar model, associative grammar models have been built for English, Russian and some other languages. The English parser and the propositional decomposition algorithms of compound sentences are the problems that the thesis studies to build the Vietnamese link parser. To illustrate the performance capability of the Vietnamese linked grammar model, the thesis goes into understanding translation systems to build a machine translation using a linked grammar.
Within the framework of the thesis, the work will be limited to:
- Building an association model to represent Vietnamese syntax. The linked dictionary of Vietnamese was built experimentally, covering the most basic syntactic phenomena and some special cases commonly encountered in practice.
- The Vietnamese link parser goes through the same preprocessing stages as any other parser. In this approach, the parser does not assign word labels before parsing, but the word splitting stage cannot be skipped. The thesis used the separator from vnTokenizer of Dr. Le Hong Phuong, freely available online.
- Research on probabilistic association grammar model to eliminate ambiguity in parsing. The thesis limits the scope of work to test the proposed algorithms.
- Research on discourse structure theory and sentence-level discourse segmentation algorithm to separate compound sentences into propositions. Propose large connections for propositions on the basis of discursive relations to produce the overall analysis of compound sentences.
- The construction of a Vietnamese-English translation system based on annotated selection form is an illustration for the application of the Vietnamese associative grammar model. This system is tested on a corpus consisting of sentence patterns in the basic and advanced Vietnamese program for teaching foreigners of the Faculty of Vietnamese Studies and Vietnamese, University of Social Sciences and Humanities, Hanoi National University [18]
The thesis is divided into 4 chapters and 4 appendices as follows:
Chapter 1: An overview of grammar models for natural languages introduces grammatical models to describe the syntax of natural languages and the relationship of grammar models associated with grammar models. is different.
Maybe you are interested!
- Vietnamese linking grammar model - 1
- Context-Free Grammar For Natural Language Representation
- Lexicalization Probabilistic Context-Free Grammar
- Approach Through Stroke Structure And Unified Grammar
Research results of PhD students related to the thesis are presented in chapters 2, 3, 4.
Chapter 2: The model of Vietnamese associative grammar gives details of the Vietnamese associative grammar system developed by the PhD student.
Chapter 3: Parsing on a linking grammar describes the link parser, the direction to solve the parsing problem for compound sentences, the syntactic ambiguity problem, and the solution.
Chapter 4: Annotated selection-based translation system demonstrates the test of the associative grammar model in the Vietnamese-English machine translation problem.
Conclusion and directions for development.
The appendix includes four appendices:
Appendix 1: Linking formula for Vietnamese word subcategories.
Appendix 2: Parsing results of some simple and compound sentences with two clauses.
Appendix 3: Some typical laws in the Vietnamese – English translation law set.
Appendix 4: Compare translation results of some sample sentences.