The thesis aims to build a new model to represent Vietnamese syntax. This model follows the current popular approach: dependent and lexicalized approach.

To build the association model without the support of linguistic studies, the thesis had to research and synthesize knowledge about Vietnamese syntax, draw out the features of word association and structures, consult linguists for an acceptable association model for Vietnamese.

In order to test and demonstrate the advantages of the link syntax representation model, the thesis has built a link parser. The results obtained with simple and compound sentences are satisfactory, no less than the traditional models, but the storage and search analysis is much simpler.

Vietnamese is an Asian language, with very different characteristics from European languages, especially in terms of morphological changes. Taking advantage of the ability to represent morphological information of the linking grammar, the thesis has tested the translation system with the tool of the associated grammar, which is annotated anthology. The initial test results are acceptable for the small corpus.

The main contributions of the thesis


For the first time, the associative grammar model was built for Vietnamese, a dependent model, very flexible and has many practical applications. Associative grammar has flexibly represented many phenomena in Vietnamese that, to our knowledge, we have not seen other models handle.

The Vietnamese parser gives a very compact syntactic representation, which facilitates the construction of a linked parsing bank. The analysis of compound sentences in various forms gives good applicability to other tasks, such as creating high-quality translations.

Specifically, the thesis has made the following contributions:

  1. Building an association model for Vietnamese at the syntax level.
  2. Complete set of linked dictionary with 40,000 entries, more than 150 formulas and 77 types of connections.
  3. Build and test Vietnamese parser at single sentence level.
  4. Propose a Viterbi-style algorithm to eliminate component ambiguity according to the 3-gram model.
  5. Improved discourse analysis algorithm at sentence level combined with association analysis. Build a parsing algorithm for compound sentences and solve the following problems:
    1. Parsing association for compound sentences consisting of many clauses with many types of complex discursive relations.
    2. Developing the problem of conjugate ambiguity: resolving ambiguity when the word “and” plays the role of a discursive sign and a conjunction.
  6. Build a Vietnamese – English automatic machine translation model based on annotated selection form.
  7. Build a Vietnamese – English ADJ dictionary with the same size as the associated grammar dictionary.
  8. Build a set of Vietnamese – English translation rules with about 300 translation rules.
  9. Building and testing a Vietnamese – English machine translation system based on annotated selection form. The system gives acceptable results with basic and advanced Vietnamese sentence patterns.

These are completely new results because the associative grammar model has never been built for Vietnamese. The analysis of multiple-clause compound sentences by linking grammar has not been solved in any language. The translation model is based on annotated selection form, although it has been used for English – Indonesian translation, when applied to the Vietnamese – English translation system, it has been built completely new, showing important characteristics of Vietnamese and overcoming There are major syntactic differences between the two languages.

In terms of practice

  1. The linking grammar model builds a new parsing method for Vietnamese.
  2. The dictionary system will provide good support for those who want to approach the problem according to this model.
  3. Linked Analytical Banks enable linkage studies. with the statistical approach.
  4. Support the promotion of information about tourism, culture and society to the world.
  5. Good support for Vietnamese language teaching.
  6. Parsing results are easy to understand and close to the ideas of learners, especially those who are not majoring in linguistics.
  7. The translator gives good quality on a small set of sample sentences (Suitable for basic and advanced Vietnamese programs).

Limitations and development directions

The parser has worked quite effectively with the class of simple and compound sentences. However, the thesis has not modeled the connection in the case of complex sentences, when the clauses overlap and intersect, for example the sentence “The fan you gave me yesterday runs very well”. The thesis can only handle the case of adverbial clauses at the beginning of sentences, not analyzed with some cases of adverb clauses in other positions.

Once the parser is available, building a multimedia database of link analyzers can effectively assist Vietnamese learners in understanding the syntactic structures of Vietnamese. .

The sample corpus as well as the analysis bank need to be expanded for more accurate and comprehensive assessments.

If there is a Vietnamese – English bilingual sentence bank as well as a more complete Vietnamese – English dictionary, the ADJ dictionary can be edited to better eliminate ambiguity.

The translation system with ADJ completely on the rule has worked quite effectively. Due to time constraints, the thesis only translates on simple sentences and compound sentences with two clauses. With the existing compound sentence analysis system, it is possible to translate multiple-clause compound sentences. In addition, if combined with a statistical translation system, this system will participate in the translation refining process and will certainly give translations of much better quality. The thesis has initially tested on a system with similar properties, which is an example-based translation system, with positive results.


01/11/2021
