Vietnamese linking grammar model - 37

MINISTRY OF EDUCATION AND TRAINING

Hanoi University of Science and Technology

NGUYEN THI THU HUONG

MODEL

VIETNAM LINK PHARMACS

Specialization: Computer Science

Code: 62.48.01.01

PhD THESIS OF INFORMATION TECHNOLOGY

Science Instructor:

Maybe you are interested!

  1. Dr. NGUYEN THUC HAI

Prof.Dr. NGUYEN THANH THUY

Hanoi – 2013

Acknowledgments

Before presenting the research content of the thesis, I would like to express my sincere gratitude to my two instructors, Prof. Dr. Nguyen Thuc Hai, Prof. Dr. Nguyen Thanh Thuy, dear teachers, not only enthusiastically guided and helped me, but also encouraged me a lot to complete this thesis.

Thank you very much to my colleagues at the Department of Computer Science and the Institute of Information and Communication Technology, Hanoi University of Science and Technology, for supporting and sharing with me in my work. a lot in difficult times.

I would like to thank Assoc.Prof.Dr. Luong Chi Mai, Assoc.Prof.Dr. Le Thanh Huong, Assoc. Dr. 1

I would like to thank Assoc.Prof.Dr. Luong Chi Mai, Assoc.Prof.Dr. Le Thanh Huong, Assoc. Dr. Nguyen Thi Kim Anh, Assoc. Dr. Dang Van Chey, Dr. Nguyen Van Vinh, and Dr. Nguyen Thi Minh Huyen have helped and contributed a lot of valuable comments to the thesis.

I would like to express my sincere thanks to linguists: Assoc.Prof.Dr Pham Van Tinh, Assoc.Prof.Dr. Nguyen Chi Hoa, Vu Xuan Luong,Dao Van Hung for their enthusiastic support in understanding the characteristics of Vietnamese.

Sincere thanks to alumni Le Van Chuong, Pham Nguyen Quang Anh, Luyen Thanh Dat, Le Ngoc Minh for helping me in the process of testing the association model. Sincere thanks to the VLSP research team, especially Prof. Dr. Ho Tu Bao and Dr.

Nguyen Phuong Thai provided the Vietnamese corpus for me to experiment with.

Hanoi March 20, 2012

Thesis author

Nguyen Thi Thu Huong

AGREEMENT

I hereby declare that this is my own research work. The data and results in the thesis are honest and have never been published in any other works.

Thesis author

Nguyen Thi Thu Huong

CONTENTS

LIST OF SIGNS AND Acronyms ……………………………………………………….. …………………..4

LIST OF DRAWINGS ………………………………………………. …………………………… .5

LIST OF TABLES …………………………………………………… …………………………… ….8

LIST OF IMPORTANT CONNECTIONS …………………………………………….. ……………………..9

INTRODUCTION …………………………………………………………………. …………………………… ………………………. 11

CHAPTER 1 OVERVIEW OF PHARMACOLOGICAL MODELS FOR NATURAL LANGUAGE …………………………………….. …………………………… …………. 20

1.1. Context-free grammar and structure approach ………………………………………………………… 20

1.1.1. Context-free grammar for natural language representation …………………………………….. 20

1.1.2. Probabilistic context-free grammar ………………………………………………………………………. …………23

1.1.3.Vocabulary Probabilistic Context-Free Grammar ………………………………………………………… …. 26

1.1.4. Tree connection grammar …………………………………………………………………………… …………27

1.2. Approach through stroke structure and unified grammar ………………………………………………………… .. 28

1.3. Dependency approach …………………………………………………………………………… …………………………… 29

1.3.1. Some concepts ………………………………………. …………………………… 29

1.3.2.Characteristics of the dependency tree …………………………………………………………… …………32

1.4. Linking grammar …………………………………………………………………………… ………………………. 34

1.4.1. The concept of linking grammar …………………………………………………………………………… ..34

1.4.2. Formal definitions of associative grammars ……………………………. 38

1.5. Conclude ………………………………………… …………………………… …. 40

CHAPTER 2 VIETNAMESE LINKED PHARMAGIC MODELS………………………………………………………. 43

2.1.Associative grammar for Vietnamese ………………………………… …………43

2.1.1. Link dictionary structure …………………………………………………………………………… ………… 43

2.1.2. Link building for nouns …………………………………………………………………………… ……………. 47

2.1.3. Links for verbs …………………………………………………………………………… ………… 55

2.1.4. Links for Adjectives …………………………………………………………………………. ………… 60

2.1.5. Linking clauses in simple compound sentences ………………………………… 61

2.2. Expanding the linked grammar dictionary …………………………………………………………………….. ……….64

2.2.1. Algorithm to expand dictionary …………………………………………………………….. ………………………….. 66

2.2.2. The application of the algorithm to expand the Vietnamese dictionary ………………………………………………………. 67

2.2. Conclude ………………………………………… …………………………… 68

Chapter 3 ANALYSIS ON LINKED VARMS ………………. 70

3.1. Link Parser …………………………………………………………………. ………… 70

3.1.1. Parsing Algorithm …………………………………………………………….. .. 70

3.1.2. Trimming comb ……………………………………………………………………… ………………………… 72

3.1.3. Test results of analyzing simple sentences and simple compound sentences …………. 74

3.2. Parsing for compound sentences …………………………………………………………….. ………… 77

3.2.1. Building a tree of discourse …………………………………………………………………………………… ………… 81

3.2.2. Compound Sentence Parsing Algorithm ………………………………………………………… ….. 90

3.2.3. Find words to connect clauses ………………………………………………………… ………………….. 91

3.2.4. Compound sentence analysis test results …………………………………………………………….. … 93

3.2.5. Computational complexity …………………………………………………………………………… ………… 96

3.3.Remove ambiguity …………………………………………………………………. …………………………………………. 96

3.3.1. Component ambiguity …………………………………………………………………………………….. …………….. 97

3.3.2. Conjugate ambiguity removal ……………………………………………………………………………………….. ………… 103

3.4. Conclude ………………………………………… …………………………… .. 107

CHAPTER 4 MACHINE TRANSLATION SYSTEM USING ANNOUNCEMENT FORM

…………………………………………………………………….. …………………………… ………… 109

4.1. Overview of machine translation …………………………………………………………………………… ………………………. 109

4.1.1. Development situation of machine translation in Vietnam ………………………………………….. …… 109

4.1.2. Methods for assessing the quality of machine translation ………………………………………………. 111

4.2. Language difference between Vietnamese and English …………………………………………………………………. ………………….. 112

4.2.1. Morphological difference …………………………………………………………………………… …………………….. 112

4.2.2. Differences in word order …………………………………………………………….. ………… 115

4.3. The machine translation system uses annotated selection …………………………………………. 116

4.3.1.Find the meaning of words in the ADJ dictionary ………………………. ………… 118

4.3.2.Development of translation laws ………………………………………………………. ………………………………………… 119

4.3.3.Complete the translation ………………….. ………………………… 125

4.3.4.Test results with annotated selection-based translator …… 126

4.4. Conclude ………………………………………… …………………………… 130

CONCLUSION AND DEVELOPMENT DIRECTIONS ………………………. ………… 133

Summary …………………………………………………………………. …………………………… ………… 133

Main contributions of the thesis……………………………………………………….. ………… 133

Scientifically ……………………………………………………………….. …………………………………………. 133

Practical aspects ……………………………………………………………………………………….. …………………………… 134

Constraints and development directions …………………………………………………………………….. ………………………… 135

DISCLOSED WORKS …………………………………………………… ………………………………… 136

REFERENCES …………………………………………………… …………………………… 137

VIETNAMESE …………………………………………………………….. …………………………… …. 137

ENGLISH …………………………………………………………….. …………………………… ….139

RUSSIA …………………………………………………………….. …………………………… …. 147

SITES ……………………………………………………………….. ……………………………………………………….. 147

APPENDIX 1: DETAILS OF MAJOR RECIPES IN VIETNAMESE LINKS ………………………. …………………………… ………………………. 148

APPENDIX 2: RESULTS OF LINKED ANALYSIS OF SOME SIMPLE AND COMPLEX SENSITIES ………………………………………………………… …………………………… ………. 166

APPENDIX 3: SOME TYPICAL TRANSLATION LAW …………………………………………………………… …………….. 174

  1. Law of defining attributes ……………………………………………………………………………………. …………………………… 174
  2. Law of phrase translation …………………………………………………………………….. ………………………………………. 175
  3. Structural transformation law ……………………………………………………………………… …………………………… 178

APPENDIX 4: COMPARATIVE TRANSLATION RESULTS OF SOME QUESTIONS …………………………………………. 179

LIST OF SIGNS AND Acronyms

HMM Hidden Markov Model: Hidden Markov Model

BNF Backus Naur Form: Backus Metaphorical Formula

ADJ Annotated Disjunct: Annotated Annotation

RST Rhetorical Structure Tree: Discourse Structure Tree

CCR Chunks/Constituents/Relation

SVO Subject-Verb-Object: Sentence order by subject-verb-object type

SVM Support Vector Machine: Support Vector Machine

CRF Conditional Random Fields: Conditional Random Fields

EDU Elementary Discourse Unit: Elemental Discourse Unit

HPSG Head driven Phrase Structure Grammar: Center-oriented paragraph structure grammar

EBNF Extended Backus Naur Form: Extended Backus Metabolic Formula

LIST OF DRAWINGS

Figure 1.1. Structure tree of the sentence “I like chicken feet”. ………………………. 21

Figure 1.2. Two sentence structures of the sentence “They will not load the goods into the boat

tomorrow”. …………………………… …………………………… ……… 22

Figure 1.3. Probabilistic context-free grammar and structure tree of the sentence “Last week IBM

bought Lotus” …………………………………………………………………… …………………………… ……… 27

Figure 1.4. Analyzing the sentence “John loves a woman” in a sub-grammatical model

belonging to ………………………… …………………………… …………31

Figure 1.5. Dependency graph of the sentence “Economic news had little effect on financial

market” ……………………………………………………………………… …………………………… ……………… 32

Figure 1.6. Grammatically correct sentence “Why didn’t you come” ………………………….. … 35

Figure 1.7. The Great Connection of the Words “and” ……………………………………………………………………………………….. …………37

Figure1.8. The cycle in sentence analysis ……………………………………………………………………………………….. ………………….. 38

Figure 1.9. Link button …………………………………………………………………………… ………………………. 39

Figure 2.1. Noun structure with all elements …………………………………………………………………….. …. 48

Figure 2.2. Links in the phrase “tables” …………………………………………………………… ……… 51

Figure 2.4. Links in the phrase “wooden table” …………………………………………………………… …….. 53

Figure 2.3. Link in the phrase “spring bed”. ………………………. 53

Figure 2.5. Links in the phrase “my desk” ………………………………………………. ……… 54

Figure 2.6. Two Links for the Phrase “My Wooden Table” ……………………… 54

Figure 2.7. Links around the central noun “chair” ………………………………………… 55

Figure 2.8. Auxiliary element before every verb ……………………………………………………………………………………….. …………56

Figure 2.9. Links in the phrase “still working” ……………………………………………………………………… ……… 57

Figure 2.10. Links in the phrase “don’t read this book often” ………………………………………………………. 57

Figure 2.11. Links in the phrase “afraid” …………………………………………………………… ………… 59

Figure 2.12. Links in the phrase “two thousand meters deep”………………………. …. 61

Figure 2.13. Linking two-clause compound sentences with a conjunction in the middle ………………………………… 62

Figure 2.14. Link of two-clause compound sentence with beginning conjunction and comma …….. 63

Figure 2.15. Link in a compound sentence with a conjunction present in both clauses ………. 63

Figure 2.16. A passage from a linked grammar dictionary ……………………………………………………………………… … 64

Figure 2.17. Intuitive mapping …………………………………………………………………………… ………………………… 67

Figure 2.18. The process of building a Vietnamese linking grammar dictionary ………………… 69

Figure 3.1. Analytical Algorithm …………………………………………………………………. ……………… 70

Figure 3.2. Local solution …………………………………………………………………………… ………………………. 70

Figure 3.3. Link Parsing Algorithm …………………………………………………………………. …………71

Figure 3.4.COUNT function for the analysis number of the sentence. …………………………… 71

Figure 3.5. Formula tree (NN- &{NN+}) or ({PqNt-} & {NN+}) ……………… 73

Figure 3.6. Number of types of selection after pruning comb and strong pruning comb ………………………………………… 74

Figure 3.7. Link analysis results of the sentence “We want to win titles

brand” …………………………………………………………………………………… …………………………… ……….75

Figure 3.8. The results of the link analysis of the sentence “Every empty-handed season is difficult to swallow

drifting” …………………………………………………………………………. …………………………… …………75

Figure 3.9. Link analysis results of the sentence “Most mantises eat insects” … 76

Figure 3.10. Discourse analysis tree of the sentence “[it rained heavily and A1] [the wind was very strong

shouldB1] [I had to leave school,C1] [my mother had to take time off from work.D1]” ………… ………. 80

Figure 3.11. Discourse Segmentation Algorithm (with ambiguity removal) ………………………………………… 85

Figure 3.12. The isClause function …………………………………………………………………………… …………………………… 87

Figure 3.13. Discourse structure trees …………………………………………………………………………… …………89

Figure 3.14. Parsing Algorithm for Compound Sentences ………………………………………………………… .. 90

Figure 3.15. Insert_Link_From_RST_Tree function ……………………………………………………………………… …… 91

Figure 3.16. Illustrate how to store the link analysis of the sentence “I bought a cotton

flowers” …………………………………………………………………… …………………………… …………92

Figure 3.17. Analysis of the sentence “I bought a flower” …………………………………………………… …….. 92

Figure 3.18. Analysis of the phrase “a very good pen” ……………………………………………………….. ……….. 93

Figure 3.19. Analysis results of the sentence “It rained heavily and the wind was strong, so I had to

leave school, my mother has to take a break from work” …………………………………………………………… …………………………… 94

Figure 3.20. Two analyzes of the sentence “I bought a flower” …………………………………………. 98

Figure 3.21. Viterbi type algorithm for predicting the analysis with the highest probability …….. 99

Figure 3.22. Describe how to calculate the probability PrO ⊲ left(L, W, l ⊳, ⊲ leftd) ………………….. 101

Figure 3.23. Illustrate the relationship to calculate O ……………………………………………………………………………………….. …. 102

Figure 3.24. Analyze the sentence “I like cake and candy, you like wine and beer” ……….. 105

Figure 3.25. An analysis with the F-connection for the word “and” ……………………………………………………….. ……. 106

Figure 3.26. G-connector joins multiple commas and the word “and”………………………………………………………………. …… 107

Figure 4.1.Rearrange word order ………………………… ………………………………… 115

Figure 4.2. The architecture of the translation system is based on annotated collection ………………………………………………………. 118

Figure 4.3. Change the word order for the translation of the sentence “The little girl is very pretty” …………. 122

Figure 4.4. The process of translating the sentence “The cheetah is the fastest animal in the world” …… 128

Figure 4.5. Comparison of BLEU scores of systems …………………………………………………………………………. …. 129

LIST OF TABLES

Table 1.1. Example of a dictionary …………………………………………………………………………… ………… 34

Table 2.1.Types of Vietnamese words ………………………… ………………………………………… 45

Table 2.2. Subcategories of Vietnamese words ………………………………………………………………………. ………… 45

Table 3.1. Details of the sample corpus for the link parser ………………….. 76

Table 3.2. Linkage analysis results for the sample sets …………………………………………………………………………. ….76

Table 3.3. Discourse analyzer test results (not yet combined parser

France) ……………………………………………………….. …………………………… ………………….. 79

Table 3.4. Regular expressions represent some latent discursive cues81

Table 3.5. Actions in response to some discursive signs ………………………………………………………. 82

Table 3.6. Detail of the set of compound sentences …………………………………………………………….. …………………….. 94

Table 3.7. Analysis results of compound sentence samples …………………………………………………………….. ……. 95

Table 4.1. Important morphological differences between Vietnamese and English 113

Table 4.2. English pronouns …………………………………………………………………………… ………… 114

Table 4.3. Vietnamese pronouns …………………………………………………………………. ………… 114

Table 4.4. Compare the results of the translation systems ………………………………………………………… ………… 129

LIST OF IMPORTANT CONNECTORS

CLI Connects only material (hidden prepositions).

DI Connect the verb “go” with another verb.

DpN Connect the plural determiner with the noun.

DpNt Connects plural adjectives with specific nouns.

DsN Connect singular determiners with nouns.

DT_LA Connects nouns and pronouns with the relative verb “is”.

DT_DONE Connect a verb and the verb “done”.

EoPp Connect the preposition “of” with the pronoun.

EpNt Connects prepositions of place and specific nouns.

EsNt Connects specific nouns and prepositions of scope.

LA_DT Connect the verb “to be” with the noun.

McNu Connect word numbers and unit nouns.

NcNt1 Connects a noun of a class with a noun of a person.

NcNt2 Connects nouns of types with specific nouns of animals.

NcNt3 Connects nouns of class with specific nouns of plants.

NcNt4 Connecting nouns of categories with specific nouns of utensils and items.

NcNt5 The connection between nouns of class and specific nouns of phenomena.

NcNt6 Connecting nouns of categories with specific nouns of concepts.

NEo Connect nouns and possessive prepositions.

NN Connect nouns and nouns, can show relationships in terms of content, location…

NtEm Connects concrete nouns and prepositions of substance.

NtEs Connects range prepositions and specific nouns.

NtPd Connect specific nouns with demonstrative pronouns.

NuNt Connects unit nouns and specific nouns.

NHAT_DT Connect the word “most” with the noun after it.

O Connect verbs and objects directly.

RcV Connect verbs with comparative adverbs.

RfA Sub-connections of time (future) words and adjectives.

RfVt Connect the future tense determiner and the verb.

RfVt Connect verbs with adverbs of time (future).

RhA Sub-connections of time words (present perfect) and adjectives.

RhV Sub-connections of time words (present perfect) and verbs.

RmV Connect verbs with imperative adverbs.

RnV Negative connection between words and verbs.

RnV1 Connect verbs with negative auxiliary.

RpA Sub-connections from time (past) and adjectives.

RpV Connect verbs with time adverbs (past).

RpVt Connect the past tense determiner and the verb.

RtA Sub-connections of time (present) and adjectives.

RtV Connect verbs with adverbs of time (present).

SA Connect nouns, pronouns with adjectives.

SA Connect nouns and adjectives.

SH Connect possessive preposition and owner noun.

SHA Connects two nouns with implicit possessive relations.

SS_NHAT Connect adjectives with the word “most”.

SV Connect nouns and pronouns as subjects with verbs.

THS Connect question words after verbs and verbs.

THT Connect question words before verbs and verbs.

VmVt Connect modal verbs and specific verbs.

VtAp Connects transitive verbs and adjectives of properties.

VtEp Connect transitive verbs and place prepositions.

VtVs Connect transitive and status verbs.

..... Xem trang tiếp theo?
⇦ Trang trước - Trang tiếp theo ⇨

Date published: 01/11/2021