MINISTRY OF EDUCATION AND TRAINING
Hanoi University of Science and Technology
NGUYEN THI THU HUONG
MODEL
VIETNAM LINK PHARMACS
Specialization: Computer Science
Code: 62.48.01.01
PhD THESIS OF INFORMATION TECHNOLOGY
Science Instructor:
Maybe you are interested!
- Vietnamese linking grammar model - 34
- Vietnamese linking grammar model - 35
- Vietnamese linking grammar model - 36
- Dr. NGUYEN THUC HAI
Prof.Dr. NGUYEN THANH THUY
Hanoi – 2013
Acknowledgments
Before presenting the research content of the thesis, I would like to express my sincere gratitude to my two instructors, Prof. Dr. Nguyen Thuc Hai, Prof. Dr. Nguyen Thanh Thuy, dear teachers, not only enthusiastically guided and helped me, but also encouraged me a lot to complete this thesis.
Thank you very much to my colleagues at the Department of Computer Science and the Institute of Information and Communication Technology, Hanoi University of Science and Technology, for supporting and sharing with me in my work. a lot in difficult times.
I would like to thank Assoc.Prof.Dr. Luong Chi Mai, Assoc.Prof.Dr. Le Thanh Huong, Assoc. Dr. Nguyen Thi Kim Anh, Assoc. Dr. Dang Van Chey, Dr. Nguyen Van Vinh, and Dr. Nguyen Thi Minh Huyen have helped and contributed a lot of valuable comments to the thesis.
I would like to express my sincere thanks to linguists: Assoc.Prof.Dr Pham Van Tinh, Assoc.Prof.Dr. Nguyen Chi Hoa, Vu Xuan Luong,Dao Van Hung for their enthusiastic support in understanding the characteristics of Vietnamese.
Sincere thanks to alumni Le Van Chuong, Pham Nguyen Quang Anh, Luyen Thanh Dat, Le Ngoc Minh for helping me in the process of testing the association model. Sincere thanks to the VLSP research team, especially Prof. Dr. Ho Tu Bao and Dr.
Nguyen Phuong Thai provided the Vietnamese corpus for me to experiment with.
Hanoi March 20, 2012
Thesis author
Nguyen Thi Thu Huong
AGREEMENT
I hereby declare that this is my own research work. The data and results in the thesis are honest and have never been published in any other works.
Thesis author
Nguyen Thi Thu Huong
CONTENTS
LIST OF SIGNS AND Acronyms ……………………………………………………….. …………………..4
LIST OF DRAWINGS ………………………………………………. …………………………… .5
LIST OF TABLES …………………………………………………… …………………………… ….8
LIST OF IMPORTANT CONNECTIONS …………………………………………….. ……………………..9
INTRODUCTION …………………………………………………………………. …………………………… ………………………. 11
CHAPTER 1 OVERVIEW OF PHARMACOLOGICAL MODELS FOR NATURAL LANGUAGE …………………………………….. …………………………… …………. 20
1.1. Context-free grammar and structure approach ………………………………………………………… 20
1.1.1. Context-free grammar for natural language representation …………………………………….. 20
1.1.2. Probabilistic context-free grammar ………………………………………………………………………. …………23
1.1.3.Vocabulary Probabilistic Context-Free Grammar ………………………………………………………… …. 26
1.1.4. Tree connection grammar …………………………………………………………………………… …………27
1.2. Approach through stroke structure and unified grammar ………………………………………………………… .. 28
1.3. Dependency approach …………………………………………………………………………… …………………………… 29
1.3.1. Some concepts ………………………………………. …………………………… 29
1.3.2.Characteristics of the dependency tree …………………………………………………………… …………32
1.4. Linking grammar …………………………………………………………………………… ………………………. 34
1.4.1. The concept of linking grammar …………………………………………………………………………… ..34
1.4.2. Formal definitions of associative grammars ……………………………. 38
1.5. Conclude ………………………………………… …………………………… …. 40
CHAPTER 2 VIETNAMESE LINKED PHARMAGIC MODELS………………………………………………………. 43
2.1.Associative grammar for Vietnamese ………………………………… …………43
2.1.1. Link dictionary structure …………………………………………………………………………… ………… 43
2.1.2. Link building for nouns …………………………………………………………………………… ……………. 47
2.1.3. Links for verbs …………………………………………………………………………… ………… 55
2.1.4. Links for Adjectives …………………………………………………………………………. ………… 60
2.1.5. Linking clauses in simple compound sentences ………………………………… 61
2.2. Expanding the linked grammar dictionary …………………………………………………………………….. ……….64
2.2.1. Algorithm to expand dictionary …………………………………………………………….. ………………………….. 66
2.2.2. The application of the algorithm to expand the Vietnamese dictionary ………………………………………………………. 67
2.2. Conclude ………………………………………… …………………………… 68
Chapter 3 ANALYSIS ON LINKED VARMS ………………. 70
3.1. Link Parser …………………………………………………………………. ………… 70
3.1.1. Parsing Algorithm …………………………………………………………….. .. 70
3.1.2. Trimming comb ……………………………………………………………………… ………………………… 72
3.1.3. Test results of analyzing simple sentences and simple compound sentences …………. 74
3.2. Parsing for compound sentences …………………………………………………………….. ………… 77
3.2.1. Building a tree of discourse …………………………………………………………………………………… ………… 81
3.2.2. Compound Sentence Parsing Algorithm ………………………………………………………… ….. 90
3.2.3. Find words to connect clauses ………………………………………………………… ………………….. 91
3.2.4. Compound sentence analysis test results …………………………………………………………….. … 93
3.2.5. Computational complexity …………………………………………………………………………… ………… 96
3.3.Remove ambiguity …………………………………………………………………. …………………………………………. 96
3.3.1. Component ambiguity …………………………………………………………………………………….. …………….. 97
3.3.2. Conjugate ambiguity removal ……………………………………………………………………………………….. ………… 103
3.4. Conclude ………………………………………… …………………………… .. 107
CHAPTER 4 MACHINE TRANSLATION SYSTEM USING ANNOUNCEMENT FORM
…………………………………………………………………….. …………………………… ………… 109
4.1. Overview of machine translation …………………………………………………………………………… ………………………. 109
4.1.1. Development situation of machine translation in Vietnam ………………………………………….. …… 109
4.1.2. Methods for assessing the quality of machine translation ………………………………………………. 111
4.2. Language difference between Vietnamese and English …………………………………………………………………. ………………….. 112
4.2.1. Morphological difference …………………………………………………………………………… …………………….. 112
4.2.2. Differences in word order …………………………………………………………….. ………… 115
4.3. The machine translation system uses annotated selection …………………………………………. 116
4.3.1.Find the meaning of words in the ADJ dictionary ………………………. ………… 118
4.3.2.Development of translation laws ………………………………………………………. ………………………………………… 119
4.3.3.Complete the translation ………………….. ………………………… 125
4.3.4.Test results with annotated selection-based translator …… 126
4.4. Conclude ………………………………………… …………………………… 130
CONCLUSION AND DEVELOPMENT DIRECTIONS ………………………. ………… 133
Summary …………………………………………………………………. …………………………… ………… 133
Main contributions of the thesis……………………………………………………….. ………… 133
Scientifically ……………………………………………………………….. …………………………………………. 133
Practical aspects ……………………………………………………………………………………….. …………………………… 134
Constraints and development directions …………………………………………………………………….. ………………………… 135
DISCLOSED WORKS …………………………………………………… ………………………………… 136
REFERENCES …………………………………………………… …………………………… 137
VIETNAMESE …………………………………………………………….. …………………………… …. 137
ENGLISH …………………………………………………………….. …………………………… ….139
RUSSIA …………………………………………………………….. …………………………… …. 147
SITES ……………………………………………………………….. ……………………………………………………….. 147
APPENDIX 1: DETAILS OF MAJOR RECIPES IN VIETNAMESE LINKS ………………………. …………………………… ………………………. 148 strong>
APPENDIX 2: RESULTS OF LINKED ANALYSIS OF SOME SIMPLE AND COMPLEX SENSITIES ………………………………………………………… …………………………… ………. 166
APPENDIX 3: SOME TYPICAL TRANSLATION LAW …………………………………………………………… …………….. 174
- Law of defining attributes ……………………………………………………………………………………. …………………………… 174
- Law of phrase translation …………………………………………………………………….. ………………………………………. 175
- Structural transformation law ……………………………………………………………………… …………………………… 178
APPENDIX 4: COMPARATIVE TRANSLATION RESULTS OF SOME QUESTIONS …………………………………………. 179
LIST OF SIGNS AND Acronyms
HMM Hidden Markov Model: Hidden Markov Model
BNF Backus Naur Form: Backus Metaphorical Formula
ADJ Annotated Disjunct: Annotated Annotation
RST Rhetorical Structure Tree: Discourse Structure Tree
CCR Chunks/Constituents/Relation
SVO Subject-Verb-Object: Sentence order by subject-verb-object type
SVM Support Vector Machine: Support Vector Machine
CRF Conditional Random Fields: Conditional Random Fields
EDU Elementary Discourse Unit: Elemental Discourse Unit
HPSG Head driven Phrase Structure Grammar: Center-oriented paragraph structure grammar
EBNF Extended Backus Naur Form: Extended Backus Metabolic Formula
LIST OF DRAWINGS
Figure 1.1. Structure tree of the sentence “I like chicken feet”. ………………………. 21
Figure 1.2. Two sentence structures of the sentence “They will not load the goods into the boat
tomorrow”. …………………………… …………………………… ……… 22
Figure 1.3. Probabilistic context-free grammar and structure tree of the sentence “Last week IBM
bought Lotus” …………………………………………………………………… …………………………… ……… 27
Figure 1.4. Analyzing the sentence “John loves a woman” in a sub-grammatical model
belonging to ………………………… …………………………… …………31
Figure 1.5. Dependency graph of the sentence “Economic news had little effect on financial
market” ……………………………………………………………………… …………………………… ……………… 32
Figure 1.6. Grammatically correct sentence “Why didn’t you come” ………………………….. … 35
Figure 1.7. The Great Connection of the Words “and” ……………………………………………………………………………………….. …………37
Figure1.8. The cycle in sentence analysis ……………………………………………………………………………………….. ………………….. 38
Figure 1.9. Link button …………………………………………………………………………… ………………………. 39
Figure 2.1. Noun structure with all elements …………………………………………………………………….. …. 48
Figure 2.2. Links in the phrase “tables” …………………………………………………………… ……… 51
Figure 2.4. Links in the phrase “wooden table” …………………………………………………………… …….. 53
Figure 2.3. Link in the phrase “spring bed”. ………………………. 53
Figure 2.5. Links in the phrase “my desk” ………………………………………………. ……… 54
Figure 2.6. Two Links for the Phrase “My Wooden Table” ……………………… 54
Figure 2.7. Links around the central noun “chair” ………………………………………… 55
Figure 2.8. Auxiliary element before every verb ……………………………………………………………………………………….. …………56
Figure 2.9. Links in the phrase “still working” ……………………………………………………………………… ……… 57
Figure 2.10. Links in the phrase “don’t read this book often” ………………………………………………………. 57
Figure 2.11. Links in the phrase “afraid” …………………………………………………………… ………… 59
Figure 2.12. Links in the phrase “two thousand meters deep”………………………. …. 61
Figure 2.13. Linking two-clause compound sentences with a conjunction in the middle ………………………………… 62
Figure 2.14. Link of two-clause compound sentence with beginning conjunction and comma …….. 63
Figure 2.15. Link in a compound sentence with a conjunction present in both clauses ………. 63
Figure 2.16. A passage from a linked grammar dictionary ……………………………………………………………………… … 64
Figure 2.17. Intuitive mapping …………………………………………………………………………… ………………………… 67
Figure 2.18. The process of building a Vietnamese linking grammar dictionary ………………… 69
Figure 3.1. Analytical Algorithm …………………………………………………………………. ……………… 70
Figure 3.2. Local solution …………………………………………………………………………… ………………………. 70
Figure 3.3. Link Parsing Algorithm …………………………………………………………………. …………71
Figure 3.4.COUNT function for the analysis number of the sentence. …………………………… 71
Figure 3.5. Formula tree (NN- &{NN+}) or ({PqNt-} & {NN+}) ……………… 73
Figure 3.6. Number of types of selection after pruning comb and strong pruning comb ………………………………………… 74
Figure 3.7. Link analysis results of the sentence “We want to win titles
brand” …………………………………………………………………………………… …………………………… ……….75
Figure 3.8. The results of the link analysis of the sentence “Every empty-handed season is difficult to swallow
drifting” …………………………………………………………………………. …………………………… …………75
Figure 3.9. Link analysis results of the sentence “Most mantises eat insects” … 76
Figure 3.10. Discourse analysis tree of the sentence “[it rained heavily and A1] [the wind was very strong
shouldB1] [I had to leave school,C1] [my mother had to take time off from work.D1]” ………… ………. 80
Figure 3.11. Discourse Segmentation Algorithm (with ambiguity removal) ………………………………………… 85
Figure 3.12. The isClause function …………………………………………………………………………… …………………………… 87
Figure 3.13. Discourse structure trees …………………………………………………………………………… …………89
Figure 3.14. Parsing Algorithm for Compound Sentences ………………………………………………………… .. 90
Figure 3.15. Insert_Link_From_RST_Tree function ……………………………………………………………………… …… 91
Figure 3.16. Illustrate how to store the link analysis of the sentence “I bought a cotton
flowers” …………………………………………………………………… …………………………… …………92
Figure 3.17. Analysis of the sentence “I bought a flower” …………………………………………………… …….. 92
Figure 3.18. Analysis of the phrase “a very good pen” ……………………………………………………….. ……….. 93
Figure 3.19. Analysis results of the sentence “It rained heavily and the wind was strong, so I had to
leave school, my mother has to take a break from work” …………………………………………………………… …………………………… 94
Figure 3.20. Two analyzes of the sentence “I bought a flower” …………………………………………. 98
Figure 3.21. Viterbi type algorithm for predicting the analysis with the highest probability …….. 99
Figure 3.22. Describe how to calculate the probability PrO ⊲ left(L, W, l ⊳, ⊲ leftd) ………………….. 101
Figure 3.23. Illustrate the relationship to calculate O ……………………………………………………………………………………….. …. 102
Figure 3.24. Analyze the sentence “I like cake and candy, you like wine and beer” ……….. 105
Figure 3.25. An analysis with the F-connection for the word “and” ……………………………………………………….. ……. 106
Figure 3.26. G-connector joins multiple commas and the word “and”………………………………………………………………. …… 107
Figure 4.1.Rearrange word order ………………………… ………………………………… 115
Figure 4.2. The architecture of the translation system is based on annotated collection ………………………………………………………. 118
Figure 4.3. Change the word order for the translation of the sentence “The little girl is very pretty” …………. 122
Figure 4.4. The process of translating the sentence “The cheetah is the fastest animal in the world” …… 128
Figure 4.5. Comparison of BLEU scores of systems …………………………………………………………………………. …. 129
LIST OF TABLES
Table 1.1. Example of a dictionary …………………………………………………………………………… ………… 34
Table 2.1.Types of Vietnamese words ………………………… ………………………………………… 45
Table 2.2. Subcategories of Vietnamese words ………………………………………………………………………. ………… 45
Table 3.1. Details of the sample corpus for the link parser ………………….. 76
Table 3.2. Linkage analysis results for the sample sets …………………………………………………………………………. ….76
Table 3.3. Discourse analyzer test results (not yet combined parser
France) ……………………………………………………….. …………………………… ………………….. 79
Table 3.4. Regular expressions represent some latent discursive cues81
Table 3.5. Actions in response to some discursive signs ………………………………………………………. 82
Table 3.6. Detail of the set of compound sentences …………………………………………………………….. …………………….. 94
Table 3.7. Analysis results of compound sentence samples …………………………………………………………….. ……. 95
Table 4.1. Important morphological differences between Vietnamese and English 113
Table 4.2. English pronouns …………………………………………………………………………… ………… 114
Table 4.3. Vietnamese pronouns …………………………………………………………………. ………… 114
Table 4.4. Compare the results of the translation systems ………………………………………………………… ………… 129
LIST OF IMPORTANT CONNECTORS
CLI Connects only material (hidden prepositions).
DI Connect the verb “go” with another verb.
DpN Connect the plural determiner with the noun.
DpNt Connects plural adjectives with specific nouns.
DsN Connect singular determiners with nouns.
DT_LA Connects nouns and pronouns with the relative verb “is”.
DT_DONE Connect a verb and the verb “done”.
EoPp Connect the preposition “of” with the pronoun.
EpNt Connects prepositions of place and specific nouns.
EsNt Connects specific nouns and prepositions of scope.
LA_DT Connect the verb “to be” with the noun.
McNu Connect word numbers and unit nouns.
NcNt1 Connects a noun of a class with a noun of a person.
NcNt2 Connects nouns of types with specific nouns of animals.
NcNt3 Connects nouns of class with specific nouns of plants.
NcNt4 Connecting nouns of categories with specific nouns of utensils and items.
NcNt5 The connection between nouns of class and specific nouns of phenomena.
NcNt6 Connecting nouns of categories with specific nouns of concepts.
NEo Connect nouns and possessive prepositions.
NN Connect nouns and nouns, can show relationships in terms of content, location…
NtEm Connects concrete nouns and prepositions of substance.
NtEs Connects range prepositions and specific nouns.
NtPd Connect specific nouns with demonstrative pronouns.
NuNt Connects unit nouns and specific nouns.
NHAT_DT Connect the word “most” with the noun after it.
O Connect verbs and objects directly.
RcV Connect verbs with comparative adverbs.
RfA Sub-connections of time (future) words and adjectives.
RfVt Connect the future tense determiner and the verb.
RfVt Connect verbs with adverbs of time (future).
RhA Sub-connections of time words (present perfect) and adjectives.
RhV Sub-connections of time words (present perfect) and verbs.
RmV Connect verbs with imperative adverbs.
RnV Negative connection between words and verbs.
RnV1 Connect verbs with negative auxiliary.
RpA Sub-connections from time (past) and adjectives.
RpV Connect verbs with time adverbs (past).
RpVt Connect the past tense determiner and the verb.
RtA Sub-connections of time (present) and adjectives.
RtV Connect verbs with adverbs of time (present).
SA Connect nouns, pronouns with adjectives.
SA Connect nouns and adjectives.
SH Connect possessive preposition and owner noun.
SHA Connects two nouns with implicit possessive relations.
SS_NHAT Connect adjectives with the word “most”.
SV Connect nouns and pronouns as subjects with verbs.
THS Connect question words after verbs and verbs.
THT Connect question words before verbs and verbs.
VmVt Connect modal verbs and specific verbs.
VtAp Connects transitive verbs and adjectives of properties.
VtEp Connect transitive verbs and place prepositions.
VtVs Connect transitive and status verbs.