Vietnamese linking grammar model - 19

Figure 3.16 below shows the link analysis storage structure of the sentence “I bought a flower”. 1, 2…5 is the ordinal number of the word. Each word has a linked list of connections with the words to its right. Information about each connection includes (type, destination, rank). For example (SV, 2, 0 ) indicates the association of the first word (“me”) and the second word (“buy”).

The links after being drawn on top of each word will be assigned a value called the degree of the association. Due to the requirements of the problem, the chosen word must ensure that when we add other links of the conjunctions to it, the old and new links do not intersect. So the word selected here is the word with the lowest order corresponding to the word whose link is drawn at the top.

Figure 3.16. Illustrate how to store the link analysis of the sentence “I bought a flower” The 1

Figure 3.16. Illustrate how to store the link analysis of the sentence “I bought a flower”

The degree of the link is calculated as follows:

According to the parsing algorithm, the first drawn link will have degree 0. That is the SV and O association. Then, in the process of performing the analysis algorithm in [111] recursively with words to the left and right of the word under consideration, the degree of McN and McNt3 is 1. If this sentence plays the role of a clause in linking with another clause, the connection selected for linking will be the top connection, ie. connection of order 0 (in the example in Figure 3.17. is SV or O).

Figure 3.17. Analyze the sentence “I bought a flower” As another example, analyze the sentence 2

Figure 3.17. Analyze the sentence “I bought a flower”

As another example, analyze the sentence “If I had a lot of time, I would have been in Nghe An and Ho Chi Minh City”. When choosing the word that represents the proposition “I was in Nghe An and Ho Chi Minh City”, the highest-order link is the SV link between “I” and “in”. This link has degree 0 while other links have degree 1, 2…

The word chosen can be “I” or “at”. This will ensure flatness because if you draw the link from the previous clause with the word “already” or the word “Nghe An”, there is no way to draw those links that do not intersect.

Choose words to link

After finding a suitable connection with candidate words located to the left and right of the connection, the problem arises to choose the word left or right. The thesis’s criterion is to choose the more important word. With relationships like McNt, RlAp… the word chosen is the word on the right (noun), and with the connection SV, SA, the word chosen is the word on the left (noun). Information about the selected left or right word will be stored for each connection type. In the example in Figure 3.10. The following shows an analysis of the phrase “a very good pen”.

Figure 3.18. Analyze the phrase “a very good pen” Since the link between the word “one” and 3

Figure 3.18. Analyze the phrase “a very good pen”

Since the link between the word “one” and the word “pen” is McNt, the word with higher priority is the word “pen”. The word selected must be the word “pen”.

3.2.4. Compound sentence analysis test results

Compound Sentence Parser is developed from the link analyzer for single and compound sentences, using Java language working in Windows environment. The figure below depicts the parsing result for the compound sentence “It was raining heavily and the wind was very strong so I had to leave school, my mother had to leave work” consisting of 4 clauses with discursive relations shown in Fig. Figure 3.19.

Figure 3.19. Analysis results of the sentence “It rained heavily and the wind was very strong, so 4

Figure 3.19. Analysis results of the sentence “It rained heavily and the wind was very strong, so I had to leave school, my mother had to take a break from work”

Figure 3.19. Analysis results of the sentence “It rained heavily and the wind was very strong, so I had to leave school, my mother had to take a break from work”

http://www.mediafire.com/?6ajt9btbrtxidr9

http://www.vietnamtourism.com/v_pages/tourist/destination.asp?mt=8420&uid=533

http://dantri.com.vn/c26/s26-484690/barcelona-mu-giac-mo-noi-thien-duong.htm

Table 3.6. Detail of compound sentence sample set

NumSample setNumber of sentencesAverage number of words in a sentence
1Universal Vietnamese corpus (Ho Quoc Bao)509.7
2Sport2511.5
3Travel2512.5

Maybe you are interested!

The results of the analysis of the sample set shown in Table 3.6 show that the results are much higher than that of the old analyzer:

Table 3.7. Analysis results of compound sentence sample sets (removed ambiguity)

Sample setAccuracy (old PT kit)Coverage (old PT kit)Accuracy (new PT kit)Coverage (new PT kit)
142.5%35.7%75.1%65.7%
29.5%6.1%33.5%21.6%
328.3%20.5%47.4%58.5%

Among the corpus used during this period, the universal Vietnamese corpus (actually the Vietnamese part of the General English-Vietnamese corpus, mainly contains two-proposition compound sentences and is quite similar to each other). so the rate of correct discourse analysis sentences is 100%, in addition, the structure of each clause is quite simple.Tourism corpus also includes sentences in the introduction of tourism promotion, many sentences with more than 3 clauses. topic, but the structure is still in accordance with the syntactic law, and the sports corpus with many special forms of compound sentences still has the lowest rate.

The link parser of the thesis has achieved quite good results on compound sentences consisting of many clauses, not overlapping, which may appear explanatory paragraphs with brackets or dashes (-). However, there are still some forms of compound sentences that the analysis of the thesis has not been able to handle. Here are examples of those types of sentences:

  • The compound sentence lacks conjunctions, for example “Even if I die, I won’t follow”. This sentence appears as a simple sentence but is actually a compound sentence, without the linking word to hide the subject “tao” in the first clause.
  • There are complex predicates, for example “To avoid boredom, the princess often throws a golden ball to play”, this sentence does not have a comma before the verb “toss”, so the relationship between the verb The word “take” and the verb “toss” are indefinite.
  • Too many clauses, many conjugate components, in which some clauses hide the subject, for example “I often forget to eat at mealtime, pat my pillow in the middle of the night, my stomach hurts like hell, my eyes are full of tears; only angry can not slaughter, skinning, eating liver, drinking blood of the enemy; Even if a hundred of my bodies are left to dry in the grass, and a thousand of them are wrapped in horse skins, I will still do it.”
Date published: 01/11/2021
Trang chủ Tài liệu miễn phí