3.3.2. Conjugate Disambiguation
Conjugate ambiguity is ambiguity related to phrases that have an equivalent role in a sentence. As noted in [70], the word “and” plays a special role when analyzing a sentence on the associative grammar model because it can contain a regular connection as well as a large connection.
According to discourse structure theory, the word “and” is itself a discursive sign. Therefore, it is necessary to distinguish the case where the word “and” is a discursive sign and the word “and” only connects two simple words or phrases but not two clauses.
In [66], Le Thanh Huong also mentioned the problem of ambiguity when segmenting discourse, in which a word can act as a discursive sign as well as another, with the most obvious example being the word “” and” in English. The test of whether the word “and” is a sign of discourse is done by checking if the sentence is still syntactically correct when the word is removed. Example sentences with “Mary borrowed that book from our library, and she returned it this morning”. This observation can work convincingly in English, where the word “and” is a discursive sign often followed by a comma and nouns are often followed by articles. The word “and” in Vietnamese also has a similar role when it acts as a conjunction. However, in Vietnamese, the word “and” usually does not come with a comma like “I study and you sleep”. Moreover, when removing the word “and” in the two noun phrase “she and princess” can lead to a completely correct phrase “princess”, which cannot be syntactically incorrect.
If a discursive sign is found immediately after the occurrence of “and” and if the left boundary of the prime unit is found to the left of “and” then a new prime unit is defined whose right boundary is of that elemental unit immediately before “and”. In such a case, “and” is considered to have a discursive function.
For example, with the sentence “Even though it rained heavily and although everyone prevented it, it went ahead”, the result of the discourse segment would be [Even though it was raining heavily] [and although everyone prevented,] [it goes on.]. In this sentence the word “and” has a discursive role because it is preceded by the word “despite” as a sign of concessional relationship.
Apart from the above case, the analyzer is shallow in[89] and ignores any “and” other than the NOTHING action.
It is found that in Vietnamese syntax, the subject with the first clause appearing in the compound sentence is mainly a noun, the predicate is mainly a verb or an adjective. There are other core types, such as the subject being a verb, but the thesis proposes a processing algorithm according to the following idea:
A phrase in a compound sentence is a true proposition if in its analysis by linking grammar there exists at least one SV link (the link between the subject and the verb), the SA link (the link between the subject and the attribute). word) or a combination of two associations DT_LA and LA_DT (association of the word “is”).
The thesis solved the problem of ambiguity by parsing the phrase appearing before and after the word “and”. If both phrases are syntactically correct, the word “and” acts as a discourse. On the contrary, it acts as a conjugate. That is shown in the algorithm in Figure 3.11 and the results of analyzing the sentence “I like cake and candy, you like wine and beer” in Figure 3.24.
Figure 3.24. Analyze the sentence “I like cake and candy, you like wine and beer”
When analyzing the phrase “I was in Nghe An and Ho Chi Minh City”, the phrase “I was in Nghe An” is a proposition, but the phrase “Ho Chi Minh City” is not a proposition. The word “and” is not a discursive sign.
With the sample corpus used for the compound sentence parser, the conjugation de-ambiguity significantly improves the results of discourse analysis. Comparison of the results of discourse analysis with and without ambiguity is presented in Table 3.8 below.
Table 3.8. Compare the results of discourse analysis
Input set | Number of compound sentences | Number of clauses | Number of analytic clauses (without de-ambiguity) | Number of correct analytic statements (with ambiguity removed) |
1 | 50 | 87 | 62 (71.26%) | 87 (100%) |
2 | 25 | 62 | 27 (43.54%) | 36 (58.06%) |
3 | 25 | 56 | 33 (58.92%) | 41(73.21%) |
Maybe you are interested!
- Vietnamese linking grammar model - 19
- Viterbi Type Algorithm To Find The Best Analysis
- Vietnamese linking grammar model - 21
- Development Situation Of Machine Translation In Vietnam
- Language Difference Between Vietnamese And English
- The Machine Translation System Uses Annotated Selections
The percentage of correct analytical statements after de-ambiguity increases more or less depending on the frequency of occurrence of suggested words that can cause ambiguity. Incorrect results when removing ambiguities involving the words “and”, “or”, commas mainly because the clauses contain noun-adjective phrases. A noun-adjective phrase can be the core, but it can also be just a noun acting as a subject. For example, in the sentence “Sa Pa is the “kingdom” of fruits, peach blossoms, big yellow peaches, small yellow peaches, queen plums, purple plums, tam Hoa plums, gladiolus flowers, plum blossoms, pear blossoms, peach blossoms, chrysanthemums, roses … especially immortal flowers live forever with time”, commas cause ambiguity. Phrases such as “big gold digger”, “small yellow peach”, “purple plum” are separated into separate clauses when in reality they are just nouns that act as proofs of the assertion before the word “like”. “.
When acting as a conjugate, the word “and” will have connections such that it plays the role of each element in its list. The selection form of the word “and” has a large connection F. The connection F points to either side of the word “and”, in addition, the connections of the word “and” are an extension of F, i.e. the initial connections. the beginning of F. This helps the word “and” connect the two parts of the list “and” together, and act as the elements in the sentence as described in chapter 1.
When applied on the link parser, the result is as shown in Figure 3.25.
Figure 3.25. An analysis with the F connection for the word “and”
However, this can lead to a connection: brother — sister. Although the linking grammar allows the cycle, this association does not represent the actual relationship in the sentence.
To remove this association, [111] adds some information for the large connection and corrects the matching condition of the connections. Each connection is appended with a priority of 0, 1, and 2. Normal connections (not a large connection) have a priority of 0. A large connection on a word has a priority of 1, and large connections on the word “and” have priority of 2. For two connections to match, they must first match according to normal criteria, and their precedence must be compatible: 0 compatible with 0; 1 compatible with 2; 2 is compatible with 1. No precedence is compatible anymore.
The applied thesis method has effectively solved a number of cases with the word “and” in practice. However, there are some phenomena with the word “and” and are treated according to [111].
The most common case is a list with more than two elements, where the elements in the list “and” are separated by commas. For example “grandpa, grandma, dad and mom”. Then, the comma will have the form of selection (( G2 ) ( G1 , G2 )). Here, the subscript represents the priority of the connection.
Figure 3.26. Connect G joins multiple commas and the word “and”
In the example in Figure 3.26, the second comma used that form to connect to the first comma via the G connection (priority 2, because the G connection of the first comma already has precedence 1), then where the G connection with priority 1 is used to connect the second comma with the word “dad”, and the connection G with priority 2 is used to connect the second comma with the word “and” (Connect G with priority 1 was used to connect the word “and” with the word “mother”).
3.4. Conclude
The parsing problem is the crucial problem to be solved when building a new syntactic model. With the linking grammar model built for Vietnamese, the link parser of the thesis has solved the following problems:
- Parsing for single sentences.
- Parsing for compound sentences with multiple clauses.
- Solve quite completely the conjugate ambiguity problem.
- Test the component de-ambiguity algorithm.
The experimental results of the parsing algorithms are acceptable. However, due to the complexity of natural language as well as time constraints, the thesis has not solved the following issues:
- Parse sentence types where some elements have arbitrary positions. The nature of the associative grammar is the dependent type grammar, so this problem is not too difficult, although in some cases it may violate the flatness.
- Parsing for compound sentences without conjunctions. This problem also has the potential to be solved. When concluding a sentence is not syntactically correct, the parser has come up with all possible analyzes of every phrase in the sentence. Violation of analytic connectivity can be a sign of a missing conjunction. To fully solve this problem requires more in-depth study of the language as well as the large corpus.
- Parsing for complex sentences. This is also a very difficult problem with other languages and requires the use of statistical methods to find the bounds of the proposition. Hopefully, this problem will be solved in the future, when a large enough corpus is built.
Another development direction that is also of interest is the integration of semantic linkages in Vietnamese associative grammar. This is possible with the associative grammar model that allows the representation of sentence analysis by acyclic associative graphs, but this is also a big problem, requiring a lot of time investment.