The COUNT function takes as inputs the two words L and R and a pair of two pointers to join lists: l points to a connection in the list to the right of a selection of L and r points to a connection belonging to L and R. list to the left of a selection form of R. COUNT returns a number, that is, the number of ways to draw the connections of words from L to from R, with the connections in the list pointed to by l and r.
The calculated result of the COUNT function at each call is stored in a hash array (just before returning). In the next time, the result of the previous calculation is looked up in the hash table. Since there is a hash table that stores the results, the runtime cost is O(c2 d) where d is the number of pools and c is the number of connections. For a given grammar, d = O(n) and c = O(n), so the running time is O(n3).
3.1.2. Trimming comb
With the formulas presented in Chapter 3, in order to cover Vietnamese syntactic phenomena, the number of selection forms to consider is very large. However, most conjunctions are not used because they contain connections that do not match a connection of a word in the sentence. In particular, suppose a word W has the form d with the connection C in the list on the right. If none of the words to the right of W have a join on the left that matches C, the candidate form d cannot be included in any correct analysis. Therefore, this type of recruitment can be deleted without changing the results of the association analysis. The deletion of that selection is called the pruning step [111].
The trimming comb process is divided into two steps: the trimming comb and the strong trimming comb.
Trimming comb
Sequentially traverse the words in the sentence from left to right, then right to left, and so on until no more forms can be eliminated.
Assume the mth word in the sentence under consideration. The set S of connections in the list must be in a collection of words 1,…, m – 1 stored in a hash table, with the hash using the initial capital letters of the connection name. This will save you a lot of time looking for a match to match it.
Maybe you are interested!
- Link Building Based On Adjective Structure
- Vietnamese linking grammar model - 13
- Application Algorithm To Expand Vietnamese Dictionary
- Vietnamese linking grammar model - 16
- Vietnamese linking grammar model - 17
- Vietnamese linking grammar model - 18
In fact, the parsing process of [111] shows that it never takes more than five passes to finish the pruning process.
Strong Trimming Comb
Call a connection shallow if it is the first connection in its list of connections.
In contrast the connection is deep. Strong pruning comb is based on the following criteria:
1. The nearest word criterion must be satisfied for both connections to form the association.
2. There cannot be a link between two deep connections.
3. Two connections of a link between two adjacent words must be the last connection in their list.
4. Two connections of a link between two non-adjacent words cannot be the last connection in their list at the same time (except in the case of a large connection).
Trimming comb on expression tree
Although according to [111], after building all the newly-trimmed categories, the thesis has chosen the method of English association analyzers [137], which is to build a tree representing the association formulas. the concatenation of each word, and then pruning the tree before constructing the aggregates. This treatment allows for much faster execution than the one introduced in [111].
If the connection name is treated as operands, &, or, xor are operators, the join formula has the same structure as an arithmetic expression ({X} is converted to X or()). Figure 3.5.below depicts a tree representing the association formula
Figure 3.5. Formula tree (NN- &{NN+}) or ({PqNt-} & {NN+})
When traversing the association of words for pruning as introduced in Chapter 4, if we find a connection that does not match any of the connections on the right, we will remove the nodes in the tree according to the following rules:
- If a child node of the node labeled “&” is deleted, delete the node.
- If the node labeled “or” “xor” has no children, remove the node.
In addition, the following three rules should be applied sequentially.
- If more than one node labeled “( )” is a child of a certain node, keep only one node
- If a node labeled “&” has more than one child node including a node labeled “( )” then it will be removed from the tree.
- If a node is labeled “&” or “or” , “xor” contains only one child node, replace its label with the child node’s label.
Of course, the process is still performed in the order of left → right then right → left, etc. The result is the same as the result of two steps of pruning comb and strong pruning comb, but the execution speed is much faster.
The effect of trimming comb in Vietnamese
Because Vietnamese does not change morphology, tenses, forms, numbers … are expressed by adding words, so the number of initial forms of each word, especially nouns and verbs, is much larger than that of English. . However, the pruning algorithms are very effective: after two strong pruning and combing processes, the number of selection forms is only equivalent to English, and there is no need for any sentences in the thesis’s example set. 5 times trimming comb.
Figure 3.6. Number of types after trimming combs and strong trimming combs
In figure 3.6. is a pruning result image made by the analyzer with the phrase “we want to win titles”.
3.1.3. Test results for analyzing simple sentences and simple compound sentences
Link parser built in Java, working on Windows environment. To test the parser model according to [111], the thesis has collected 200 sentences, typical for different types from articles on the Internet on a number of topics: Vietnamese conversation, science casual, sport, travel. Below are the results of the program implementation with the sentence “We want to win titles”
Figure 3.7. Results of the link analysis of the sentence “We want to win titles”