Complexity Assessment and Algorithm Implementation

Algorithm for Preparing Summary Report on Implementation Status of Consulting Contracts
Qos Assurance Methods for Multimedia Communications zt2i3t4l5ee zt2a3gs zt2a3ge zc2o3n4t5e6n7ts low. The EF PHB requires a sufficiently large number of output ports to provide low delay, low loss, and low jitter. EF PHBs can be implemented if the output port's bandwidth is sufficiently large, combined with small buffer sizes and other network resources dedicated to EF packets, to allow the router's service rate for EF packets on an output port to exceed the arrival rate λ of packets at that port. This means that packets with PHB EF are considered with a pre-allocated amount of output bandwidth and a priority that ensures minimum loss, minimum delay and minimum jitter before being put into operation. PHB EF is suitable for channel simulation, leased line simulation, and real-time services such as voice, video without compromising on high loss, delay and jitter values. Figure 2.10 Example of EF installation Figure 2.10 shows an example of an EF PHB implementation. This is a simple priority queue scheduling technique. At the edges of the DS domain, EF packet traffic is prioritized according to the values agreed upon by the SLA. The EF queue in the figure needs to output packets at a rate higher than the packet arrival rate λ. To provide an EF PHB over an end-to-end DS domain, bandwidth at the output ports of the core routers needs to be allocated in advance to ensure the requirement μ > λ. This can be done by a pre-configured provisioning process. In the figure, EF packets are placed in the priority queue (the upper queue). With such a length, the queue can operate with μ > λ. Since EF was primarily used for real-time services such as voice and video, and since real-time services use UDP instead of TCP, RED is generally not suitable for EF queues because applications using UDP will not respond to random packet drop and RED will strip unnecessary packets. 2.2.4.2 Assured Forwarding (AF) PHB PHB AF is defined by RFC 2597. The purpose of PHB AF is to deliver packets reliably and therefore delay and jitter are considered less important than packet loss. PHB AF is suitable for non-real-time services such as applications using TCP. PHB AF first defines four classes: AF1, AF2, AF3, AF4. For each of these AF classes, packets are then classified into three subclasses with three distinct priority levels. Table 2.8 shows the four AF classes and 12 AF subclasses and the DSCP values for the 12 AF subclasses defined by RFC 2597. RFC 2597 also allows for more than three separate priority levels to be added for internal use. However, these separate priority levels will only have internal significance. PHB Class PHB Subclass Package type DSCP AF4 AF41 Short 100010 AF42 Medium 100100 AF43 High 100110 AF3 AF31 Short 011010 AF32 Medium 011100 AF33 High 011110 AF2 AF21 Short 010010 AF22 Medium 010100 AF23 High 010110 AF1 AF11 Short 001010 AF12 Medium 001100 AF13 High 001110 Table 2.8 AF DSCPs The AF PHB ensures that packets are forwarded with a high probability of delivery to the destination within the bounds of the rate agreed upon in an SLA. If AF traffic at an ingress port exceeds the pre-priority rate, which is considered non-compliant or “out of profile”, the excess packets will not be delivered to the destination with the same probability as the packets belonging to the defined traffic or “in profile” packets. When there is network congestion, the out of profile packets are dropped before the in profile packets are dropped. When service levels are defined using AF classes, different quantity and quality between AF classes can be realized by allocating different amounts of bandwidth and buffer space to the four AF classes. Unlike EF, most AF traffic is non-real-time traffic using TCP, and the RED queue management strategy is an AQM (Adaptive Queue Management) strategy suitable for use in AF PHBs. The four AF PHB layers can be implemented as four separate queues. The output port bandwidth is divided into four AF queues. For each AF queue, packets are marked with three “colors” corresponding to three separate priority levels. In addition to the 32 DSCP 1 groups defined in Table 2.8, 21 DSCPs have been standardized as follows: one for PHB EF, 12 for PHB AF, and 8 for CSCP. There are 11 DSCP 1 groups still available for other standards. 2.2.5.Example of Differentiated Services We will look at an example of the Differentiated Service model and mechanism of operation. The architecture of Differentiated Service consists of two basic sets of functions: Edge functions: include packet classification and traffic conditioning. At the inbound edge of the network, incoming packets are marked. In particular, the DS field in the packet header is set to a certain value. For example, in Figure 2.12, packets sent from H1 to H3 are marked at R1, while packets from H2 to H4 are marked at R2. The labels on the received packets identify the service class to which they belong. Different traffic classes receive different services in the core network. The RFC definition uses the term behavior aggregate rather than the term traffic class. After being marked, a packet can be forwarded immediately into the network, delayed for a period of time before being forwarded, or dropped. We will see that there are many factors that affect how a packet is marked, and whether it is forwarded immediately, delayed, or dropped. Figure 2.12 DiffServ Example Core functionality: When a DS-marked packet arrives at a Diffservcapable router, the packet is forwarded to the next router based on Per-hop behavior is associated with packet classes. Per-hop behavior affects router buffers and the bandwidth shared between competing classes. An important principle of the Differentiated Service architecture is that a router's per-hop behavior is based only on the packet's marking or the class to which it belongs. Therefore, if packets sent from H1 to H3 as shown in the figure receive the same marking as packets from H2 to H4, then the network routers treat the packets exactly the same, regardless of whether the packet originated from H1 or H2. For example, R3 does not distinguish between packets from h1 and H2 when forwarding packets to R4. Therefore, the Differentiated Service architecture avoids the need to maintain router state about separate source-destination pairs, which is important for network scalability. Chapter Conclusion Chapter 2 has presented and clarified two main models of deploying and installing quality of service in IP networks. While the traditional best-effort model has many disadvantages, later models such as IntServ and DiffServ have partly solved the problems that best-effort could not solve. IntServ follows the direction of ensuring quality of service for each separate flow, it is built similar to the circuit switching model with the use of the RSVP resource reservation protocol. IntSer is suitable for services that require fixed bandwidth that is not shared such as VoIP services, multicast TV services. However, IntSer has disadvantages such as using a lot of network resources, low scalability and lack of flexibility. DiffServ was born with the idea of solving the disadvantages of the IntServ model. DiffServ follows the direction of ensuring quality based on the principle of hop-by-hop behavior based on the priority of marked packets. The policy for different types of traffic is decided by the administrator and can be changed according to reality, so it is very flexible. DiffServ makes better use of network resources, avoiding idle bandwidth and processing capacity on routers. In addition, the DifServ model can be deployed on many independent domains, so the ability to expand the network becomes easy. Chapter 3: METHODS TO ENSURE QoS FOR MULTIMEDIA COMMUNICATIONS In packet-switched networks, different packet flows often have to share the transmission medium all the way to the destination station. To ensure the fair and efficient allocation of bandwidth to flows, appropriate serving mechanisms are required at network nodes, especially at gateways or routers, where many different data flows often pass through. The scheduler is responsible for serving packets of the selected flow and deciding which packet will be served next. Here, a flow is understood as a set of packets belonging to the same priority class, or originating from the same source, or having the same source and destination addresses, etc. In normal state when there is no congestion, packets will be sent as soon as they are delivered. In case of congestion, if QoS assurance methods are not applied, prolonged congestion can cause packet drops, affecting service quality. In some cases, congestion is prolonged and widespread in the network, which can easily lead to the network being "frozen", or many packets being dropped, seriously affecting service quality. Therefore, in this chapter, in sections 3.2 and 3.3, we introduce some typical network traffic load monitoring techniques to predict and prevent congestion before it occurs through the measure of dropping (removing) packets early when there are signs of impending congestion. 3.1. DropTail method DropTail is a simple, traditional queue management method based on FIFO mechanism. All incoming packets are placed in the queue, when the queue is full, the later packets are dropped. Due to its simplicity and ease of implementation, DropTail has been used for many years on Internet router systems. However, this algorithm has the following disadvantages: − Cannot avoid the phenomenon of “Lock out”: Occurs when 1 or several traffic streams monopolize the queue, making packets of other connections unable to pass through the router. This phenomenon greatly affects reliable transmission protocols such as TCP. According to the anti-congestion algorithm, when locked out, the TCP connection stream will reduce the window size and reduce the packet transmission speed exponentially. − Can cause Global Synchronization: This is the result of a severe “Lock out” phenomenon. Some neighboring routers have their queues monopolized by a number of connections, causing a series of other TCP connections to be unable to pass through and simultaneously reducing the transmission speed. After those monopolized connections are temporarily suspended, Once the queue is cleared, it takes a considerable amount of time for TCP connections to return to their original speed. − Full Queue phenomenon: Data transmitted on the Internet often has an explosion, packets arriving at the router are often in clusters rather than in turn. Therefore, the operating mechanism of DropTail makes the queue easily full for a long period of time, leading to the average delay time of large packets. To avoid this phenomenon, with DropTail, the only way is to increase the router's buffer, this method is very expensive and ineffective. − No QoS guarantee: With the DropTail mechanism, there is no way to prioritize important packets to be transmitted through the router earlier when all are in the queue. Meanwhile, with multimedia communication, ensuring connection and stable speed is extremely important and the DropTail algorithm cannot satisfy. The problem of choosing the buffer size of the routers in the network is to “absorb” short bursts of traffic without causing too much queuing delay. This is necessary in bursty data transmission. The queue size determines the size of the packet bursts (traffic spikes) that we want to be able to transmit without being dropped at the routers. In IP-based application networks, packet dropping is an important mechanism for indirectly reporting congestion to end stations. A solution that prevents router queues from filling up while reducing the packet drop rate is called dynamic queue management. 3.2. Random elimination method – RED 3.2.1 Overview RED (Random Early Detection of congestion; Random Early Drop) is one of the first AQM algorithms proposed in 1993 by Sally Floyd and Van Jacobson, two scientists at the Lawrence Berkeley Laboratory of the University of California, USA. Due to its outstanding advantages compared to previous queue management algorithms, RED has been widely installed and deployed on the Internet. The most fundamental point of their work is that the most effective place to detect congestion and react to it is at the gateway or router. Source entities (senders) can also do this by estimating end-to-end delay, throughput variability, or the rate of packet retransmissions due to drop. However, the sender and receiver view of a particular connection cannot tell which gateways on the network are congested, and cannot distinguish between propagation delay and queuing delay. Only the gateway has a true view of the state of the queue, the link share of the connections passing through it at any given time, and the quality of service requirements of the traffic flows. The RED gateway monitors the average queue length, which detects early signs of impending congestion (average queue length exceeding a predetermined threshold) and reacts appropriately in one of two ways: − Drop incoming packets with a certain probability, to indirectly inform the source of congestion, the source needs to reduce the transmission rate to keep the queue from filling up, maintaining the ability to absorb incoming traffic spikes. − Mark “congestion” with a certain probability in the ECN field in the header of TCP packets to notify the source (the receiving entity will copy this bit into the acknowledgement packet). Figure 3. 1 RED algorithm The main goal of RED is to avoid congestion by keeping the average queue size within a sufficiently small and stable region, which also means keeping the queuing delay sufficiently small and stable. Achieving this goal also helps: avoid global synchronization, not resist bursty traffic flows (i.e. flows with low average throughput but high volatility), and maintain an upper bound on the average queue size even in the absence of cooperation from transport layer protocols. To achieve the above goals, RED gateways must do the following: − The first is to detect congestion early and react appropriately to keep the average queue size small enough to keep the network operating in the low latency, high throughput region, while still allowing the queue size to fluctuate within a certain range to absorb short-term fluctuations. As discussed above, the gateway is the most appropriate place to detect congestion and is also the most appropriate place to decide which specific connection to report congestion to. − The second thing is to notify the source of congestion. This is done by marking and notifying the source to reduce traffic. Normally the RED gateway will randomly drop packets. However, if congestion If congestion is detected before the queue is full, it should be combined with packet marking to signal congestion. The RED gateway has two options: drop or mark; where marking is done by marking the ECN field of the packet with a certain probability, to signal the source to reduce the traffic entering the network. − An important goal that RED gateways need to achieve is to avoid global synchronization and not to resist traffic flows that have a sudden characteristic. Global synchronization occurs when all connections simultaneously reduce their transmission window size, leading to a severe drop in throughput at the same time. On the other hand, Drop Tail or Random Drop strategies are very sensitive to sudden flows; that is, the gateway queue will often overflow when packets from these flows arrive. To avoid these two phenomena, gateways can use special algorithms to detect congestion and decide which connections will be notified of congestion at the gateway. The RED gateway randomly selects incoming packets to mark; with this method, the probability of marking a packet from a particular connection is proportional to the connection's shared bandwidth at the gateway. − Another goal is to control the average queue size even without cooperation from the source entities. This can be done by dropping packets when the average size exceeds an upper threshold (instead of marking it). This approach is necessary in cases where most connections have transmission times that are less than the round-trip time, or where the source entities are not able to reduce traffic in response to marking or dropping packets (such as UDP flows). 3.2.2 Algorithm This section describes the algorithm for RED gateways. RED gateways calculate the average queue size using a low-pass filter. This average queue size is compared with two thresholds: minth and maxth. When the average queue size is less than the lower threshold, no incoming packets are marked or dropped; when the average queue size is greater than the upper threshold, all incoming packets are dropped. When the average queue size is between minth and maxth, each incoming packet is marked or dropped with a probability pa, where pa is a function of the average queue size avg; the probability of marking or dropping a packet for a particular connection is proportional to the bandwidth share of that connection at the gateway. The general algorithm for a RED gateway is described as follows: [5] For each packet arrival Caculate the average queue size avg If minth ≤ avg < maxth div.maincontent .s1 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 15pt; } div.maincontent .s2 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 15pt; } div.maincontent .p { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; margin:0pt; } div.maincontent p { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; margin:0pt; } div.maincontent .s3 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .s4 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s5 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s6 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .s7 { color: black; font-family:Wingdings; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s8 { color: black; font-family:Arial, sans-serif; font-style: italic; font-weight: bold; text-decoration: none; font-size: 15pt; } div.maincontent .s9 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .s10 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 9pt; vertical-align: 6pt; } div.maincontent .s11 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 13pt; } div.maincontent .s12 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10pt; } div.maincontent .s13 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-d

T 3.1

1 2 3		T 6.1	1 2 3 4 5 6
T 3.2		1 2 5

Maybe you are interested!

We have:

- T 2,1 = T 1,1  T 1,2 ;

- T 3.3 = T 2.4  T 1.2 = T 2.5  T 1.2

T 5,1 = T 4,1  T 1,5 = T 4,2  T 1,4 = T 3,2  T 2,4 = T 4,4  T 1,1

3.4.4. Optimization principle

This section will show that the distributed tree query optimization problem obeys the optimization principle [34]. Suppose that an i- class query T i,j is a subquery of a tree query T with n classes and that it is divided into an r -class subquery T r,p and an (ir) -class subquery T i-r,q . Among all possible plans to deliver the result of T r,p at station x, suppose there is a plan a u(x). Among all possible plans to deliver the result of T i-r,q at station x, suppose there is a plan b v(x). Let Cost(p) be the cost of plan p. The execution plans for implicit joins and transfers are generated by the following functions.

- The function buildJPlan x (a u(x), b v(x) ) combines plans a u(x) and b v(x) into a plan in which the results of T r,p and T i-r,q at station x are implicitly connected at this station, that is buildJPlan x (a u(x), b v(x) ) = IJ x (a u(x), b v(x) ). The notation c w(x) is the plan generated by the above function. By executing this plan, the results of T i,j are explicitly calculated at station x.

- The function buildTPlan x,t (c w(x) ) extends the plan into another plan in which the results of T i,j at station x are transmitted to station t, buildTPlan xt (c w(x) ) =TR x,t (c w(x) ). Executing this plan will obtain the results of T i,j at station t.

Since the results of T r,p and T i-r,q must be at the same station to be implicitly joined, the cost of the plan generated by buildJPlan x (a u(x), b v(x) ) is the sum of the costs of fetching the results of the queries T r,p and T i-r,q at station x and the cost of the implicit join at this station. Similarly, the cost of the plan generated by buildTPlan x,t (c w(x) ) is the sum of the costs of the results

explicitly compute the query at station x and the cost of transmitting the result of query T i,j from station x to station t. Therefore, we have:

Cost(buildJPlan x (a u(x), b v(x) ))=Cost(a u(x) )+Cost(b v(x) )+join x (T r,p ,T i-r,q ) (3.6) Cost(buildTPlan x,t (c w(x) )) = Cost(c w(x) )) + trans x,t (T ij ) (3.7) where join x (T r,p ,T i-r,q ) is the cost of the implicit join between the results T r,p and

T i-r,q at station x and trans x, t (T i,j ) is the cost of transmitting the result of T i,j from station x to station t.

To select an optimized plan among the plans generated by buildJPlan and buildTPlan respectively, the optimization principle is used: If the plans differ only in their subplans, the plan with the optimal subplan is the optimal plan. This principle can be expressed formally as follows:

Cost(a l(x) )<=( ∀ u) Cost(a u(x) ) and Cost(b l(x) )<=( ∀ v) Cost(b v(x) )

=>Cost(buildJPlan x (a l(x), b l(x) ))<=( ∀ u,v) Cost(buildJPlan x (a u(x), b v(x) )) (3.8) Cost (c l(x) ))<=( ∀ w) Cost(c w(x) ))

=> Cost(buildTPlan x,t (c l(x) ))<=( ∀ w) Cost(buildTPlan x,t(c w(x) )) (3.9)

Where the sub-plans a l(x), b l(x) and c l(x) are the lowest cost plans among the plans a u(x), b v(x) and c w(x) respectively .

Theorem 3.1: The cost model for the implicit connection plan and the transmission plan satisfies the optimization principle.

Proof: To prove the theorem, we prove expressions (3.8) and (3.9). The proof for formula (3.8) is as follows, where steps 1 and 3 follow the definition of cost of buildJPlan and step 2 is due to the assumption that Cost(a l(x) )<=Cost(a u(x) )

and Cost(b l(x) )<=Cost(b v(x) ) for arbitrary values of u and v.

Cost(buildJPlan x (a l(x), b l(x) )) = Cost(a l(x) ) + Cost(b l(x) ) + join x (T r,p , T i-r,q )

<= Cost(a u(x) ) + Cost(b v(x) ) + join x (T r,p , T i-r,q )

<= Cost(buildJPlan x (a u(x), b v(x) ))

The proof of formula (3.9) is as follows, where step 1 and step

3 according to the cost definition of buildTPlan and step 2 is due to the assumption that Cost(c l(x) )<=Cost(c w(x) ) for arbitrary w values.

Cost(buildTPlan x,t (c l(x) )) = Cost(c l(x) )) + trans x,t (T i,j )

<= Cost(c w(x) )) + trans x,t (T i,j )

<= Cost(buildTPlan x,t (c w(x) ))

3.4.5. PathExpOpt optimization algorithm

The algorithm consists of the following three steps:

 The first step is initialization

 The next step is to find the optimal solution for the induced subtrees, from trees with one vertex to trees with n vertices.

 The final step executes the path expression according to the found optimal plan.

a. Initialization

 In the initialization section, reduce the classes (or class fragments) by projections on the attributes that need to be left, these attributes belong to one of the following three types:

o OIDs of classes

o Complex attributes used for concatenation

o Attributes to select after query

 Generate sympathetic plantlets and calculate the size of the sympathetic plantlets.

b. Find the optimal solution

- Step 1: Initialize the cost of single-vertex trees, the cost of single-vertex trees at each station is the cost of transmitting vertices to this station.

- Step 2: Build optimal solutions for trees from 2 vertices.

o Step 2.1: Find the optimal solution to implement T i, j through the joins. For each tree T i, j (tree j has i vertices), search all ways to split the tree into 2 induced subtrees T r , p and T i-r, q , find the way to split that has the smallest cost T i, j . The cost T i, j at station t is calculated by the sum of the costs T r , p and T i-r, q at station t and the cost of joining these 2 trees at station t. Save the optimal implementation solution.

o Step 2.2: Find the optimal implementation plan for T i, j through the transmission operations. The cost of T i, j at station t will be recalculated if this cost is greater than the total cost of T i, j at station x and the transmission cost of T i, j from station x to station t. Save the optimal implementation plan.

Algorithm 3.5 PathExpOpt Input:

- G =(V, E); V={F 1 , F 2 ,…, Fn};

- fragSize: Array that stores the size of the fragments

- fragSite: The array of stations where the fragments are located

- transCost: Array to store costs between stations

- T: The induced subtrees of the original query tree

- treeSize: Array of tree sizes, calculated when the tree is generated.

- k: Query generating station

Output: Optimal execution plan for the query tree (result) Algorithm steps:

//Step 1: Initialize the cost of 1-vertex subtrees for (u=1; u<=n; u++) //n is the number of pieces

begin

x = fragSite u ;

for (t = 1; t<=s; t++) //s is the number of stations Cost_Trans t (T 1,u )= fragSize u * transCost x,t ;

end; {for u}

//Step 2: Build optimal solutions for trees from 2 vertices for (i=2; i<=n; i++)

//Step 2.1: Find the optimal solution to implement T i,j through the joins for (j=1; ji ; j++) //m i is the number of induced seedlings with i vertices

for (t = 1; t<=s; t++)

JoinPlan t (T i,j )= Find_JoinPlan(T i,j ); end; end; {for t}

end; {for j}

//Step 2.2: Find the optimal solution to implement T i,j through propagation

for (t = 1; t<=s; t++) for (j=1; ji ; j++)

TransPlan t (T i,j )= Find_TransPlan t (T i,j ); end; end; {for j}

end; {for t} end; {for i}

result = TransPlan k (T n,1 ); //k is the station that generates the query

end; {Algorithm 3.5}

Algorithm 3.6 Find_JoinPlan(T i,j )

//Find the optimal solution to implement T i,j through the connections Input :

- G =(V, E); V={F 1 , F 2 ,…, Fn};

- transCost: Array to store costs between stations

- T: The induced subtrees of the original query tree

- treeSize: Array of sizes of the subtrees, calculated when the tree is generated

Output: JoinPlan t (T i,j ). Steps of the algorithm:

//Find the smallest cost of the induced subtree T i,j at the stations Cost_Join t (T i,j )=∞ ;

// Browse tree splitting methods

for (each (T r,p and T i-r,q ) that T i,j = T r, p ᴜ T i-r,q ) begin

//calculate the cost of joining T r,p and T i-r,q according to the Bloom filter Cost_Temp = cost_BloomJoin t (join(T r,p ,T i-r,q );

if (Cost_Join t (T i,j )>cost_Temp) begin

//save small cost plan Cost_Join t (T i,j )= cost_Temp; save JoinPlan t (T i,j );

end; {if}

end; {for each}

end; {Algorithm 3.6}

Algorithm 3.7 Find_TransPlan(T i,j ).

//Find the optimal solution to implement T i,j through the following operations Input:

- G =(V, E); V={F 1 , F 2 ,…, Fn};

- transCost: Array to store costs between stations

- T: The induced subtrees of the original query tree

- treeSize: Array of sizes of the subtrees, calculated when the tree is generated

Output: TransPlan t (T i,j ) Steps of the algorithm:

//Consider if there is a solution where the total cost of joining on a station is

//that plus the transmission cost to this station is smaller then replace Cost_Trans t (T i,j )= ∞ ;

for (x = 1; x<=s; x++)

begin

Cost_Temp = Cost_Join x (T i,j )+ treeSize(T i,j )* transCost x,t; if (Cost_Trans t (T i,j )>cost_Temp)

begin

//replace the connection at t with a connection at x and then pass it to t Cost_Temp t (T i,j )= cost_Temp;

save TransPlan t (T i,j ); end; end; {if}

end; {for x}

end; {Algorithm 3.7}

c. Execute the query according to the optimal plan

By going backwards from the tree Tn ,1 will find the optimal plan and execute the query according to this optimal plan. During the execution, Bloom filter is used to transfer data.

3.4.6. Complexity assessment and algorithm implementation

The complexity of the algorithm depends on the number of layers, the number of stations and the number of generated subtrees. In the best case, the query graph is a string graph (corresponding to a string query), then the number of generated subtrees with i vertices is n-i+1, then the number of join plans and transfer plans is:

𝑛 𝑛−𝑖+1 𝑠

𝑖−1

𝑠 𝑛−𝑖+1 𝑠

(𝑛 − 1)𝑛(𝑛 + 1) (𝑛 − 1)𝑛

∑( ∑ ∑ ∑ 𝑟 + ∑ ∑ ∑ 𝑥) = 𝑠 + 𝑠 2

(3.10)

𝑖=2

𝑗=1

𝑡=1 𝑟=1

𝑡=1

𝑗=1

6 2

𝑥=1

𝑖−1

Thus, the best case complexity of the algorithm is O(sn 3 ). In the worst case, the query graph is a star graph (corresponding to a star query), the number of induced subtrees is 𝐶 𝑛−1 , the complexity of the algorithm in this case is

O(s2 n-1 ).

𝑛

𝑛−1

𝐶

𝑖−1 𝑠

𝑖−1

𝑛−1

𝐶

𝑠 𝑖−1 𝑠

∑( ∑ ∑ ∑ 𝑟 + ∑ ∑ ∑ 𝑥) = 𝑠(𝑛 − 1)2 𝑛−2 + 𝑠 2 (2 𝑛−1 − 1) (3.11)

𝑖=2

𝑗=1

𝑡=1 𝑟=1

𝑡=1

𝑗=1 𝑥=1

In the case of a query graph with many induced subgraphs, a heuristic can be used for the algorithm by considering only the ways to split T i,j into trees T 1,p and T i-1,q . Experiments show that the execution time of the algorithm for a 20-vertex star graph from 15 minutes when using the above heuristic is only less than 1 minute, the resulting solution is a near-optimal solution with a cost not much different from the optimal solution.

The algorithm was implemented in Java and tested with path expression queries involving a class number of 20.

3.4.7. Experimental results

An example of experimental data and the generated results is shown in Figure 3.6. The experimental data consists of a set of 20 fragments belonging to classes with predetermined sizes and stations, the connection graph is a star-shaped graph with many nodes, the connection network consists of 4 stations with a predetermined cost of transmitting each unit of data between 2 stations.

T[11,2492]: 4,7,5,6,10,11,16,17,18,19,20

1: T[1,4] + T[10,2312] = [12400, 17000, 25400, 4600]

2: T[1,5] + T[10,1850] = [6800, 8000, 5600, 8500]

3: T[1,16] + T[10,1388] = [6600, 5500, 7200, 9800]

4: T[1,17] + T[10,1387] = [6400, 7000, 5600, 7300]

5: T[1,18] + T[10,1386] = [14200, 18300, 27200, 6400]

6: T[1,19] + T[10,1385] = [6800, 8000, 5600, 9100]

7: T[1,20] + T[10,1384] = [8200, 5500, 10400, 20200]

8: T[3,5] + T[8,1346] = [7100, 6900, 7900, 5600]

9: T[4,6] + T[7,950] = [7600, 6900, 7800, 6100]

10: T[5,10] + T[6,515] = [8200, 8100, 7800, 6700]

- JOIN: [6400, 5500, 5600, 4600]

- J_PLAN: [[1,17]+[10,1387], [1,16]+[10,1388],

[1,5]+[10,1850], [1,4]+[10,2312]]

- TRANS: [5800, 5500, 5600, 4600]

- T_PLAN: [3, 0, 0, 0]

JOIN_2(F2,JOIN_2(F9,JOIN_2(F13,JOIN_2(F16,JOIN_2(F20,TRANS_1->2(JOIN_1(F8,JOIN_1(F14, JOIN_1(F15,TRANS_3->1(JOIN_3(F3,JOIN_3(TRANS_1->3(F1),JOIN_3(F5,JOIN_3(F17,JOIN_3(F19, TRANS_1->3(JOIN_1(TRANS_3->1(F12),TRANS_4->1(JOIN_4(F4,JOIN_4(F18,JOIN_4(F11,

TRANS_1->4(JOIN_1(F7,TRANS_2->1(JOIN_2(F6,TRANS_3->2(F10))))))))))))))))))))))))))

Figure 3.6: Illustration of test results

Comment