Query Optimization in Distributed Databases - 11

The relationship diagram between the tables is as follows:



3.2. Distributed database model, tools, programming languages

Maybe you are interested!

The database is distributed across 3 stations, using SQL Server 2008 database management system.

Station 1: includes 2 tables Customer (18484 records) and Geography (655 records).

Station 2: IntenetSales table (60398 records)

Station 3: Product table (606 records) Explains the steps of the algorithm.

Step 1: Split the query into queries on each workstation

Step 2: Find the maximum cost of data transmission (ignoring network bandwidth speed)

The cost of network data transmission is mainly the number of bytes to be queried. To reduce the communication cost, the cost on the largest workstation will not be transmitted.

Instead, smaller costs are passed on to perform the data integration process.

Main interface:


Workstation connection interface 1

Information to connect to station 1 includes: workstation IP (or machine name), database connection type (SQL server, Mysql, Microsoft Access, ...), login name, password, database name used.

Workstation connection interface 2.

The information of station 2 includes the same components as station 1.


Workstation connection interface 3.

The information of station 3 includes the same components as station 1 and station 2.

The application allows queries on each workstation, with queries only retrieving data on a single machine.


3.3. Applied algorithm

The program will install the test algorithm R* presented in section 2.4.2

3.4. Test results

Test 1: Query with the following question:

Select Customer. GeographyKey, Customer.FirstName, InternetSales.ProductKey, InternetSales.TotalProductCost From Customer, InternetSales

Where InternetSales.CustomerKey = Customer.CustomerKey


Query with centralized database:

- Time to execute query and display: 1093.75 Milliseconds

- Number of records returned is 60398



Querying a distributed database:

- Time to execute query and display: 1968.75 Miliseconds

- Number of records returned is 60398



The steps to execute the query are shown below:


Test 2: Query with the following question:

Select Customer.GeographyKey, Customer.FirstName, InternetSales.ProductKey, InternetSales.TotalProductCost, Product.EnglishProductName

From Customer, InternetSales, Product

Where InternetSales.CustomerKey = Customer.CustomerKey and Product.ProductKey = InternetSales.ProductKey and Customer.GeographyKey = 11 and Product.Color ='Silver'

Query with centralized database:

- Time to execute query and display: 62.5 Milliseconds

- Number of records returned is 17



Querying a distributed database:

- Time to execute query and display: 875 Milliseconds

- Number of records returned is 17


The steps to execute the query are shown below:


Comment


Agree Privacy Policy *