Query Optimization in Distributed Databases

The relationship diagram between the tables is as follows:

3.2. Distributed database model, tools, programming languages

Maybe you are interested!

Query Processing and Optimization in Distributed Object-Oriented Databases - 17
Query Processing and Optimization in Distributed Object-Oriented Databases - 1
Distributed Database - 35
Evaluate the performance of some distributed hash table algorithms DHT and propose solutions to improve the performance of the CHORD algorithm - 1
Mapreduce And Hdfs (Optimal Features Of Mapreduce When Combined With Hdfs):hdfs Is Just A Distributed File System With Management Mechanisms Inside It.

The database is distributed across 3 stations, using SQL Server 2008 database management system.

• Station 1: includes 2 tables Customer (18484 records) and Geography (655 records).

• Station 2: IntenetSales table (60398 records)

• Station 3: Product table (606 records) Explains the steps of the algorithm.

Step 1: Split the query into queries on each workstation

Step 2: Find the maximum cost of data transmission (ignoring network bandwidth speed)

The cost of network data transmission is mainly the number of bytes to be queried. To reduce the communication cost, the cost on the largest workstation will not be transmitted.

Instead, smaller costs are passed on to perform the data integration process.

Main interface:

Workstation connection interface 1

Information to connect to station 1 includes: workstation IP (or machine name), database connection type (SQL server, Mysql, Microsoft Access, ...), login name, password, database name used.

Workstation connection interface 2.

The information of station 2 includes the same components as station 1.

Workstation connection interface 3.

The information of station 3 includes the same components as station 1 and station 2.

The application allows queries on each workstation, with queries only retrieving data on a single machine.

3.3. Applied algorithm

The program will install the test algorithm R* presented in section 2.4.2

3.4. Test results

Test 1: Query with the following question:

Select Customer. GeographyKey, Customer.FirstName, InternetSales.ProductKey, InternetSales.TotalProductCost From Customer, InternetSales

Where InternetSales.CustomerKey = Customer.CustomerKey

Query with centralized database:

- Time to execute query and display: 1093.75 Milliseconds

- Number of records returned is 60398

Querying a distributed database:

- Time to execute query and display: 1968.75 Miliseconds

- Number of records returned is 60398

The steps to execute the query are shown below:

Test 2: Query with the following question:

Select Customer.GeographyKey, Customer.FirstName, InternetSales.ProductKey, InternetSales.TotalProductCost, Product.EnglishProductName

From Customer, InternetSales, Product

Where InternetSales.CustomerKey = Customer.CustomerKey and Product.ProductKey = InternetSales.ProductKey and Customer.GeographyKey = 11 and Product.Color ='Silver'

Query with centralized database:

- Time to execute query and display: 62.5 Milliseconds

- Number of records returned is 17