4376

Consider the Employees and Departments relations described in Exercise 21.3. They are now stored in a distributed DBMS with all of Employees stored at Naples and all of Departments stored at Berlin. There are no indexes on these relations. The cost of various operations is as described in Exercise 21.3. Consider the query:

The query is posed at Delhi, and you are told that only 1 percent of employees are managers. Find the cost of answering this query using each of the following plans:

1. Compute the query at Naples by shipping Departments to Naples; then ship the result to Delhi.

2. Compute the query at Berlin by shipping Employees to Berlin; then ship the result to Delhi.

3. Compute the query at Delhi by shipping both relations to Delhi.

4. Compute the query at Naples using Bloomjoin; then ship the result to Delhi.

5. Compute the query at Berlin using Bloomjoin; then ship the result to Delhi.

6. Compute the query at Naples using Semijoin; then ship the result to Delhi.

7. Compute the query at Berlin using Semijoin; then ship the result to Delhi.

Exercise 21.3

Consider a parallel DBMS in which each relation is stored by horizontally partitioning its tuples across all disks.

The mgrid field of Departments is the eid of the manager. Each relation contains 20-byte tuples, and the sal and budget fields both contain uniformly distributed values in the range 0 to 1,000,000. The Employees relation contains 100,000 pages, the Departments relation contains 5,000 pages, and each processor has 100 buffer pages of 4,000 bytes each. The cost of one page I/O is td, and the cost of shipping one page is ts; tuples are shipped in units of one page by waiting for a page to be filled before sending a message from processor i to processor j. There are no indexes, and all joins that are local to a processor are carried out using a sort-merge join. Assume that the relations are initially partitioned using a round-robin algorithm and that there are 10 processors.

For each of the following queries, describe the evaluation plan briefly and give its cost in terms of td and ts. You should compute the total cost across all sites as well as the ‘elapsed time’ cost (i.e., if several operations are carried out concurrently, the time taken is the maximum over these operations).

1. Find the highest paid employee.

2. Find the highest paid employee in the department with did 55.

3. Find the highest paid employee over all departments with budget less than 100,000.

4. Find the highest paid employee over all departments with budget less than 300,000.

5. Find the average salary over all departments with budget less than 300,000.

6. Find the salaries of all managers.

7. Find the salaries of all managers who manage a department with a budget less than 300,000 and earn more than 100,000.

8. Print the of all employees, ordered by increasing salaries. Each processor is connected to a separate printer, and the answer can appear as several sorted lists, each printed by a different processor, as long as we can obtain a fully sorted list by concatenating the printed lists (in some order).