A Greedy Algorithm for Distributed Multiway Spatial Join Scheduling using FM Linear-Integer Model
Abstract: A multiway spatial join is an important query in spatial databases, which has been widely used in many scientific applications. Because it is both data and computation-intensive, it may be processed in distributed systems, where each machine is responsible for processing a query fragment. The query fragment is a pair of data partitions aligned by a spatial predicate, which we will term as a task. For these tasks to be processed, they must be scheduled in such a way that the query load is evenly distributed across the machines, thus minimizing the makespan, or in other words, the total runtime. Moreover, processing a task requires data partitions to be transferred to the machine on which the task was scheduled, incurring a communication cost, which must also be minimized. However, if the data partition is already on the machine, no communication costs will be incurred. For this task-assignment problem, two models seek to solve it, the FM model, which considers the scenario described above, and the SM model, which assumes that the cost of communication will be incurred every time a task is processed in a machine. For the SM model, algorithms have been developed that find approximate feasible solutions. This work had as its object of study the effectiveness of the algorithms for multiway spatial scheduling. We proposed a greedy algorithm, called EBCVDMC, which uses the coefficient of variation to measure the dispersion of tasks load and to calculate how unbalanced the task distribution is. The solutions found by the algorithm were better than the ones provided by a standard linear relaxation (LP) method in 68 of 80 evaluated instances. Compared to a method based on Lagrangian Relaxation (LR), our algorithm provided competitive solutions in general and found better ones for 4 instances. As EBCVDMC complexity is O(nm^2), where n is the number of tasks and m the number of machines, it can be used in small multiway spatial query instances for which runtime constrains query scheduling time.
Keywords: Spatial Data; Distributed Processing; Scheduling; Greedy Algorithm.
Complete monograph. Copyright © 2019. All rights reserved.
Disclaimer: Although the student carefully wrote the original abstract, and it was revised and improved, English is not him or the advisor' mother language. The original work is written in Portuguese.
Citation: Ronaldo Nogueira de Sousa. Algoritmo Guloso para Escalonamento de Multijunções Espaciais em Sistemas Distribuídos usando o Modelo FM. Monograph. Bacharelado em Ciências da Computação. Universidade Federal de Goiás, Regional Jataí. Jataí, GO, Brasil. 2019. 53p.
Copy citation in bibtex format.