Posts
-
Scheduling distributed multiway spatial join queries: optimization models and algorithms
-
Comparative Analysis of the AB Histogram for Window Queries in Line and Polygon Spatial Datasets
-
Teaching functional memory architecture, illusions, beliefs, and techniques for effective learning: becoming a sophisticated learner
-
A Greedy Algorithm for Distributed Multiway Spatial Join Scheduling using FM Linear-Integer Model
-
Cost Estimation of Multiway Spatial Joins using Intermediate Euler Histograms and Datasets of Lines and Polygons
-
Majestic: An Extension of a Programming Language for Educational Robotics
-
Gain-Loss: A Method of Data Distribution for Distributed Processing of Multiway Spatial Joins
-
Intermediate Euler Histogram for Selectivity Estimation of Multiway Spatial Joins
-
Technical Evaluation of Electronic Components and Microprocessors for Using in Teaching-Learning of Exact Sciences
-
Development of a Domain-Specific Programming Language for Educational Robotics
-
Accuracy of Selectivity Estimation for Distributed Spatial Join Tasks using Euler Histograms
-
A Method of Data Distribution for Distributed Processing of Multiway Spatial Joins
-
Selectivity Estimate Accuracy Validation in Grid Histograms for Decomposed Line Spatial Objects
-
A Method for Reducing Spatial Grid Histogram Resolution and Improve Query Estimation Accuracy
-
Evaluation of Histograms to Partitioning Spatial Data in Distributed Systems
-
Distributed Execution Plans for Multiway Spatial Join Queries using Multidimensional Histograms
-
A New Method for Hashing Complex Objects in Spatial Histograms
-
Escalability Evaluation of High-Performing Applications in Public and Private Clouds
-
Evaluation of R-Tree Split Methods for Spatial Datasets of Line Type
-
Evaluation of High-Performance Application Escalability on Public and Private Clouds
-
Escalability Evaluation of High-Performing Applications in Physical Cluster and Cloud
Multiway spatial joins are a commonly occurring and fundamental type of query for spatial data processing. This article presents models and algorithms to schedule this type of query in distributed database systems while attempting to strike a balance between makespan and communication costs. We propose three algorithms based on combinatorial optimization methods: the well-known linear relaxation technique of rounding a solution generated by linear programming (LP), a more sophisticated Lagrangian Relaxation method (LR), as well...
The processing of spatial queries has a notably high computing cost, especially considering multiway spatial joins where the query may be executed in different ways called execution plans. We usually use spatial histograms to select the best plan based on the number of objects returned by each query. One relevant type of histogram, due to its high precision, is the Annular Bucket Histogram or AB. However, the experiments made by the authors that proposed the...
Although we dedicate a large part of our lives to formal education, we have little or even no opportunity during all those years to learn how to study. We hope, by teaching and evaluating other concepts and competencies, that the study practice, on its own, can enable us to learn effectively. Is 'knowing how to learn' an innate competence of individuals? Is study practice a process capable of fully developing it? Did the undergraduate students...
A multiway spatial join is an important query in spatial databases, which has been widely used in many scientific applications. Because it is both data and computation-intensive, it may be processed in distributed systems, where each machine is responsible for processing a query fragment. The query fragment is a pair of data partitions aligned by a spatial predicate, which we will term as a task. For these tasks to be processed, they must be scheduled...
Spatial join queries are essential to spatial data processing and also very compute-resource intensive, particularly when considering multiway spatial joins, which have many distinct ways of computing called execution plans. A poorly chosen plan increases the processing time and usage of computational resources and, consequently, we demand very effective methods for estimating the cost of queries such as spatial histograms. Recently studies identified that the type of spatial object in datasets (whether of line or...
One of the motivating techniques in learning process that has contrast due to the ease way of conveying complex ideas is the use of educational robotics. But the main programming languages used in Educational Robotics are of general purpose programming languages making it difficult in the process of teaching-learning due to the complexity of their syntax mastery. An alternative to these languages are Domain Specific Languages, which are designed to assist in the process of...
Data distribution is a challenge in the distributed execution of multiway spatial join queries. An efficient execution requires both a balanced data distribution as well as a distribution with spatial data colocalization. In this paper, we compare two methods of spatial data distribution and propose a new one called Gain-Loss, based in the R0-tree algorithms. Our evaluation shows that Gain-Loss has a reduced area overlay between servers in all tested scenarios and also a competitive...
This article presents a new method for building Intermediate Euler Histograms to estimate the selectivity of multiway spatial join queries. The new method is based on the original Euler Histogram and considers that the spatial extent of the spatial datasets is not the same (not aligned), a real scenario for spatial databases. Preliminary results have shown that the proposed method improved the cardinality estimation when compared to Grid Histogram, the most frequently mentioned histogram in...
Educational Robotics or Robotics in Education are terms commonly used to describe the use of robotics as an instrument to support teaching-learning, helping teachers to introduce concepts considered complex since the beginning of student training, such as electronics, computer programming, applied mechanics, and basic robotic building. In addition to offering an attractive learning space that stimulates students’ interest and curiosity, Educational Robotics is a unique tool that offers practical and fun activities. It stands on...
Considering the advancement of technology and, as a consequence, the facility in accessing it, robotics has been gaining considerable attention in new teaching methodologies nowadays. When applied to the teaching methodology process, robotics is defined by the term Educational Robotics, which consists of the application of robotic kits in the educational process. Those robotic kits can be divided into proprietary and open-source. Proprietary kits have their specific programming language, are fully developed around education, have...
Spatial data processing has grown significantly in recent years, and computing devices equipped with GPS (Global Positioning System) and communication networks (2G, 3G, and others) such as mobile phones, smartphones, and sensors are increasingly common and affordable. There is a great availability of spatial data: geolocalized images, open data from federal, state, and municipal governments, mapping of commercial stores, georeferenced data collection by governmental entities, among others. All these data enable us to produce new...
Data partitioning is a challenge in the distributed execution of multiway spatial join queries. An efficient execution requires a balanced data distribution in the cluster computers as well as a distribution that maintains spatial data colocalization. In this monograph, two spatial data distribution methods were compared: Round-Robin and Proximity Area, and a new one was proposed, called Gain-Loss, based on the R0-tree algorithms. Our experiments in a controlled environment, using synthetic datasets, show that the...
Selectivity estimation is an important metric for choosing efficient spatial database execution plans. Working with such a metric requires that spatial objects be represented by approximations. One of the most commonly used techniques is MBRs. However, for some spatial objects, such lines, the MBR generates a high error rate in the selectivity estimation. Although some works proposed the decomposition of the object as a relevant method for dealing with errors in query estimates, fewer studies...
The estimation of spatial query selectivity using grid spatial histograms is one of the methods proposed in the literature to perform selectivity estimation. In this way, defining the spatial histogram grid to reduce the use of computational resources and the execution time of the queries is an important challenge. In this work, we examine the definition of the number of cells for complex object types such as line and polygon. The avglimit method we propose...
Spatial data processing has grown in size since the creation of information retrieval equipment, such as GPS (Global Position System), smartphones, drones, and satellites. With this spatial data, new information can be acquired. An example of spatial data processing is a spatial query, which finds in two or more datasets correlated information. Processing a spatial query can be quite complex because of the amount of data involved and the computational systems that perform it have...
Multiway spatial join is a common and heavyweight type of query for spatial data processing on relational database management systems. This article presents a complete solution to process this type of query in distributed systems. We proposed a cost-based optimizer for multiway spatial join queries, based on a novel use of multidimensional histograms, which are used to represent two metrics that describe a dataset: cardinality and size of the spatial objects, together with one feature...
The selectivity estimate is an important metric when selecting efficient execution plans on spatial databases. However, little effort was dedicated to enhance the methods and data structures which support the calculations of these estimations. In this paper we proposed an enhancement in the method used to make a multidimensional grid histogram. The proposed method reduced the error in the estimation up to 30.16%, when estimating the cardinality of spatial window queries, compared to the grid...
The analysis of high-performance computing application (HPC), has recently been facilitated by the use of cloud platforms. However, the performance of HPC applications relies heavily on the support I/O platform, mainly the communication network between the VMs. In this study, the performance of the NPB-NAS Parallel Benchmark suite and DGEO suite, which processes spatial queries, were compared in a physical cluster, a private cloud (XenServer), and a public cloud (Microsoft Azure). The experiment demonstrates a...
For storage and retrieval of spatial data it is necessary to use a special structure for mul- tidimensional or complex data. In the literature it has been proposed various indexing structures, each with their specific characteristics and behaviors. The R-Tree is a hierar- chical tree, similar to B-Tree tree, which groups co-lococated objects, using surrounding rectangles, called MBR (minimum bounding rectangle) also known as rectangle surroun- ding. The implementation of these structures internally has a...
Recently, subsidies granted by IaaS providers (Infrastructure as a Service) for research projects in universities have facilitated the performance analysis of high-performance computing (HPC) applications. However, the performance of HPC applications relies heavily on cloud platform I/O support, especially the virtual communication network between machines. In this paper, we evaluate the scalability of the NPB-NAS benchmark suite and applications for processing spatial data in a public cloud, Azure, and a private cloud using XenServer. As...
The performance analysis of High Performance Computing (HPC) applications has recently been facilitaded by the use of cloud platforms. The grants offered by infrastructure providers to research projects in universities has also been contributing on this migration. However, the performance of HPC applications depends heavily on the Input/Output support of the platform, mostly the intercommunication network between virtual machines. In this study, wecompared the performance of selected applications of the NPB NAS Parallel Benchmark and...