A Framework for Geographical based Approximate String Search

International Journal of Computer Trends and Technology (IJCTT)
© 2014 by IJCTT Journal
Volume-16 Number-4
Year of Publication : 2014
Authors : G. Vidya Praveena , K. Krishna reddy , Janapati Venkata Krishna


G. Vidya Praveena , K. Krishna reddy , Janapati Venkata Krishna. "A Framework for Geographical based Approximate String Search". International Journal of Computer Trends and Technology (IJCTT) V16(4):161-165, Oct 2014. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
This work deals with the approximate string search in large spatial databases. We investigate range queries augmented with a string similarity search predicate in both Euclidean space and road networks. We make this query the spatial approximate Existing (Apr 19, 2013) string (SAS) query. In Euclidean space, we propose an approximate solution MHR-tree, which will embed min-wise signatures into R-tree. These min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the sub-tree of u. We here analyze the pruning functionality of such signatures based on the set resemblance between he query string and the q-grams from the sub-trees with index nodes. Here we also discuss how to estimate the selectivity of a SAS query in Euclidean space, for which here we present a novel adaptive algorithm to find balanced partitions using both the spatial and string information stored in the tree as discussed. For queries on road networks, we propose a novel method, the RSASSOL, which significantly outperforms the baseline algorithm in practice. RSASSOL combines the q-gram based inverted lists and the pruning based on reference nodes. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of the approach.

[1] S. Acharya, V. Poosala, and S. Ramaswamy. Selectivity estimation in spatial databases. In SIGMOD, pages 13–24, 1999.
[2] S. Alsubaiee, A. Behm, and C. Li. Supporting location-based approximate-keyword queries. In GIS, pages 61–70, 2010.
[3] A. Arasu, S. Chaudhuri, K. Ganjam, and R. Kaushik. Incorporating string transformations in record matching. In SIGMOD, pages 1231– 1234, 2008.
[4] A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, pages 918–929, 2006.
[5] N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger. The R_-tree: an efficient and robust access method for points and rectangles. In SIGMOD, pages 322–331, 1990.
[6] A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Minwiseindependent permutations (extended abstract). In STOC, pages 327–336, 1998.
[7] X. Cao, G. Cong, and C. S. Jensen. Retrieving top-k prestige-basedrelevant spatial web objects. Proc. VLDB Endow., 3:373–384, 2010.
[8] K. Chakrabarti, S. Chaudhuri, V. Ganti, and D. Xin. An efficient filterfor approximate membership checking. In SIGMOD, pages 805–818, 2008.