Indexing Genome with the External Construction of Compressed Suffix Tree Using LCP Array
DOI:
https://doi.org/10.51983/ajeat-2013.2.1.652Keywords:
Genome Indexing, Compressed Suffix Tree, Data Structure, DNA IndexingAbstract
We are proposing the genome indexing algorithm, which depends upon compressed form of suffix trees, in which every node has four parts; suffix array number, suffix start number, LCP count, and a pointer to another node. The proposed algorithm does not use the whole suffix array, it just takes some necessary information like LCP of two suffix array, compare them and suffix start number, to align the suffix to proper position and suffix array number to distinguish among all the partitions. The use of compressed suffix array minimizes the number of trees, eventually; it also minimizes the random access to input data, as it creates the compressed suffix tree for two suffix arrays using pairwise sorting, sequentially.
References
Benjarath Phoophakdee and Mohammed J. Zaki, “Genome-scale disk-based suffix tree indexing”. SIGMOD ‘07: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM. pp. 833–844, 2007.
Marina Barsky, Ulrike Stege, Alex Thomo, and Chris Upton . “A new method for indexing genomes using on-disk suffix trees”. CIKM ‘08: Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM. pp. 649–658, 2008.
http://en.wikipedia.org/wiki/Suffix_tree
Donald R. Morrison, “PATRICIA – Practical Algorithm To Retrieve Information Coded in Alphanumeric”, Journal of the ACM, Vol. 15, NO 4, pp. 514-534, 1968.
Niko Välimäki, Wolfgang Gerlach, Kashyap Dixit and Veli Mäkinen, “Compressed Suffix Tree - A Basis for Genome-scale Sequence Analysis”. Bioinformatics, 23(5), Application note, pp 629-630, 2007
Simon Gog, Enno Ohlebusch, “Fast and Lightweight LCP-Array Construction Algorithms”, ALENEX, 2011.
N.J. Larsson and K. Sadakane, “Faster suffix sorting”, Tech. Rep. LUCS-TR: 99-214 of the Dept. of Comp. Sc., Lund University, Sweden, 1999.
http://en.wikipedia.org/wiki/Lexicographical_order
Md. Jahangir Alam, Muhammad Monsur Uddin, Mohammad Shabbir Hasan, Abdullah Al Mahmood, “ Pair Wise Sorting: A New Way of Sorting”, International Journal of Computer Science and Information Security, Volume 8:9, pp. 116-120,2010.
Simonas Salteni, “External memory sorting”, Deptt. Of Computer Science, Aalborg University, Denmark, LNCS, pp 1-7, 2001. 11
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2013 The Research Publication
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.