A Comparison of BWT Approaches to Compressed-Domain Pattern Matching

dc.contributor.authorFirth, Andrew Philip
dc.date.accessioned2015-10-16T01:52:35Z
dc.date.available2015-10-16T01:52:35Z
dc.date.issued2002en
dc.description.abstractA number of algorithms have recently been developed to search files compressed with the Burrows-Wheeler Transform (BWT) without the need for full decompression first. This allows the storage requirement of data to be reduced through the exceptionally good compression offered by BWT, while still allowing fast access to the information for searching. We provide a detailed description of five of these algorithms: Compressed-Domain Boyer-Moore (Bell et al. 2002), Binary Search (Bell et al. 2002), Suffix Arrays (Sadakane & Imai 1999), q-grams (Adjeroh et al. 2002) and the FM-index (Ferragina & Manzini 2001), and also present results from a set of extensive experiments that were performed to evaluate and compare the algorithms. Furthermore, we introduce a technique to improve the search times of Binary Search, Suffix Arrays and q-grams by around 20%, as well as reduce the memory requirement of the latter two by 40% and 31%, respectively. Our results indicate that, while the compressed files of the FM-index are larger than those of the other approaches, it is able to perform searches with considerably less memory. Additionally, when only counting the occurrences of a pattern, or when locating the positions of a small number of matches, it is the fastest algorithm. For larger searches, q-grams provides the fastest results.en
dc.identifier.urihttp://hdl.handle.net/10092/11200
dc.language.isoen
dc.publisherUniversity of Canterbury. Mathematics and Statisticsen
dc.relation.isreferencedbyNZCUen
dc.rightsCopyright Andrew Philip Firthen
dc.rights.urihttps://canterbury.libguides.com/rights/thesesen
dc.subject.anzsrcField of Research::01 - Mathematical Sciencesen
dc.titleA Comparison of BWT Approaches to Compressed-Domain Pattern Matchingen
dc.typeDiscussion / Working Papers
uc.collegeFaculty of Engineeringen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
firth_report.pdf
Size:
1.71 MB
Format:
Adobe Portable Document Format