Compressed-domain pattern matching with the Burrows-Wheeler transform
Type of content
UC permalink
Publisher's DOI/URI
Thesis discipline
Degree name
Publisher
Journal Title
Journal ISSN
Volume Title
Language
Date
Authors
Abstract
This report investigates two approaches for online pattern-matching in files compressed with the Burrows-Wheeler transform (Burrows & Wheeler 1994). The first is based on the Boyer-Moore pattern matching algorithm (Boyer & Moore 1977), and the second is based on binary search. The new methods use the special structure of the Burrows-Wheeler transform to achieve efficient, robust pattern matching algorithms that can be used on files that have been only partly decompressed. Experimental results show that both new methods perform considerably faster than a decompress-and-search approach for most applications, with binary search being faster than Boyer-Moore at the expense of increased memory usage. The binary search in particular is strongly related to efficient indexing strategies such as binary trees, and suggests a number of new applications of the Burrows-Wheeler transform in data storage and retrieval.
Description
Citation
Keywords
Ngā upoko tukutuku/Māori subject headings
ANZSRC fields of research
Field of Research::08 - Information and Computing Sciences::0806 - Information Systems::080604 - Database Management