Compressed-domain pattern matching with the Burrows-Wheeler transform

Type of content
Reports
Publisher's DOI/URI
Thesis discipline
Degree name
Publisher
University of Canterbury. Computer Science and Software Engineering
Journal Title
Journal ISSN
Volume Title
Language
Date
2001
Authors
Powell, Matt
Abstract

This report investigates two approaches for online pattern-matching in files compressed with the Burrows-Wheeler transform (Burrows & Wheeler 1994). The first is based on the Boyer-Moore pattern matching algorithm (Boyer & Moore 1977), and the second is based on binary search. The new methods use the special structure of the Burrows-Wheeler transform to achieve efficient, robust pattern matching algorithms that can be used on files that have been only partly decompressed. Experimental results show that both new methods perform considerably faster than a decompress-and-search approach for most applications, with binary search being faster than Boyer-Moore at the expense of increased memory usage. The binary search in particular is strongly related to efficient indexing strategies such as binary trees, and suggests a number of new applications of the Burrows-Wheeler transform in data storage and retrieval.

Description
Citation
Keywords
Ngā upoko tukutuku/Māori subject headings
ANZSRC fields of research
Field of Research::08 - Information and Computing Sciences::0801 - Artificial Intelligence and Image Processing::080109 - Pattern Recognition and Data Mining
Field of Research::08 - Information and Computing Sciences::0806 - Information Systems::080604 - Database Management
Rights
Copyright Matt Powell