• Admin
    UC Research Repository
    View Item 
       
    • UC Home
    • Library
    • UC Research Repository
    • College of Engineering
    • Engineering: Theses and Dissertations
    • View Item
       
    • UC Home
    • Library
    • UC Research Repository
    • College of Engineering
    • Engineering: Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of the RepositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    Statistics

    View Usage Statistics

    A Comparison of BWT Approaches to Compressed-Domain Pattern Matching

    Thumbnail
    View/Open
    hons_0205.pdf (630.2Kb)
    Author
    Firth, Andrew
    Date
    2002
    Permanent Link
    http://hdl.handle.net/10092/14789
    Degree Grantor
    University of Canterbury
    Degree Level
    Doctoral
    Degree Name
    Other

    A number of algorithms have recently been developed to search files compressed with the Burrows-Wheeler Transform (BWT) without the need for full decompression first. This allows the storage requirement of data to be reduced through the exceptionally good compression offered by BWT, while still allowing fast access to the information forsearching. We provide a detailed description of five of these algorithms: CompressedDomain Boyer-Moore (Bell et al. 2002), Binary Search (Bell et al. 2002), Suffix Arrays (Sadakane & Imai 1999), q-grams (Adjeroh et al. 2002) and the FM-index (Ferragina & Manzini 2001), and also present results from a set of extensive experiments that were performed to evaluate and compare the algorithms. Furthermore, we introduce a technique to improve the search times of Binary Search, Suffix Arrays and q-grams by around 20%, as well as reduce the memory requirement of the latter two by 40% and 31%, respectively. Our results indicate that, while the compressed files of the FM-index are larger than those of the other approaches, it is able to perform searches with considerably less memory. Additionally, when only counting the occurrences of a pattern, or when locating the positions of a small number of matches, it is the fastest algorithm. For larger searches, q-grams provides the fastest results.

    Collections
    • Engineering: Theses and Dissertations [2163]
    Rights
    https://canterbury.libguides.com/rights/theses

    UC Research Repository
    University Library
    University of Canterbury
    Private Bag 4800
    Christchurch 8140

    Phone
    364 2987 ext 8718

    Email
    ucresearchrepository@canterbury.ac.nz

    Follow us
    FacebookTwitterYoutube

    © University of Canterbury Library
    Send Feedback | Contact Us