An exploration of dynamic Markov compression
Thesis DisciplineComputer Science
Degree GrantorUniversity of Canterbury
Degree NameMaster of Science
Data Compression is the process of removing redundancy from data. Dynamic Markov Compression (DMC), developed by Cormack and Horspool, is a method for performing statistical data compression of a binary source. DMC generates a finite context state model by adaptively generating a Finite State Machine (FSM) that captures symbol frequencies within the source message. Traditionally DMC has operated on a binary alphabet creating a FSM of binary transitions between states. The compression of DMC is comparable with PPM, which is one of the best compression methods in use today. However because DMC is a bit-wise modeller, it is considerably slower than PPM and requires large amounts of memory to achieve this compression. This thesis extends the bit-wise modeller DMC to a finite character alphabet, which promises to improve compression speed. Considerable attention is given to efficient data structures capable of representing individual states of the FSM. While the performance of Character DMC fails to match bit-wise DMC or PPM, a significant improvement in compression speed can be achieved. Evolving Character DMC to capture variable length phrases was the next approach taken. Several inherent problems involving locating words within each state have surfaced. Again, the comparable compression with PPM was disappointing, and the compression speed was less than half that of Character DMC. A final implementation attempted to avoid the large resource requirements of DMC It achieves this by restricting both the length of the source message, and the amount of memory available for the FSM. Reasonable compression is achieved, at very high speeds. Such an implementation is ideal for network packets, or hard disk block compression.