Sequential and Parallel Algorithms for the Generalized Maximum Subarray Problem
Thesis DisciplineComputer Science
Degree GrantorUniversity of Canterbury
Degree NameDoctor of Philosophy
The maximum subarray problem (MSP) involves selection of a segment of consecutive array elements that has the largest possible sum over all other segments in a given array. The efficient algorithms for the MSP and related problems are expected to contribute to various applications in genomic sequence analysis, data mining or in computer vision etc. The MSP is a conceptually simple problem, and several linear time optimal algorithms for 1D version of the problem are already known. For 2D version, the currently known upper bounds are cubic or near-cubic time. For the wider applications, it would be interesting if multiple maximum subarrays are computed instead of just one, which motivates the work in the first half of the thesis. The generalized problem of K-maximum subarray involves finding K segments of the largest sum in sorted order. Two subcategories of the problem can be defined, which are K-overlapping maximum subarray problem (K-OMSP), and K-disjoint maximum subarray problem (K-DMSP). Studies on the K-OMSP have not been undertaken previously, hence the thesis explores various techniques to speed up the computation, and several new algorithms. The first algorithm for the 1D problem is of O(Kn) time, and increasingly efficient algorithms of O(K² + n logK) time, O((n+K) logK) time and O(n+K logmin(K, n)) time are presented. Considerations on extending these results to higher dimensions are made, which contributes to establishing O(n³) time for 2D version of the problem where K is bounded by a certain range. Ruzzo and Tompa studied the problem of all maximal scoring subsequences, whose definition is almost identical to that of the K-DMSP with a few subtle differences. Despite slight differences, their linear time algorithm is readily capable of computing the 1D K-DMSP, but it is not easily extended to higher dimensions. This observation motivates a new algorithm based on the tournament data structure, which is of O(n+K logmin(K, n)) worst-case time. The extended version of the new algorithm is capable of processing a 2D problem in O(n³ + min(K, n) · n² logmin(K, n)) time, that is O(n³) for K ≤ n/log n For the 2D MSP, the cubic time sequential computation is still expensive for practical purposes considering potential applications in computer vision and data mining. The second half of the thesis investigates a speed-up option through parallel computation. Previous parallel algorithms for the 2D MSP have huge demand for hardware resources, or their target parallel computation models are in the realm of pure theoretics. A nice compromise between speed and cost can be realized through utilizing a mesh topology. Two mesh algorithms for the 2D MSP with O(n) running time that require a network of size O(n²) are designed and analyzed, and various techniques are considered to maximize the practicality to their full potential.