How far away is it? : depth estimation by a moving camera.
Thesis DisciplineElectrical Engineering
Degree GrantorUniversity of Canterbury
Degree NameDoctor of Philosophy
This thesis considers the challenge of autonomous robot navigation. Effective self-guiding robots are a tool applicable to many important and critical tasks, such as fire-fighting, transporting dangerous materials, even bomb disposal. In many cases the robots are even more useful if their method of guidance is passive and utilises common technology such as CCD cameras. Using biological models to inspire the design of such robots is an exhilarating approach to the problem and provides sensible and novel solutions. The method of determining distance to objects using the optical flow from sequences of camera images is well-known and many techniques for estimating optical flow have been proposed. This thesis explores those differential optical flow techniques which solve the aperture problem by using a window of pixels and a model of the structure of the optical flow within that window. It shows that a number of these methods can be incorporated into a general framework utilising a sum of basis functions over the window. A more or less complicated structure for the optical flow can be achieved by selecting a greater or fewer number of these basis functions. Certain choices of basis function correspond to published models, such as those of Lucas and Kanade (1981), Campani and Verri (1992), Schalkoff and McVey (1982), Nagle and Srinivasan (1996), and Waxman and Wohn (1985). A number of these models were compared over different image sequences, both real and synthetic, and the errors in each case quantified. This comparison shows that the best choice of model is dictated both by the size of the pixel window and also by the surface being viewed. A set of basis functions will cause a bias in the optical flow estimates if the surface structure is more complex than the model can fit. This causes errors in the location of the focus of expansion. A new method, only recently proposed for robot navigation, is known as volumetric stereo or voxel colouring. Most of the work performed in this area uses the method for computer graphics purposes, to produce photo-realistic scenes or images. It can also be used to produce accurate and detailed depth maps of a scene. Rather than using multiple pixels from a single camera, as optical flow does, it relies upon multiple camera observations of a single point. The camera observations of points in space are compared and those where the cameras agree are deemed to be surface points. The concepts behind this approach are explained, including a number of ways this method can reconstruct partially occluded objects. The emphasis then shifts to specific implementations for robot navigation. These include assumptions about camera motion and methods to speed up the calculation procedure. Results for real and synthetic sequences are shown and comparison is performed with optical flow, showing the volumetric technique is greatly superior in a number of important respects, not the least of which is accuracy. Finally, some important extensions to the algorithm are discussed. These extensions make it robust to three problems often ignored in computer vision: inaccurate calibration, variable lighting, and specular surfaces. The first of these is overcome by showing that the algorithm is capable of self-calibration, allowing it to substantially improve depth estimates in the case of inaccurate camera positions or rotations. By using a lighting-invariant colour model, the algorithm can successfully reconstruct depth, even when the sequence lighting is altered. Finally, the algorithm successfully reconstructs specularities in images at the same time as reconstructing the Lambertian regions. This is done by observing the pattern of intensity variation in the camera observations. Results for these situations are shown for real images sequences and the improvements are demonstrated quantitatively.