Verification based annotation for visual recognition.
Type of content
Applying deep learning and Convolutional Neural Network (CNN)s to new domains usually implies a data collection and annotation problem. While several large datasets exist and provide a great deal of utility, there is a need to apply deep learning to new domains more easily, and to more easily experiment without the burden of spending a large amount of time upfront annotating data.
The major work in this thesis has been in evaluating and characterising proposed methods centred around the collaboration of human and a machine learning I term Verification Based Annotation (VBA), intended for human- efficient annotation as well as rapid prototyping. A proposed difference to similar works, is the use of online training, as opposed to either (a) strong models trained on large data sets or (b) staged systems with alternating periods of annotation and training.
Contrary to popular belief that CNNs require much data and much training time, I demonstrate the opposite, using few images and also very little training time so that a CNN can be trained to a level that provides genuine assistance to a human annotator. I propose methods for high-resolution object detection, which can improve accuracy and improve the speed of learning and study how noise and systematic bias degrade performance.
I demonstrate the effectiveness of VBA methods by annotating a variety of real-world image sets. I find it is especially effective in image sets with uniformity of object instances, reducing required annotation outright by 75– 93% on many datasets, and a further 10% in several cases, using novel methods for utilising weakly confident detections.
One successful demonstration of VBA is verified counting on images of Ad´elie penguins and Weddell seals, where it has the promise of revolutionising the field. Counting takes a fraction of the effort and improves consistency compared to widely used methods such as crowdsourcing. Verification based methods offer immediate visual feedback and improved engagement, where the chore of annotating many images becomes the exciting task of teaching a machine to recognise the objects.