Fachbereich Informatik

Robot Reading

Master-Thesis at group TAMS


While handwritten digit recognition is one of the standard benchmarks of computer vision algorithms and deep networks have improved object detection and classification in many real-world scenarios, there is currently no software framework that allows a robot to detect and read text. In this thesis, we will try to develop and prototype an architecture that allows a robot to detect and read (large-enough) text on flat objects, e.g. the titles of books or the headlines of a newspaper or the name of a medicine box.


The first part of the proposed thesis is a literature search for existing approaches (if any). Many standard benchmarks like MNIST work on carefully centered, aligned and scaled input images. Common OCR programs also expect at least correctly aligned high-res input images.

These conditions are not fulfilled for a robot that looks down on a table full of cluttered objects; so object segmentation and inversion of the perspective transformation might be a first step in the final computer vision algorithm.

On the other hand, it would be possible to manually annotate and label a number of input images, and then train one of the popular object-detection networks to look for (large) texts like headlines or titles on objects.

Once text candidates have been found and the perspective transformation is (roughly) known, it might be possible to just run existing OCR methods on the corresponding cropped region of the input image to read and output the text.

Thesis Goals: