Robot Reading
Master-Thesis at group TAMS
Motivation
While handwritten digit recognition is one of the standard benchmarks
of computer vision algorithms
and deep networks have improved object detection and classification
in many real-world scenarios,
there is currently no software framework that allows a robot
to detect and read text.
In this thesis, we will try to develop and prototype an architecture
that allows a robot to detect and read (large-enough) text on flat objects,
e.g. the titles of books or the headlines of a newspaper
or the name of a medicine box.
State-of-the-art
The first part of the proposed thesis is a literature search
for existing approaches (if any).
Many standard benchmarks like MNIST work on carefully centered,
aligned and scaled input images.
Common OCR programs also expect at least correctly aligned
high-res input images.
These conditions are not fulfilled for a robot that looks down
on a table full of cluttered objects; so object segmentation
and inversion of the perspective transformation might be a
first step in the final computer vision algorithm.
On the other hand, it would be possible to manually annotate
and label a number of input images, and then train one of the
popular object-detection networks to look for (large) texts
like headlines or titles on objects.
Once text candidates have been found and the perspective
transformation is (roughly) known, it might be possible
to just run existing OCR methods on the corresponding
cropped region of the input image to read and output the text.
Thesis Goals:
- development of a computer vision module that detects
and reads text on objects,
- if possible, with integration into a ROS software package,
- and a couple of real-world tests on a service robot.
- Architecture concept and implementation are fully open,
- but a combination of deep-learning and existing OCR-algorithms seems likely.
- Collect ideas and experiment data leading to a publication
Requirements
- as always, interest in the topic area
- knowledge of computer vision (e.g. CV lectures)
- knowledge of deep-learning object detection/classification
- basic knowledge of the ROS framework
Contact
- Norman Hendrich, Raum F-314, Tel.: 42883-2399
- this thesis could be done in a cooperation with the Computer Vision group