View based mapping for wheeled robots
During my PhD I performed research on robust and efficient vision based mapping methods for mobile robots. The aim was to enable a mobile robot driving around in a home environment to build a map of that environment. This map consists of images, taken using camera mounted on the robot, and a set of connections linking these images. The emphasis of the research was to build a qualitative/semantic map as opposed to quantitative/geometric map.
Ben Kröse supervised my PhD together with Zoran Zivkovic. The research was partly carried out within the EU Integrated Project COGNIRON (The Cognitive Robot Companion).
Download thesis: pdf (5.4 mb).
Download front cover: pdf (2.4 mb).
Download the slides I used during my "leken-praatje": pdf (7.0 mb).
Long abstract:
A common task for a mobile robot, is to build a map of its environment. It needs such a map to localize itself in the environment, and to navigate to a specific place. In this thesis, we propose methods to build such a map on the basis of images. These images are taken with a camera that is mounted onto the robot. The methods are specifically developed for robots operating in real home environments, such as autonomous vacuum cleaners. Therefore, all methods are evaluated on image sets which were acquired in real home environments.
Building a map is particularly difficult in human inhabited home environments. We can identify two main challenges. First, the captured images are often of poor quality. As opposed to brightly lit office spaces and outdoor environments, home environments usually have bad lighting condition. On top of this, humans can block the view of the robot and change the appearance of the environment, for example by switching off lights. Such circumstances call for a robust mapping approach. Second, the robot should be able to communicate with humans about the map and add labels, such as `kitchen'' or `chair'', to the map. To allow for such communication, during map building, the method should be efficient enough to perform in real-time.
Different types of maps have been proposed to solve this robot task. In this thesis, we focus on a specific type of map termed the `view-based topological map''. In Chapter 2, we see that this is a commonly used map in the field of vision-based robotics. It can be seen as a graph for which each node denotes an image and each link between two images denotes that they depict an overlapping part of the environment. The basic task of map-building and localization is finding for a new image all other images that could link with it. This later task is called `view-based data association'' and depends on the ability to compare images with each other.
Two images can be compared by automatically extracting a set of salient image points from both of them and matching these sets to obtain a number of image point correspondences. If two images depict an overlapping part of the environment, then the spatial layout of the correspondences depends greatly on the relative robot pose. This relative pose can be parameterized by a rotation and a translation up to scale. To make image comparison robust and efficient, it is beneficial to incorporate constraints on the possible robot poses. We thoroughly investigate how the assumption that the robot moves over a planar surface can be used to improve existing algorithms. We derive an algorithm that computes the planar relative pose given only two correspondences and combine it with the well-known RANSAC algorithm (RANdom SAmple Consensus). On top of that, we propose an efficient method based on the Hough Transform that determines a probability density over the space of all planar relative poses. On the one hand, this can be used to estimate the relative pose given two images. In Chapter 3, we show that our relative pose solution is more accurate and efficient than the popular RANSAC-based method. On the other hand, it can be used as an image similarity measure, by estimating the probability that robot pose is indeed correct. In Chapter 4, we show that this image similarity measure is better than existing measures used for both topological view-based mapping and the related task of image-based place recognition.
Even if an efficient image comparison method is used, for very large view-based maps with thousands of images, it becomes infeasible to compare with all images. In Chapter 5, we propose a method that finds a set of key images that best represents the complete set of images of the map. This set of key images defines a subgraph of the complete graph representing the map. It is based on a graph theoretic technique called the `Connected Dominating Set'' (CDS). We explain why this algorithm is the optimal choice and how it can be used for view-based mapping and localization procedures. The results of these procedures are close to the ones obtained when comparing with all images, while being an order of magnitude faster.
Combining the image comparison technique with the CDS method, results in scalable real-time data association. We show, in two appendices, that this can be used to perform real-time interactive map-building and goal-directed navigation. In addition, the proposed data association method could be used to improve existing SLAM systems (Simultaneous Localization And Mapping). This would allow for efficient and robust 3D geometric map-building of real home environments.
Errata:
On page 47, I made some errors mixing up singular value decomposition with
eigendecomposition. The statement:
"The essential matrix has rank 2 and its first two eigenvalues are
equal",
is incorrect. It should be
"The essential matrix has rank 2 and its first two singular values are
equal".
(Credits go to Leo Dorst, for finding this (stupid) mistake.)
In addition, on page 47, I wrote:
"applying a singular value decomposition (SVD) to $A^T A$ and taking the
eigenvector corresponding to the smallest singular value."
Eigenvector should be replaced by left singular vector. However, it is
indeed also possible to apply an eigendecomposition to $A^T A$ and take the
eigenvector corresponding to the smallest eigenvalue. One could also
apply a singular value decomposition (SVD) to $A$ and take the
right singular vector corresponding to the smallest singular value. However,
for large n, A is bigger than $A^T A$, making svd($A^T A$) more efficient
than svd(A).