Generic motion based object segmentation for assisted navigation

CASBliP – Computer Aided System for the Blind

casIn the CASBliP project, a robust approach to annotating independently moving objects captured by head mounted stereo cameras that are worn by an ambulatory (and visually impaired) user is proposed. Initially, sparse optical flow is extracted from a single image stream, in tandem with dense depth maps. Then, using the assumption that apparent movement generated by camera egomotion is dominant, flow corresponding to independently moving objects (IMOs) is robustly segmented using MLESAC. Next, the mode depth of the feature points defining this flow (the foreground) are obtained by aligning them with the depth maps. Finally, a bounding box is scaled proportionally to this mode depth and robustly fit to the foreground points such that the number of inliers is maximised. The system runs at around 8 fps and has been tested by visually impaired volunteers.

For more information, see CASBliP – Computer Aided System for the Blind.

Visual SLAM

vslam_matchingSimultaneous localisation and mapping (SLAM) is the problem of determining the position of an entity (localisation), such as a robot, whilst at the same time determining the structure of the surrounding environment (mapping). This has been a major topic of research for many years in Robotics, where it is a central challenge in facilitating navigation in previously unseen environments. Recently, there has been a great deal of interest in doing SLAM with a single camera, enabling the 6-D pose of a moving camera to be tracked whilst simultaneously determining structure in terms of a depth map. This has been dubbed ‘monocular SLAM’ and several systems now exists which are capable of running in real-time, giving the potential for a highly portable and cheap location sensor.

We have the following projects running on real-time visual SLAM:

  • Robust feature matching for visual SLAM: Matching image features reliably from frame to frame is a central component in visual SLAM. This project is looking at designing new techniques to achieve more robust operation by utilising image descriptors and making use of the estimated camera pose to achieve matching which has greater robustness to changes in camera viewpoint.
  • Extracting higher-order structure in visual SLAM. Previous visual SLAM algorithms are based on mapping the depth of sparse points in the scene. This project is looking at expanding the SLAM framework to allow the mapping of higher-order structure, such as planes and 3-D edges, hence producing more useful representations of the surrounding environment.

Our SLAM system is also the central component in the ViewNet project.

You can view an introduction to visual SLAM – slides from the BMVC Tutorial on visual SLAM given by Andrew Calway, Andrew Davidson and Walterio Mayol-Cuevas.