Robust Visual SLAM for Fast Moving Platforms

Dr. Jose Martinez-Carranza

In the last years considerable progress has been achieved for what is known as visual Simultaneous localisation and Mapping (SLAM).

Visual SLAM is a technology that provides fast accurate 6D pose estimation of a moving camera and a 3D representation of the scene observed with the camera. Applications for this technology include: navigation in GPS-denied environments, virtual augmentation of objects in video footage, video-game interaction, etc.

Despite the achievements, there are still challenges to be faced. A practical one, but yet quite important, is that of using visual SLAM systems on platforms of low budget where computer power is reduced and memory is limited.

From the above, my main research focuses on the design of strategies that allow visual SLAM systems to keep working on slow budget platform without sacrificing the real-time response. This also includes maintaining robustness against loss of tracking, vibration, image blurred and strong change of light conditions.

Applications of my research are oriented to fast moving robotic platforms such as walking robots, mobile vehicles and Unmanned Aerial Vehicles (UAVs).

Full details about my ongoing research can be found here.

Error resilience and transport

The Group’s relationship with the Communication Systems and Networks Group has produced a body of successful research into the reliable transport of video. The group has proposed a number of error resilient methods which reduce the propagation of errors and conceal, rather than correct them.

Early work (EU FP4 Project “WINHOME”) found that error resilience methods, based on EREC could be combined with adaptive packetisation strategies and data partitioning to provide a robust MPEG-2 transport for in-home TV distribution. WINHOME provided the first European demonstration of robust video transport over WLANs, highlighting the weakness of media-unaware systems and the potential of selective reuse of corrupted packets.

In the EPSRC funded project SCALVID (GR/L43596/01) robust and scalable video coding schemes for heterogeneous communication systems were investigated. In particular the Group investigated a new coding approach based on matching pursuits. SCALVID significantly reduced the complexity of matching pursuits through a hierarchical (primitive operator) correlator structure (patented by NDS) and through the optimisation of basis function dictionaries. This work was widely cited internationally. SCALVID was first to show that matching pursuits can form the basis of an inherently robust coding system.

With BT and JISC funding, JAVIC (Joint Audio Visual Internet Coding) investigated packet-loss-robust internet video coding for multicast applications. Using H.263, reliable streaming with up to 10% packet loss was demonstrated, by combining cross packet FEC, prioritisation and judicious reference frame selection. Following this, the 3CResearch-ROAM4G project, delivered a novel 3-loop Multiple Description Coding scheme which, with minimal redundancy, provides highly robust video transmission over congestion-prone networks (Figure 2). Extended recently to exploit path-diversity in MIMO video systems this has, for the first time, shown that MDC with spatial-multiplexing can deliver up to 8dB PSNR improvement over corresponding SDC systems. ROAM4G also produced a 3D wavelet embedded coder which competes well with MPEG4-SVC and which additionally provides excellent congestion management. Trials are underway with Thales Research.

In the EU FP5 WCAM project (Wireless Cameras and Seamless Audiovisual Networking), the Group in collaboration with Thales Communications (France) and ProVision Communications (UK) produced a wireless H.264 platform incorporating a new spatio-temporal error concealment algorithm, which provides substantial gains (up-to 9dB PSNR improvement over H.264-JM) with up to 20% packet loss (Figure 3). This was singled out by the reviewers and has been patented and successfully licensed. WCAM has also provided an understanding of link adaptation (switching between a range of modulation and FEC schemes according to channel conditions). Realising that throughput based switching metrics are inherently flawed for video, new quality-derived metrics were developed which substantially outperform existing methods.

Work on High Definition coding and transport has progressed further under the Group’s participation in the EU FP5 MEDIANET project where pre- and post-processing algorithms have been developed. Finally, the 3CResearch-project VISUALISE integrated much of the above work into a live-viewing infrastructure where video compression and streaming technology are efficiently deployed over wireless broadband networks in difficult environments. This collaboration between BT Broadcast, Inmarsat, Node, ProVision, U4EA and ISC has developed a way for spectators at large-scale live events to have near real-time access to events as they unfold via portable terminals for an enhanced experience.

Efficient image and video algorithms & architectures

The group is involved in a broad range of activities related to image and video coding at various bit rates, ranging from sub 20kb/s to broadcast rates including High Definition.

We are currently conducting research in the following topics:

  • Parametric Coding – a paradigm for next generation video coding
  • Modelling and coding for 3G-HDTV and beyond – preserving production values and increasing immersivity (through resolution and dynamic range)
  • Scalable Video Coding – a paradigm for codec based congestion management
  • Distributed video coding – shifting the complexity from the encoder(s) to the decoder(s)
  • Complexity reductions for HDTV and post processing.
  • Biologically and neurally inspired media capture/coding algorithms and architectures
  • Architectures and sampling approaches for persistent surveillance – analysis using spatio-temporal volumes
  • Eye tracking and saliency as a basis for context specific systems
  • Quality assessment methods and metrics

Early work in the Group developed the concept of Primitive Operator Signal Processing, which enabled the realisation of high performance, multiplier-free filter banks. This led to collaboration with Sony, enabling the first ASIC implementation of a sub-band video compression system for professional use. In EPSRC project GR/K25892 (architectural optimisation of video systems), world leading complexity results were achieved for wavelet and non-linear filterbank implementations.

International interest has been stimulated by our work on Matching Pursuits (MP) video coding which preserves the superior quality of MP for displaced-frame difference coding while offering dramatic complexity savings and efficient dictionaries. The Group has also demonstrated that long-term prediction is viable for real-time video coding; its simplex minimisation method offers up to 2dB improvement over single-frame methods with comparable complexity.

Following the Group’s success in reduced-complexity multiple-reference-frame motion estimation, interpolation-free sub-pixel motion estimation techniques were produced in ROAM4G (UIC 3CResearch) offering improvements up to 60% over competing methods. Also in ROAM4G, a novel mode-refinement algorithm was invented for video transcoding which reduces the complexity over full-search by up to 90%. Both works have generated patents which have been licensed to ProVision and STMicroelectronics respectively. Significant work on H.264 optimisation has been conducted in both ROAM4G and the EU FP6 WCAM project.

In 2002, region-of-interest coding was successfully extended to sign language (DTI). Using eyetracking to reveal viewing patterns, foveation models provided bit-rate reductions of 25 to 40% with no loss in perceived quality. This has led to a research programme with the BBC on sign language video coding for broadcasting.

In collaboration with the Metropolitan Police, VIGELANT (EPSRC 2003) produced a novel joint optimisation for rapidly deploying wireless-video camera systems incorporating both multi-view and radio-propagation constraints. With Heriot-Watt and BT, the Group has developed novel multi-view video algorithms which, for the first time, optimise the trade-off between compression and view synthesis (EPSRC).

Methods of synthesising high throughput video signal processing systems which provide joint optimisation of algorithm performance and implementation complexity have been developed using genetic algorithms. Using a constrained architectural style, results have been obtained for 2D filters, wavelet filterbanks and transforms such as DCT. In 2005 innovative work conducted in the Group has led to the development of the X-MatchPROvw lossless data compressor (BTG patent assignment) which at the time this was the fastest in its class.