Bio-Inspired 3D Mapping

Geoffrey Daniels

Supervised by: David Bull, Walterio Mayol-Cuevas, J Burn

Using state of the art computer vision techniques and insights into the biological process used by animals to traverse any terrain a system has been created to enable a robotic platform to gather the information required to move safely throughout an unknown environment.  A goal of this project is to produce a system that can run in real-time upon an arbitrary locomotion platform and provide local route planning and hazard detection.  With the real-time aim in mind the core parts of the current algorithm have been developed using NVidia’s CUDA language for general purpose computing on GPUs as the code is embarrassingly parallel and GPUs can provide a huge speed increase for parallel processes. Currently without significant optimisation the system is able to compute the 3D surface ahead of the camera in approximately 100ms.

This system will be a module of a larger grant to develop a bio-inspired concept system for overall terrestrial perception and safe locomotion.

Interesting Results

Some example footage of the system generating a virtual 3D world from a single camera in real time:
https://www.youtube.com/watch?v=h36hVOerMFU&list=PLJmQZRbc9yWWrg6A0R_NFYHl6WP4FoNe9#t=15

Context Based Compression

Richard Vigars

Supervised by: Dave Bull, Andrew Calway

State-of-the-art video coding standards (such as H.264 and H.265) are extremely reliable, and facilitate straightforward rate-quality control. However, certain scenarios remain challenging to the block-based motion and residual coding paradigm. For example, where there is camera motion with respect to highly textured surfaces, particularly if perspective effects are strong, the block-based model produces a large and highly textured residual signal.

To address this problem, we have developed a video codec framework which exploits extrinsic scene knowledge to target higher-order motion models to specific geometrical structures in the scene. We create a textural-geometric model of the scene prior to coding. During coding, we use camera tracking algorithms to track regions of salient scene geometry. Foreground regions are detected and coded by a host codec, such as H.264.

This approach allows us to replace a large volume of the host codec’s bitstream with our own compact motion parameters and side information. Compared to H.264 operating at equivalent quantisation parameters our hybrid codec can achieve bitrate savings of up to 48%.

In breif:

  • Applying extrinsic knowledge of the scene in which a video is capture in order to exploit geometrical redundancy in the motion of objects with respect to the camera.
  • A perspective motion model is targeted to planar background objects in the scene. This allows large regions of frames in the video sequence to be interpolated from reference frames. These interpolated regions are perceptually acceptable, and as such no residual is required. This translates into large bitrate savings!
  • In order to target the perspective motion model to the appropriate regions, tracking techniques similar to those used in augmented reality applications are used.

In terms of theory, this project has turned out to be quite an interesting mix of perceptual video coding and computer vision.

(Back to top)

Key algorithms

In order to extract a context from the scene a few key operations are performed:

Plane modelling.    Plane tracking     Motion estimation

From left to right these are:

  1. A set of textural models of key scene planes is extracted using SURF features. These are augmented with polygons giving the approximate locations of other planes in the scene.
  2. These models are matched to the features in each frame of the video in order to approximately localise the planar structures.
  3. These approximate locations provide an initialisation for frame-to-frame motion estimation. The use of SURF feature point matching and RANSAC isolates the background motion from that of the foreground.

(Back to top)

Foreground segmentation using anisotropic diffusion and morphological operations [1]

The process is depicted in images below:

Pixel differences between predicted frame and actual frame.     Anisotropic diffusion     Thresholding     Cleanup     Map

From left to right:

  1. Absolute pixel differences between the predicted frame, and the actual frame.
  2. These difference are then filtered using anisotropic diffusion. This smooth out invisible errors in the texture, while leaving the large errors due to differences in foreground intact.
  3. A thresholding stage creates a mask.
  4. The mask is grown into a map.
  5. Finally, cleaned up using morphological operations.

References

  1. Krutz et al – Motion-based object segmentation using sprites and anisotropic diffusion, IWIAMIS 2007

(Back to top)

Published work

  • Context-Based Video Coding – R. Vigars, A. Calway, D. R. Bull; ICIP 2013 (accepted)

Efficient image and video algorithms & architectures

The group is involved in a broad range of activities related to image and video coding at various bit rates, ranging from sub 20kb/s to broadcast rates including High Definition.

We are currently conducting research in the following topics:

  • Parametric Coding – a paradigm for next generation video coding
  • Modelling and coding for 3G-HDTV and beyond – preserving production values and increasing immersivity (through resolution and dynamic range)
  • Scalable Video Coding – a paradigm for codec based congestion management
  • Distributed video coding – shifting the complexity from the encoder(s) to the decoder(s)
  • Complexity reductions for HDTV and post processing.
  • Biologically and neurally inspired media capture/coding algorithms and architectures
  • Architectures and sampling approaches for persistent surveillance – analysis using spatio-temporal volumes
  • Eye tracking and saliency as a basis for context specific systems
  • Quality assessment methods and metrics

Early work in the Group developed the concept of Primitive Operator Signal Processing, which enabled the realisation of high performance, multiplier-free filter banks. This led to collaboration with Sony, enabling the first ASIC implementation of a sub-band video compression system for professional use. In EPSRC project GR/K25892 (architectural optimisation of video systems), world leading complexity results were achieved for wavelet and non-linear filterbank implementations.

International interest has been stimulated by our work on Matching Pursuits (MP) video coding which preserves the superior quality of MP for displaced-frame difference coding while offering dramatic complexity savings and efficient dictionaries. The Group has also demonstrated that long-term prediction is viable for real-time video coding; its simplex minimisation method offers up to 2dB improvement over single-frame methods with comparable complexity.

Following the Group’s success in reduced-complexity multiple-reference-frame motion estimation, interpolation-free sub-pixel motion estimation techniques were produced in ROAM4G (UIC 3CResearch) offering improvements up to 60% over competing methods. Also in ROAM4G, a novel mode-refinement algorithm was invented for video transcoding which reduces the complexity over full-search by up to 90%. Both works have generated patents which have been licensed to ProVision and STMicroelectronics respectively. Significant work on H.264 optimisation has been conducted in both ROAM4G and the EU FP6 WCAM project.

In 2002, region-of-interest coding was successfully extended to sign language (DTI). Using eyetracking to reveal viewing patterns, foveation models provided bit-rate reductions of 25 to 40% with no loss in perceived quality. This has led to a research programme with the BBC on sign language video coding for broadcasting.

In collaboration with the Metropolitan Police, VIGELANT (EPSRC 2003) produced a novel joint optimisation for rapidly deploying wireless-video camera systems incorporating both multi-view and radio-propagation constraints. With Heriot-Watt and BT, the Group has developed novel multi-view video algorithms which, for the first time, optimise the trade-off between compression and view synthesis (EPSRC).

Methods of synthesising high throughput video signal processing systems which provide joint optimisation of algorithm performance and implementation complexity have been developed using genetic algorithms. Using a constrained architectural style, results have been obtained for 2D filters, wavelet filterbanks and transforms such as DCT. In 2005 innovative work conducted in the Group has led to the development of the X-MatchPROvw lossless data compressor (BTG patent assignment) which at the time this was the fastest in its class.

Error resilience and transport

The Group’s relationship with the Communication Systems and Networks Group has produced a body of successful research into the reliable transport of video. The group has proposed a number of error resilient methods which reduce the propagation of errors and conceal, rather than correct them.

Early work (EU FP4 Project “WINHOME”) found that error resilience methods, based on EREC could be combined with adaptive packetisation strategies and data partitioning to provide a robust MPEG-2 transport for in-home TV distribution. WINHOME provided the first European demonstration of robust video transport over WLANs, highlighting the weakness of media-unaware systems and the potential of selective reuse of corrupted packets.

In the EPSRC funded project SCALVID (GR/L43596/01) robust and scalable video coding schemes for heterogeneous communication systems were investigated. In particular the Group investigated a new coding approach based on matching pursuits. SCALVID significantly reduced the complexity of matching pursuits through a hierarchical (primitive operator) correlator structure (patented by NDS) and through the optimisation of basis function dictionaries. This work was widely cited internationally. SCALVID was first to show that matching pursuits can form the basis of an inherently robust coding system.

With BT and JISC funding, JAVIC (Joint Audio Visual Internet Coding) investigated packet-loss-robust internet video coding for multicast applications. Using H.263, reliable streaming with up to 10% packet loss was demonstrated, by combining cross packet FEC, prioritisation and judicious reference frame selection. Following this, the 3CResearch-ROAM4G project, delivered a novel 3-loop Multiple Description Coding scheme which, with minimal redundancy, provides highly robust video transmission over congestion-prone networks (Figure 2). Extended recently to exploit path-diversity in MIMO video systems this has, for the first time, shown that MDC with spatial-multiplexing can deliver up to 8dB PSNR improvement over corresponding SDC systems. ROAM4G also produced a 3D wavelet embedded coder which competes well with MPEG4-SVC and which additionally provides excellent congestion management. Trials are underway with Thales Research.

In the EU FP5 WCAM project (Wireless Cameras and Seamless Audiovisual Networking), the Group in collaboration with Thales Communications (France) and ProVision Communications (UK) produced a wireless H.264 platform incorporating a new spatio-temporal error concealment algorithm, which provides substantial gains (up-to 9dB PSNR improvement over H.264-JM) with up to 20% packet loss (Figure 3). This was singled out by the reviewers and has been patented and successfully licensed. WCAM has also provided an understanding of link adaptation (switching between a range of modulation and FEC schemes according to channel conditions). Realising that throughput based switching metrics are inherently flawed for video, new quality-derived metrics were developed which substantially outperform existing methods.

Work on High Definition coding and transport has progressed further under the Group’s participation in the EU FP5 MEDIANET project where pre- and post-processing algorithms have been developed. Finally, the 3CResearch-project VISUALISE integrated much of the above work into a live-viewing infrastructure where video compression and streaming technology are efficiently deployed over wireless broadband networks in difficult environments. This collaboration between BT Broadcast, Inmarsat, Node, ProVision, U4EA and ISC has developed a way for spectators at large-scale live events to have near real-time access to events as they unfold via portable terminals for an enhanced experience.

Visual saliency

By being able to predict multicue gaze for open signed video content, there can be coding gains without loss of perceived quality. vissWe have developed a face orientation tracker based upon grid-based likelihood ratio trackers, using profile and frontal face detections. These cues are combined using a grid-based Bayesian state estimation algorithm to form a probability surface for each frame. This gaze predictor outperforms a static gaze prediction and one based on face locations within the frame.