Context Based Compression

Richard Vigars

Supervised by: Dave Bull, Andrew Calway

State-of-the-art video coding standards (such as H.264 and H.265) are extremely reliable, and facilitate straightforward rate-quality control. However, certain scenarios remain challenging to the block-based motion and residual coding paradigm. For example, where there is camera motion with respect to highly textured surfaces, particularly if perspective effects are strong, the block-based model produces a large and highly textured residual signal.

To address this problem, we have developed a video codec framework which exploits extrinsic scene knowledge to target higher-order motion models to specific geometrical structures in the scene. We create a textural-geometric model of the scene prior to coding. During coding, we use camera tracking algorithms to track regions of salient scene geometry. Foreground regions are detected and coded by a host codec, such as H.264.

This approach allows us to replace a large volume of the host codec’s bitstream with our own compact motion parameters and side information. Compared to H.264 operating at equivalent quantisation parameters our hybrid codec can achieve bitrate savings of up to 48%.

In breif:

  • Applying extrinsic knowledge of the scene in which a video is capture in order to exploit geometrical redundancy in the motion of objects with respect to the camera.
  • A perspective motion model is targeted to planar background objects in the scene. This allows large regions of frames in the video sequence to be interpolated from reference frames. These interpolated regions are perceptually acceptable, and as such no residual is required. This translates into large bitrate savings!
  • In order to target the perspective motion model to the appropriate regions, tracking techniques similar to those used in augmented reality applications are used.

In terms of theory, this project has turned out to be quite an interesting mix of perceptual video coding and computer vision.

(Back to top)

Key algorithms

In order to extract a context from the scene a few key operations are performed:

Plane modelling.    Plane tracking     Motion estimation

From left to right these are:

  1. A set of textural models of key scene planes is extracted using SURF features. These are augmented with polygons giving the approximate locations of other planes in the scene.
  2. These models are matched to the features in each frame of the video in order to approximately localise the planar structures.
  3. These approximate locations provide an initialisation for frame-to-frame motion estimation. The use of SURF feature point matching and RANSAC isolates the background motion from that of the foreground.

(Back to top)

Foreground segmentation using anisotropic diffusion and morphological operations [1]

The process is depicted in images below:

Pixel differences between predicted frame and actual frame.     Anisotropic diffusion     Thresholding     Cleanup     Map

From left to right:

  1. Absolute pixel differences between the predicted frame, and the actual frame.
  2. These difference are then filtered using anisotropic diffusion. This smooth out invisible errors in the texture, while leaving the large errors due to differences in foreground intact.
  3. A thresholding stage creates a mask.
  4. The mask is grown into a map.
  5. Finally, cleaned up using morphological operations.

References

  1. Krutz et al – Motion-based object segmentation using sprites and anisotropic diffusion, IWIAMIS 2007

(Back to top)

Published work

  • Context-Based Video Coding – R. Vigars, A. Calway, D. R. Bull; ICIP 2013 (accepted)

Spot the Penguin

Tilo Burghardt, Neill Campbell, Peter Barham, Richard Sherley

This research was conducted 2006-2009. The research aimed at providing non-invasive solutions to problems of field biology and to better understand and conserve endangered species. Specifically we are penguinsdeveloped approaches to permit remote monitoring and identification of large populations using techniques that originated in computer vision and human biometrics. Initial work was centred around the African penguin (Spheniscus demersus).

During the project we provided a proof of concept for an autonomously operating prototype system capable of monitoring and recognising individual African penguins in their natural environment without tagging or otherwise disturbing the animals. The system was limited to very good environmental conditions.

Research was conducted together with the Animal Demography Unit at the University of Cape Town, South Africa. The project was funded by the Leverhulme Trust, with long-term support in the field from the Earthwatch Institute, and with pilot tests run in collaboration with Bristol Zoo Gardens.

Human pose estimation using motion

Ben Daubney, David Gibson, Neill Campbell

Currently we are researching how to extract human pose from a sparse set of moving features. This work is inspired from psychophisical experiments using thehumanpose Moving Light Display (MLD), where it has been shown that a small set of moving points attached to the key joints of a person could convey a wealth of information to an observer about the person being viewed, such as their mood or gender. Unlike the typical MLD’s used in the physchophysics community ours are automatically generated by applying a standard feature tracker to a sequence of images.

The result is a set of features that are far more noisy and unreliable than those tradtionally used. The purpose of this research is to try to better understand how the temporal dimension of a sequence of images can be exploited at a much lower level than currently used to estimate pose.

Analysis of moth camouflage

mothcam

David Gibson, Neill Campbell

A half million pound BBSRC collaboration with Biological sciences and experimental Psychology, the aim of this project is to develop a computational theory of animal camouflage, with models specific to the visual systems of birds and humans. Moths have been chosen for this study as they are a particularly good demonstrators of a wide range of cryptic and disruptive camouflage in nature. Using psychophysically plausible low-level image features, learning algorithms are used to determine the effectiveness of camouflage examples. The ability to generate and process large numbers of camouflage examples enables predictive computational models to be created and compared to the performance of human and bird subjects. Such comparisons will give insights into what aspects of moth camouflage are important for avoiding detection and recognition by birds and humans and thereby, give insight into the mechanisms being employed by bird and human visual systems

Active contours

Majid Mirmehdi, Xianghua Xie, Ronghua Yang

Active contours finding boundaries in the brainActive contour models, commonly known as snakes, have been widely used for object localisation, shape recovery, and visual tracking due to their natural handling of shape variations. The introduction of the Level Set method into snakes has greatly enhanced their potential in real world applications.

Since 2002, we have developed some novel active contour models. The first one aims to bridge (image gradient) boundary based approach and region-based approach. In this work, a level set based geometric snake, enhanced for more tolerance towards weak edges and noise, is introduced. It is based on the principle of the conjunction of the traditional gradient flow forces with new region constraints. We refer to this as the Region-aided Geometric Snake or RAGS. The image gradient provides local information of object boundaries, while the region information offers global definition of boundaries. In this framework, the region constrains can be conveniently customerised and plugged into the snake model.

The second model, called Charged Contour Model (CCM),is a migration of Charged Particle Model (CPM) into the active contour framework. The basic idea is to introduce particle dynamics into contour based deformable models. CCM performs better than CPM in the sense that it guarantees closed contours, i.e. it eliminates the ambiguities in contour reconstruction. Also, CCM is much more efficient. In comparison to geodesic snake, CCM is more robust to weak edges and less sensitive to noise interference.

The third model, CACE (Charged Active Contour model based on Electrostatics), is a further development of the CCM. The snake, implicitly embedded in level sets, propagates under the joint influence of a boundary attraction force and a boundary competition force. Its vector field dynamically adapts by updating itself when a contour reaches a boundary (which differs from CCM). The model is then more invariant to initialisation and possesses better convergence abilities. Analytical and comparative results are presented on synthetic and real images.

MAC model is a result of our most recent effort in developing new active contour models. The proposed external force field that is based on magnetostatics and hypothesized magnetic interactions between the active contour and object boundaries. The major contribution of the method is that the interaction of its forces can greatly improve the active contour in capturing complex geometries and dealing with difficult initializations, weak edges and broken boundaries. The proposed method is shown to achieve significant improvements when compared against six well-known and state-of-the-art shape recovery methods, including the geodesic snake, the generalized version of GVF snake, the combined geodesic and GVF snake, and the charged particle model.

For more information, please see our active contours site.

On-line learning of shape information

John Chiverton, Majid Mirmehdi, Xianghua Xie

Tracking of objects and simultaneously identifying an accurate outline of the tracked object is a complicated computer vision problem to solve because of the handschanging nature of the high-dimensional image information. Prior information is often included into models, such as probability distribution functions on a prior definition of shape to alleviate potential problems due to e.g. ambiguity as to what should actually be tracked in the image data. However supervised learning and or training is not always possible for new unseen objects or unforeseen configurations of shape, e.g. for silhouettes of 3-D objects. We are therefore interested and are currently investigating ways to include high-level shape information into active contour based tracking frameworks without a supervised pre-processing stage.