Perceptual Quality Metrics (PVM)

RESEARCHERS

Dr. Fan (Aaron) Zhang

INVESTIGATOR

Prof. David Bull, Dr. Dimitris Agrafiotis and Dr. Roland Baddeley

DATES

2012-2015

FUNDING

ORSAS and EPSRC

SOURCE CODE 

PVM Matlab code Download.

INTRODUCTION

It is known that the human visual system (HVS) employs independent processes (distortion detection and artefact perception – also often referred to near-threshold supra-threshold distortion perception) to assess video quality for various distortion levels. Visual masking effects also play an important role in video distortion perception, especially within spatial and temporal textures.

Algorithmic diagram for PVM.
It is well known that small differences in textured content can be tolerated by the HVS. In this work, we employ the dual-tree complex wavelet transform (DT-CWT) in conjunction with motion analysis to characterise this tolerance within spatial and temporal textures. The DT-CWT has been found to be particularly powerful in this context due to its shift invariance and orientation selectivity properties. In highly distorted video content, for compressed material, blurring is one of the most commonly occuring artefacts. This is detected in our approach by comparing high frequency subband coefficients from the reference and distorted frames, also facilitated by the DT-CWT. This is motion-weighted in order to simulate the tolerance of the HVS to blurring in content with high temporal activity. Inspired by the previous work of Chandler and Hemamiand Larson and Chandler, thresholded differences (defined as noticeable distortion) and blurring artefacts are non-linearly combined using a modified geometric mean model, in which the proportion of each component is adaptively tuned. The performance of the proposed video metric is assessed and validated using the VQEG FRTV Phase I and the LIVE video databases, and shows clear improvements in correlation with subjective scores, over existing metrics such as PSNR, SSIM, VIF, VSNR, VQM and MOVIE, and in many cases over STMAD.

RESULTS

Figure: Scatter plots of subjective DMOS versus different video metrics on the VQEG database.
Figure: Scatter plots of subjective DMOS versus different video metrics on the LIVE video database.

REFERENCE

  1. A Perception-based Hybrid Model for Video Quality Assessment F. Zhang and D. Bull, IEEE T-CSVT, June 2016.
  2. Quality Assessment Methods for Perceptual Video Compression F. Zhang and D. Bull, ICIP, Melbourne, Australia, September 2013.

 

Parametric Video Coding

RESEARCHERS

Dr. Fan (Aaron) Zhang

INVESTIGATOR

Prof. David Bull, Dr. Dimitris Agrafiotis and Dr. Roland Baddeley

DATES

2008-2015

FUNDING

ORSAS and EPSRC

INTRODUCTION

In most cases, the target of video compression is to provide good subjective quality rather than to simply produce the most similar pictures to the originals. Based on this assumption, it is possible to conceive of a compression scheme where an analysis/synthesis framework is employed rather than the conventional energy minimization approach. If such a scheme were practical, it could offer lower bitrates through reduced residual and motion vector coding, using a parametric approach to describe texture warping and/or synthesis.

methodDiagram-1200x466

Instead of encoding whole images or prediction residuals after translational motion estimation, our algorithm employs a perspective motion model to warp static textures and utilises texture synthesis to create dynamic textures. Texture regions are segmented using features derived from the complex wavelet transform and further classified according to their spatial and temporal characteristics. Moreover, a compatible artefact-based video metric (AVM) is proposed with which to evaluate the quality of the reconstructed video. This is also employed in-loop to prevent warping and synthesis artefacts. The proposed algorithm has been integrated into an H.264 video coding framework. The results show significant bitrate savings, of up to 60% compared with H.264 at the same objective quality (based on AVM) and subjective scores.

RESULTS

 

 

REFERENCE

  1. Perception-oriented Video Coding based on Image Analysis and Completion: a Review. P. Ndjiki-Nya, D. Doshkov, H. Kaprykowsky, F. Zhang, D. Bull, T. Wiegand, Signal Processing: Image Communication, July 2012.
  2. A Parametric Framework For Video Compression Using Region-based Texture Models. F. Zhang and D. Bull, IEEE J-STSP, November 2011.

Marie Skłodowska-Curie Actions : PROVISION

Creating a ‘Visually’ Better TomorrowPROVISION team photo

PROVISION is a network of leading academic and industrial organisations in Europe comprising of international researchers working on the problems plaguing most video coding technologies of the day. The ultimate goal is to make noteworthy technical advances and further improvements to the existing state-of-the-art techniques of compression video material.

The project shall not only aim to enhance broadcast and on-demand video material, but also produce a new generation of scientists equipped with research and soft skills needed by industry, academia and society by large. In line with the principles laid down by Marie Skłodowska-Curie actions of the European Commission, PROVISION is a great example of an ensemble of researchers with varied geographical and academic backgrounds all channelling their joint effort towards creating a technologically, or more specifically a ‘visually’ better tomorrow

Provision website, Provision facebook page

Context Based Compression

Richard Vigars

Supervised by: Dave Bull, Andrew Calway

State-of-the-art video coding standards (such as H.264 and H.265) are extremely reliable, and facilitate straightforward rate-quality control. However, certain scenarios remain challenging to the block-based motion and residual coding paradigm. For example, where there is camera motion with respect to highly textured surfaces, particularly if perspective effects are strong, the block-based model produces a large and highly textured residual signal.

To address this problem, we have developed a video codec framework which exploits extrinsic scene knowledge to target higher-order motion models to specific geometrical structures in the scene. We create a textural-geometric model of the scene prior to coding. During coding, we use camera tracking algorithms to track regions of salient scene geometry. Foreground regions are detected and coded by a host codec, such as H.264.

This approach allows us to replace a large volume of the host codec’s bitstream with our own compact motion parameters and side information. Compared to H.264 operating at equivalent quantisation parameters our hybrid codec can achieve bitrate savings of up to 48%.

In breif:

  • Applying extrinsic knowledge of the scene in which a video is capture in order to exploit geometrical redundancy in the motion of objects with respect to the camera.
  • A perspective motion model is targeted to planar background objects in the scene. This allows large regions of frames in the video sequence to be interpolated from reference frames. These interpolated regions are perceptually acceptable, and as such no residual is required. This translates into large bitrate savings!
  • In order to target the perspective motion model to the appropriate regions, tracking techniques similar to those used in augmented reality applications are used.

In terms of theory, this project has turned out to be quite an interesting mix of perceptual video coding and computer vision.

(Back to top)

Key algorithms

In order to extract a context from the scene a few key operations are performed:

Plane modelling.    Plane tracking     Motion estimation

From left to right these are:

  1. A set of textural models of key scene planes is extracted using SURF features. These are augmented with polygons giving the approximate locations of other planes in the scene.
  2. These models are matched to the features in each frame of the video in order to approximately localise the planar structures.
  3. These approximate locations provide an initialisation for frame-to-frame motion estimation. The use of SURF feature point matching and RANSAC isolates the background motion from that of the foreground.

(Back to top)

Foreground segmentation using anisotropic diffusion and morphological operations [1]

The process is depicted in images below:

Pixel differences between predicted frame and actual frame.     Anisotropic diffusion     Thresholding     Cleanup     Map

From left to right:

  1. Absolute pixel differences between the predicted frame, and the actual frame.
  2. These difference are then filtered using anisotropic diffusion. This smooth out invisible errors in the texture, while leaving the large errors due to differences in foreground intact.
  3. A thresholding stage creates a mask.
  4. The mask is grown into a map.
  5. Finally, cleaned up using morphological operations.

References

  1. Krutz et al – Motion-based object segmentation using sprites and anisotropic diffusion, IWIAMIS 2007

(Back to top)

Published work

  • Context-Based Video Coding – R. Vigars, A. Calway, D. R. Bull; ICIP 2013 (accepted)

Visual saliency

By being able to predict multicue gaze for open signed video content, there can be coding gains without loss of perceived quality. vissWe have developed a face orientation tracker based upon grid-based likelihood ratio trackers, using profile and frontal face detections. These cues are combined using a grid-based Bayesian state estimation algorithm to form a probability surface for each frame. This gaze predictor outperforms a static gaze prediction and one based on face locations within the frame.

Error resilience and transport

The Group’s relationship with the Communication Systems and Networks Group has produced a body of successful research into the reliable transport of video. The group has proposed a number of error resilient methods which reduce the propagation of errors and conceal, rather than correct them.

Early work (EU FP4 Project “WINHOME”) found that error resilience methods, based on EREC could be combined with adaptive packetisation strategies and data partitioning to provide a robust MPEG-2 transport for in-home TV distribution. WINHOME provided the first European demonstration of robust video transport over WLANs, highlighting the weakness of media-unaware systems and the potential of selective reuse of corrupted packets.

In the EPSRC funded project SCALVID (GR/L43596/01) robust and scalable video coding schemes for heterogeneous communication systems were investigated. In particular the Group investigated a new coding approach based on matching pursuits. SCALVID significantly reduced the complexity of matching pursuits through a hierarchical (primitive operator) correlator structure (patented by NDS) and through the optimisation of basis function dictionaries. This work was widely cited internationally. SCALVID was first to show that matching pursuits can form the basis of an inherently robust coding system.

With BT and JISC funding, JAVIC (Joint Audio Visual Internet Coding) investigated packet-loss-robust internet video coding for multicast applications. Using H.263, reliable streaming with up to 10% packet loss was demonstrated, by combining cross packet FEC, prioritisation and judicious reference frame selection. Following this, the 3CResearch-ROAM4G project, delivered a novel 3-loop Multiple Description Coding scheme which, with minimal redundancy, provides highly robust video transmission over congestion-prone networks (Figure 2). Extended recently to exploit path-diversity in MIMO video systems this has, for the first time, shown that MDC with spatial-multiplexing can deliver up to 8dB PSNR improvement over corresponding SDC systems. ROAM4G also produced a 3D wavelet embedded coder which competes well with MPEG4-SVC and which additionally provides excellent congestion management. Trials are underway with Thales Research.

In the EU FP5 WCAM project (Wireless Cameras and Seamless Audiovisual Networking), the Group in collaboration with Thales Communications (France) and ProVision Communications (UK) produced a wireless H.264 platform incorporating a new spatio-temporal error concealment algorithm, which provides substantial gains (up-to 9dB PSNR improvement over H.264-JM) with up to 20% packet loss (Figure 3). This was singled out by the reviewers and has been patented and successfully licensed. WCAM has also provided an understanding of link adaptation (switching between a range of modulation and FEC schemes according to channel conditions). Realising that throughput based switching metrics are inherently flawed for video, new quality-derived metrics were developed which substantially outperform existing methods.

Work on High Definition coding and transport has progressed further under the Group’s participation in the EU FP5 MEDIANET project where pre- and post-processing algorithms have been developed. Finally, the 3CResearch-project VISUALISE integrated much of the above work into a live-viewing infrastructure where video compression and streaming technology are efficiently deployed over wireless broadband networks in difficult environments. This collaboration between BT Broadcast, Inmarsat, Node, ProVision, U4EA and ISC has developed a way for spectators at large-scale live events to have near real-time access to events as they unfold via portable terminals for an enhanced experience.