ViSTRA: Video Compression based on Resolution Adaptation

Fan Zhang, Mariana Afonso and David Bull

ABSTRACT

We present a new video compression framework (ViSTRA2) which exploits adaptation of spatial resolution and effective bit depth, down-sampling these parameters at the encoder based on perceptual criteria, and up-sampling at the decoder using a deep convolution neural network. ViSTRA2 has been integrated with the reference software of both the HEVC (HM 16.20) and VVC (VTM 4.01), and evaluated under the Joint Video Exploration Team Common Test Conditions using the Random Access configuration. Our results show consistent and significant compression gains against HM and VVC based on Bjonegaard Delta measurements, with average BD-rate savings of 12.6% (PSNR) and 19.5% (VMAF) over HM and 5.5% (PSNR) and 8.6% (VMAF) over VTM.

PROPOSED ALGORITHM

RESULTS

BD-rate results of ViSTRA2 when HM 16.20 was employed as host codec.

BD-rate results of ViSTRA2 when VTM 4.01 was employed as host codec.

REFERENCE

[1] F. Zhang, M. Afonso and D. R. Bull, ViSTRA2: Video Coding using Spatial Resolution and Effective Bit Depth Adaptation. arXiv preprint arXiv:1911.02833.

[2] M. Afonso, F. Zhang and D. R. Bull, Video Compression based on Spatio-Temporal Resolution Adaptation. IEEE T-CSVT (Letter), 2019.

[3] M. Afonso, F. Zhang, A. Katsenou, D. Agrafiotis, D. Bull, Low Complexity Video Coding Based on Spatial Resolution Adaptation, ICIP, 2017.

Rate-distortion Optimization Using Adaptive Lagrange Multipliers

Fan Zhang and David Bull

ABSTRACT

This page introduces the work of rate-distortion optimisation using adaptive Lagrange Multipliers. In current standardized hybrid video encoders, the Lagrange multiplier determination model is a key component in rate-distortion optimization. This originated some 20 years ago based on an entropy-constrained high-rate approximation and experimental results obtained using an H.263 reference encoder on limited test material. In this work, we conducted a comprehensive analysis of the results of a Lagrange multiplier selection experiment conducted on various video content using H.264/AVC and HEVC reference encoders. These results show that the original Lagrange multiplier selection methods, employed in both video encoders, are able to achieve optimum rate-distortion performance for I and P frames, but fail to perform well for B frames. The relationship is identified between the optimum Lagrange multipliers for B frames and distortion information obtained from the experimental results, leading to a novel Lagrange multiplier determination approach. The proposed method adaptively predicts the optimum Lagrange multiplier for B frames based on the distortion statistics of recent reconstructed frames. After integration into both H.264/AVC and HEVC reference encoders, this approach was evaluated on 36 test sequences with various resolutions and differing content types. The results show consistent bitrate savings for various hierarchical B frame configurations with minimal additional complexity. BD savings average approximately 3% when constant QP values are used for all frames, and 0.5\% when non-zero QP offset values are employed for different B frame hierarchical levels.

 

REFERENCE

[1] Fan Zhang and David, R. Bull, “Rate-distortion Optimization Using Adaptive Lagrange Multipliers”, IEEE Trans. on CSVT, accepted in 2018.

[2] F. Zhang and D. Bull, “An Adaptive Lagrange Multiplier Determination Method for Rate-distortion Optimisation in Hybrid Video Codecs”. IEEE ICIP, 2015.

FRQM: A Frame Rate Dependent Video Quality Metric

Fan Zhang, Alex Mackin and David Bull

ABSTRACT

This page introduces the work of an objective quality metric (FRQM), which characterises the relationship between variations in frame rate and perceptual video quality. The proposed method estimates the relative quality of a low frame rate video with respect to its higher frame rate counterpart, through temporal wavelet decomposition, subband combination and spatiotemporal pooling. FRQM was tested alongside six commonly used quality metrics (two of which explicitly relate frame rate variation to perceptual quality), on the publicly available BVI-HFR video database, that spans a diverse range of scenes and frame rates, up to 120fps. Results show that FRQM offers significant improvement over all other tested quality assessment methods with relatively low complexity.

PROPOSED ALGORITHM

SOURCE CODE DOWNLOAD

[DOWNLOAD] Matlab code

REFERENCE

[1] Fan Zhang, Alex Mackin, and David, R. Bull, “A Frame Rate Dependent Video Quality Metric based on Temporal Wavelet Decomposition and Spatiotemporal Pooling. “, IEEE ICIP, 2017.

 

BVI-HD: A Perceptual Video Quality Database for HEVC and Texture Synthesis Compressed Content

Fan Zhang, Felix Mercer Moss, Roland Baddeley and David Bull

ABSTRACT

This page introduces a new high definition video quality database, referred to as BVI-HD, which contains 32 reference and 384 distorted video sequences plus subjective scores. The reference material in this database was carefully selected to optimise the coverage range and distribution uniformity of five low level video features, while the included 12 distortions, using both original High Efficiency Video Coding (HEVC) and HEVC with synthesis mode (HEVC-SYNTH), represent state-of-the-art approaches to compression. The range of quantisation parameters included in the database for HEVC compression was determined by a subjective study, the results of which indicate that a wider range of QP values should be used than the current recommendation. The subjective opinion scores for all 384 distorted videos were collected from a total of 86 subjects, using a double stimulus test methodology. Based on these results, we compare the subjective quality between HEVC and synthesised content, and evaluate the performance of nine state-of-the-art, full-reference objective quality metrics. This database has now been made available online, representing a valuable resource to those concerned with compression performance evaluation and objective video quality assessment.

DATABASE DOWNLOAD

[DOWNLOAD] instructions and related file.

[DOWNLOAD] all videos from CDVL (personal account may need to be registered first).

[DOWNLOAD] all subjective data.

Please read the README file before using the data.

If this content has been mentioned/used in a research publication, please give credit to both CDVL and the University of Bristol, by referencing the following papers:

[1] Fan Zhang, Felix Mercer Moss, Roland Baddeley, and David, R. Bull, “BVI-HD: A Video Quality Database for HEVC Compressed and Texture Synthesised Content”, IEEE Trans. on Multimedia, 2018.

[2] Margaret H. Pinson, “The Consumer Digital Video Library [Best of the Web],” IEEE Signal Processing Magazine, vol. 30, no. 4, pp. 172,174, July 2013 doi: 10.1109/MSP.2013.2258265

 

Computational cameras

A novel close-to-sensor computational camera has been designed and developed at the ViLab. ROIs can be captured and processed at 1000fps; the concurrent processing enables low latency sensor control and flexible image processing. With 9DoF motion sensing, the low, size, weight and power form-factor makes it ideally suited for robotics and UAV applications. The modular design allows multiple configurations and output options, easing development of embedded applications. General purpose output can directly interface with external devices such as servos and motors while ethernet offers a conventional image output capability. A binocular system can be configured with self-driven pan/tilt positioning, as an autonomous verging system or as a standard stereo pair. More information can be found here xcamflyer.

Terrain analysis for biped locomotion

Numerous scenarios exist where it is necessary or advantageous to classify surface material at a distance from a moving forward-facing camera. Examples include the use of image based sensors for assessing and predicting terrain type in association with the control or navigation of autonomous vehicles. In many real scenarios, the upcoming terrain might not just be flat but may also be oblique and vehicles may need to change speed and gear to ensure safe and clean motion.

Blur-robust texture features

Videos captured with moving cameras, particularly those attached to biped robots, often exhibit blur due to incorrect focus or slow shutter speed. Blurring effects generally alter the spatial and frequency characteristics of the content and this may reduce the performance of a classifier. Robust texture features are therefore developed to deal with this problem. [Matlab Code]

Terrain classification from body-mounted cameras during human locomotion

A novel algorithm for terrain type classification based on monocular video captured from the viewpoint of human locomotion is introduced. A texture-based algorithm is developed to classify the path ahead into multiple groups that can be used to support terrain classification. Gait is taken into account in two ways. Firstly, for key frame selection, when regions with homogeneous texture characteristics are updated, the frequency variations of the textured surface are analysed and used to adaptively define filter coefficients. Secondly, it is incorporated in the parameter estimation process where probabilities of path consistency are employed to improve terrain-type estimation [Matlab Code]. Figures below show the proposed process of terrain classification for tracked regions and a result. [PDF]

Label 1 (green), Label 2 (red) and Label 3 (blue) correspond to the areas classified as hard surfaces, soft surfaces and unwalkable areas, respectively. The size of the circle indicates probabilities – bigger implies higher confidence of classification.

Planar orientation estimation by texture

The gradient of a road or terrain influences the appropriate speed and power of a vehicle traversing it. Therefore, gradient prediction is necessary if autonomous vehicles are to optimise their locomotion. A novel texture-based method for estimating the orientation of planar surfaces under the basic assumption of homogeneity has been developed for scenarios that only a single image source exists, which also includes where a region of interest is too further to employ a depth estimation technique.

References

  • Terrain classification from body-mounted cameras during human locomotion. N. Anantrasirichai, J. Burn and David Bull. IEEE Transactions on Cybernetics. [PDF] [Matlab Code].
  • Projective image restoration using sparsity regularization. N. Anantrasirichai, J. Burn and David Bull. ICIP 2013. [PDF] [Matlab Code]
  • Robust texture features for blurred images using undecimated dual-tree complex wavelets. N. Anantrasirichai, J. Burn and David Bull. ICIP 2014. [PDF] [Matlab Code]
  • Orientation estimation for planar textured surfaces based on complex wavelets. N. Anantrasirichai, J. Burn and David Bull. ICIP 2014. [PDF]
  • Robust texture features based on undecimated dual-tree complex wavelets and local magnitude binary patterns. N. Anantrasirichai, J. Burn and David Bull. ICIP 2015. [PDF]