Katie Black – Visual Information Laboratory

Electron Microscopy Image Segmentation

Posted on 12th July 201623rd February 2021 by Katie Black

David Nam, Judith Mantell, David Bull, Paul Verkade, Alin Achim

The following work presents a graphical user interface (GUI), for automatic segmentation of granule cores and membranes, in transmission electron microscopy images of beta cells. The system is freely available for academic research. Two test images are also included. The highlights of our approach are:

A fully automated algorithm for granule segmentation.
A novel shape regularizer to promote granule segmentation.
A dual region-based active contour for accurate core segmentation.
A novel convergence filter for granule membrane verification.
A precision of 91% and recall of 87% is observed against manual segmentations.

Further details can be found in–

D. Nam, J. Mantell, D. Bull, P. Verkade, and A. Achim, “A novel framework for segmentation of secretory granules in electron micrographs,” Med. Image Anal., vol.18, no. 2, pp. 411–424, 2014.

granulesegmenter

Granule Segmenter Download (Matlab)

Place Recognition From Disparate Views

Posted on 2nd October 201323rd February 2021 by Katie Black

Visual place recognition methods which use image matching techniques have shown success in recent years, however their reliance on local features restricts their use to images which are visually similar and which overlap in viewpoint. We suggest that a semantic approach to the problem would provide a more meaningful relationship between views of a place and so allow recognition when views are disparate and database coverage is sparse. As initial work towards this goal we present a system which uses detected objects as the basic feature and demonstrate promising ability to recognise places from arbitrary viewpoints. We build a 2D place model of object positions and extract features which characterise a pair of models. We then use distributions learned from training examples to compute the probability that the pair depict the same place and also an estimate of the relative pose of the cameras. Results on a dataset of 40 urban locations show good recognition performance and pose estimation, even for highly disparate views.

systemdiagramsmall-822x386

Notable Results

To assess the performance of our system, we collected a dataset of 40 locations, each with between 2 and 4 images from widely different viewpoints. Since we are simply learning distributions over comparisons of places, not about the places themselves, we decided to train the system on a subset of the test dataset to maximise use of the data. To verify that the results were not biased, we tried repeatedly training the system on a random 50% subset of the dataset and running the test again. We found that the learned probability distributions were very similar each iteration, and that the recognition performance did not change by more than about 2%.

A place recognition experiment was then performed. Each image from the dataset was compared against every other image to compute the the posterior probability that the images depict the same place. The table below states the performance of our system under several conditions. The “grouped” score is simply the percentage of test images for which an image from the same place was chosen as the most likely match, simulating a place recognition scenario in which we have made a small number of previous observations of each place. It is interesting however to consider a harder case in which, for each test image, there is only a single matching image in the database. The “pairwise” score simulates this situation by removing all but one of the matching images for each test image.

We also observed that some discriminative ability of the system is provided by the different object classes – so a place with objects of class “sign” and “bollard” cannot possibly match with a place containing only “traffic light” objects. Whilst this is a legitimate place recognition scenario, we wanted to observe the discriminative ability of the features alone. Thus, we also tested the system on a “restricted class” subset of the dataset with 30 locations, all of which contained the same two object classes, meaning that almost every image was capable of valid object correspondences with every other image. Clearly this is a harder case, however the table shows that performance was still reasonable.

	Grouped	Pairwise
Restricted class dataset	67.9%	54.5%
Full dataset	73.1%	61.8%
GIST (Oliva and Torralba, 2001)	19.2%	21.4%

Plane Detection From Single Images

Posted on 1st September 201323rd February 2021 by Katie Black

Our work involves the detection of planar structures from single images. This is inspired by human vision – since humans have an impressive ability to understand the content of both the real world and 2D images, without necessarily needing depth or parallax cues. As such, we take a machine learning route, and learn from a large set of images the relationship between image appearance and 3D structure.

There are two main parts to our method: first, plane recognition, which for a given, pre-segmented image region can classify it as being planar or not, and for planar regions estimate their 3D orientation with respect to the camera. This is done by representing the image region with standard image descriptors, within a bag of words framework enhanced with spatial information. These are used as input to a relevance vector machine classifier, to identify planes, and a regression algorithm to estimate orientation.

Second, the above is used for plane detection, where since we do not generally know the location of potentially planar regions in the image, we apply the plane recognition step repeatedly to overlapping segments of the image. These overlapping regions give allow us to calculate an estimate, at each of a set of salient points, whether they are likely to belong to a plane or not, and their likely orientation (by considering all the regions in which they lie). This point-wise local plane estimate is then segmented to give a discrete set of non-planar and oriented planar regions.

We have also shown (work in collaboration with José Martínez-Carranza) how this single-image plane detection can be useful for visual odometry, where by detecting the presence of likely planar structures from on frame while traversing an outdoor urban environment, planar features can be quickly initialised into the map, with a good prior estimate of their orientation. This allows rough 3D maps of the environment, incorporating higher-level structures, to be rapidly built.

Experimental Results

Plane Recognition

We found that the plane recognition algorithm was able to work well in a variety of outdoor scenes. As well as comprehensive cross-validation, we tested the algorithm on a set of images taken from a completely independent area of the city from the location of the test images (where the region of interest has been marked up by hand). Average classification (plane/non-plane) accuracy was 91.6%, and an orientation (normal vector estimation) error of 14.5 degrees. Some example results from this data set are shown here:

plane1-1000x205

The first three show successful plane detection with estimated orientations (green) compared to ground truth (blue); the last two show identification of non-planar regions.

Plane Detection

The full plane detection algorithm, involving finding planes in previously unseen images, and estimating their orientation, was also tested on an independent data set of images. A few example results are shown here:

plane2-1000x205

References

Visual mapping using learned structural priors (ICRA 2013)
Detecting planes and estimating their orientation from a single image (BMVC 2012)
Estimating planar structure in single images by learning from examples (ICPRAM 2012)

Parametric Video Compression

Posted on 2nd September 201423rd February 2021 by Katie Black

This project presents a novel means of video compression based on texture warping and synthesis. Instead of encoding whole images or prediction residuals after translational motion estimation, our algorithm employs a perspective motion model to warp static textures and utilises texture synthesis to create dynamic textures. Texture regions are segmented using features derived from the complex wavelet transform and further classified according to their spatial and temporal characteristics. Moreover, a compatible artefact-based video metric (AVM) is proposed with which to evaluate the quality of the reconstructed video. This is also employed in-loop to prevent warping and synthesis artefacts. The proposed algorithm has been integrated into an H.264 video coding framework. The results show significant bitrate savings, of up to 60% compared with H.264 at the same objective quality (based on AVM) and subjective scores.

It is currently a very exciting and challenging time for video compression. The predicted growth in demand for bandwidth, especially for mobile services will be driven by video applications and is probably greater now than it has ever been. David Bull (VI-Lab), Dimitris Agrafiotis (VI-Lab) and Roland Baddeley (Experimental Psychology) have won a new £600k EPSRC research grant to investigate perceptual redundancy in, and new representations for digital video content. With EPSRC funding and collaboration with BBC and HHIFraunhofer Berlin, the team will investigate video compression schemes where an analysis/synthesis framework replaces the conventional energy minimisation approach. A preliminary coding framework of this type has been created by Zhang and Bull where scene content is modelled, using computer graphic techniques to replace target textures at the decoder. This approach is already producing world-leading results and has the potential to create a new content-driven framework for video compression, where region-based parameters are combined with perceptual quality metrics to inform and drive the coding processes.

Published Work

P. Ndjiki-Nya, D. Doshkova, H. Kaprykowsky, F. Zhang, D. Bull, T. Wiegand, Perception-oriented video coding based on image analysis and completion: A review, Signal Processing: Image Communication, Volume 27, Issue 6, July 2012, Pages 579–594Link
Zhang, F and Bull D.R., A Parametric Framework For Video Compression Using Region-based Texture Models’, IEEE Journal on Selected Areas in Signal processing (Special Issue), Vol. 5, No. 7, November 2011, pp1378-92. Link
Ierodiaconou, S.; Byrne, J.; Bull, D.R.; Redmill, D.; Hill, P.; Unsupervised image compression using graphcut texture synthesis, Image Processing (ICIP), 2009 16th IEEE International Conference on, 2009, Page(s): 2289 – 2292
Byrne, J.; Ierodiaconou, S.; Bull, D.; Redmill, D.; Hill, P.; Unsupervised image compression-by-synthesis within a JPEG framework, Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, 2008 , Page(s): 2892 – 2895

Bio-Inspired 3D Mapping

Posted on 1st September 201323rd February 2021 by Katie Black

Geoffrey Daniels

Supervised by: David Bull, Walterio Mayol-Cuevas, J Burn

Using state of the art computer vision techniques and insights into the biological process used by animals to traverse any terrain a system has been created to enable a robotic platform to gather the information required to move safely throughout an unknown environment. A goal of this project is to produce a system that can run in real-time upon an arbitrary locomotion platform and provide local route planning and hazard detection. With the real-time aim in mind the core parts of the current algorithm have been developed using NVidia’s CUDA language for general purpose computing on GPUs as the code is embarrassingly parallel and GPUs can provide a huge speed increase for parallel processes. Currently without significant optimisation the system is able to compute the 3D surface ahead of the camera in approximately 100ms.

This system will be a module of a larger grant to develop a bio-inspired concept system for overall terrestrial perception and safe locomotion.

Interesting Results

Some example footage of the system generating a virtual 3D world from a single camera in real time:
https://www.youtube.com/watch?v=h36hVOerMFU&list=PLJmQZRbc9yWWrg6A0R_NFYHl6WP4FoNe9#t=15

Robust Visual SLAM for Fast Moving Platforms

Posted on 1st March 201323rd February 2021 by Katie Black

Dr. Jose Martinez-Carranza

In the last years considerable progress has been achieved for what is known as visual Simultaneous localisation and Mapping (SLAM).

Visual SLAM is a technology that provides fast accurate 6D pose estimation of a moving camera and a 3D representation of the scene observed with the camera. Applications for this technology include: navigation in GPS-denied environments, virtual augmentation of objects in video footage, video-game interaction, etc.

Despite the achievements, there are still challenges to be faced. A practical one, but yet quite important, is that of using visual SLAM systems on platforms of low budget where computer power is reduced and memory is limited.

From the above, my main research focuses on the design of strategies that allow visual SLAM systems to keep working on slow budget platform without sacrificing the real-time response. This also includes maintaining robustness against loss of tracking, vibration, image blurred and strong change of light conditions.

Applications of my research are oriented to fast moving robotic platforms such as walking robots, mobile vehicles and Unmanned Aerial Vehicles (UAVs).

Full details about my ongoing research can be found here.