VIL Seminars

VI Lab seminars take place regularly during term time, are free to attend and feature special invited guests from across the globe.

For details about upcoming seminars, taking place on Zoom until further notice, please contact Jonathan Munro on

  • VILSS: How to tag billions of photos: The evolution of image auto-tagging from a technology to a global service​ - 09,10,2017

    Dr Stavri Nikolov, Imagga Technologies, Co-fouder and Research Director 

    Imagga ( is one of the pioneers in the world in large-scale image tagging. Our cloud and on-premise software solutions have analysed and tagged billions of photos of various clients from around the world, ranging from telcos and cloud service providers to digital media companies, stock photo agencies, media sharing platforms, real estate and advertising agencies and others. In this talk we shall present an overview of how in the last decade image auto-tagging evolved from a technology to a global service and share our views how it may develop in the future. We shall discuss the challenges, the client needs, and the solutions that have evolved, and showcase some interesting applications of image tagging and categorisation.


    Dr Stavri Nikolov is co-founder and Research Director of Imagga Technologies Ltd ( Imagga Technologies Ltd is a company that develops and offers technologies, services, and online tools, for large-scale image analysis, recognition and tagging in the cloud and on premise. Dr Nikolov is also Founding Director of Digital Spaces Living Lab (DSLL) in Sofia, Bulgaria. DSLL ( is one of the leading Living Labs in Europe and a member of the European Network of Living Labs (ENoLL). DSLL develops and tests new technologies, services and apps for digital media, wearables, lifelogging, museums and smart cities. In the past, Dr Nikolov was also a Senior Scientist (Digital Identity and Information Search) at the Institute for Prospective Technological Studies ( of the European Commission in Seville, Spain (2009-2011), and a Senior Research Fellow in Image Processing at the University of Bristol, UK (1998-2007). His research interests over the years have spanned areas such as image analysis, image recognition, image fusion, image search, mobile search, new methods for data visualisation and navigation, gaze-tracking, HCI, VR, the construction of attentive and interactive information displays, video surveillance, digital identity and biometrics, location-based services, and new technologies for CH in museums and galleries. In the last 20 years he has coordinated and participated in many large international and national research projects in Austria, Portugal, UK, Spain and Bulgaria. He has published more than 80 refereed or invited papers, including eight invited book chapters, and also numerous technical reports in these areas. Dr Nikolov has also given many invited lectures around the world. He was the creator and co-ordinator of The Online Resource for Research in Image Fusion ( and The Online Archive of Scanpath Data ( Dr Nikolov has been a member of the British Machine Vision Association, the Applied Vision Association, the International Society of Information Fusion, ACM SIGGRAPH, and IEEE. He served as a member of the Editorial Board of the Information Fusion journal published by Elsevier for nearly 10 years. Over the years, Dr Nikolov has been a technology mentor to various organisations, including Eleven (, a €12M Acceleration Fund, LAUNCHub (, a €9M Seed and Acceleration Fund, the European Business Network (, and the European Satellite Navigation Competition ( He was the founding director of Smart Fab Lab ( – the first fab lab (digital fabrication lab) in Bulgaria. Dr Nikolov was Working Group 5 (Industry and End Users) Leader and a Management Committee member of the European Network on Integrating Vision and Language (iV&L Net) COST Action ( in the first two years of the network.

  • VILSS: Human Action Recognition and Detection from Noisy 3D Skeleton Data - 20,06,2016

    Mohamed Hussein, Egypt-Japan University of Science and Technology

    profile3 Human action recognition and human action detection are two closely related problems. In human action recogniton, the purpose is to determine the class of an action performed by a human subject from spatio-temporal measurements of the subject, which are cropped in the time dimension to include only the performed action. On the other hand, in human action detection, the input is not cropped in time and may include multiple action instances, possibly from different classes, and the purpose is to determine the action class and the time period of each action instance in the input sequence. Recent years have witnessed a surge in research efforts on the two problems when the measurements are noisy 3D skeleton data, obtained from cheap consumer-level depth sensors, such as the Microsoft Kinect. In this talk, I will present our efforts in this domain. I will first describe our earlier work in designing fixed-length descriptors for human action recognition from 3D skeleton data. Then, I will introduce a direct deployment of these techniques on human action detection via multi-scale sliding window search, which works in real-time, but, can only process sequences off-line. Then, I will explain our most recent results on real-time online human action detection using a simple linear-time greedy search strategy that we call ‘Efficient Linear Search’, which overcomes the limitations of a more sophisticated dynamic programming strategy in this problem.


  • VILSS: Upper body pose estimation for sign language and gesture recognition - 17,06,2016

    James Charles, University of Leeds

    profile3In this talk I present methods for estimating the upper body pose of people performing gestures and sign language in long video sequences. Our methods are based on random forests classifiers and regressors which have proved successful for inferring pose from depth data (Kinect). Here, I will show how we develop methods to: (1) achieve real-time 2D upper body pose estimation without depth data, (2) produce structured pose output from a mixture of random forest experts, (3) use more image context while keeping the learning problem tractable and (4) incorporate temporal context using dense optical flow.



  • VILSS: Intelligent signal processing and learning in imaging - 17,06,2016

    Panagiotis Tsakalides, University of Crete

    Panagiotis TsakalidesModern technologies, including the proliferation of high performance sensors and network connectivity, have revolutionized imaging systems used in applications ranging from medical and astronomical imaging to consumer photography. These applications demand even higher speed, scale, and resolution, which are typically limited by specific imaging and processing components. While striving for more complex and expensive hardware is one path, an alternative approach involves the intelligent design of architectures that capitalize on the advances in cutting edge signal processing to achieve these goals. This talk will motivate the need for smart integration of hardware components, on one hand, and software based recovery, on the other. The talk will showcase the benefits that stem from Compressed Sensing and Matrix Completion, two paradigm shifting frameworks in signal processing and learning, in two imaging problems, namely, range imaging and hyperspectral imaging. Challenges associated with properties of imaging data such as complexity and volume will also be presented and viewed under the prism of these algorithms.

  • VILSS: Are Cars Just 3D Boxes? Jointly Estimating the 3D Shape of Multiple Objects - 17,06,2016

    Zeeshan Zia, Imperial College London

    Zeeshan ZiaCurrent systems for scene understanding typically represent objects as 2D or 3D bounding boxes. While these representations have proven robust in a variety of applications, they provide only coarse approximations to the true 2D and 3D extent of objects. As a result, object-object interactions, such as occlusions or supporting-plane contact, can be represented only superficially. We approach the problem of scene understanding from the perspective of 3D shape modeling, and design a 3D scene representation that reasons jointly about the 3D shape of multiple objects. This representation allows expressing 3D geometry and occlusion on the fine detail level of individual vertices of 3D wireframe models, and makes it possible to treat dependencies between objects, such as occlusion reasoning, in a deterministic way. The talk will further describe experiments which demonstrate the benefit of jointly estimating the 3D shape of multiple objects in a scene over working with coarse boxes.

  • VILSS: Wirewax – engineering vision algorithms for the wild - 17,06,2016

    John Greenall, Wirewax, London

    Wirewax is a platform for turning your videos into rich interactive experiences. Backing the website is a powerful suite of computer vision algorithms that run on a scalable cloud architecture. This talk will detail some of the experiences of training and deploying algorithms for use “in the wild”, including discussion of face detection, recognition and motion tracking.

  • VILSS: 2D Pairwise Geometry for Robust and Scalable Place Recognition - 17,06,2016

    Edward Jones, Dyson Research Lab, Imperial College London

    Teo de CamposIn this talk, I will present an overview of my PhD research on extending recent trends in visual place recognition to offer robustness and scalability. The underlying theme of my work is the exploitation of 2D geometry between pairs of local image features, which is often overlooked in favour of stronger 3D constraints. I will show how 2D geometry can be effectively applied to complement the limitations of 3D geometry, or even replace it at a fraction of the computational cost. The talk is divided into three sections, each discussing one example of such a method. First, I will show how 2D pairwise geometry can help to eliminate false positive feature correspondences which arise from RANSAC-based 3D geometric constraints. Then, an inverted index consisting of pairwise geometries will be introduced, which makes scalable recognition with geometry possible. Finally, I will introduce a topological robot localisation system which aims towards encoding probability into place recognition attempts, and hence offering suitability to visual SLAM frameworks.


  • VILSS: Transductive Transfer Learning for Computer Vision - 17,06,2016

    Teo de Campos, University of Surrey

    Teo de CamposOne of the ultimate goals of the open ended learning systems is to take advantage of previous experience in dealing with future problems. We focus on classification problems where labelled samples are available in a known problem (the source domain), but when the system is deployed in the target dataset, the distribution of samples is different. Although the number of classes and the feature extraction method remain the same, a change of domain happens because there is a difference between the typical distribution of data of source and target samples. This is a very common situation in computer vision applications, e.g., when a synthetic dataset is used for training but the system is applied on images “in the wild”. We assume that a set of unlabelled samples is available target domain. This constitutes a Transductive Transfer Learning problem, also known as Unsupervised Domain Adaptation. We proposed to tackle this problem by adapting the feature space of the source domain samples, so that their distribution becomes more similar to that of the target domain samples. Therefore a classifier re-trained on the updated source space can give better results on the target samples. We proposed to use a pipeline which consists of three main components:  (i) a method for global adaptation of the marginal distribution of the data using Maximum Mean Discrepancy; (ii) a sample-based adaptation method, which translates each source sample towards the distribution of the target samples; (iii) a class-based conditional distribution adaptation method. We conducted experiments on a range of image classification and action recognition datasets and showed that our method gives state-of-the-art results.


  • VILSS: Ortho-diffusion decompositions of graph-based representations of images - 17,06,2016

    Adrian Bors, University of York

    Adrian BorsIn this presentation I introduce the ortho-diffusion operator. I consider graph-based data representations where full data interconnectivity is modelled using probability transition matrices. Multi-scale dimensionality reduction at different scales is used in order to extract the meaningful data representations. The QR orthonormal decomposition algorithm, alternating with diffusion and data reduction stages is applied recursively at each scale level for the given data representation. Columns in the ortho-diffusion representation matrix represent characteristic features of the data. Those columns that are not considered essential for the data representation are removed at each scale. The proposed methodology is used to model features extracted from images which are then used for image matching and face recognition. Image matching is applied to optical flow estimation from image sequences. For the face recognition application I consider both global appearance models, based on either the correlation or the covariance of training sets, as well as semantic representations of biometric features. The proposed methodology is shown to be robust in face classification applications when considering image corruption by various noise statistics.


  • VILSS: Ultrasound imaging and inverse problems - 17,06,2016

    Denis Kouame, Universite Paul Sabatier Toulouse

    Denis KouameAmong all the medical imaging modalities, ultrasound imaging is the most widely used, due to its safety, cost-effectiveness, flexibility and real-time nature. However, compared to other medical imaging modalities such as Magnetic Resonance Imaging (MRI), or Computed Tomography (CT), ultrasound images suffers from the presence of speckle and have low-resolution in most standard applications. Although most manufacturers of ultrasound scanners have developed many device-based-routines in order to overcome these issues, many challenges in terms of signal and image processing remain. In this tutorial, we will review the basics and advanced ultrasound imaging, then we will focus on the current signal and image processing challenges, and show some recent results.


  • VILSS: Global description of images. Application to robot mapping and localisation - 15,06,2016

    Luis Payá, Miguel Hernández University, Spain

    Luis PayaNowadays, the design of fully autonomous mobile robots is a key discipline. Building a robust model of the unknown environment is an important ability the robot must develop. Using this model, the robot must be able to estimate its current position and to navigate to the target points. The use of omnidirectional vision sensors is usual to solve these tasks. When using this source of information, the robot must extract relevant information from the scenes both to build the model and to estimate its position. The possible frameworks include the classical approach of extracting and describing local features or working with the global appearance of the scenes, which has emerged as a conceptually simple and robust solution. In this talk, the role of global-appearance techniques in robot mapping and localization is analysed.



  • VILSS: Joint Tracking and Event Analysis for Carried Object Detection - 17,11,2015

    Aryana Tavanai, University of Leeds

    Mengyang YuTracking and Event Analysis are areas of video analysis which have great importance in robotics applications and automated surveillance. Although they have been greatly studied individually, there has been little work on performing them jointly where they mutually influence and improve each other. In this talk I will present our novel approach for jointly estimating the track of a moving object and recognising the events in which it participates. First, I will introduce our geometric carried object detector. Then I will present our tracklet building approach which enforces spatial consistency between the carried objects and other pre-tracked entities in the scene. Finally, I will present our joint tracking and event analysis framework posed as maximisation of a posterior probability defined over event sequences and temporally-disjoint subsets of tracklets. We evaluate our approach using tracklets from three state of the art trackers and demonstrate improved tracking performance in each case, as a result of jointly incorporating events, while also subsequently improving event recognition.


  • VILSS: Discriminative Feature Learning for Large-scale Data - 09,11,2015

    Mengyang Yu, Northumbria University

    Mengyang YuComputation on large-scale data spaces has been involved in many active problems in computer vision and pattern recognition. However, in realistic applications, most existing algorithms are heavily restricted by the huge number and the high dimension of feature descriptors in data spaces. Generally speaking, there are two main ways to speed up the algorithms: (1) projecting features onto a lower-dimensional subspace; (2) embedding features into a Hamming space. In this talk, I will present our recent work on the dimensionality reduction and the binarization of features for various applications. First, I will show a novel subspace learning algorithm which realizes the discriminant analysis for large-scale local feature descriptors, and a generalized orthogonalization method leading to a more compact and less redundant subspace. Next, local feature based hashing for similarity search will be introduced. Most existing hashing methods for image search and retrieval are based on global representations, e.g., Fisher vectors and VLAD, which lack the analysis of the intrinsic geometric property of local features and heavily limit the effectiveness of the hash code. Finally, I will present how to efficiently reduce very high-dimensional representations to medium-dimensional binary codes with a small memory cost and the low coding complexity.