Autonomous Systems and Robotics

Computer Vision for Pixel Processor Arrays

In the EPSRC funded project Agile we have developed novel image plane sensor-processor systems, associated high level vision algorithms, and integrated them within a control-aware architecture for autonomous robotic systems, in particular Micro Aerial Vehicles (MAVs).

We use the SCAMP Vision System that performs general purpose computer vision algorithms at several thousands of frames per second. Examples of some of these algorithms can been seen on the Demonstrators page.

Micro Aerial Vehicle platforms are being developed to take advantage of the high rate sensing and processing made available by SCAMP. With the high bandwidth, high level information from SCAMP the MAV can perceive relevant features in its environment to perform agile manoeuvres for tasks such as tracking, navigating and obstacle avoidance.

The AGILE project aims to provide samples of the system for evaluation to the community. Watch this space for more details.

Main website for images and other materials-

Contributions include:

Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays
[pdf] [video]

High-speed Light-weight CNN Inference via Strided Convolutions on a Pixel Processor Array
[pdf] [video]

A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays
[pdf] [video]

Handheld Robotics and Vision

Handheld robots, aim to share physical proximity with users but are neither fully independent as is a humanoid robot nor are part of the user’s body, as are exoskeletons. Handheld robots should be cognitive and cooperative.  The aim is to capitalize on exploiting the intuitiveness of use of traditional handheld tools while adding embedded intelligence and action to open up new capabilities.

For further information, please see:

Contributions include:

Rebellion and Obedience: The Effects of Intention Prediction in Cooperative Handheld Robots
[pdf] [video]

I Can See Your Aim: Estimating User Attention from Gaze for Handheld Robot Collaboration
[pdf] [video]

Understanding and determining skill

In VI-Lab we have developed new models to determine relative skill from long videos, through learnable temporal attention modules. Skill determination is formulated as a ranking problem, making it suitable for common and generic tasks. However, for long videos, parts of the video are irrelevant for assessing skill, and there may be variability in the skill exhibited throughout a video. We therefore propose methods which assess the relative overall level of skill in a long video by attending to its skill-relevant parts.

Our approach trains temporal attention modules, learned with only video-level supervision, using a novel rank-aware loss function. We also employ two attention modules to separately indicate higher (pros) and lower (cons) skill.

We evaluate our approach on the EPIC-Skills dataset

Recent papers include:

The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos
[pdf] [video]

Who’s Better? Who’s Best? Pairwise Deep Ranking for Skill Determination
[pdf] [video]