man riding a fish w/ diagram of a neural network next to him

Generalised policies for probabilistic planning

A generalised policy for a family of planning problems is a function which can suggest an appropriate action to take for any state of any problem from that family. For instance, if one wants to solve a series of slightly different truck routing problems (e.g. with different numbers of trucks or different road networks), then it might be possible to obtain a single generalised policy which solves all possible truck routing problems, rather than obtaining a separate policy for each problem. I showed that it's possible to use deep learning—in the form of specially-structured neural networks—to acquire effective generalised policies for PPDDL-style discrete probabilistic planning. This strategy allows groups of similar problems to be solved much faster than they could be by obtaining a separate policy for each problem in the group. This idea was explored in depth in a AAAI'18 paper, and in my honours (undergraduate) thesis:

S Toyer, F Trevizan, S Thiébaux, L Xie. “Action Schema Networks: Generalised Policies with Deep Learning”. AAAI. 2018.

S Toyer. “Generalized Policies for Probabilistic Planning with Deep Learning”. Honours thesis, ANU. 2017.

Links: [AAAI'18 paper], [Thesis], [Github]

block diagram extract showing 'poses' being fed into 'deep Markov model'

Pose forecasting

Given a sequence of observed locations for a person's joints, pose forecasting is the task of predicting where each of those joints will move in the future (e.g. over 500ms to 5s). Pose forecasting has potential applications in collaborative robotics, as it can be helpful for a robot to anticipate a person's movement in order to coordinate with them or avoid colliding with them. In our DICTA paper, we proposed a novel pose forecasting method which used recurrent variational autoencoders to model a distribution over future poses. In contrast, previous methods could predict only a single trajectory, and thus did not adequately account for the uncertainty inherent to predicting human motion. Our DICTA paper also introduces Ikea Furniture Assembly—a new dataset for pose forecasting, action anticipation, and related tasks.

S Toyer, A Cherian, T Han, S Gould. “Human Pose Forecasting via Deep Markov Models”. DICTA. 2017.

Links: [DICTA paper], [Github (baselines)], [Github (DMM code)], [Ikea FA dataset]

Landsat scene depicting Canberra; a few visible bands have been picked out for their colour

Coverage data on the semantic web

Geoscientists often rely on data gathered from a range of satellites operated by many different agencies and countries—for instance, it's possible to make use data from Japan's Himawari satellite series, the United States' Landsat program, and the EU's Sentinel missions. However, geospatial information is currently presented in a raft of complex (and sometimes proprietary) formats, thereby making it difficult to combine the many available data sources. Many geoscientists are thus excited about the semantic web, which is a set of interoperable standards for publishing machine-readable data on the web.

I spent two semesters working alongside a student team on methods for publishing coverage data—that is, data with both a spatial and temporal dimension—as linked data, with a particular focus on satellite imagery. My contribution was which a server which allows users to retrieve satellite imagery as linked data in response to queries expressed in a limited dialect of SPARQL. We later presented our work in a Note (W3C equivalent of a technical report) published by the W3C/OGC Spatial Data on the Web Working Group:

D Brizhinev, S Toyer, K Taylor. “Publishing and Using Earth Observation Data with the RDF Data Cube and the Discrete Global Grid System”. W3C Working Group Note. 2017.

Links: [W3C Note], [AGU extended abstract], [Github]

Human pose estimation from video

Given an image of a person, human pose estimation is the task of extracting the (image) coordinates of a subset of that person's joints. Microsoft's Kinect sensor, which estimates pose from RGB images and depth maps, is perhaps the best-known example of a pose estimation system. However, it's also possible to effectively estimate human pose from RGB video alone.

tiny excerpt from a figure depicting images and optical flow going into a neural network, and pairs of poses coming out the other side


Human pose estimation from still images is often accomplished with tree-structured undirected graphical models in which each vertex represents the position of a joint. However, such graphical models are challenging to use for video pose estimation because introduction of connections across time tends to introduce cycles in the graph, thereby making inference intractable.Biposelets, which I introduce in the project report below, are a novel method for video pose estimation with graphical models in which each vertex represents the positions of a single joint across two consecutive frames. This approach can capture temporal relationships between joint positions, but still allows for tractable inference.

Links: [Project report], [Github]

three video frames showing a person lifting themselves out of a wheelchair; each frame is annotated with a pose

CNNs with temporal smoothing

An earlier project, described in the report below, explored how an approximation algorithm could be used to add temporal connections to a graphical-model-based pose estimation system for single images. The project later lead to the proposal of biposelets, as described above.

Links: [Project report], [Github]