T16: Latest Advances in Deep Learning for Mutlimodal and Multisensorial Signal Analysis

Monday, 20 July 2020, 13:30 – 17:30
Back to Tutorials' Program


Nicholas Cummins (short bio)

ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany


Björn W. Schuller (short bio)

ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany
GLAM – Group on Language, Audio & Music, Imperial College London, UK



The objective of this tutorial is to overview the latest methods in deep learning for optimal and efficient fusion, processing, analysis, and synthesis of data from multimodal and multisensorial signals for the next generation's intelligent interfaces. Current Human-Computer Interaction is becoming increasingly multimodal and multisensorial with the present increasing spread of everyday speech interaction, video and depth information-based interaction, and even physiological sensor data analysis becoming a daily standard such as in current smart watches next to more traditional and modern haptical interaction. The vast amount of such collected interaction data can currently best be exploited by methods of deep learning.


Content and benefits:

In this tutorial, we will present methods for optimal deep fusion, analysis and synthesis of such data for tomorrow's intelligent interaction. This content will include deep fusion on various early via intermediate to late levels. Further, to overcome one of the primary obstacles in this field, we will introduce participants to methods for coping with asynchronous cross-sensorial and cross-modal data fusion by deep algorithms such as Deep Canonical Warping.

The focus will then shift to unsupervised representation learning, such as in an end-to-end manner from raw sensor signals. This aspect of the tutorial will include the exploitation of convolutional and recurrent network topologies with memory to best handle the typical interaction time series data. Alongside signal-type data, also handling of symbolic information such as text or events are also covered. This section of the tutorial will also cover attention mechanisms, a current state-of-art technique in the deep learning community.

Participants are further introduced to Automatic Machine Learning, allowing for deep networks to self-optimise in such context. Likewise, multimodal and multi sensorial fusion increasingly opens up, also, to the non-expert interface designer, as mainly labelled data is needed to set up a system ready for rich intelligent input and output processing. To allow for mobile-based interactions, we will present methods for model complexity reduction on restricted hardware devices. We will also cover methods for coping with low availability of user data. These include transfer learning and Generative Adversarial Models for the generation of interaction data through deep methods. Finally, we will introudce participants to the lastest advance in Explainable AI technqiues

The tutorial is based on open source toolkits to give the attendee the tools at hand needed to benefit from the above described right away. These include auDeep, End2You, and openXBOW opening up also to the non-python savvy participant. At the same time, experts in deep fusion will find interest in the latest methods presented and outlook given following the general introduction and practical parts.


Target Audience:

The target audience is the broad audience of HCI International. The tutorial introduces general principles of deep learning for a start in an introductory manner and then moves to the recent approaches for fusion by deep learning, synchronization by suited means of deep warping, and analysis and synthesis hence targeting also intermediate to advanced level participants from general HCI, intelligent interaction, to the deep learning and machine learning expert attendees.

Bio Sketches of Presenters:

Nicholas Cummins is a habilitation candidate at the ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg in Germany. Dr Cummins was awarded his PhD in Electrical Engineering from UNSW Australia in February 2016. His is activley involved in the DE-ENIGMA, RADAR-CNS, TAPAS and sustAGE Horizon 2020 projects, in which his roles include contributions towards management of the technical work packages. His current research includes areas of behavioural signal processing with a focus on the automatic multisensory analysis and understanding of different health states. Dr Cummins has been lecturing since autumn 2017, writing and delivering courses in Speech Pathology, Deep Learning and Intelligent Signal Analysis in Medicine. He has (co-)authored over 90 conference and journal papers leading to over 1200 citations (h-index: 20). He is a frequent reviewer for IEEE, ACM and ISCA journals and conferences as well as serving on program and organisational committees. He has given repeated tutorials at leading international conferences such as HCI, IEEE EMBC, Interspeech, and IEEE SAM.

Björn Schuller received the Diploma in 1999, the Doctoral degree in 2006, and the Habilitation and Adjunct Teaching Professorship in the subject area of signal processing and machine intelligence in 2012, all in electrical engineering and information technology from TUM in Munich, Germany. He is Professor of Artificial Intelligence in the Department of Computing, Imperial College London, U.K., and a Full Professor and head of the ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg, Germany as well as co-founding CEO and current CSO of the audio intelligence company audEERING. He (co-)authored five books and more than 800 publications in peer reviewed books, journals, and conference proceedings in the fields of Affective Computing, HCI, and deep learning leading to more than 26000 citations (h-index = 74). He is a Fellow of the IEEE, president-emeritus of the AAAC, and Senior Member of the ACM. His service to the community includes Editor in Chief of the IEEE Transactions on Affective Computing, past and present General Chair and Program Chair of conferences in the field such as ACM ICMI, IEEE ACII, and Interspeech. He has given 15 tutorials up to now at leading conferences such as HCI, ACII, ACM Multimedia, EMBC, ICASSP, IJCAI, Interspeech, SAM, IUI, or UMAP.