# Wed. June 21, 2023

## 1:30pm - 2:00pm Registration

## 2:00pm - 2:15pm Welcome to FBK (Paolo Traverso)

## 2:15pm - 2:30pm Introduction to the Workshop (Pierre Baldi)

## 2:30pm - 3:30pm Technical Session (Chair: Pierre Baldi)

2:30 pm

Speaker: Piero Fariselli (Università di Torino)

3:00pm

Speaker: Masashi Sugiyama (RIKEN/The University of Tokyo)

Supervised learning from noisy output is one of the classical problems in machine learning. While this task is relatively straightforward in regression since independent additive noise cancels with big data, classification from noisy labels is still a challenging research topic. Recently, it has been shown that when the noise transition matrix which specifies the label flipping probability is available, the bias caused by label noise can be canceled by appropriately correcting the loss function. However, when the noise transition matrix is unknown, which is often the case in practice, its estimation only from noisy labels is not straightforward due to its non-identifiability. In this talk, I will give an overview of recent advances in classification from noisy labels, including joint estimation of the noise transition matrix and a classifier, analysis of identifiability conditions, and extension to instance-dependent noise.

## 3:30pm - 4:00pm Coffee Break

## 4:00pm - 6:00pm Technical Session (Chair: Barbara Hammer)

4:00pm

### Title: The Science of Generative AI: Focus on Reasoning

Speaker: Dan Roth (University of Pennsylvania and AWS AI Labs)

The rapid progress made over the last few years in generating linguistically coherent natural language has blurred, in the mind of many people, the difference between natural language generation, understanding, and the ability to reason with respect to world. Nevertheless, robust support of high-level decisions that depend on natural language understanding, and one that requires dealing with “truthfulness” are still beyond our capabilities, partly since most of these tasks are very sparse, often require grounding, and may depend on new types of supervision signals.

I will discuss some of the challenges underlying reasoning – and present some of our work in this space, focusing on supporting task decomposition and planning.

4:30pm

Speaker: Stefano Soatto (UCLA)

I will present a view of large language models (LLMs) as stochastic dynamical systems, for which the notion of controllability is well established. From this view, it is easy to see that the ``state of mind'' of an LLM can be easily steered by a suitable choice of input, given enough time and memory. However, the space of interest for an LLM is not that of words, but rather the set of ``meanings'' expressible as sentences that a human could have spoken, and would understand. Unfortunately, unlike controllability, the notions of ``meaning'' and ``understanding'' are not usually formalized in a way that is relatable to LLMs in use today.

I will propose a simplistic definition of meaning that is compatible with at least some theories found in Epistemology, and relate it to functional characteristics of a trained LLM. Then, I will describe both necessary and sufficient conditions for controllability in the space of meanings. I will show that a well-trained LLM establishes a topology and geometry in the space of meanings, whose embedding space is Euclidean and has words (tokens) as coordinate axes. In this space, meanings are not sentences (trajectories), but equivalence classes of trajectories.

I will then argue that meaning attribution requires an external grounding mechanism, and relate LLMs with models of the physical scene inferred from images. There, I will highlight the analogy between meanings inferred from sequences of words, and the ``physical scene'' inferred from collections of images. But while the entity that generates textual meanings (the human brain) is not accessible for experimentation, the physical scene can be probed and falsified.

5:00pm

### Title: Can deep novelty detection help atrial fibrillation therapy?

Speaker: Paolo Frasconi (Università di Firenze)

Atrioventricular junction catheter ablation is a recommended therapy for a class of drug-refractory atrial fibrillation patients. However, the success rate of this intervention is not yet satisfactory and a better identification of ablative targets may help to improve current therapeutic strategies. We conjecture that non-physiologic atrial electrograms may be identified by novelty detection. We compare several deep anomaly detection algorithms against currently recognized indicators such as fractionated or low-voltage EGMs. Our preliminary results show that anomaly strongly correlates with existing indicators but also reveal novel potential ablation candidates.

5:30pm

### Title: AI and medicine

Speaker: Pietro Liò (University of Cambridge, UK)

In this talk I will focus on how to build a digital patient twin using graph and hypergraph representation learning and considering physiological (cardiovascular), clinical (inflammation) and molecular variables (multi omics and genetics). I will discuss how this approach could also keep the clinicians in the loop to avoid excessive automatisation using logic and explainer frameworks.

# Thu. June 22, 2023

## 9:00am - 10:30am Technical Session (Chair: Alessandro Sperduti)

9.00am

9.30am

Speaker: Joachim M. Buhmann (ETH Zürich)

Data Science (DS) algorithms interpret outcomes of empirical experiments with random influences. The input to these algorithms are realizations of random variables and, consequently, DS algorithms necessarily return random variables, even for a deterministic sequence of computations. The uncertainty in the input causes a rate distortion tradeoff in the output when the DS is adapted by learning. We discuss consequences of this setting and present design choices for algorithm validation.

10.00am

Speaker: Marco Gori (University of Siena)

In this talk we propose an orthogonal research direction where we expect that perceptual cognitive skills (e.g. in language, vision, and control) can emerge simply by environmental interactions without needing to store and properly organize big data collections. The proposed approach relies on moving the framework of statistical machine learning to that of learning over time by solving optimization problems similar to those that are at the basis of laws in Physics. It turns out that the longstanding discussion on the biological plausibility of Backpropagation (BP) finds its natural evolution in terms of diffusion processes. This interpretation goes beyond the peaceful interlude where BP is defined in the framework of algorithms for efficient gradient computation. We show that in the proposed framework learning processes arise from the forward (on-line) solution of classic variational problems and provide preliminary experimental evidence on the effectiveness of the theory.

## 10:30am - 11:00am Coffee Break

## 11:00pm - 12:30am Technical Session (Chair: Cesare Alippi)

11.00am

This talk is based on joint work (arXiv:2212.14457) with Alexander Zlokapa, which gives exact non-asymptotic formulas for Bayesian posteriors in deep linear networks. After providing some general motivation, I will focus on explaining results of two kinds. First, I will state a precise result showing that infinitely deep linear networks compute optimal posteriors starting from universal, data-agnostic priors. Second, I will explain how a novel scaling parameter -- given by # data * depth / width -- controls the effective depth and complexity of the posterior.

11.30am

### Title: Compositional Sparsity as the key to (Deep) Learning

Speaker: Tomaso Poggio (MIT)

12.00am

### Title: An optimal transport and information geometric framework for Gaussian processes

Speaker: Minh Ha Qang (RIKEN)

Information geometry (IG) and Optimal transport (OT) have been attracting much research attention in various fields, in particular machine learning and statistics. In this talk, we present results on the generalization of IG and OT distances for finite-dimensional Gaussian measures to the setting of infinite-dimensional Gaussian measures and Gaussian processes. Our focus is on the Entropic Regularization of the 2-Wasserstein distance and the generalization of the Fisher-Rao distance and related quantities. In both settings, regularization leads to many desirable theoretical properties, including in particular dimension-independent convergence and sample complexity. The mathematical formulation involves the interplay of IG and OT with Gaussian processes and the methodology of reproducing kernel Hilbert spaces (RKHS). All of the presented formulations admit closed form expressions that can be efficiently computed and applied practically. The mathematical formulations will be illustrated with numerical experiments on Gaussian processes.

## 12:30am - 2:00pm Lunch Break

## 2:00pm - 3:30pm Technical Session (Chair: Davide Bacciu)

2:00pm

Speaker: Cesare Alippi (Politecnico di Milano / Università Svizzera Italiana)

The well-known Kalman filters model dynamical systems by relying on state-space representations with the next state updated, and its uncertainty controlled, by fresh information associated with newly observed system outputs. We derive Kalman and extended Kalman filters to discrete-time settings where inputs, states, and outputs are represented as attributed graphs whose topology and attributes can change with time. The setup allows us to adapt the framework to cases where the output is a vector or a scalar too (node/graph level tasks). Within the proposed theoretical framework, the unknown state-transition and the readout functions are learned end-to-end along with the downstream prediction task.

2:30pm

Although the investigation on learning in structured domains started in the late 90’s, recently deep learning for graphs has attracted tremendous research interest and increasing attention for applications, also posing new challenges to the field of deep learning (DL).

The talk will briefly introduce the area of DL for graphs. We will also discuss advanced topics and current open issues, including examples of recent progresses of my research group, with an emphasis on the efficiency issue and on the interplay between depth of the models and complex data representation learning.

3:00pm

### Title: A Unified Framework for Backpropagation-Free Soft and Hard Gated Graph Neural Networks

Speaker: Alessandro Sperduti (University of Padua)

We introduce a framework for the definition of neural models for graphs that do not rely on back-propagation for training.

Our proposed framework is inspired by Gated Linear Networks and allows the adoption of multiple graph convolutions.

Specifically, each neuron is defined as a set of graph convolution filters (weight vectors) and a gating mechanism that, given a node and its topological context, generates the weight vector to use for processing the node's attributes. Two different graph processing schemes are studied, i.e., a message-passing aggregation scheme where the gating mechanism is embedded directly into the graph convolution, and a multi-resolution one where neighbouring nodes at different topological distances are jointly processed by a single graph convolution layer.

We also compare the effectiveness of different alternatives for defining the context function of a node, i.e., based on hyper-planes or on prototypes, and using a soft or hard gating mechanism.

We propose a unified theoretical framework allowing us to theoretically characterise the proposed models' expressiveness.

## 3:30pm - 4:00pm Coffee Break

## 4:00pm - 5:30pm Technical Session (Chair: Alessio Micheli)

4:00pm

### Title: The Quest for How the Brain Learns

Speaker: Pierre Baldi (University of California, Irvine)

We will first showcase two cutting-edge applications of modern Artificial Intelligence (AI) I in biomedical imaging and in reasoning. These applications, as well as most modern AI applications (e.g., ChatGPT, AlphaFold, AlphaGO, Google Translate, Self-Driving Cars) are based on deep learning, a modern rebranding of neural networks, dating back to the 1980s, or even the 1950s. We will then briefly review the neuroscience-inspired, tortuous, historical path that has led to deep learning, and the key discoveries made along the way, highlighting the synergies and discrepancies between neuroscience and deep learning. One key conclusion is that approximate gradient descent is essential for learning. However, the standard gradient descent algorithm of deep learning is not biologically-plausible for multiple reasons. We will examine these reasons one-by-one and identify biologically-plausible solutions for each one of them. In particular, we will introduce and demonstrate a general class of neural architectures and learning algorithms capable of learning from data in a largely unsupervised and asynchronous manner, without the need for symmetric connectivity.

4:30pm

Deep networks and other machine learning models can give rise to peculiar behavior such as adversarial examples or unfair decisions. These are caused by the fact that most systems rely on correlations rather than causalities and their functionality is implemented as black-box mechanism. In the recent years, a huge number of technologies has been presented which try to open such black boxes and enrich the technologies with elements which enable lay persons to inspect model decisions.

Within the talk, I will have a glimpse at a few recent methods and developments in this context which adress different modelities for explanation: How to enrich deep models by global visualization tools to inspect their decision boundary? How to compute feature relevances for decisions in the case of feature interactions? How to explain decisions in terms of exemplars in meaningful way?

5:00pm

Speaker: Robert Jenssen (UiT The Arctic University of Norway)

This talk will illustrate some new approaches to representation learning where information theoretic criteria play a key role. Particular focus is placed on criteria based on kernel functions both in the context of embedding to the sphere and for a graph-based information bottleneck approach. The talk then turns to feature interpretability, introducing the first representation learning interpretability method which we call “Relax”.

# Fri. June 23, 2023

## 9:00am - 10:30am Technical Session (Chair: Tomaso Poggio)

9:00am

### Title: Modular Continual Learning

Speaker: Marc'Aurelio Ranzato (Google)

9:30am

Most current AI applications (e.g., image classification or more in general perception tasks) are based on deep-learning models that are trained on datasets available before deployment. However, in many cases, e.g., in agents that operate in real-world open-ended dynamic environments, it is unrealistic to assume that all the training datasets are available at once for all possible configurations of all potential environments. AI agents should autonomously learn/adapt/extend their deep learning models online. They should continuously collect training data by exploring and interacting with the environment anduse them to regularly re-train/fine-tune their deep learning models. In this talk, I will discuss how symbolic planning techniques generate a strategy for automating the process of collecting training data and using them to train agents’ deep learning models. Moreover, I will show how an agent can automatically evaluate the quality of the prediction of a deep neural model by identifying, via clustering, which are the circumstances in which the model's predictions are correct with a certain level of confidence.

10:00am

Automated Machine Learning (AutoML) and Continual Learning have become intensely active research areas in the last years. Interestingly, very little work has been done at the intersection of these fields. It is, however, a very promising area that may resolve some of the most significant limitations of state-of-the-art approaches. Continual learning, for instance, usually assumes that the model architecture remains largely fixed, which allows fast learning from little data but limits the range of tasks it can adapt to. It doesn't explore how this architecture was created or how it could be further adapted to solve different new tasks. AutoML, on the other hand, generally assumes that models have to be created from scratch with every new task, which can be very inefficient and unnatural. The combination of these fields could allow AutoML methods to become much faster by starting from previously trained models, while enabling Continual Learning techniques to adapt their architecture to potentially address any new task. In this talk, we will explore the opportunities that lie in this mostly unexplored area and report on our early endeavors.

## 10:30am - 11:00am Coffee Break

## 11:00am - 1:00pm Technical Session (Chair: Marc'Aurelio Ranzato)

11:00am

Speaker: Battista Biggio (University of Cagliari)

AI security testing has become crucial due to the widespread use of AI-based solutions in safety-critical and security-sensitive systems, and to comply with the emerging regulations. This talk aims to shed light on how to develop practical approaches to evaluate AI security properties in a realistic, pragmatic manner. I will discuss the emerging challenges and opportunities in AI security testing, along with the state-of-the-art techniques developed to date. The presentation will describe several security testing methods and explain the related challenges they pose. It will also quickly touch upon novel approaches derived from adversarial machine learning that can help overcome the existing AI security limitations.

11:30am

### Title: Some considerations on intelligent machines

Speaker: Nello Cristianini (University of Bath)

Artificial Intelligence has not emerged in 2022 with chatGPT, but is the result of a series of choices that we have made over the past decades. Reviewing these choices helps us to understand the trajectory of this field, and imagine its future.

12:00am

We investigate a sup-norm convergence of estimators using deep neural networks in a nonparametric regression problem. Numerous studies have clarified several properties of deep neural networks in this regression problem, however there is a limited choice of norm to evaluate their estimation error. This suggests local instability of neural networks in practical applications of deep learning. In this study, we study an estimator by neural networks using the adversarial training framework. As a result, we show that the ordinary adversarial training leads to inconsistency in the regression problems, and that correcting for its bias allows us to construct a neural network estimator that converge in the sup-norm sense.

12:30am

### Title: A Theory for Emergence of Complex Skills in Language Models

Speaker: Anirudh Goyal (University of Montreal)

Emergence of complex skills as a result of scaling up LLMs is a major phenomenon (albeit with no clear definition) for which we lack scientific understanding. Mathematical analysis of gradient-based training seems forbiddingly difficult, We ask: What understanding is possible using existing notions? Using a new statistical framework of language skills together with Scaling Laws we give an analysis of emergence. It highlights the powerful inductive bias implicit in Scaling Laws. One consequence is a surprising phenomenon: competence at k-tuples of skills is acquired almost as easily as competence on individual skills, despite examples of most k-tuples being missing in the training data --- this relates to Chomsky’s old point about “Paucity of the stimulus” in human learning.