# Horizon Maths 2018: Artificial Intelligence

**Horizon Maths** is going to take place on **Friday November 23 ^{rd} 2018** in

**Ecole Normale Supérieure**(29 rue d'Ulm, Paris 5e),

**room Jean Jaurès**. The subject of the conference is

**Artificial Intelligence**. The organizers are

**Francis Bach**(Inria, ENS),

**Gabriel Peyré**(CNRS, ENS) and

**Cordelia Schmid**(Inria).

### Speakers

- Florence d'Alché-Buc (Telecom Paristech)
- Alexandre Allauzen (Université Paris-Sud)
- Marco Baroni (Facebook)
- Rémi Munos (Deepmind)
- Naila Murray (Naver)
- Patrick Perez (Valeo)
- Lorenzo Rosasco (Genoa University)
- Joseph Salmon (Université de Montpellier)

### Program

09:00-09:30 Accueil et allocutions de bienvenue

09:30-10:05 **NLP for all languages: some challenges in machine learning**,** ****Alexandre Allauzen** (Université Paris-Sud)

10:05-10:40 **Compositional generalization biases in artificial neural networks and natural human beings**, **Marco Baroni** (Facebook)

10:40-11:10 Pause café

11:10-11:45 **Interferences in Match Kernels**, **Naila Murray** (Naver)

11:45-12:20 **Fast neural solvers**, **Patrick Perez** (Valeo)

12:20-14:00 Déjeuner

14:00-14:35 **Celer: a Fast Solver for the Lasso with Dual Extrapolation**,** ****Joseph Salmon** (Université de Montpellier)

14:35-15:10 **Unconventional regularization for efficient machine learning**,** ****Lorenzo Rosasco** (Genoa University)

15:10-15:40 Pause café

15:40-16:10 **Distributional reinforcement learning**, **Rémi Munos** (Deepmind)

16:10-16:45 **Auto-encoding any data with Kernel Auto-Encoder**,** ****Florence d'Alché-Buc** (Telecom Paristech)

### Abstracts and videos

**Welcome addresses from Francis Bach (Inria, ENS), organizer of the conference, and Gabrielle Costa de Beauregard, representing Région Île-de-France.**

Welcome addresses, Francis Bach (Inria, ENS) and Gabrielle Costa de Beauregard (Région Île-de-France) from Contact FSMP on Vimeo.

**NLP for all languages: some challenges in machine learning****, Alexandre Allauzen (Université Paris-Sud)**

In the last decades, statistical models and deep-learning approaches has renewed the research in Natural Language Processing (NLP) and many applications are now widely used in our everyday life. Their success relies on the availability of large (annotated) corpora tailored to build robust and useful software. However, for the vast majority of languages around the world, the access to such linguistic resources is uneven and patchy. Moreover, the wide linguistic diversity across languages implies challenges for research in machine learning. This presentation will focus on two of them: the large vocabulary challenge for neural language models; and bayesian approach for unsupervised natural language documentation.

NLP for all languages: some challenges in machine learning, Alexandre Allauzen (Université Paris-Sud) from Contact FSMP on Vimeo.

*Compositional generalization biases in artificial neural networks and natural human beings*, Marco Baroni (Facebook)

In the last decade, "deep" artificial neural networks have led to astonishing empirical progress in tasks that require considerable generalization skills. Recurrent neural networks trained to translate from one language to the other must deal almost exclusively with sentences they have not been exposed to in training. A network playing Go against a human master must handle board configurations that it has never seen before. We must conclude that neural networks posses compositional abilities: They are able to combine knowledge they have previously acquired in novel ways, in order to solve new problems. However, more direct inquiries into how neural networks handle explicit compositional problems suggest that they do not efficiently discover the expected combinatorial strategies. For example, our recent experiments show that a network trained to execute instructions such as "run", "run twice" and "jump" is not able to generalize to "jump twice". In this talk, I will survey our experiments probing the compositional generalization abilities of neural networks, and report ongoing work in which we test human subjects in comparable tasks. I will conclude with some conjectures about which priors emerging from the human data might serve as inspiration, if we want to instill more systematic compositional capabilities into artificial neural networks.

(Joint work with Brenden Lake, Joao Loula and Tal Linzen.)

Compositional generalization biases in artificial neural networks and natural human beings, Marco Baroni (Facebook) from Contact FSMP on Vimeo.

*Interferences in Match Kernels*, Naila Murray (Naver)

We consider the design of an image representation that embeds and aggregates a set of local descriptors into a single vector. Popular representations of this kind include the bag-of-visual-words, the Fisher vector and the VLAD. When two such image representations are compared with the dot-product, the image-to-image similarity can be interpreted as a match kernel. In match kernels, one has to deal with interference, i.e. with the fact that even if two descriptors are unrelated, their matching score may contribute to the overall similarity. We formalise this problem and propose two related solutions, both aimed at equalising the individual contributions of the local descriptors in the final representation. These methods modify the aggregation stage by including a set of per-descriptor weights. They differ by the objective function that is optimised to compute those weights. The first is a â€œdemocratisationâ€ strategy that aims at equalising the relative importance of each descriptor in the set comparison metric. The second one involves equalising the match of a single descriptor to the aggregated vector. These concurrent methods give a substantial performance boost over standard aggregation methods, as demonstrated by our experiments on standard public image retrieval benchmarks.

Interferences in Match Kernels, Naila Murray (Naver) from Contact FSMP on Vimeo.

*Fast neural solvers*, Patrick Perez (Valeo)

Modern artificial neural networks dominate classic machine learning tasks, classification and regression alike, in a wide range of application domains. What is probably less known is that they also offer new ways to attack certain optimization problems, such as inverse problems arising in physics or image processing. While a variety of powerful iterative solvers usually exist for such problems, deep learning may offer an appealing alternative: With or without supervision, neural networks can be trained to produce approximate solutions, possibly of lower quality, but orders of magnitude faster and with no need for initialization. We shall discuss different ways to design and train such fast neural solvers, with examples from computer vision and graphics.

Fast neural solvers, Patrick Perez (Valeo) from Contact FSMP on Vimeo.

*Celer: a Fast Solver for the Lasso with Dual Extrapolation*, Joseph Salmon (Université de Montpellier)

Convex sparsity-inducing regularizations are ubiquitous in high-dimensional machine learning, but solving the resulting optimization problems can be slow. To accelerate solvers, state-of-the-art approaches consist in reducing the size of the optimization problem at hand. In the context of regression, this can be achieved either by discarding irrelevant features (screening techniques) or by prioritizing features likely to be included in the support of the solution (working set techniques). Duality comes into play at several steps in these techniques. Here, we propose an extrapolation technique starting from a sequence of iterates in the dual that leads to the construction of improved dual points. This enables a tighter control of optimality as used in stopping criterion, as well as better screening performance of Gap Safe rules. Finally, we propose a working set strategy based on an aggressive use of Gap Safe screening rules. Thanks to our new dual point construction, we show significant computational speedups on multiple real-world problems.

(This is joint work with M. Massias and A. Gramfort.)

Celer: a Fast Solver for the Lasso with Dual Extrapolation, Joseph Salmon (Université de Montpellier) from Contact FSMP on Vimeo.

*Unconventional regularization for efficient machine learning*, Lorenzo Rosasco (Genoa University)

Classic algorithm design is based on penalizing or imposing explicit constraints to an empirical objective function, which is eventually optimized. In practice however, a number of algorithmic solutions are employed. Their effect on final performance is hard to assess a priori and typically done empirically. In this talk, we consider a linear least squares framework and take a regularization perspective to understand the effect of two commonly used ideas: sketching and iterative optimization. Our analysis highlights the role and the interplay of different algorithmic choices, including training time, step and mini-batch size, and the choice of sketching, among others. Indeed, one can view all these choices as controlling a form of ``algorithmic regularization''. The obtained results provides practical guidelines to algorithm design and suggest optimal statistical accuracy can be achieved while dramatically improving computational efficiency. Theoretical findings are illustrated in the context of large scale kernel methods, where we develop the first solvers able to scale to millions of training points.

Unconventional regularization for efficient machine learning, Lorenzo Rosasco (Genoa University) from Contact FSMP on Vimeo.

*Distributional reinforcement learning*, Rémi Munos (Deepmind)

I'll talk about recent work related to distributional reinforcement learning where one model the full return distribution instead of its expectation. We generalize Bellman equations to this setting and describe a deep-learning approach for approximating the distributions. We report experiments on Atari games.

Distributional reinforcement learning, Rémi Munos (Deepmind) from Contact FSMP on Vimeo.

*Auto-encoding any data with Kernel Auto-Encoder*, Florence d'Alché-Buc (Telecom Paristech)

This paper investigates a novel algorithmic approach to data representation based on kernel methods. Assuming that the observations lie in a Hilbert space X, the introduced Kernel Autoencoder (KAE) is the composition of mappings from vector-valued Reproducing Kernel Hilbert Spaces (vv-RKHSs) that minimizes the expected reconstruction error.Beyond a first extension of the auto-encoding scheme to possibly infinite dimensional Hilbert spaces, KAE further allows to autoencode any kind of data by choosing $\mathcal{X}$ to be itself a RKHS. A theoretical analysis of the model is carried out, providing a generalization bound, and sheding light on its connection with Kernel Principal Component Analysis. The proposed algorithms are then detailed at length: they crucially rely on the form taken by the minimizers, revealed by a dedicated Representer Theorem. Finally, numerical experiments on both simulated data and real labeled graphs (molecules) provide empirical evidence of the KAE performances.

(Joint work with Pierre Laforgue and Stephan Clémençon.)

Auto-encoding any data with Kernel Auto-Encoder, Florence d'Alché-Buc (Telecom Paristech) from Contact FSMP on Vimeo.