MIDI seminars

Next events

March 16, 2023 – Faïcel Chamroukhi (IRT-X, Université de Caen)

Principled and interpretable learning via new mixtures for heterogenous high-dimensional and distributed data.

Modern machine learning algorithms deal with real-world data problems that arise in complex scenarios, including unlabeled heterogenous data, prediction with high-dimensional or functional predictors, and massive or distributed data. In this framework, we present a new family of mixture-of-experts (ME), that enjoy denseness properties and statistical estimation guarantees, and easier interpretation, to deal with heterogenous data with high-dimensional and functional inputs and in a distributed scenario. First, we present ME models with high-dimensional predictors or when the predictors are potentially noisy observations from entire functions, and Lasso-like regularizations with EM-Lasso optimization to provide sparse and interpretable representations. Then, we consider the situation in which the data are potentially massive and may be distributed for computational purposes, or are distributed by nature. We present a distributed learning approach for these ME models and an aggregation strategy based on optimal transport to aggregate local estimators fitted parallelly, and provide a reduced estimator that enjoys statistical guarantees and that is computationally effective.

Bio: Faïcel Chamroukhi is since sept. 2022 scientific responsible of data science and artificial intelligence at IRT SystemX, on a secondment from université de Caen where he is since sept. 2016 professor of statistics and data science. His primary research interests include statistical inference and machine learning, latent variable models and unsupervised learning in high-dimensional and large-scale scenarios.

Past events

February 10, 2023 – Xuan Son Nguyen

Neural networks on matrix manifolds

Data lying on matrix manifolds are commonly encountered in various applied areas such as medical imaging, shape analysis, drone classification, image recognition, human behavior analysis. Due to the non-Euclidean nature of these data, traditional optimization algorithms usually fail to obtain good results in the matrix manifold setting. While a number of approaches has been developed to generalize traditional optimization algorithms to this setting, there is still a lack of works that translate the language of differential geometry to basic operations on matrix manifolds so that they can be used in computational building blocks of neural network models on these manifolds just as basic operations on Euclidean spaces are used in deep neural networks. In this talk, I will present an approach for building neural networks on matrix manifolds based on the theory of gyrogroups and gyrovector spaces.

Slides: https://cloud.etis-lab.fr/index.php/f/10541664

January 19, 2023 – Loïc Jézéquel, Hervé-Madelein Attolou

Loïc Jézéquel: Fraud detection via deep anomaly detection approaches

Detecting observations straying apart from a well-defined normal baseline consistently lies at the center of many modern machine learning challenges. Given the complexity of the anomalous class and the high cost of obtaining labeled anomalies, this task of anomaly detection differs quite a lot from classical binary classification. This accordingly gave birth to many deep anomaly detection methods producing more stable results given an extremely unbalanced training dataset. Deep AD has been successful in various applications such as in fraud detection, medical imaging, video surveillance or visual defect detection.
In this talk, I will be presenting my work on deep anomaly detection with applications on generic one-vs-all image classification and face presentation attack detection (FPAD). First we will focus on the one-class setting where only normal samples are available. Then we will consider the unbalanced setting which provides a few anomalies during training. Finally, we encompass these two settings into a single efficient unified model. We respectively improve the one-vs-all relative error in one-class and unbalanced setting by up to 19% and 28%. Moreover, we also keep an edge on FPAD applications with an error relative improvement of up to 14% on paper prints.

Hervé-Madelein Attolou: Why-Not explanations for recommenders

Recommenders suggest pertinent items to users from a wide variety of possibilities. However, it is crucial for the user and the system developer to understand why the system recommends certain items (why), and why it does not recommend others that he/she might expect (why not). In this thesis, we aim to explore explanations for the why not problem, which is less studied, but equally important.

December 13, 2022 – Guillaume Renton

Réseaux de Neurones en Graphes (GNN).

Abstract: Although theorised about fifteen years ago, the scientific community’s interest for graph neural networks has only really taken off recently. Those models aim to transpose the representation learning capacity inherent in deep neural network onto graph data, via the learning of hidden states associated with the graph nodes. These hidden states are computed and updated according to the information contained in the neighborhoud of each node. In this presentation, we intend to make an overview of the different strategies of GNNs, namely spatial and spectral GNNs, with their advantages and weaknesses. This will lead to different propositions of models, on one side in order to reduce the number of parameters, and on the other side to include the edge attributes.

Bio: Guillaume Renton obtained his PhD from the University of Rouen in 2021. Since 2022, he is an assistant professor in the Multimedia Indexing and Data Integration (MIDI) team of ETIS at ENSEA. His domain of interest is on Machine Learning applied on structured data represented as graphs. A particular interest is given to the analysis and application of deep learning methods dedicated to graphs, called Graph Neural Networks.

Video: https://cloud.etis-lab.fr/index.php/f/9548038

November 25, 2022 – Ioannis Tsamardinos

Automated Machine Learning for Knowledge Discovery

Abstract: Automated Machine Learning, or AutoML, is a newly emerging field in Machine Learning. It promises to automate predictive modeling, democratize machine learning to non-experts, boost the productivity of experts, ensure the statistical validity of the modeling process, and even surpass human experts in quality. AutoML should not only strive to produce a high-quality model, but all information, explanations, interpretations, and decision support a human expert would. In this talk, we’ll present the challenges of AutoML and the design choices we made to construct the Just Add Data Bio, or JADBio for short, AutoML platform. JADBio is particularly suited for very high dimensional data with millions of features, and low-sample datasets that present statistical estimation challenges. Particularly, JADBio focuses on Knowledge Discovery in the form of Feature Selection and identifying one or more minimal-size subsets that lead to the optimal model. Feature Selection is often the primary goal of the analysis as a first step to understanding the causal relations in our data. We’ll also discuss on-going efforts to construct an Automated Causal Discovery engine that strives to take AutoML a step further and return the best possible Causal Model that fits the data.

Short Bio: Ioannis Tsamardinos, Ph.D., is a Professor at the Computer Science Department of the University of Crete, CEO, and co-founder of JADBio (Gnosis Data Analysis PC), a University start-up. He obtained his Ph.D. from the Intelligent Systems Program at the University of Pittsburgh in 2001. Prof. Tsamardinos’ main research directions include machine learning, bioinformatics, and artificial intelligence. More specifically his computer science work emphasizes automated machine learning, feature selection, and causal discovery. Prof. Tsamardinos has over 140 publications in international journals, conferences, and books. Distinctions with colleagues and students a Gold Medal in the Student Paper Competition in MEDINFO 2004, the Outstanding Student Paper Award in AIPS 2000, the NASA Group Achievement Award for participation in the Remote Agent team, and others. Statistics on recognition of work include more than 10000 citations (1000+ a year), and h-index of 40 (as estimated by Google Scholar). Ioannis has been awarded the European and Greek national grants of excellence, the ERC Consolidator, and the ARISTEIA II grants respectively.

Video: https://cloud.etis-lab.fr/index.php/f/9547953

October 13, 2022 – Dominique Laurent

Handling Inconsistencies in Tables with Nulls and Functional Dependencies

Dominique Laurent (ETIS-MIDI) & Nicolas Spyratos (LISN, Paris-Saclay)

Résumé : l’intégration de différentes tables produites par diverses sources de données esune tache de plus en plus fréquente dans les applications actuelles. De plus, ces tables sont très souvent incomplètes et doivent satisfaire des contraintes de clé, ou plus généralement, des dépendances fonctionnelles. Il est bien connu que dans un tel contexte, même si les sources produisent des données satisfaisant les dépendances fonctionnelles, leur intégration qui est en fait l’union des tables produites, ne satisfait pas forcément ces dépendances fonctionnelles.
L’approche proposée se situe dans ce contexte, selon lequel les données se présentent sous la forme d’une table incomplète et potentiellement incohérente par rapport à un ensemble donné de dépendances fonctionnelles. Dans la littérature, de telles tables n’ont pas été étudiées, mais on trouve de nombreux papiers traitant des cas suivants :
(1) La table est incomplète mais satisfait les dépendances fonctionnelles : la sémantique usuelle est celle des instances faibles et l’algorithme associé est connu sous le nom de chase (Universal relation [Fagin, Mendelson, Ullman – TODS 1982]).
(2) La table est complète mais ne satisfait pas les dépendances fonctionnelles (Consistent query answering [Koutris, Wisjen – TODS 2017]).
La sémantique proposée dans notre approche est fortement inspirée de la logique à 4 valeurs de Belnap et étend la sémantique partitionnelle de [Spyratos – TODS 1987]. Nous généralisons ainsi l’algorithme de chase et nous montrons comment une des quatre valeurs de vérité (vrai, faux, inconnu, inconsistant) peut être associée à chaque n-uplet.
L’exposé montrera comment définir et construire une telle sémantique. En se limitant aux n-uplets vrais ou inconsistants, le lien avec les approches ‘consistent query answering’ sera ensuite esquissé.

Ce travail a été publié sous le même titre dans Journal of Intelligent Information Systems

Présentation : https://cloud.etis-lab.fr/index.php/f/8468124

September 29, 2022 – Claudia Paris

Timely monitoring and short-term forecasting of crop phenology integrating Satellite Image Time Series (SITS) and meteorological information

Abstract: Increasingly frequent extreme weather conditions, changes in temperature and precipitation strongly affect agriculture and pose a threat to sustainable food production. How crops are affected by adverse weather conditions strongly depends on the crop’s stage of development. In this context, timely monitoring systems of crop phenology is needed to understand and assess the impact of climate change on crop production. To this end, information provided by long and dense Satellite Image Time Series (SITS) acquired at high spatial (e.g., 10 m) and temporal (e.g., 5 days) resolutions, as well as by the meteorological parameters continuously collected by weather stations (e.g., 15 minutes), can be exploited. Despite recent advances in Earth Observation (EO) data and artificial intelligence (AI), little has been done in investigating the potential of combining detailed meteorological information (e.g., temperature and precipitation) and SITS to forecast and monitor crop phenology. Most studies on crop phenology are site-specific, thus hampering their generalization capability in space and time, mainly because the same crop type can have substantially different phenology in different areas due to different climatic conditions and management decisions. This talk will provide an overview of the limitations, challenges and opportunities of combining EO and meteorological data to bring new insights on the relations between crop plant phenology and extreme weather events.

Bio: Claudia Paris received the B.S. and M.S. (summa cum laude) degrees in telecommunication engineering and the Ph.D. degree in information and communication technology from the University of Trento, Trento, Italy, in 2010, 2012, and 2016, respectively. She is currently Assistant Professor with the Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, the Netherlands. Her main research includes image and signal processing, machine learning and deep learning with applications to remote sensing image analysis and, in particular, on designing novel and automatic methods for large-scale environmental monitoring. Moreover, her research interests are also focused on classification and fusion of multisource remote sensing data (LiDAR data, hyperspectral, multispectral and high resolution optical images), multi-temporal image analysis, domain-adaptation methods, land cover map update and remote sensing single date and time series image classification. She has been conducting research on these topics in the framework of national and international projects. She is a member of the scientific and programme committee of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS) and the SPIE International Symposium on Remote Conferences, respectively, and is also a referee for several international journals. Dr Paris won the prestigious Symposium Prize Paper Award (SPPA) at the International Symposium on Geoscience and Remote Sensing 2016 (Beijing, China, 2016) and the International Symposium on Geoscience and Remote Sensing 2017 (Fort Worth, Texas, USA, 2017).

Presentation: https://cloud.etis-lab.fr/index.php/f/8418331