Title: Coalescing Data Repair: How to recover from losing your sensor data?
With the emergence of the Internet of Things (IoT), time series streams have become ubiquitous in our daily life. Recording
such data is rarely a neat process, as sensor failures, power outages, and transmission problems frequently occur, yielding
occasional blocks of data that go missing in one or multiple time series. These blocks can be large, as it can take arbitrarily long
time to fix a faulty sensor. Data management systems assume no such gaps exist in the data. Even if a system can work with
incomplete data (e.g., NULLs in databases), leaving missing values untreated can cause incorrect or ill-defined results.
Missing values can also hinder downstream applications such as classification.
In this talk, I will introduce the problem of missing values in time series data and discuss why traditional
statistical solutions are ill-suited to repair incomplete time series. Next, I will present different batch, streaming,
and in-Database techniques to recover large missing blocks in time series. I will also describe our ImputeBench benchmark, the most
comprehensive benchmark to date for missing values recovery techniques. Finally, I will discuss our solutions to curate and
repair other types of time series inconsistencies such as anomalies or outliers.
is a Senior researcher at the Department of Computer Science of the University of Fribourg, Switzerland.
He obtained his PhD from the University of Zurich, Switzerland, under the supervision of Prof. Michael Böhlen.
His research interests include Time Series analytics, data repair and temporal data storage. His time series benchmark won
the VLDB 2020 Most Reproducible Paper Award. He served as a program committee member in many conferences including
VLDB, ICDE, and WWW, and as a regular reviewer in many journals including VLDB journal and TKDE.
He also served as a Senior PC for the 29TH ACM International Conference on Information and Knowledge Management (CIKM)
and as a mentor for PhD students in many conferences such as EDBT 2013 or CIKM 2020.