Cartesian Pattern Matching

Abstract : Cartesian trees are associated to strings of numbers. They are structured as heap and original strings can be recovered by symmetrical traversal of the trees. Let x be a string of numbers of length m. The Cartesian tree of x is the binary tree where:

– the root corresponds to the index i of the minimal element of x (if there are several occurrences of the minimal element, the leftmost one is chosen);

– the left subtree of the root corresponds to the Cartesian tree of x[1..i-1];

– the right subtree of the root corresponds to the Cartesian tree of x[i+1..m]. Cartesian pattern matching can be applied to find patterns in time series data.

In this talk, we will review the existing Cartesian pattern matching algorithms and describe more in details solutions for the following problems:

– given a text and a pattern that consist of sequences of numbers, find all the substrings of the text that have the same Cartesian tree than the pattern;

– given a text and a finite set of patterns that consist of sequences of numbers, find all the substrings of the text that have the same Cartesian tree than one of the patterns.

– given two strings that consist of sequences of numbers, find the length of the longest substring of both strings that have the same Cartesian tree.

On a Greedy Approach for Genome Scaffolding

Abstract : Scaffolding is a bioinformatics problem aimed at completing the contig assembly process by determining the relative position and orientation of these contigs. It can be seen as a paths and cycles cover problem of a particular graph called the “scaffold graph”. In this talk, we provide some complexity results on this problem. We also adapt a greedy approximation algorithm on complete graphs so that it works on a special class aiming to be close to real instances. The described algorithm is the first polynomial-time approximation algorithm designed for this problem on non-complete graphs.

Graphe universel et représentation implicite pour les graphes planaires

Résumé: Le nombre de graphes planaires à n sommets (non-isomorphes, non étiquetés) est de l’ordre de c(n+o(n)), où c=O(1). Quel est le plus petit graphe qui les contient tous comme sous-graphes induits ? Nous montrons ici qu’il suffit de n(1+o(1)) sommets. Ce résultat, de toute évidence optimal asymptotiquement, repose sur des avancées majeures en théorie structurelle des graphes, dont nous discuterons également.

Fully dynamic longest increasing subsequence

We revisit the problem of maintaining the longest increasing subsequence (LIS) of an array under

(i) inserting an element, and
(ii) deleting an element of an array.

In a recent breakthrough, Mitzenmacher and Seddighin [STOC 2020] designed an algorithm that maintains an O((1/ϵ)^O(1/ϵ))-approximation of LIS under both operations with worst-case update time ~O(n^ϵ), for any constant ϵ>0. We exponentially improve on their result by designing an algorithm that maintains a (1+ϵ)-approximation of LIS under both operations with worst-case update time ~O(ϵ^(−5)). Instead of working with the grid packing technique introduced by Mitzenmacher and Seddighin, we take a different approach building on a new tool that might be of independent interest: LIS sparsification.

While this essentially settles the complexity of the approximate version of the problem, the exact version seems more elusive. The only known sublinear solution was given very recently by Kociumaka and Seddighin [STOC 2021] and takes ~O(n^(2/3)) time per update. We show polynomial conditional lower bounds for two natural extensions of this problem: weighted LIS and LIS in any subarray.

Joint work with Wojciech Janczewski.

Computing Runs in Strings Over General Ordered Alphabets

A run in a string is a maximal periodic substring. For example, the string « bananatree » contains exactly the runs « anana » and « ee ». There are less than n runs in any length-n string, and computing all runs in a string over a linearly-sortable alphabet takes O(n) time (Bannai et al., SIAM J. Comput. 2017).

We consider the problem of computing all runs in a string over a general ordered alphabet (i.e. the symbols are totally ordered, and testing the order of any two symbols takes constant time — but linear-time sorting of the symbols is not possible). We show that by exploiting powerful properties of the Lyndon array, we can achieve optimal linear time and space. This improves the previously best known result by Crochemore et al., who solved the problem in O((n)) time (where α is the inverse Ackermann function; SPIRE 2016). Our result positively answers the at least 29-year-old question whether square-freeness can be tested in linear time over general ordered alphabets (Breslauer, PhD thesis, Columbia University 1992).

The presentation will cover a brief introduction to the different alphabet types, as well as selected techniques used for the linear time runs computation. Some of them (like the efficient computation of the Lyndon array) are of independent interest.

Paper: https://doi.org/10.4230/LIPIcs.ICALP.2021.63
Video presentation: https://www.youtube.com/watch?v=bHbXVlb_EJ4

Joint work with Johannes Fischer.

Random matrices could steer the dangerous path taken by AI but even that is likely not enough

Abstract: Like most of our technologies today, AI dramatically increases the world’s carbon footprint, thereby strengthening the severity of the coming downfall of life on the planet. In this talk, I propose that recent advances in large dimensional mathematics, and especially random matrices, could help AI engage in the future economic degrowth. This being said, even those mitigating solutions are only temporary in regards to the imminence of collapse, which calls for drastically more decisive changes in the whole research and industry world. I will discuss these aspects in a second part and hope to leave ample time for discussion.

Sorting presorted arrays

Abstract: Twenty years ago was invented Timsort, a surprisingly efficient sorting algorithm. One crucial aspect of Timsort is that it sorts presorted arrays (rare if array entries are elements of a large set chosen uniformly at random, but frequent in practice) in much fewer than the N log(N) operations that might be expected. We will present the algorithm and adequate measures of presortedness, explain why Timsort is so good on presorted data, and how one may still improve upon Timsort and on some of its most naive implementations.