An S-adic characterization of sublinear complexity subshifts

In the context of symbolic dynamics, the class of « sublinear complexity subshifts » is of particular relevance as it occurs in a variety of areas, such as geometric dynamical systems, language theory, number theory, and numeration systems, among others. During the intensive study carried on this subject from the beginning the 90’s, it was proposed that a hierarchical decomposition based on S-adic sequences that characterizes sublinear complexity subshifts would be useful to understand this class. The problem of finding such a characterization was given the name « S-adic conjecture » and inspired several influential results in symbolic dynamics. In this talk, I will present an S-adic characterization of sublinear complexity subshifts and some of its applications, which in particular give a solution to this conjecture.

Algorithmic enumeration of West-3-stack sortable permutations

Abstract: In his seminal book, the art of computer programming, Knuth discussed the enumeration of permutations which can be sorted by a single stack. West later defined the stack-sorting map in a deterministic way, such that repeatedly applying this map to a permutation of length n results in the identity permutation after at most n−1 steps. The number of permutations that are sorted by k applications of this map is known only for k≤2. I will discuss recent work on the case k=3, in particular I will describe an algorithm we applied to enumerate these permutations up to length n=1000 and our subsequent analysis of these numbers. This is joint work with Colin Defant and Anthony J. Guttmann.

Enumerative properties of fighting fish

Résumé :

Fighting fish are combinatorial structures made of square tiles that form two dimensional branching surfaces. They share the same counting sequence as other classical combinatorial structures like non separable planar maps, 2-stack sortable permutations and synchronized intervals. In this talk I will talk about a direct bijection between fighting fish and non-separable planar maps, and also about the link with left ternary trees, another class of objects that share the same counting sequence.

Image Synthesis: Texture and Text line

In this talk, we will present two applications of image synthesis.

First, we will focus on texture synthesis. This task consists in synthesizing a new image from a reference image. The synthesized image must be perceptually similar to the reference image while being different. This is an old problem that has been recently updated with the use of convolutional neural networks (CNNs). Most current methods are based on the use of Gram matrices of feature maps from ImageNet trained CNNs. We have developed a simple multi-resolution strategy to take into account large-scale structures. This strategy can be coupled with long-range constraints, such as imposing the Fourier spectrum of the image, or using of autocorrelation of the feature maps. This multi-resolution strategy allows to obtain excellent high resolution synthesis. Combining it with additional constraints improves the results in the case of regular textures. We compared our methods to alternative methods on various texture examples and corroborated our visual observations with quantitative and perceptual evaluations.

Second, we will describe a new unsupervised and document-specific approach for character recognition from lines of text. Our main idea is to build on unsupervised approaches to object discovery and in particular on recent methods of « analysis-by-synthesis », which reconstruct images from a limited number of visual elements, called sprites. We extend these approaches to learn up to a hundred characters and analyze full lines of text by introducing a relevant architecture and an efficient sprite selection strategy. We illustrate the effectiveness of our model on printed documents and ancient manuscripts.  

Bio: Nicolas Gonthier received a Data Science M.Eng. degree from ISAE-Supaéro and an M.Sc. degree in statistics from the University of Toulouse, both in 2017 and a Ph.D. degree in image processing from the University Paris-Saclay in 2021. His PhD was funded by an interdisciplinary grant (IDI IDEX) from the University Paris-Saclay and hosted by Télécom Paris, Institut polytechnique de Paris. Currently, he is a postdoctoral researcher at ENPC Paris-Tech. His research interests include deep learning, image processing and machine learning mainly for cultural heritage (historical documents, artworks, etc.).

Le séminaire aura lieu en mode hybride, à ESIEE Paris dans l’amphithéâtre 160 ; et via meet.google.com/nux-njnv-vud pour celles et ceux qui ne pourront pas être présents.

Ordre des feuilles d’un arbre et évolution de l’idiolecte de romanciers et romancières du XIXe siècle

Dans cette présentation à plusieurs voix, nous montrerons comment des questions liées à l’évolution du style littéraire chez certains auteurs ou certaines autrices du XIXe siècle a motivé la recherche de l’ordre optimal sur les feuilles d’un arbre, pour correspondre au mieux à un ordre fourni en entrée. Cette question correspond à deux problèmes précédemment introduits en bioinformatique dans le contexte de la comparaison d’arbres phylogénétiques, OTCM et OTDE. Le premier consiste à trouver un ordre qui minimise le nombre d’inversions avec l’ordre en entrée sur les feuilles alors que le second consiste à supprimer le nombre minimal de feuilles de l’arbre pour le rendre cohérent, sur les feuilles restantes, avec l’ordre en entrée. Nous montrons que ces deux problèmes sont NP-complets dans le cas général. Nous fournissons un algorithme en temps polynomial pour le second problème dans le cas où le degré maximal est borné dans l’arbre, et un algorithme de complexité paramétrée par un paramètre borné par le nombre de feuilles à supprimer. Nous explorons les possibilités de mise en œuvre pratique de ces algorithmes sur des arbres obtenus par classification hiérarchique de corpus de romans du XIXe siècle. Nous présentons aussi une approche directe pour étudier l’évolution de l’idiolecte de romanciers et romancières du XIXe siècle.

Reordering a tree according to an order on its leaves and studying the evolution of the idiolect of writers

In this multi-voice presentation, we will show how questions related to the evolution of literary style among several authors of the 19th century motivated the search for the optimal order on the leaves of a tree to best correspond to an order provided as input. This question corresponds to two problems previously introduced in bioinformatics in the context of phylogenetic tree comparison, OTCM and OTDE. The first problem consists in finding an order which minimizes the number of inversions with an input order on the leaves, while the second one consists in removing the minimum number of leaves from the tree to make it consistent with the input order on the remaining leaves. We show that both problems are NP-complete when the maximum degree is not bounded. We provide a polynomial-time algorithm for OTDE in the case where the maximum degree is bounded by a constant and an FPT algorithm in a parameter lower than the number of leaves to delete. We explore the possibilities of practical use of those algorithms on trees obtained by clustering French novels of the 19th century. We also present a more direct approach to study the evolution of the idiolect of French 19th century writers.

Géométrie des clusters de spins dans les triangulations munies d’un modèle d’Ising

Résumé :

Dans cet exposé, je présenterai des résultats, obtenus en collaboration avec Laurent Ménard, sur la géométrie des clusters de spins dans des triangulations munies d’un modèle d’Ising, et qui s’appuie sur des travaux obtenus précédemment en collaboration avec Laurent Ménard et Gilles Schaeffer

Dans ce modèle, on considère des triangulations dont les sommets sont décorés par des spins + ou -. Cela nous permet de définir une loi de probabilités qui consiste à échantillonner une triangulation décorée avec une probabilité qui dépend de son nombre d’arêtes monochromatiques, via un paramètre nu. Le fait qu’il existe une valeur critique du paramètre pour ce modèle, a été initialement prouvé dans la littérature physique par Kazakov, et a ensuite été retrouvé par des méthodes combinatoires par Bousquet-Mélou et Schaeffer et par Bouttier, Di Francesco and Guitter.

Je montrerai comment on peut étudier géométriquement la transition de de phase de ce modèle via l’étude du volume et du périmètre de ses clusters monochromatiques. En particulier, je montrerai que quand nu est critique ou sous critique, le cluster de la racine est fini presque sûrement et infini avec une probabilité positive quand nu est surcritique.

Cet exposé ne suppose aucun prérequis sur les cartes, ni sur le modèle d’Ising !

A cartographic quest between lambda calculus, logic, and combinatorics

The lambda calculus was invented by Church in the late 1920s, as part of an ambitious project to build a foundation for mathematics around the concept of function.  Although his original system turned out to be logically inconsistent, Church was able to extract from it two separate usable systems that remain of great interest to this day, with a typed calculus for logic and an untyped calculus for pure computation.  In the talk, I will survey some surprising connections discovered over recent years between linear subsystems of lambda calculus and enumeration of graphs on surfaces, or « maps », which is an active subfield of combinatorics with roots in W. T. Tutte’s work in the 1960s.  Part of the interest in these links is that they suggest we may be able to develop new logical perspectives on maps and related combinatorial objects, and at the same time new quantitative perspectives on lambda calculus and related systems.

Compositional Properties of Alignments

Alignments, i.e., position-wise comparisons of two or more strings or ordered lists are of utmost practical importance in computational biology and a host of other fields, including historical linguistics and emerging areas of research in the Digital Humanities. The problem is well-known to be computationally hard as soon as the number of input strings is not bounded. Due to its practical importance, a huge number of heuristics have been devised, which have proved very successful in a wide range of applications. Alignments nevertheless have received hardly any attention as formal, mathematical structures. Here, we focus on the compositional aspects of alignments, which underlie most algorithmic approaches to computing alignments. We also show that the concepts naturally generalize to finite partially ordered sets and partial maps between them that in some sense preserve the partial orders. As a consequence of this discussion we observe that alignments of even more general structure, in particular graphs, are essentially characterized by the fact that the restriction of alignments to a row must coincide with the corresponding input graphs. Pairwise alignments of graphs are therefore determined completely by common induced subgraphs. In this setting alignments of alignments are well-defined, and alignments can be decomposed recursively into subalignments. This provides a general framework within which different classes of alignment algorithms can be explored for objects very different from sequences and other totally ordered data structures.

Working with digital convexity

Unlike in continuous geometry the notion of convexity is not clearly defined in digital geometry. Arguably, the most natural definition of digital convexity is to say that a lattice set S in Z^d is digital convex if S is equal to the intersection of its convex hull with Z^D. The issue with that definition that motivated the investigation of alternate definition is that it does not ensure any connectivity properties for S (in terms of induced grid subgraph). In this talk we study some computational geometry problems such as the potato peeling problem and the recognition problem in the digital world using this definition of digital convexity. We also study unimodular affine transformations (the set of lattice preserving affine transformation) and their consequences on the connectivity of digital convex sets.  
Bio: After working 4 years as a software developer and 2 years as a math teacher, Loïc Crombez took his PhD, and then 1 year as an ATER at Université Clermont Auvergne – LIMOS where he worked on algorithmic problems at the intersection of computational and digital geometry. This work rewarded him the best student paper award in DGCI 2019.

Word and Graph Embeddings for Machine Learning

Distributed word embeddings (e.g. word2vec) provide a powerful way to reduce large text corpora to concise features (vectors) readily applicable to a variety of problems in NLP and data science. I will introduce word embeddings, and apply them in variety of new and interesting directions, including:

(1) Multilingual NLP — The Polyglot project (www.polyglot-NLP.com) employs deep learning and other techniques to build a basic NLP pipeline (including entity recognition, POS tagging, and sentiment analysis) for over 100 different languages. We train our systems over each language’s Wikipedia edition, providing unified data resources in the absence of explicitly annotated data, but substantial challenges in interpretation and evaluation.

(2) Detecting Historical Shifts in Word Meaning — Words like « gay » and « mouse » have substantially shifted their meanings over time in response to societal and technological changes. We use word embeddings trained over texts drawn from different time periods to detect changes in word meanings. This is part of our efforts in historical trends analysis.

(3) Feature Extraction from Graphs — We present DeepWalk, our approach for learning latent representations of vertices in a network, which has become extremely popular. DeepWalk uses local information on truncated random walks to learn embeddings, by treating walks as the equivalent of sentences in a language. It is suitable for a broad class of applications such as network classification and anomaly detection. We also introduce new graph embedding techniques based on random projections, which produce DeepWalk-quality embeddings thousands of times faster than previous algorithms.