We study indexing data structures for substring compression problems.Given an input text \(T\), a substring compression problem for a compression scheme \(C\) and a query interval \([i..j]\) asks to compute the compressed form of the substring \(T[i..j]\) with respect to \(C\) — that is, to apply \(C\) to \(T[i..j]\), which we write as \(C(T[i\ldots j])\). While \(T\) and \(C\) are given in advance, the interval \([i\ldots j]\) is specified at query time.
This setting allows preprocessing \(T\) to facilitate efficient answers to substring compression queries under \(C\). Space- or time-optimal solutions are straightforward but impractical:
— A space-optimal solution applies \(C(T[i\ldots j])\) at query time, without any preprocessing (selecting the optimal implementation of \(C\) with respect to space).
— A time-optimal solution precomputes \(C(T[i\ldots j])\) for all text positions \(i, j\), but requires at least quadratic space and preprocessing time.
Here, optimal time refers to linearity in the size of the compressed output. Thus, a natural question arises: Can we build an index over \(T\) that supports substring compression queries with near-optimal query time and subquadratic space?
In this talk, we explore approaches for compression schemes \(C\) related to the Lempel–Ziv 78 factorization, and outline promising directions for future research.
