*Bounty: 100*

*Bounty: 100*

This question is migrated from MathOverflow, where it did not receive any answers a year ago.

For a language $L$ over the finite alphabet $Sigma$, let $L_n$ denote the set of words in $L$ of length $n$. The word $u$ is a *subsequence* of $w$ if $u$ can be obtained from $w$ by deleting letters (note that $u$ does *not* have to occur consecutively in $w$). The language $L$ is *subsequence-closed* if whenever $win L$ and $u$ is a subsequence of $w$ then $uin L$ (thus subsequence-closed languages are piecewise testable, but not vice versa). It can be shown (see below) that for all subsequence-closed languages $L$,

$$

lim_{ntoinfty} sqrt[n]{|L_n|}

$$

exists and is an integer. Does anyone know of a reference for this fact? (I have stated it with proof in one of my papers, but I am trying to find the “correct” reference for it now.)

Here is the proof I know, thanks to Michael Albert. First, subsequence-closed languages are necessarily regular (if $L$ is subsequence-closed then there are only finitely many minimal words not in $L$ by Higman’s Lemma, and this leads quickly to the existence of a regular expression for it).

Next we claim that every subsequence-closed language $LsubseteqSigma^ast$ can be expressed as a finite union of regular expressions of the form

$$

ell_1Sigma_1^astell_2Sigma_2^astcdotsell_kSigma_k^astell_{k+1}

$$

for letters $ell_iinSigma$ and subsets $Sigma_isubseteqSigma$. (I would also be interested if anyone has a reference for this decomposition of subsequence-closed languages.) This follows by induction on the regular expression defining $L$. The base cases where $L$ is empty or a single letter are trivial. If the regular expression defining $L$ is a union or a concatenation then the claim follows inductively. The only other case is when this regular expression is a star, say $L=E^ast$ for a regular expression $E$. In this final case we see that $L=Pi^ast$ where $PisubseteqSigma$ is the set of all letters occurring in $E$ because $L$ is subsequence-closed.

With the claim above established, it follows that $limsqrt[n]{|L_n|}$ is equal to the size of the largest set $Sigma_i$ occurring in such an expression for $L$.