least one edit that we are allowed to perform, we de-
scend into both left, middle, and right children.
When considering a possible deletion, the search
continues with the first letter of the search word re-
moved. When considering a possible insertion, the
search continues with a wildcard flag ω 6∈ Σ that signi-
fies that the first letter of the search word now matches
all possible letters from Σ. Likewise, when consider-
ing a possible replacement, the search continues with
a search word where the first letter has been replaced
by a wildcard flag ω 6∈ Σ. For example, if the search
word is “evenshtein”, the search continues with “ven-
shtein” for deletion, “ωevenshetin” for insertion, and
“ωvenshtein” for replacement.
The use of a wildcard flag and threshold corre-
sponds to a compressed Levenshtein automaton. In
order to prune redundant paths through the tree, the
algorithm also keeps track of whether possible edits
already have been explored for this part of the path.
The approximate search algorithm is presented in
Algorithm 3. Here, n is the current node under con-
sideration, s is the search word, t is the threshold, v is
the value associated with an entry to possibly return
as a result, w is a flag indicating that a wild card pre-
cedes the search word, and d is a flag indicating that
the current part of the path has already been explored
in relation to possible edits. The final parameter keeps
track of the entry such that entries and values can be
returned as pairs. We denote the empty word as ε and
string concatenation as an infix operator ·. The al-
gorithm returns a list of results consisting of pairs of
entries and associated values.
For the sake of clarity, the construction of this list
is made implicit by the yield and yield from state-
ment implementing the popular generator semantics
of high-level languages such as Python. Here, yield
adds a single value to the implicit result list while
yield from adds all values from the implicit result list
of a recursive call.
The presented algorithm for approximate search
in ternary search trees is guaranteed to find all exact
matches, as well as matches that require at most t edits
from the search word. Likewise, it is guaranteed not
to find any other matches.
Lemma 1 (Soundness and Completeness). Let T =
hΣ,LEV,ρ, N i be a ternary search tree, s ∈ Σ
∗
be a
string, and t ≥ 0 be an integer.
Let E = {e ∈ Σ
∗
| GET(ρ, e) = he,vi for some
v ∈ Σ
∗
and LEV(s,e) ≤ t} be the set of all entries rep-
resented by T that have a distance at most t from the
search word s.
Let S = {e ∈ Σ
∗
| he,vi ∈ GET(ρ,s,t,UNDEF,
FALSE, FALSE,e)} be the set containing all the pro-
jections of the first element of the pairs of the result
Algorithm 3: Approximate search in a ternary search tree.
procedure GET(n, s, t)
yield from GET(n, s, t, UNDEF, FALSE, FALSE, ε)
end procedure
procedure GET(n, s, t, v, w, d, e)
if |s| = 0 and ¬w and v 6= UNDEF then
yield he,vi
end if
if n 6= UNDEF and (|s| ≥ 1 or w) then
if w or HD(s) < n.c then
yield from GET(n., s, t, UNDEF, w, TRUE, e)
end if
if w or HD(s) > n.c then
yield from GET(n.r, s, t, UNDEF, w, TRUE, e)
end if
if w or HD(s) = n.c then consume letter
yield from GET(n.m, (w ? s : TL(s)), t, n.v, FALSE, FALSE, n.c · e)
end if
end if
if ¬c and t ≥ 1 and ¬w then edit allowed
if n 6= UNDEF then
yield from GET(n, s, t-1, UNDEF, TRUE, FALSE, e) insert
end if
if |s| ≥ 1 then
yield from GET(n, TL(s), t-1, UNDEF, FALSE, FALSE, e) delete
end if
if n 6= UNDEF and |s| ≥ 1 then
yield from GET(n, TL(s), t-1, UNDEF, TRUE, FALSE, e) replace
end if
end if
end procedure
list constructed by Algorithm 3.
Then, we have that S = E, i.e., that these two sets
are identical.
Proof. We split the proof into two parts: (i) sound-
ness, i.e, S ⊆ E, and (ii) completeness, i.e., E ⊆ S.
For (i), for any string e ∈ S , we need to show
that (a) GET(ρ,e) = v for some v ∈ Sigma
∗
and (b)
LEV(s,e) ≤ t. Claim (a) can be proven straightfor-
wardly by structural induction over the ternary search
tree and Algorithm 2.
Claim (b) can be proven by induction over the
threshold t. For the base case t = 0, Algorithm 3 is
obviously equivalent to returning the result of Algo-
rithm 2 as a singleton list.
For the step case t > 0, the induction hypothesis is
that Claim (b) holds for t − 1. The condition for the
if statement marked as “edit allowed” in Algorithm 3
can be proven to evaluate to TRUE for any prefix p
of s by straightforward structural induction over the
ternary search tree and Algorithm 3. For a given pre-
fix p, if an edit is possible at this stage, i.e., the re-
mainder of s without prefix p is not the empty word
for deletion and replacement, the three recursive calls
marked as “insert”, “delete”, and “replace” are exe-
cuted. In the case of “delete”, the induction hypothe-
sis is immediately applicable. In the case of “insert”
and “replace”, the induction hypothesis is applicable
when the wildcard flag is consumed through the recur-
sive call in the body of the if statement marked “con-
sume letter”. Claim (b) thus follows from the obser-
vation that the prefix is concatenated with the strings
from the resulting set in the body of the if statement
marked “consume letter”.
Approximate Dictionary Searching at a Scale using Ternary Search Trees and Implicit Levenshtein Automata
659