Lemma 1. Minimizing the APTA as a DFA, and then
applying the world states from the starts of each sub-
sequence to the entire subsequence yields a minimal
unambiguous FST.
Proof.
1. The original APTA contains no cycles, by defi-
nition of tree, therefore the minimized DFA con-
tains no directed cycles, or it would accept arbi-
trarily long strings not accepted by the APTA.
2. When converting the DFA to an FST, the addi-
tion of constraints in the form of the world states
(the numbers) on the transitions cannot enable a
smaller machine. The FST could be made smaller
than the DFA only if adding the world states en-
abled more merges on the minimal DFA. Since
mergeability of two states is still determined in
part by the output character, the additional con-
straints will not enable more states to merge in ei-
ther the minimal or any other DFA. Therefore the
FST is also minimal.
3. Non-determinism in the FST could only come
from more than one transition from a node with
the same world-state on each transition. Since
each partition substring is uniquely labeled, am-
biguity cannot arise from labels from multiple
substrings. Because there are no directed cycles,
when labeling the transitions for a particular sub-
string, each node can be visited at most once.
Therefore the FST is unambiguous.
Algorithm 2
1. Compute APTA from data, ignoring worldstate.
2. Minimize APTA as a deterministic finite state au-
tomaton.
3. For each data string d
(a) s =start state
(b) For each character c ∈ d
i. t = transition from s that matches c
ii. Label t with the world-state from d
iii. s = NextState(t)
3.3 Example
An example run through the algorithm follows. Given
an initial action sequence of ”adcdedcdebcdebfdabcd-
abcxxbcd”, the partitioning portion generates two po-
tential partitions, each with an estimated program
size of 12. The first uses ”abcd” as p, partitioning
the string as, {”adcd” ”edcd” ”ebcd” ”ebfd” ”abcd”
”abcx” ”xbcd”}. The second partition uses ”ebcd”
as p, but ends up with the same partition. The FST
builder takes that partition and generates the FST
shown in figure 5.
3.4 Run Time Analysis
In algorithm 1, the main loop is executed O(n
2
) times.
Building each distance table takes (letting q = |p|, and
i is from line 1 (a) in algorithm 1) O(qi), so that all
the tables take O(n
2
q), roughly O(n
3
). In step 3a,
there are n comparisons in each call to min, and t
i
is computed n times, for a total of O(n
2
), therefore
algorithm 1 is O(n
5
). In algorithm 2 step 1 is O(n),
step 2 can be done in O(nlogn) (Hopcroft, 1971), and
the loop in step 3 is just O(n), making algorithm 2
O(nlogn) and the overall process O(n
5
).
Once caveat to this analysis is that it assumes the
action data genuinely comes from a loop and there is
a sensible period. If there is no good period, such as
when all the characters are unique, all partitions are
equally bad. In this case, although p can be found
quickly, all possible periods are equivalent, and all
possible partitions of the sequence for each candidate
p are equivalent, thus there are n
2
2
n−1
total equivalent
results. As a practical matter, we cut off the number
of partitions to 20 per period, but with input strings
that do have a component period, we do not hit the
threshold in practice.
4 EXAMPLE TRIAL
To test the algorithm, we ran it on a small data set
gathered by observing the behavior of a single indi-
vidual retrieving coffee from the department coffee
maker over the period of five days. This data set dif-
fers slightly from the type of data de- scribe above
in that while it is repetitive, the time differences be-
tween each pass could be used to partition the passes
correctly. However, if we ignore those breaks and run
the data together into a single stream, then it does fit
into the paradigm of the algorithm. The behavior se-
quence we observed correspond to two normal coffee
retrieval events, and event where the pot was empty
and needed to be remade, one normal coffee event,
and a coffee retrieval at the end of the day, so the ma-
chine was shut off, for a total of 64 steps. Details
of the data and the results can be found in (Crabbe,
2011).
In the first stage of the process, algorithm 1 cor-
rection identifies the normal coffee retrieval as the pe-
riod. Because one of the variations from the period is
long compared to the period itself, the algorithm 1
finds that the cost of considering the variation to be
a single partition is equal to splitting the sequence in
the middle and considering the first part as a suffix to
one partition and the second as it’s own. Algorithm 1
generates nine partitions, all of which differ in where
ICAART 2012 - International Conference on Agents and Artificial Intelligence
676