To be clear, this is not an ideal validation. Though we
have been using cross validation approaches through-
out, a significant test set would be more convincing
(future work is to obtain this).
The pipeline was run on the 116 work orders with
1106 images where 117 are FAA documents (one
work order had a duplicate FAA form). All 117
FAA documents were correctly oriented. The doc-
ument type classifier predicted 121 documents to be
FAA where four were false positives (non-FAA im-
ages classified as FAA). The text classifier predicted
three of the four as other and one as tested (i.e. it mit-
igated three of four false positives). The text classifier
correctly predicted the status of 116 FAA documents
with one incorrectly predicted as tested versus actual
In this paper we proposed and demonstrated an ap-
proach for document analysis using a combination
of supervised machine learning models for orienta-
tion classification and document classification. The
form style documents were first partitioned to pro-
duce symbols from which features were generated.
The features were then used to train machine learn-
ing algorithms. When the image is oriented and the
document identified, document streams are sent to an
OCR engine to produce a text file from which a simple
match is made to determine the desired form’s status.
We then employed a feature selection approach for the
document type classifier to produce a parsimonious
model and showed that it was as accurate as the full
model. Finally, the end-to-end results were presented
to demonstrate the effectiveness of our approach.
This work was supported in part by the University
of Montevallo Contract #19-0501-001. The authors
greatly appreciate the support of the airline company
employees involved in the project. Without their ef-
forts this research could not have been conducted.
