operators.
The operators as listed above (see Table 1) and
their functionality turned out to be “intuitive”, i.e.
matching the information consumers’ mental model.
There were syntax elements considered as highly
intuitive, like the quotation marks indicating some
sort of fixed text in an otherwise parametric
presentation of a pattern. The users immediately
perceived dots as separators of building blocks of
their patterns. The role of the comma was apparent,
too. However, there were also things to learn, the
difference between comma and semicolon, for
instance. Some had to learn that the question mark
has to be put ahead of the expression to mark it as
optional rather than thereafter. For others, however,
it was more intuitive for this purpose to have a
leading question mark than a trailing one.
We ran our experiments with about 2000
documents (real estate contracts with related
certificates) distributed over 18 data sources. In the
first place, this sample may seem small to validate
our approach or to underpin its scalability. However,
the inherent character of basically unstructured data
distributed over different sources reflects the nature
of the challenge we face in data discovery, even
within the context of Big Data. The language applied
in contracts and related certificates is quite uniform
and not narratively complex. We are convinced that
our document sample covers this language in its
entirety, and thus scales for even larger collections.
In many information ecosystems we barely have to
deal with highly complex narrative forms. Due to
this fact, we consider our approach as scalable also
towards thematic areas outside legal information as
long as the narrative nature is relatively uniform,
such as is the case for legal texts.
5 CONCLUSION
Our work-in-progress demonstrates the feasibil-ity
of self-service data discovery and information
sharing. With a simple instrument like DISL the
information consumers can leverage their shallow
engineering knowledge for managing discovery
services on their own. Over the time, information
consumers naturally develop a data-driven mindset
and with their computer literacy a certain level of
“natural” engineering knowledge that enables them
to handle these discovery tools, including the tools’
command language. Our experiments indicate that
users can develop shallow engineering knowledge
without much effort. If the discovery tool requires
not more than that level of knowledge they can
transform their domain knowledge easily into
machine instructions. This smooth integration of
domain and tool knowledge completes the picture of
self-service discovery that meanwhile is also
demanded by the industry (Sallam et al., 2014).
There are many discovery tasks that serve
individual, ad hoc, and transient purposes. Main
stream discovery, in contrast, supports reoccurring
discovery requests commonly shared by large user
communities and operates on large data collection,
including sometimes the entire Web. We can
conceive manifold scenarios for non-mainstream
discovery. Users may have to analyse from time to
time dozens of failure descriptions or complaints, for
instance. The corresponding data collections are
personal or shared among small groups and consist
of bunches of PDF files or emails, for instance,
barely documents on the Web. Dynamically
changing small-scale requests would mean
permanent system adaptation, which is too intricate
and too expensive in the majority of cases. With a
flexible self-service solution like DISL information
consumers can reap the benefits of automatic
information discovery and sharing and avoid the
drawbacks of mainstream discovery.
REFERENCES
Brandt, D. S., Uden, L., 2003. Insight Into Mental Models
of Novice Internet Searchers, Communications of the
ACM, vol. 46 no. 7, pp. 133-136.
Cowie, J., Lehnert, W., 1996. Information Extraction,
Communications of the ACM, vol. 39 no. 1, pp. 80-91.
Ding, L., Finin, T., Joshi, A., Pan, R., Peng, Y., Reddivari,
P., 2005. Search on the Semantic Web, IEEE
Computer, vol. 38, no. 10, 2005, pp. 62-69.
Fan, J., Kalyanpur, A., Gondek, D.C., Ferrucci, D.A.,
2012. Automatic knowledge extraction from
documents. IBM Journal of Research and
Development, vol. 56, no 3.4, pp.: 5:1-5:10
Iwanska, L.M., 2000. Natural Language Is a Powerful
Knowledge Representation System: The UNO Model,
in: L.M. Iwanska and S.C. Shapiro (eds.), Natural
Language Processing and Knowledge Representation,
AAAI Press, Menlo Park, USA, pp. 7-64.
Magaria, T., Hinchey, M., 2013. Simplicity in IT: The
Power of Less, IEEE Computer, vol. 46, no. 11, pp.
23-25.
Norman, D., 1987. Some observations on mental models.
D. Gentner; A. Stevens, (Eds.) Mental Models,
Lawrence Erlbaum, Hillsdale, NJ.
Pentland, A., 2013. The data-driven society. Scientific
American, vol. 309, no. 4, pp. 64-69.
Sallam, R., Tapadinhas, J., Parenteau, J., Yuen, D.,
Hostmann, B., 2014. Magic Quadrant for Business
Intelligence and Analytics Platforms, February 2014,
KMIS2014-InternationalConferenceonKnowledgeManagementandInformationSharing
344