review without getting lost, and is insufficient for
mass manipulation of data objects. The search step
does not allow real exploration but merely helps to
shrink the result list; hence, it still requires the user
to know what he is looking for.
Modern and future devices allow a more intuitive
tactile- and gesture-based interaction, and provide
various kinds of context information (e.g. location,
orientation and acceleration). Based on the capabi-
lities of these devices, new user interfaces offer a
freely interactive and explorative way to work with
data. With the ability to present data at instantly and
easily changeable levels of detail and to dynamically
create context-sensitive views or filters, these new
interfaces supply the user with exactly the
information that is relevant for his current situation.
This flexible and intelligent way of data presentation
and interaction lets users search, view, handle, and
manipulate very large datasets of varying kinds and
structure (Perlin et al., 1993); (Morris et al., 2010);
(Werner et al., 2011).
Putting the large variety of available data and the
new UI capabilities together, a wide range of new
applications that leverage potentially large weakly
structured and only partially integrated data becomes
possible. For instance, in the area of competitive
market intelligence future applications can analyze
customer sentiment, but can also be used to discover
market trends, competitor moves, regulatory
constraints, and so on, early, and handle these
proactively. Such applications will be able to crawl
the Web and combine information extracted from
unstructured content (typically blogs, news pages,
press releases) with publicly available structured
information (e.g., from Freebase) and private
structured information (contained mostly in business
applications).
In contrast to the established transactional
applications, the content of these collaborative
applications - a shared and incrementally composed
work context - is less predictable. Hence, it does not
fit well with a predefined rigid schema, which is the
key design concept of relational database systems.
More flexibility in the data model is provided by
key-value stores or document-oriented database
management solutions. However, abandoning a rigid
up-front defined database schema also implies that
the intended meaning of data, which so far has been
contained in the schema, must be made explicit and
completely handled within the application coding.
Semantic Web technology is one attempt to
create a more explicit representation of the meaning
of data. But representing meaning by complete and
consistent sets of computer-processable rules requi-
res a rigid standardization of metadata vocabularies,
and does not provide a concept for dealing with
different context-specific views on identical entities,
with incompleteness and inconsistencies.
In summary, we see this kind of heterogeneity
and the strong scalability demand resulting from the
fast growing amount of available and potentially
relevant data as one of the main challenges for
database technology. Future data repositories need to
support the coexistence of heterogeneous content
and its incremental integration to the degree needed,
based on specific application knowledge and
requirements. We therefore see the need for data
management systems that do not require the upfront
definition of rigid database schemas and that support
the efficient representation and processing of large
volumes of irregularly structured, not necessarily
fully integrated data.
In this paper, we describe an extension to SAP’s
HANA data management and analytics engine.
(Färber et al., 2011); (Gupta et al., 2012). HANA is
a main-memory centric database system that
leverages new technologies such as multi-core, SSD,
and large main-memory capacities to significantly
increase performance of analytical and transactional
applications. HANA is based on a fundamentally
new system architecture that allows storing data
either column- or row-oriented. We leverage HANA
as the foundation of a new schema-flexible data
management system that we call Active Information
Store (AIS). AIS capabilities are realized directly on
top of the HANA database engine to guarantee good
query performance and scalability.
2 REQUIREMENTS
We studied a broad range of uses cases, such as ad-
hoc data integration and analysis, competitive
market intelligence, social network analysis, idea
management business user enablement, and next-
generation collaboration tools. Based on these use
cases, we identified common requirements for an
optimal underlying data store. A data store
developed specifically for such applications has to
be able to efficiently manage schema-flexible,
irregularly structured and partially integrated data
with a soft-coded or system generated schema.
With such a data store, schema information does
not need to be defined upfront. Instead, the data is
self-describing (a soft-coded schema). Data objects
of similar structure, or representing similar kinds of
entities, do not need to have the same semantic type
(are not fully integrated), and data objects of the
DATA2012-InternationalConferenceonDataTechnologiesandApplications
16