A Survey of Object-Relational Transformation Patterns
for High-performance UML-based Applications
Nemanja Kojić and Dragan Milićev
Faculty of Electrical Engineering, University of Belgrade, Bulevar Kralja Aleksandra 73, Belgrade, Serbia
Keywords: Object-relational Mapping, Relational Databases, Denormalization, UML.
Abstract: We outline a methodology for automatic and efficient object-relational mapping (ORM) in the context of
model-driven development (MDD) of high-performance information systems specified with executable
UML models. Although there are various approaches to performance tuning, we focus here on the
persistence layer ̶ the relational database. The relational data model is usually designed following the well-
known normal forms. However, a fully normalized relational model often does not provide sufficient
performance, and improper relational model design can easily lead to a slow and unusable relational
database for particular operations. Our ORM approach is intended to exploit smart optimization techniques
from the relational paradigm that abandon normalization and its positive effects, and trade them off for
better performance. Our ORM approach hence combines the classical denormalization transformations,
based on reducing or eliminating expensive database operations by the model restructuring, but applies them
to a non-redundant conceptual UML model. In this paper, we also present the first step towards this goal: a
catalogue of ORM transformation patterns.
1 INTRODUCTION
There are two broad classes of information systems:
transactional (OLTP) and analytical (OLAP). OLTP
information systems, apart from storing live and
active data, are characterized by intensive short
online transactions, fast query processing, and
maintaining strong data consistency in a concurrent
environment. Data in OLTP information systems are
usually persisted in OLTP relational databases, since
they are mature and reliable persistence technology.
Performance optimization and efficient handling of
data is tightly coupled with the data model in the
relational database. In our context, a special UML
profile, customized for information systems
modeling, is used for capturing key data and
operations. In the MDD approach, the data model
(DDL schema) is automatically generated from the
UML model by object-relational mapping (ORM).
In addition to (statically) generating a relational
database schema, the runtime component of ORM
has to provide operations of (dynamically) persisting
data in a relational database during transaction
processing. Conceptual UML models, that are the
input into this process, are usually normalized,
regarding the data aspect, which means there is no
redundancy. The normal forms, in general, minimize
effort for ensuring strong data consistency (Codd,
1971; Maier, 1983). Regarding ORM approaches,
UML models are often in practice transformed to
normalized relational data models. However, we
have witnessed that a fully normalized relational
data model cannot provide desired scalability and
performance of a large-scale information system
with intensive transactional processing. In addition,
numerous researchers in the domain of relational
databases argue that in practice, a relational data
model must be denormalized to fit in a form that is
handled most efficiently by a relational database.
They also provide numerous denormalization
techniques that increase performance of queries and
reduce or even eliminate expensive database
operations (Shin and Sanders, 2006; Sanders and
Shin, 2001; Keller and Coldewey, 1997). These
techniques have been traditionally associated with
OLAP systems, which assume none or very little
updates, and complex and intensive retrieval
operations on high volumes of data. On the other
hand, OLTP systems may still benefit from the
denormalization techniques, although the penalty
expressed through increased volume of update
operations for the sake of consistency of redundant
280
Koji
´
c N. and Mili
´
cev D..
A Survey of Object-Relational Transformation Patterns for High-performance UML-based Applications.
DOI: 10.5220/0005242302800285
In Proceedings of the 3rd International Conference on Model-Driven Engineering and Software Development (MODELSWARD-2015), pages 280-285
ISBN: 978-989-758-083-3
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)
data has been traditionally considered as a detractor
from applying these techniques to OLTP systems.
The paper is structured as follows. Section 2
describes the motivation for enhancing ORM to
exploit database features in a more efficient way,
according to the knowledge of experts in that area
and proven optimization techniques. In section 3, we
give a catalog of ORM transformations for mapping
object-oriented models to optimized relational data
models. Section 4 gives the main conclusions and
addresses directions for future work.
2 MOTIVATION
Accessing a relational database in a most efficient
way and maximizing usage of its most efficient
features is the key approach for achieving good
performance. Efficient access to a relational
database is tightly related to the complexity of
queries. The complexity of queries is directly
dictated by the relational model itself. The more
normalized model is, the more complex queries are
in general, because of joins, repeating calculations of
derived data, etc. In the relational database theory, it
is well known that the normal forms are often
considered and widely adopted as a principle of a
good database design that promotes elimination of
data redundancy, while minimizing effort of
maintaining data consistency (Agarwal, Keene and
Keller, 1995).
In the context of an OO information system that
persists data in an OLTP relational database, the
ORM approach has to be sophisticated enough to
automatically create a relational model that is most
efficient and optimized (even denormalized) for the
particular database. Each denormalization technique
brings both advantages and disadvantages. The
choice of an appropriate denormalization technique
highly depends on the nature of the system and data
access patterns. For example, if a data value is
derived (computed) from other data values that are
very infrequently modified, or not modified at all
after initialization, it will be an excellent candidate
for storing as a redundant persistent value at all
places (relational tables) where it is retrieved from
with other data. Also, constraints in the logical UML
models can sometimes limit the number of available
denormalization options. Unlike the other
denormalization approaches, our approach differs in
one important detail. While the denormalization is
considered as a process of restructuring an existing
normalized relational data model (Shin and Sanders,
2006; Sanders and Shin, 2001; Keller and Coldewey,
1997), our approach exploits the knowledge from
the denormalization techniques and applies them
directly to create an initially denormalized relational
model from a (non-redundant) conceptual UML
model in the context of OLTP information systems.
Denormalization is a complex process in
practice, done (or at least instructed and steered)
exclusively by human experts, who are the only ones
who understand the semantics of applications and
the static and dynamic nature of the structural and
behavioral model of the system. Static aspects of the
model are based on recognition of structural class
patterns that are good candidates for mapping to
denormalized relations. If the class patterns are not
isolated from other classes in the model, the static
rules are not always applicable easily, since there are
many combinations to examine while the model is
being compiled to the relational model. For example,
if a complex application does not contain any code
for actions that access a structure of related data in a
particular manner, it is of no use to optimize the
piece of relational model for that particular access.
This task is error-prone if done manually, without a
systematic approach. That is one of the reasons why
we investigate a fully automatic ORM approach, that
should consider all available denormalization
options in a systematic way. In spite of all positive
effects of denormalization, it is worth repeating that
it makes updates more complex (although they are
done automatically by the ORM runtime, they still
put additional workload to the database). It must be
carried out in a controlled manner, balanced
carefully between the achieved performance and the
relational model maintainability. Uncontrolled
denormalization can lead to an even more
complicated relational model, derived as a result of
the denormalization explosion. This can be solved
by considering the dynamic aspects of the system's
model, by online data access profiling and
discovering the most dominant operations in the
system. A hybrid combination of the static and
dynamic aspects of the model would lead to a more
scalable and efficient ORM approach that controls
the denormalization explosion by focusing on the
most frequently accessed data. All of these
optimizations are hidden and transparent for the
developers, since ORM is responsible for creating
the appropriate relational model and mapping
application's operations to the optimized relational
data model. Finally, what we find extremely
necessary regarding efficient application of
denormalization is a comprehensive quantitative
analysis of denormalization techniques that should
result in a guide that will provide for each
ASurveyofObject-RelationalTransformationPatternsforHigh-performanceUML-basedApplications
281
denormalization technique a context in which it is
most effective.
3 ORM TRANSFORMATION
PATTERNS
As the first prerequisite for the research path, we
have described in the previous section, we establish
and describe a catalogue and classification of ORM
transformations in the described context.
There are four classes of denormalization
strategies: (1) collapsing relations, (2) partitioning
relations, (3) adding redundant properties, and (4)
adding derived properties (Shin and Sanders, 2006).
In this chapter, we present transformations of
generalization/specialization relationships and
associations (aggregations and compositions, as
special kinds of associations in UML, are not
covered separately), relying on the denormalization
techniques from the relational paradigm. We also
outline some optimization techniques that are not
directly related to the structure of the relational
model, but rather represent optimization tricks.
3.1 Object Identifier
Object identifier generation should not be
centralized in the relational database, as it may
impose unnecessary database load. It should be
rather decentralized and stateless. One way to
accomplish this requirement is to generate an object
identifier as a GUID. Yet another important aspect
of object identifier is that may carry the object type
identifier. That way, it is possible to dynamically
infer the type of an object without querying the
database, which leverages scalability of the
persistence layer and makes the polymorphic queries
more efficient. In addition, having the object type
identifier encoded in object identifier, eliminates
high load of tables near to the root inheritance table
(Keller and Coldewey, 1997; Keller and Coldewey,
1998).
3.2 Mapping Inheritance
We do not consider multiple inheritance, but only
single class inheritance. The authors in (Keler, 1995;
Agarwal, Keene and Keller, 1995) presented a few
relational model transformations for (efficient)
mapping of inheritance. In this chapter we combine
the existing denormalization approaches with
requirements of ORM.
One Table per Class: Among probably many
other places, this approach was presented in (Keller,
1997; Agarwal, Keene and Keller, 1995), as a
vertical partitioning. The main idea of the approach
is to map each class in the model to one table in the
relational database. Abstract classes also have their
own tables in the relational model. The tables in the
database form a tree, with one root table that holds
object identifiers. Records in each child table are
linked to the corresponding records in the parent
table with the object identifier as the foreign key.
The advantage of such an approach is getting an
easy-to-maintain and normalized relational model,
optimized for updates, but not for reads. Queries that
generalize objects (e.g., a query that searches for all
instances of a base class, possibly abstract, that
satisfy a criterion over properties of that base class)
are straightforward and efficient. Other queries that
fetch both inherited and specific properties of
derived classes may be far from simple and efficient.
The root table, and tables near to the root table, are
thus often under heavy load of queries, because of
frequent joins, which may affect the scalability of
the system. The approach may not be usable in case
of deep inheritance hierarchies, since multi-way
joins are required for retrieving basic object
information (Agarwal, Keene and Keller, 1995).
One Table per each Concrete Class: Usually,
there is no need to create separate tables for abstract
classes, but all their properties are copied to the
tables of the inherited classes. The properties from
the abstract class are thus duplicated in the schema,
or repeated in each table that corresponds to a
concrete inherited classes (note, however, that
values are not duplicated, unless an object belongs to
more than one derived class). The rule may be
generalized to a sub-hierarchy of abstract classes
related with generalization/specialization.
Eliminating tables for abstract classes may improve
reading performance, as some joins are eliminated.
One Table per One Inheritance Path: This
approach is useful in situations when the previous
two cannot provide sufficient reading performance,
due to the mentioned heavy load of the root tables
and multi-way join operations. This approach is
characterized with producing one table per each
inheritance path (assuming single inheritance). An
inheritance path starts from the root class and ends at
each concrete class, no matter if it is the leaf or not
(abstract classes are not considered). All inherited
properties on the inheritance path are collected into
the table for that path. This approach introduces
even more redundancy in the relational model (but
still not on the data), but eliminates joins for
MODELSWARD2015-3rdInternationalConferenceonModel-DrivenEngineeringandSoftwareDevelopment
282
retrieving basic object information that are already
present in the table. As a consequence, the
elimination of multi-way joins completely eliminates
heavy load of the root tables. Although the
bottleneck near the root table is eliminated, the
relational model now complicates generalized
polymorphic queries, since more tables must be
combined/joined to retrieve the desired information
(Keller, 1997).
One Table for One Inheritance Tree: This
approach assumes mapping of a whole inheritance
tree into one single table. This is also named as the
typed partitioning, as mentioned in (Agarwal, Keene
and Keller, 1995). The records in the table unify all
the properties from all classes in the inheritance
hierarchy, which eliminates expensive join
operations and optimizes polymorphic queries, but
creates a highly denormalized relational model. This
approach may not show good results in case of deep
hierarchies, since the table gets too big and
cumbersome. Since all data are stored in only one
table, the problem of bottleneck arises again, along
with a large waste of storage, because of a lot of null
values. Hence, this approach is recommended only
in case of shallow inheritance hierarchies and low
concurrency (Keller, 1997).
One Table per each Concrete Class with
Controlled Redundancy of Properties: We
propose this hybrid approach that leverages
advantages of the presented approaches and refines
the "one table per one concrete class" approach, by
copying properties from a base class table to the
tables of inherited classes, in order to speed up
generalized queries. However, instead of copying all
properties from a base class table to the tables of
inherited classes, only those base class properties
that are most often retrieved in combination with the
inherited class properties in the queries, may be
replicated. In particular, values of redundant
properties are copied in several tables (for base class
to which the property belongs and for derived
classes for optimized retrieval). This way, the degree
of denormalization is smaller than in the "one table
for inheritance path" approach, but performance of
reads is optimized. In addition, in the “one table for
inheritance path” approach, producing records with
great number of columns may also have some
negative effects on performance of read queries. For
example, if the database cannot store one record in
one physical page, then the number of accessed
pages may be increased. Hence, this selective
copying of properties from the tables of base classes
to the tables of inherited classes controls the
explosion of columns in records and keeps the
physical model under control. This approach must
be supported by the online data access profiling. To
the best of our knowledge, this approach has not
been published or systematically implemented in an
automated ORM except in our SOLoist
(www.soloist4uml.com) framework for model-
driven development (Milicev, 2009).
3.3 Mapping Associations
Efficient mapping of associations is another
challenge for a sophisticated ORM. The multiplicity
constraint on association ends is usually the main
factor that influences the selection of a proper
mapping transformation. In this section, we do not
consider aggregations and compositions separately,
since all that works for associations, works also for
aggregations and compositions (composition has
implications on the semantics of actions that are not
relevant for our discussion).
There are three well known approaches for
mapping associations, with respect to the
multiplicity constraint: (1) distinct table approach,
(2) embedded foreign key approach, and (3)
embedded class approach (Agarwal, Keene and
Keller, 1995).
It is necessary to mention that the
transformations, presented in this section, are
considered with one important assumption: we
examine isolated classes, neglecting their relations
with other classes in the model and other roles they
may play in the model. Otherwise, the combinatorial
complexity of available options and established
constraints increases significantly. At this moment,
this is beyond the scope of this paper.
3.3.1 Mapping Associations 1:1
Associations of type 1:1 usually relate one main
class and one dependent class, or both classes may
represent strong entities, but always related as 1:1.
All of the three mentioned mappings can be applied
for this type of associations.
The embedded Class Approach (1:1): If the
properties of both classes are often combined and
retrieved in queries, than this is the most efficient
transformation. This mapping eliminates frequent
joins, while keeping the updates still reasonably
easy. It is important to mention that objects share the
same record, no matter if one of them is existentially
dependent or not. It is important to know the aspect
of the association's semantics and respond
appropriately on operations of deleting links.
It is worth mentioning that some ORM frameworks,
ASurveyofObject-RelationalTransformationPatternsforHigh-performanceUML-basedApplications
283
such as Hibernate, support this feature but on the
level of class: a class can be annotated as one whose
instances will be embedded into the instances of the
other class (on the other side of compositions).
However, it is important to understand that this is
actually a property of an association, not of an entire
class: in our practice, we have come across
situations in which it was very useful (according to
the usage of data) to have some instances of a
certain class, which are aggregated into other objects
over one composition, embedded into those other
objects, while the other instances of that class should
be stored in a separate table. Again, to the best of
our knowledge, we are not aware of any ORM or
publication that supports this feature.
The embedded Foreign Key and the Distinct
Table Approach (1:1): If properties of one class are
dominantly retrieved, while objects of the other class
are rarely accessed, then the normalized models 1
and 2 may be a better choice, since negative effects
of frequent joins are minimized, while the
normalized relation model responds better to further
changes in the model.
Mapping Associations 1:n
The Distinct Table Approach (1:n). This is the
most flexible way of mapping, but it requires
expensive joins if the relation is traversed
frequently. This approach provides the
implementation of the bidirectional navigability.
The embedded Foreign Key Approach with
Unidirectional Navigability (1:n): The embedded
key approach is the most convenient and efficient
for this kind of associations. The embedded foreign
key is incorporated into the table of the dependent
class. This saves one level of joins, while still being
flexible and keeps the relational model in a
normalized form. The problem with this approach is
that it does not provide the bidirectional navigability
(on the level of foreign keys, which are stored in
only table on the n side of the association).
The embedded Foreign Key Approach with
Bidirectional Navigability (1:n): As mentioned in
the case of the embedded key approach, it does not
provide bidirectional navigability (foreign keys are
stored in one table only). In a special situation, the
bidirectional navigability can be implemented even
without the distinct association table. We may use a
technique called repeating groups (Shin and Sanders,
2006). Namely, if the multiplicity of the dependent
class is low, and with specified maximum
cardinality, IDs of the dependent objects can be
incorporated as foreign keys in separate columns in
the owner object's record (Zaker, Phon-Amnuaisuk
and Haw, 2009). In our future work, we will be
experimenting with an ORM approach of
bidirectional navigability with unlimited maximal
cardinality.
The embedded Class Approach (1:n): In the
case of low multiplicity of the dependent class, it
may be useful to embed it to the table of the owner
class. Hence, joins are not needed for traversing the
association, all property values are available directly
in the owner object's record.
Mapping Associations m:n
The Distinct Table Approach (m:n): The distinct
table approach is a natural solution for persisting
links of this kind of associations. Each record of the
table contains pairs of object identifiers, for objects
from both sides of the association. Although
flexible, this approach requires two joins to retrieve
combined attribute values from the related objects.
On the other hand, updates remain fast and easy
(Agarwal, Keene and Keller, 1995).
Distinct Table Approach with Controlled
Redundancy (m:n): In the previous paragraph, we
mentioned that the distinct table approach requires
two joins to retrieve properties of the related objects.
Usually, applications need to retrieve only subsets of
the properties from the related classes. If the
property values are rarely changed, then it makes
sense to copy those properties that are accessed most
frequently to the association table. This way,
similarly to the hybrid approach of mapping
inheritance described before, we apply a controlled
redundancy to keep copies of properties that are
most often retrieved when the association is
traversed in the association table. This approach
eliminates join operations in these cases. As for the
similar approach for mapping inheritance, we are not
aware of an implementation or publication of this
mapping, that we propose.
The embedded Foreign Key Approach (m:n):
The embedded key approach may be considered in
the situation when the multiplicity on at least one
side is low, known and limited with the upper
bound. That way, the table of the class at the
opposite side may contain repeating groups of the
finite number of object identifiers as the foreign
keys (Shin and Sanders, 2006). Interesting to
mention, such a mapping improves performance of
queries that need to read direct links of an object. If
both association ends have low and limited
multiplicities, we may apply the same technique on
the opposite table, too. That way we get
bidirectional navigability, compared to the case
when only one role has low and limited multiplicity.
MODELSWARD2015-3rdInternationalConferenceonModel-DrivenEngineeringandSoftwareDevelopment
284
The embedded Class Approach (m:n): The
embedded class approach is not suitable for
representing associations of this kind, so we do not
examine it further.
3.4 Optimizing Transitive Associations
Transitive associations are often present in UML
models and traversed. If the navigation from one
object to its transitively related object goes through a
sequence of links, such request is accomplished by
using multiple joins. Again, if this operation is done
frequently, it may impose significant slowdown and
high load of the database. This issue is often solved
by using pre-joint tables, as recommended in (Zaker,
Phon-Amnuaisuk & Haw, 2009). That is, objects
that are accessed together frequently are stored in
the table for the direct navigation. In addition, this
optimization can be justified only by the dynamic
profiling of data access in the relational database. In
fact, this technique is just a special case of storing
derived properties (associations in this case).
3.5 Storing Derived Values Instead of
Frequent Recalculation
This transformation provides optimized access to
derived values. If derived values are calculated
frequently, and if the basic values change rarely,
then it is highly recommended to store the derived
values in redundant columns and retrieve them on
demand (Shin and Sanders, 2006). This is a large
category of particular techniques that cover
attributes as well as associations, including
functional and recursive ones. Due to the lack of
space, we will not investigate this category any
further in this paper.
4 CONCLUSIONS
In this paper, we presented a survey of ORM
transformations aimed for creating optimized
relational models, specifically structured to
eliminate expensive database operations in queries.
Based on the given survey, we plan to implement a
systematic approach for automatic mapping UML
models to the optimized relational model, although
some of the techniques are already present in our
SOLoist framework as particular solutions. The
approach is intended to provide static, as well as
dynamic application profiling that will feed ORM
with information needed for adapting the relational
model to support the most dominant data access
patterns. Finally, one of the most important
contributions of this research is an initial framework
for a detailed analysis and comparison of the
presented ORM approaches based on
denormalization of the relational model. As a result
of the analysis, we expect to produce a methodology
for using the denormalization techniques in a most
efficient way.
REFERENCES
Batini, C., Stefano C., Navathe S., 1989. Conceptual
Database Design. Entity Relationship Approach,
Elsevier Science Publishers BV (North Holland).
Shin, S. K., Sanders, G. L., 2006. Denormalization
strategies for data retrieval from data warehouses.
Decision Support Systems. 42 (1). p. 267-282.
Sanders, G., Shin. S. K., 2001. Denormalization effects on
performance of RDBMS. System Sciences, 2001."
Proceedings of the 34th Annual Hawaii International
Conference on.
Maier, D. (1983). The theory of relational databases.
Rockville: Computer science press. Vol. 11.
Keller, W. 1997. Mapping Objects to Tables: A Pattern
Language. Proceedings of the 1997 European Pattern
Languages of Programming Conference. Irrsee.
Germany.
Keller, W., Coldewey, J. 1997. Relational Database
Access Layers: A Pattern Language. Collected Papers
from the PLoP’96 and EuroPLoP’96 Conferences.
Washington University, Department of Computer
Science, Technical Report WUCS 97-07.
Keller, W., Coldewey, J. 1998. Accessing Relational
Databases: A Pattern Language. Pattern Languages of
Program Design 3. Addison-Wesley.
Milicev, D. (2009). Model-driven development with
executable UML. Wrox.
Codd, E.F. 1971. Normalized data base structure: A brief
tutorial. ACM SIG- FIDET Workshop on Data
Description, Access, and Control. San Diego,
California.
Agarwal, S., Keene, C., Keller, A. M. 1995. Architecting
object applications for high performance with
relational databases. OOPSLA Workshop on Object
Database Behaviour, Benchmarks, and Performance,
Austin (Vol. 196).
Zaker, M., Phon-Amnuaisuk, S., & Haw, S. C. 2009.
Hierarchical Denormalizing: A Possibility to Optimize
the Data Warehouse Design. International Journal of
Computers. (1).
ASurveyofObject-RelationalTransformationPatternsforHigh-performanceUML-basedApplications
285