Fault Tolerance Logging-based Model for Deterministic Systems
Óscar Mortágua Pereira, David Simões and Rui L. Aguiar
Instituto de Telecomunicações, DETI, University of Aveiro, Aveiro, Portugal
Keywords: Fault Tolerance, Logging Mechanism, Software Architecture, Transactional System.
Abstract: Fault tolerance allows a system to remain operational to some degree when some of its components fail.
One of the most common fault tolerance mechanisms consists on logging the system state periodically, and
recovering the system to a consistent state in the event of a failure. This paper describes a general fault
tolerance logging-based mechanism, which can be layered over deterministic systems. Our proposal
describes how a logging mechanism can recover the underlying system to a consistent state, even if an
action or set of actions were interrupted mid-way, due to a server crash. We also propose different methods
of storing the logging information, and describe how to deploy a fault tolerant master-slave cluster for
information replication. We adapt our model to a previously proposed framework, which provided common
relational features, like transactions with atomic, consistent, isolated and durable properties, to NoSQL
database management systems.
1 INTRODUCTION
Fault tolerance enables a system to continue its
operation in the event of failure of some of its
components (Randell et al., 1978). A fault tolerant
system either maintains its operating quality in case
of failure or decreases it proportionally to the
severity of the failure. On the other hand, a fault
intolerant system completely breaks down with a
small failure. Fault tolerance is particularly valued in
high-availability or life-critical systems.
Relational Database Management Systems
(DBMS) are systems that usually enforce
information consistency and provide atomic,
consistent, isolated and durable (ACID) properties in
transactions (Sumathi and Esakkirajan, 2007).
However, without any sort of fault-tolerance
mechanism, both atomicity and consistency are not
guaranteed in case of failure (Gray and others,
1981).
We have previously proposed a framework
named Database Feature Abstraction Framework
(DFAF) (Pereira et al., 2015), based in Call Level
Interfaces (CLI), that acts as an external layer and
provides common relational features to NoSQL
DBMS. These features included ACID transactions,
but our framework lacked fault-tolerance
mechanisms and, in case of failure, did not
guarantee atomicity or consistency of information.
This paper presents a model that can be used to
provide fault-tolerance to deterministic systems
through external layers. We describe how to log the
system state, so that it is possible to recover and
restore it when the system crashes; possible ways to
store the state, either remotely or locally; and how to
revert the state after a crash.
We prove our concept by extending DFAF with
the proposed logging mechanisms in order to
provide fault-tolerant ACID transactions to NoSQL
DBMS. DFAF acts the external layer over a
deterministic system (a DBMS). We consider that
non-deterministic events can happen in the
deterministic systems and are either expected (e.g.:
receiving a message), triggering deterministic
behaviour, or unexpected (e.g.: crashing), leading to
undefined behaviour.
The remainder of this paper is organized as
follows. Section 2 describes common fault tolerance
techniques and presents the state of the art. Section 3
provides some context about the DFAF and Section
4 formalizes our fault tolerance model, describing
what information is stored and how to store it.
Section 5 describes a fault-tolerant data replication
cluster which can be used for performance
enhancements and Section 6 shows our proof of
concept and evaluates our results. Finally, and
Section 7 presents our conclusions.
Pereira, Ó., Simões, D. and Aguiar, R.
Fault Tolerance Logging-based Model for Deterministic Systems.
DOI: 10.5220/0005979101190126
In Proceedings of the 5th International Conference on Data Management Technologies and Applications (DATA 2016), pages 119-126
ISBN: 978-989-758-193-9
Copyright
c
2016 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
119
2 STATE OF THE ART
Fault tolerance is usually achieved by anticipating
exceptional conditions and designing the system to
cope with them. Randell et al. define an erroneous
state as a state in which further processing, by the
normal algorithms of the system, will lead to a
failure (Randell et al., 1978). When failures leave
the system in an erroneous state, a roll-back
mechanism can be used to set the system back in a
safe state. Systems rely on techniques like check-
pointing, a popular and general technique that
records the state of the system, to roll-back and
resume from a safe point, instead of restarting
completely. Log-based protocols (Johnson, 1989)
are check-pointing techniques that require
deterministic systems. Non-deterministic events,
such as the contents and order of incoming
messages, are recorded and used to replay events
that occurred since the previous checkpoint. Other
non-deterministic events, such as hardware failures,
are meant to be recovered from. Indirectly, they are
recorded as lack of information.
In the fault tolerance context, logging
mechanisms and their concepts and implementation
techniques have been discussed and researched
extensively (Gray and Reuter, 1992), with popular
write-ahead logging approaches (Mohan et al., 1992)
having become common in DBMS to guarantee both
atomicity and durability in ACID transactions. There
are also other approaches which do not rely on
logging systems to provide fault tolerance, like
Huang et al.’s method and schemes for error
detection and correction in matrix operations (Huang
et al., 1984); Rabin et al.’s algorithm to efficiently
and reliable transmit information in a network
(Rabin, 1989); or Hadoop’s data replication
approach for reliability in highly distributed file
systems (Borthakur, 2007). Some relational DBMS
use shadow paging techniques (Ylönen, 1992) to
provide the ACID properties. However, the above
described fault tolerance mechanisms are not
suitable to be used in an external fault-tolerance
layer, since they are very dependent on the
architecture of the systems they were designed for.
The most general proposals fall in the category
of data replication, where several algorithms and
mechanisms have been proposed. These include
Hadoop’s data replication approach for reliability in
highly distributed file systems (Borthakur, 2007);
(Oki and Liskov, 1988), which is based on a primary
copy technique; (Shih and Srinivasan, 2003), an
LDAP-based replication mechanism; or (Wolfson et
al., 1997), which provides an adaptive algorithm that
replicates information based on its access pattern.
Recently, proposals have also focused on byzantine
failure tolerance (Castro and Liskov 1999; Cowling
et al. 2006; Merideth and Iyengar 2005; Chun et al.
2008; Castro and Liskov 2002; Kotla and Dahlin
2004). Byzantine fault-tolerant algorithms have been
considered increasingly important because malicious
attacks and software errors can cause faulty nodes to
exhibit arbitrary behaviour. However, the byzantine
assumption requires a much more complex protocol
with cryptographic authentication, an extra pre-
prepare phase, and a different set of techniques to
reach consensus.
To the best of our knowledge, there has not been
work done with the goal of defining a general
logging model that provides fault tolerance as an
external layer to an underlying deterministic system.
Some solutions provide fault tolerance, but are
adapted to a specific context or system. Others are
overly-abstract general models, like data replication,
and do not cover how to generate the necessary said
data from an external layer to provide fault-tolerance
to the underlying system. Not only that, but many
data replication systems also assume conditions we
do not, such as the possibility of byzantine failures,
or overly complex data access patterns. While
byzantine failures are of enormous importance in
distributed unsafe systems, such as in the BitCoin
environment (Nakamoto, 2008), we consider their
countermeasures to be complex and performance-
hindering in the scope of our research. Not only that,
but byzantine assumptions have been proven to
allow only up to 1/3 of the nodes to be faulty. We
intend to focus on fault-tolerance for underlying
deterministic systems through a logging system, and
while distributed data replication is used for
reliability, expected DFAF use cases do not assume
malicious attacks to tamper with the network.
However, our model is general enough that it
supports the use of any data replication techniques to
replicate logging information across several
machines.
3 CONTEXT
We have previously mentioned the DFAF, which
allows a system architect to simulate non-existent
features on the underlying DBMS for client
applications to use, transparently to them. Our
framework acts as a layer that interacts with the
underlying DBMS and with clients, which do not
access the DBMS directly. It allowed ACID
transactions, among other features, on NoSQL
DATA 2016 - 5th International Conference on Data Management Technologies and Applications
120
DBMS, but was not fault tolerant. Typically,
NoSQL DBMS provide no support to ACID
transactions. An ACID transaction allows a database
system user to arrange a sequence of interactions
with the database which will be treated as atomic, in
order to maintain the desired consistency constraints.
For reasons of performance, transactions are usually
executed concurrently, so atomicity, consistency and
isolation can be provided by file- or record-locking
strategies. Transactions are also a way to prevent
hardware failures from putting a database in an
inconsistent state. Our framework must be adjusted
to take hardware failures into account with multi-
statement transactions. In a failure free execution,
our framework registers what actions are being
executed in the DBMS and how to reverse them,
using a reverser mechanism (explained further
below). Actions are executed in the DBMS
immediately and are undone if the transaction is
rolled-back.
However, during a DFAF server crash, the ACID
properties are not enforced. As an example, consider
a transaction with two insert statements. If the
DFAF server crashed after the first insert, even
though the client had not committed the transaction,
the value would remain in the database, which
would mean the atomic aspect of the transaction was
not being enforced. To enforce it, we propose a
logging mechanism, whose records are stored
somewhere deemed safe from hardware crashes.
That logging system will keep track of the
transactions occurring at all times and what actions
have been performed so far. When a hardware crash
occurs, the logging system is verified and
interrupted transactions are rolled-back before the
system comes back on-line. Our logging system is
an extension to DFAF and is a log-based protocol
where the underlying DBMS acts as the
deterministic system mentioned previously. Each
action in a transaction represents a non-deterministic
event and is, as such, recorded, so that the chain of
events can be recreated and undone when the system
is recovering from failure.
4 LOGGING SYSTEM
Logging systems for fault-tolerance mechanisms
have several different aspects that need to be
defined: firstly, the logging system must be designed
in a way that the logging is not affected by hardware
failures. In other words, if the server crashes while a
database state was being logged, the system must be
able to handle an incomplete log and must be able to
recover its previous state. Secondly, logging an
action is not done at the same time as that action is
executed. Taking an insertion in a database as an
example, the system logs that a value is going to be
inserted, the value is inserted and the system logs
that the insertion is over. However, if the system
crashes between both log commands, there is no
record of whether the insert took place or not. To
solve this, the underlying system must be analysed
to check if it matches the state prior to the insertion
or not. Thirdly, while recovering from a failure, the
server can crash again, which means the recovery
process itself must also be fault tolerant. Finally,
cascading actions imply multiple states of the
underlying system, all of which must be logged so
that they can all be rolled-back. In other words, if an
insert in a database triggers an update, then the
database has three states to be logged: the initial
state, the state with the insertion and the state with
the insertion and the update. Because the server can
crash at any of these states, they all need to be
logged so that the recovery process rolls-back all the
states and nothing more than those states.
4.1 Logging Information
In order to provide fault tolerance, there are two
choices to compensate for the failure (Garcia-Molina
and Salem, 1987): backward recovery, or executing
the remainder of the transaction (forward recovery).
For forward recovery, it is necessary to know a
priori the entire execution flow of the transaction,
which is not always possible. DFAF uses the
backward recovery model to avoid leaving the
system in an inconsistent state when a rollback is
issued by a client. To do so, along with the actions
performed, DFAF registers how to undo them. In
other words, when a client issues a command, the
command to revert it, referred to as the reverser, is
calculated. In a SQL database, for example, an
insert’s reverser is a delete. Reversers are executed
backwards in a recovery process to keep the
underlying system in a consistent state. However,
logging actions and performing them cannot be done
at the same time. It is also not adequate to log an
action after it has already been performed, since the
server could crash between both stages, and there
would be no record that anything had happened.
Therefore, actions (and their reversers) must be
logged before they are executed on the underlying
system. However, if the server crashes between the
log and the execution, the recovery process would
try to reverse an action that had not been executed.
Because we have no assumptions regarding when
Fault Tolerance Logging-based Model for Deterministic Systems
121
the system can crash, the only way to solve this
problem is to directly assess the underlying system’s
state to figure out whether the action has been
performed or not. Since we have access to the
underlying system’s state prior to the action being
executed, we can find a condition that describes
whether the action has been executed or not. This
condition will be referred to as verifier from now on.
For example, after the insertion of value A, it is
trivial to verify if the value has been inserted or not
by the amount of rows with value A that existed
prior to the insertion. If there were two As and the
transaction crashed during the insertion of a third, by
counting how many exist in the database, we can
infer whether we need to reverse this action in the
transaction (if we now have three As) or if the action
did not get completed (if we still have two As). The
concept is extended to cascading actions. A reverser
is determined for each cascading action in DFAF,
which means a verifier must also be calculated to
determine whether that effect happened and needs to
be rolled-back or not. If the server crashes during
these triggered actions or during a rollback, each
verifier must be checked before applying the
corresponding reverser, to ensure that 1) we are not
reverting the same action twice, and that 2) we are
not reverting an action that was not executed. During
the recovery process, reversers are executed
backwards. If a verifier shows that an action has not
been completed, or after an action has been reversed,
its record (along with the reverser and verifier) is
removed from the log. If the server crashes during a
recovery, due the verifier system, there is no risk of
reverting actions that need not be reverted or that
have not yet been executed.
4.2 Logging Information Storage
We have implemented two possible information
storage mechanisms: a local and a remote one. These
can be used with regular hardware and standard
computational resources nowadays. Other storage
mechanisms are supported, such as using a relational
DBMS to store and retrieve the logs. The only
requirement is that the mechanisms are fault-
tolerant.
The local mechanism relies on writing the
logging information to disk: fault tolerance is
supported even in a complete system crash, but with
heavy performance costs. It does not require any
additional software, other than file system calls. The
remote mechanism tries to leverage both
performance and fault tolerance and relies on a
remote machine to keep the logging information in
memory. I/O operations are not as heavy on
performance as writing to disk, but fault tolerance is
only guaranteed if the logging server does not crash.
We have designed a fault-tolerant master-slave
architecture, deemed a Cluster Network (CN), to
allow several machines to coordinate and replicate
information among them. This system can be used to
store the logs from the remote mechanism, which
allows some machines to crash without loss of
information. In a CN, the only case where the logs
would be lost would be a scenario where all
machines crashed, which is unlikely if the machines
are geographically spread. We expect the
performance of this mechanism to be superior in
comparison with the local mechanism. The remote
mechanism uses TCP sockets to exchange
information between the servers. Because TCP
provides reliability and error control, both machines
know when a message has been properly delivered
and the system server can perform the requested
actions while the logging server keeps the
information in memory. Both servers can detect if
the network failed or the remaining server has
crashed. In these cases, the recovery process can be
initiated until connectivity is re-established.
The local mechanism, as previously stated, was
designed to store the information in the file system.
We assume that the hardware crashes will not be so
severe that they render the hard drive contents
unrecoverable, or that a back-up system is deployed
to allow the recovery of a defective file system.
Most file systems do not provide fault-tolerant
atomic file creation, removal, copy, movement,
appending or writing operations, so we need to first
address this issue and prevent the logging system
from entering an inconsistent state, if there is a crash
during a logging operation. We start by creating a
file for each transaction occurring in the system. The
file is created as soon as a transaction is started and
deleted just before it is complete. If the server
crashes when the transaction is starting and creating
the file, the file can either exist and be empty, or not
exist. There are no actions to be rolled-back, so
either case is fine and the file is ignored. If the
server crashes when deleting the file and closing the
transaction, the file can either exist with its contents
still intact, or not exist. If it does not exist, the
transaction was already over. If it still exists, then it
is possible to read it and rollback the database. The
log file update must be done in a way that the
logging system’s last state must be recoverable. As
such, to prevent file corruption, a copy of the old
state is kept until the new one is completely defined.
Firstly, we create a file, temp, that signals we were
DATA 2016 - 5th International Conference on Data Management Technologies and Applications
122
updating the log and whose existence means that the
original log file is valid. After we create it, we copy
the log to a copy file. When all of the contents have
been copied, we delete temp. If the server crashes at
any point and temp exists, log is still valid and the
server ignores copy. If temp does not exist, but copy
exists, then copy is valid and the server ignores the
original log. After temp has been deleted, log is
updated with the new information (a new state in the
database, for example). After log has been fully
updated, copy can be deleted, since it is no longer
necessary. Table 1 shows the several stages
described above.
Table 1: A log-update cycle, with the several stages of the
update, the state of each of the files, and what file is
chosen on each stage.
Stage: 1 2 3 4 5 6 7 8
log (L)
?
temp
copy (C)
?
File: L L L L C C C L
With the two proposed mechanisms, a recorded
log of executed actions on the database can be safely
stored and used to return the underlying system to a
consistent state.
5 CLUSTER NETWORK
Our remote logging mechanism can rely on a
cluster-based system to store the needed
information. This allows for fast interactions,
reliability and consistency. Data replication
techniques such as byzantine tolerant approaches are
a valid option, but have an associated performance
decay due to the byzantine assumption and a low
threshold for the amount of faulty machines. As
such, we designed a fault-tolerance master-slave
network that replicates information across all the
slaves and better fits DFAF’s requirements.
We require our Cluster Network to be able to
grow as needed, without having to interrupt service
or without having maintenance downtime. We
considered that nodes should be symmetrical to
avoid the human error factor present in id-based
systems. We also want a stable algorithm (a master
stays as master until it crashes) to avoid unnecessary
operations when an ex-master is turned back on.
Finally, we consider that an IP network is not perfect
and that network elements (switches, routers) and
well as network links can crash at any time. We
therefore allow a set of any number of nodes that
communicate through IP where any of the nodes can
crash and be restarted at any given time. The master
node is contacted by clients and it forwards the
information to the slave nodes. Clients can find the
master node through any number of methods, like
DNS requests, manual configuration, broadcast
inquiries, etc. If the master crashes, one of the slaves
is nominated to be master and, because all the
information was replicated among the slaves, it can
resume the master’s process.
Our election algorithm is inspired in Gusella et
al.’s election algorithm (Gusella and Zatti, 1985).
While many other leader election algorithms would
be supported, this one suits the DFAF requirements
the best. The authors have developed a Leader
Election algorithm that is dynamic (nodes can crash
and restart at any time), symmetric (randomization is
used to differ between nodes), stable (no leader is
elected unless there is no leader in the cluster) and
that uses User Datagram Protocol (UDP)
communication (non-reliable, non-ordered). It
supports dynamic topology changes to some degree,
but it is not self-stabilizing (nodes start in a defined
state, not in an arbitrary one). When a master is
defined, the master is the one receiving requests
from clients. In order to guarantee consistency
among all the nodes, the master forwards any
incoming requests to the slaves before answering the
client with the corresponding response. This
guarantees that all the slaves will have the same
information as the master. If the master crashes
during this process, because the client still has not
been answered, he will retry the request to the new
master, which will store it (while avoiding request
duplication) and forward it to the slaves. When a
slave joins the network, he contacts the master and
requests the current system information (in this case,
the current log). A mutual exclusion mechanism is
necessary to avoid information inconsistency when
information is being relayed to a new slave. To
avoid request duplication from clients when the
master node crashes, a request identification number
is used. Using this approach means that up to N-1
nodes in the CN can crash without information being
lost or corrupted. Using other approaches for data
replication, such as (Castro and Liskov, 1999) only
allows up to N/3 nodes to be faulty and is expected
to have worse performance. However, byzantine-
tolerant approaches are more robust and, as
previously stated, our logging model is general
enough that any data replication mechanism can be
used to safe-keep the logging information.
Fault Tolerance Logging-based Model for Deterministic Systems
123
6 PROOF OF CONCEPT
We extended the previously mentioned DFAF with
our proposed logging mechanism, in order to
guarantee the atomic and consistent properties of
transactions. This way, even if the DFAF server
crashed during multiple concurrent transactions,
those transactions will all be rolled-back and the
underlying database will be on a consistent state
when the recovery process has finished. The reverser
and verifier system in DFAF depends on the
underlying DBMS schema and query language.
Different schemas can imply different cascading
actions, if, for example, different triggers are defined
in each schema. However, NoSQL DBMS don’t
usually support cascading actions such as triggers,
and they do not fall under the expected use cases of
DFAF. Different query languages also imply
different reversers and verifiers, since an insert in
SQL has a very different syntax from a NoSQL
DBMS’s custom query language. However, the
reverser and verifier creation mechanism is trivial
for most SQL and SQL-like languages. Verifiers are
select statements related with the values being
inserted, deleted or updated. Reversers are delete
statements for insert statements, insert statements
for delete statements, and update statements for
update statements. Having multiple transactions
occurring at the same time implies having either
multiple log files or a single log file with
information from all transactions. This could lead to
problems during the recovery process, if the order of
actions in separate transactions was not being
logged. However, the fact that transactions
guarantee the isolation property means that each of
their actions will not affect other transactions.
Therefore, the order in which each transaction is
rolled-back is irrelevant, as long as the statements in
each transaction are executed backwards. To prove
our concept, we tested the local logging mechanism
using DFAF with a single client connecting to the
database. The client starts a transaction, inserts a
value and updates that value, finishing the
transaction. During this process, the logging
information is stored in a local file. We crashed the
transaction on several stages (shown in Table 1) and
verified that the recovery process could correctly
interpret the correct log file and set the database in a
correct state, the one previous to the transaction. In
order to interrupt the process on particular stages,
exceptions were purposely induced in the code,
which were thrown at the appropriate moments. The
recovery process was then started and tested as to
whether it could successfully recover and interpret
logged information and, if needed, rollback the
database to a previous state. Results showed that the
system was always able to recover from a failed
transaction and returned the database to a safe state.
To prove our concept with the remote mechanism,
we deployed a network with a client connected to a
DBMS and to a CN, as shown in Figure 1.
Figure 1: The deployed network for tests with the remote
mechanism and a single client.
We used the same transaction used to test the
local mechanism. In our first test, we checked
whether the CN could detect and roll-back failed
transactions. We crashed the client after the first
insertion and the CN immediately detected the crash
and rolled-back the transaction. In our second test,
we checked if a correct rollback was ensued with
crashes on different stages of the transaction. We
crashed the client at several stages of the transaction
(before logging the action, after logging but before
performing the action, after performing but before
logging that it has been performed and after logging
that the action had been done) and monitored the
roll-back procedure to guarantee the database was in
the correct state after the recovery process had
finished. Finally, we checked whether several
concurrent transactions occurring in a DFAF server
could all be rolled-back without concurrency issues.
We used a DFAF server to handle several clients
while connected to a CN, as can be seen in Figure 2,
and crashed the server during the client’s
transactions. The CN detected the crash and rolled-
back all transactions, leaving the database once more
in a consistent state.
Figure 2: The deployed network for tests with the remote
mechanism and multiple clients.
To demonstrate the soundness of our approach in
a practical environment, we examined the
performance of our logging mechanism’s
implementation and of our CN using a 64-bit Linux
DATA 2016 - 5th International Conference on Data Management Technologies and Applications
124
Mint 17.1 with an Intel i5-4210U @ 1.70GHz, 8GB
of RAM and a Solid State Drive. For tests involving
a CN, a second machine was used, running 64-bit
Windows 7 with an Intel i7 Q720 @ 1.60GHz, 8GB
of RAM and a Hard Disk Drive. A 100Mbit cable
network was used as an underlying communication
system between both nodes. Figure 3 shows how the
local (green) and remote (red) logging mechanisms,
using as a basis for comparison a transaction with up
to 1000 statements on a SQLite table. This number
of statements was based on previous DFAF
evaluations. Tests were repeated several times to get
an average of the values, the 95% confidence
interval was calculated, and the base time for
operations was removed to allow for a more intuitive
graph analysis. The CN used for the remote
mechanism was a local single-node, which removed
most of the network interference with the tests.
Figure 3: Performance (in milliseconds) of the different
logging mechanisms.
As expected, the most performant mechanism is
the remote mechanism, where a sub-second
performance decay is noticed (around 321±209
milliseconds for 1000 operations). The baseline time
for 1000 operations was 10295±1142 milliseconds,
which means remote mechanism has a performance
decay of approximately 3.1%. The local mechanism
is the least performant, due to the high amount of
disk operations, with around 2047±237 milliseconds
for 1000 operations, a 19.8% performance decay.
The performance difference of an order of
magnitude between both mechanisms is due to the
fact that, as the logging file gets bigger, it takes
longer to read, copy and write it. This means that,
with a transaction of 1000 insertions, for example,
the 1000th insertion will take a lot longer than the
1st insertion, while the remote mechanism takes the
same amount of time for any insertion.
We tested Cluster Networks to find how long it
takes to find a master and make the information
consistent among them. These values have a direct
correlation to the defined time-outs on each state of
the network, as defined by Gusella et al.’s algorithm.
We created two-node networks (1 master, 1 slave)
and measured the times taken for each node to
become a master/slave (with a confidence interval of
95%) and to guarantee the consistency of
information among them. Tests with more nodes
were not feasible, due to hardware restraints. Tests
show an average of 5±1 milliseconds to get a node
from any given phase of the election algorithm to the
next, excluding the defined time-outs. The time
taken to exchange all the information from a master
to a slave depends on the current information state,
but in our tests, any new slave took approximately
8±1 milliseconds to check whether information was
consistent with the master. Transferring the log with
1000 records from the first test took approximately
20±4 milliseconds.
7 CONCLUSIONS
We have previously proposed DFAF, a CLI-based
framework that implements common relational
features on any underlying DBMS. These features
include ACID transactions, local memory structure
operations and database-stored functions, like Stored
Procedures. However, the proposal lacked a fault
tolerance mechanism to ensure the atomic property
of transactions in case of failure. We now propose a
fault tolerance model, general enough to work with
several underlying deterministic systems, but
adapted to DFAF.
Our model is a logging mechanism which
requires the performed action, its verifier (that
checks whether it has been executed or not) and its
reverser (to undo it, in case of failure). We describe
two ways of storing the information: either locally in
the file system, or remotely in a dedicated server.
Because operating systems do not usually provide
atomic operations, to prevent the logging
information from becoming corrupted, we also
describe how to update the information. In order to
guarantee that the remote server is also fault tolerant
and the information is not lost in case of failure, we
describe a master-slave network that can be used to
replicate the information. Clients contact the master,
which replicates the information to slaves without
consistency issues. Our performance results show
that the use of our logging mechanism can be
suitable for a real-life scenario. There is an expected
performance degradation, but a fault tolerant system
Fault Tolerance Logging-based Model for Deterministic Systems
125
provides several advantages over a slightly more
performant fault intolerant system. Not only that, but
the performance decay using the remote mechanism
is nearly negligible.
In the future, we intend to improve both the local
and remote mechanisms. Regarding the file system,
we intend to develop a highly performant algorithm,
that does not rely on copying the previous log on
each update. Regarding the remote mechanism, we
intend to adapt the CN for other requirements, in
order to improve performance. This can be done by
allowing priority nodes and removing the symmetry
factor. This way, servers can preferentially become
masters, if they have better hardware or conditions.
The CN can also be improved by changing the
underlying communication protocol, which at the
moment is assumed to be unreliable. We also intend
to develop a master look-up mechanism, like DNS
registration. At the moment, there is no such
mechanism, and clients resort to finding masters
manually.
In conclusion, we extended DFAF with a log-
based fault-tolerance model, this way guaranteeing
ACID properties on the underlying DBMS
transactions. We describe two ways of storing the
information, to leverage performance and reliability,
but support other models. We also propose a master-
slave fault tolerant network which can be used as a
remote server to keep information replicated and
consistent. Both the logging model and the CN can
be used for other applications as well; we have for
example adapted the CN to act as a concurrency
handler in another module of DFAF.
ACKNOWLEDGEMENTS
This work is funded by National Funds through FCT
- Fundação para a Ciência e a Tecnologia under the
project UID/EEA/50008/2013.
REFERENCES
Borthakur, D., 2007. The hadoop distributed file system:
Architecture and design. Hadoop Project Website,
11(2007), p.21.
Castro, M. and Liskov, B., 1999. Practical Byzantine fault
tolerance. OSDI.
Castro, M. and Liskov, B., 2002. Practical Byzantine fault
tolerance and proactive recovery. ACM Transactions
on Computer Systems (TOCS).
Chun, B., Maniatis, P. and Shenker, S., 2008. Diverse
Replication for Single-Machine Byzantine-Fault
Tolerance. USENIX Annual Technical Conference.
Cowling, J., Myers, D. and Liskov, B., 2006. HQ
replication: A hybrid quorum protocol for Byzantine
fault tolerance. Proceedings of the 7th ….
Garcia-Molina, H. and Salem, K., 1987. Sagas, ACM.
Gray, J. and others, 1981. The transaction concept: Virtues
and limitations. In VLDB. pp. 144–154.
Gray, J. and Reuter, A., 1992. Transaction Processing:
Concepts and Techniques 1st ed., San Francisco, CA,
USA: Morgan Kaufmann Publishers Inc.
Gusella, R. and Zatti, S., 1985. An election algorithm for a
distributed clock synchronization program,
Huang, K.-H., Abraham, J. and others, 1984. Algorithm-
based fault tolerance for matrix operations.
Computers, IEEE Transactions on, 100(6), pp.518–
528.
Johnson, D.B., 1989. Distributed System Fault Tolerance
Using Message Logging and Checkpointing by.
Sciences-New York, 1892(December).
Kotla, R. and Dahlin, M., 2004. High throughput
Byzantine fault tolerance. Dependable Systems and
Networks, 2004 ….
Merideth, M. and Iyengar, A., 2005. Thema: Byzantine-
fault-tolerant middleware for web-service applications.
… , 2005. SRDS 2005. ….
Mohan, C. et al., 1992. ARIES: a transaction recovery
method supporting fine-granularity locking and partial
rollbacks using write-ahead logging. ACM
Transactions on Database Systems (TODS), 17(1),
pp.94–162.
Nakamoto, S., 2008. Bitcoin: A peer-to-peer electronic
cash system. Available at: http://www.cryptovest.co.u
k/resources/Bitcoin paper Original.pdf [Accessed
February 15, 2016].
Oki, B.M. and Liskov, B.H., 1988. Viewstamped
replication: A new primary copy method to support
highly-available distributed systems. In Proceedings
of the seventh annual ACM Symposium on Principles
of distributed computing. pp. 8–17.
Pereira, Ó.M., Simões, D.A. and Aguiar, R.L., 2015.
Endowing NoSQL DBMS with SQL Features
Through Standard Call Level Interfaces. In SEKE
2015 - Intl. Conf. on Software Engineering and
Knowledge Engineering. pp. 201–207.
Rabin, M.O., 1989. Efficient dispersal of information for
security, load balancing, and fault tolerance. Journal
of the ACM (JACM), 36(2), pp.335–348.
Randell, B., Lee, P. and Treleaven, P.C., 1978. Reliability
Issues in Computing System Design. ACM Computing
Surveys, 10(2), pp.123–165.
Shih, K.-Y. and Srinivasan, U., 2003. Method and system
for data replication.
Sumathi, S. and Esakkirajan, S., 2007. Fundamentals of
relational database management systems, Springer.
Wolfson, O., Jajodia, S. and Huang, Y., 1997. An adaptive
data replication algorithm. ACM Transactions on
Database Systems (TODS), 22(2), pp.255–314.
Ylönen, T., 1992. Concurrent Shadow Paging: A New
Direction for Database Research.
DATA 2016 - 5th International Conference on Data Management Technologies and Applications
126