leges the node itself grants to different tools and users.
An important feature of the data layout is the capa-
bility to protect its integrity. In this context, we need
to worry only to detect a corruption in the system,
since data correction can be performedwith help from
the lower layer, and in particular of the data replica-
tion mechanism. Corruption may occur due either to
bugs in the implementation of the API at the layer
above, or to malfunctioning of the hardware devices.
The latter problem can be solved by adding simple
error detection codes like SHA or CRC. As for the
former, automatic tools will check the status of the
system after risky operations to ensure the integrity
of the system. In all cases, when a corruption has
occurred, another copy will be used to maintain the
cloud service active, and an appropriate notification
will be issued.
Privileges. We define each individual user as a de-
veloper. Each developer can create his own projects
and, at the same time, participate in collaborative
projects. He can also link other projects to use as ex-
ternal libraries or plugins. Each collaborative project,
or simply project, may involve a large number of de-
velopers, each with his own set of privileges. Some
developers may be actively developing the software,
and therefore be granted full access. Others may be
simply reviewing the code or managing releases; in
this case they will be granted read-only access.
When a project is made available for others to use
as a library, a new set of privileges can be expressed.
For example, the code inherited may be read-only,
with write permissions but without repackaging capa-
bility, or fully accessible. Moreover, some developers
may also be allowed to completely delete some por-
tion of the stored data, change the access privileges
of other developers or of the project as a whole. Fi-
nally, automatic tools, even though not directly devel-
opers, have their specific access privileges. For ex-
ample, they may be allowed to harvest anonymous in-
formation about the engineering process for statistical
analysis, or propose new revision histories to better
present the project to reviewers.
4.3 Physical Storage
The underlying physical storage design is critical to
the overall performance, especially to support a large
scale VCS in the cloud, where failures are common.
Our design considers the various aspects of a dis-
tributed storage architecture: scalability, high avail-
ability, consistency, and security.
Scalability. Having a single large centralized stor-
age for all the repositories would lead to poor per-
formance and no scalability. Therefore, our design
contemplates a distributed storage architecture. User
repositories will be partitioned, and distributed across
different storage nodes. As the number of users
grows, new partitions can be easily added on-demand,
thus allowing for quick scale out of the system. Natu-
rally, the definition of the principles driving the repos-
itories partitioning needs careful design, so that when
new storage nodes are added, the data movement nec-
essary to recreate a correct partitioning layout is min-
imized.
Replication. Users expect continuous access to the
cloud VCS. However, in cloud environments built
with commodity hardware, failures occur frequently,
leading to both disk storage unavailability and net-
work splits—where certain geographical regions be-
come temporarily inaccessible. To ensure fault toler-
ance and high availability, user repositories are repli-
cated across different storage nodes and across data
centers. Therefore, even with failures and some stor-
age nodes unreachable, other copies can still be ac-
cessed and used to provide continuous service. The
replication scheme eliminates the single point of fail-
ure present in other centralized, or even distributed,
architectures. In our scheme, there are no master and
slave storage nodes.
Consistency. Software development involves fre-
quent changes and updates to the repositories. In a tra-
ditional scenario, changes are immediately reflected
globally (e.g., a git push command atomically updates
the remote repository). Consistency becomes more
complicated when the repository is distributed as mul-
tiple copies on different storage nodes, and multiple
users have access to the same repository and perform
operations simultaneously. If strong consistency is
wanted, the system’s availability is damaged (Brewer,
2010). In many distributed cloud storage systems,
eventual consistency has been used to enable high
availability (meaning that if no new updates are made
to an object, all accesses will eventually return the
last updated copy of that object). In our scheme, we
guarantee “read-your-write” consistency for any par-
ticular user. This means that old copies of an object
will never be seen by a user after he has updated that
object. We also guarantee “monotonic write consis-
tency”: for a particular process all the writes are seri-
alized, so that later writes supersede previous writes.
Finally, we guarantee “causal consistency”, meaning
1) that if an object X written by process A is read by
process B, which later writes object Y, then any other
process C that reads Y must also read X; and 2) that
once process A has informed process B about the up-
date of object Z, B cannot read the old copy of Z any-
more.
ICSOFT 2011 - 6th International Conference on Software and Data Technologies
164