Provenance and Formal Methods: The Case of Digital Image Processing
Carlos S´aenz-Ad´an
Departamento de Matem´aticas y Computaci´on, Universidad de La Rioja, Logro˜no, Spain
1 STAGE OF THE RESEARCH
There is a well know problem of balancing efficiency
and reliability when researchers attempt to combine
scientific computation with formal verification of al-
gorithms. Usually, verified programs cannot compete
in performance with respect to applications in pro-
duction. Both aspects (efficiency and reliability) are
in particular very important in bioinformatics applica-
tions (for instance, in the context of biomedical image
processing).
With the aim of addressing these problems, this
research aims to set up an environment in which both
scientific computation with digital images and formal
verification of their algorithms are combined using
techniques and standards of provenance. This envi-
ronment design could be able to facilitating the repro-
ducibility of the processes, and also to explaining the
reasons why that processing has been performed.
2 STATE OF THE ART
In the project entitled “Formalisation of Mathemat-
ics”, included in the 7th European program called
(ForMath), formal proof libraries were developed (us-
ing proof assistants such as Coq, Isabelle and ACL2)
in several mathematical fields. One of the main aims
of this project was the formalization of scientific com-
puting algorithms for increasing trust in Symbolic
Computation and Computer Algebra systems. In par-
ticular, the Spanish node of the project (led by the
University of La Rioja team) made relevant contri-
butions on homological processing of digital images
(Lamb´an et al., 2014) (Poza et al., 2014) and PhD the-
sis (Poza, 2013)).
An important characteristic of the project is that a
real collaboration with a biologists team, which stud-
ies synthesis of drugs for alleviating neurodegenera-
tive diseases (such as Alzheimer), provides us with
real examples for processing digital images. The
company SpineUp (SpineUp), led by the researcher
Miguel Morales, has posed problems from biomedi-
cal images that have been overcome by the Computer
Science group at University of La Rioja. These con-
tributions have been presented in the Spanish Neuro-
science conference (Mata et al., 2013) (Mata et al.,
2011). Some implemented algorithms used in these
developments were formally verified within the For-
Math project.
This research line which joins (scientific) compu-
tation and deduction (verification of programs and al-
gorithms) has been recently granted by the Spanish
Government (project MTM2014-54151-P).
In order to understand our interest in these issues,
we must explain that the Computer Science group
from University of La Rioja develops different lines
of research. In this team, researchers from the fields
of scientific computing and formal methods in Soft-
ware Engineering coexist with others who come from
research in Information Systems, and more specifi-
cally in the area of data and knowledge metamodel-
ing, and with others coming from the processes and
workflow management fields.
Modeling can be related to provenance. This re-
lationship comes from the concept of Occurrence-
Oriented system, which has been fruitfully exploited
in collaboration with the Noesis research group at
the University of Zaragoza, led by Eladio Dom´ınguez
(Dom´ınguez et al., 2014). Regarding process man-
agement, previous contributions are related to service
oriented architectures (SOA), specifically devoted to
ensure an agreed security policy covering the whole
chain, from the service establishment to its consump-
tion (Rodriguez-Priego and Garc´ıa-Izquierdo, 2007).
All these interests converge to the provenance
field. This is a huge research topic, with fast growth
and not mature enough at this moment. In principle,
several ways have been considered to organize prove-
nance. On the one hand, the provenance of informa-
tion is considered (data-oriented workflows). It has
been studied in the case of formal theorem proving
(Ikeda et al., 2013) (which is related to the ForMath
project), in the case of execution of programs (Ch-
eney et al., 2011) (also using formal methods (Acar
et al., 2013)), and finally in the case of databases (Ch-
eney et al., 2009). Data provenance and workflow
provenance may be also distinguished (Buneman and
3
Sáenz-Adán C..
Provenance and Formal Methods: The Case of Digital Image Processing.
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)
Davidson, 2010). As discussed below, in our case
both data provenance (digital images) and the pro-
cesses involved in dealing with data (image process-
ing) are relevant.
3 RESEARCH PROBLEM
The research line applying formal methods in image
processing is being very successful but have some
drawbacks (well known in the international commu-
nity working in these topics). This weakness is re-
lated to how combining efficiency (required in real
applications, in particular in bioinformatics) with re-
liability (increased by verifying programs using proof
assistants). Reliability is an important property in
scientific computing (in particular, in the area of
biomedicine), but it conflicts with the pursuit of effi-
ciency. It is well known that verified programs cannot
compete in speed with applications in production at
scientific laboratories (see, for example, (Poza et al.,
2014)).
In this research we want to reduce the distance be-
tween deduction and final applications. To this aim,
we propose to use information system techniques,
in particular modern techniques coming from prove-
nance.
4 EXPECTED OUTCOME
Our proposal, which to the best of our knowledge is
new in the literature, would consist of including in
a single network of provenance both causal chains
(in other words, those that are producing the vari-
ous transformations in digital images) and chains of
arguments (those that explain why a certain process
has been implemented). This would not only facil-
itate the reproducibility of experiments (one of the
explicit aims of provenance), but also reproduce the
reasons why different workflow steps have been de-
signed, making it easier for an external agent the un-
derstanding of the process as a whole.
On the more basic case, a conceptual relationship
of provenance could be a reference to an external doc-
ument (for example, a technical report, a well-known
algorithm or a journal paper) that explains why the as-
sociated functional process is consistent (with respect
to the workflow objectives). In a more formalized
context, the reference may contain a formal proof on
a theorem prover (Coq, Isabelle, ACL2, etc.) showing
that the algorithm is correct. Finally, in addition to a
formal proof, a verified program could be available
as an “explanatory artifact”. This certified program
(whose correctness is ensured by means of a theorem
prover) could be applied to obtain results that can be
compared against the outputs obtained in the comput-
ing part of the provenance net.
In a paradigmatic case, an agent could consult the
network of provenance to know what programs are
applied for a certain image (including the query of
previous images used as parameters, if any), may re-
quest the re-calculation (with the selected software
platform) and in addition (and this is the most inno-
vative aspect of our proposal) one could make an au-
tomated testing against a verified program that could
show that the production program is appropriate in
that particular case. Since the verified program can
be (and, in general, will be) more inefficient than the
programs in production, one can consider performing
off-line testing, with the aim of not harmingthe agility
of the reproduction of the functional part of the exper-
iment.
It should be noted that our proposal avoids the
problem (intractable with the current state of the tech-
nology) of formal verification of algorithms of scien-
tific computing in production, since we do not look
for ensuring that the operational program is equiva-
lent, for all inputs, to the verified program. However,
it allows us an flexible and uncoupled integration of
formal methods that would increase trust in the ex-
periment as a whole.
5 OUTLINE OF OBJECTIVES
Based on the problems identified above, we have de-
fined the following four objectives:
Objetive A. Definition of a setting where deduc-
tion and processing provenance coexist.
Objetive B. Devising of a representation language
which integrates (scientific) computing and de-
duction (verified algorithms and programs).
Objetive C. Set up a query language over the pre-
vious representation.
Objetive D. Develop a proof of concept. We are
going to use the particular case of homological
processing of images, with a prototype which is
going to be useful to provide a provenance net.
This is going to include both already-made formal
proofs and ex-novo proofs.
6 METHODOLOGY
The proposed project is multidisciplinary, hence, we
will use, from a methodological point of view, tech-
ICSOFT2015-DoctoralConsortium
4
niques and methods from different scientific fields. It
will be necessary to use the most appropriated one in
each stage of development.
For instance, the basic techniques of literature
search can be combined with more advanced tech-
niques as the systematic review (Kitchenham et al.,
2002). When tasks related to mathematics are ad-
dressed, we have to use more formal methods. That
may be combined with methods from mechanized
theorem proving. Since the project will likely require
software development and systems integration, it will
also be necessary to apply methods from the design of
information systems and software engineering, such
as requirements analysis and conceptual modeling.
6.1 Stages
Each stage corresponds with one objective, and has
been split in several sub-stages.
E1. Define a contextual environment with the aim
of integrating deduction and computing prove-
nance.
E1.1. Study previous works in the research group,
related both to formal verification of algorithms
and to information systems.
E1.2. Systematic review of the literature related
to provenance.
E1.3. Study of the expressiveness of different pro-
posals in the literature, trying to adapt some of
them (or a mixture of several ones) to achieve our
objectives.
E1.4. Set up a semi-formal definition of a prove-
nance model which allows integrating the work-
flow of a process from a functional perspective,
together with explanations describing why the
process has been produced in that way.
E2. Set up a formal definition of a language which
represents models corresponding to the previous
stage.
E2.1. Study of different languages inside the lit-
erature to represent provenance networks (at least
PLM (Del Rio et al., 2010), OPM (Moreau et al.,
2011) and W3C Prov (Missier et al., 2013)).
E2.2. Propose a representation language with the
aim of dealing with the second objective.
E3. Define a query and definition language for
networks constructed with the previous represen-
tation language.
E3.1. Analyze the available tools (in particular de-
veloped by the group, such as RCM (Rodriguez-
Priego et al., 2013)) for managing data and pro-
cesses.
E3.2. Formal definition of a query language for
provenance networks.
E4. Development of a prototype which will use
the proposals and definitions mentioned above.
E4.1. Deployment, in a particular network, of
some of the processes already developed for the
manipulation of biomedical images, including
formal proofs with Isabelle / HOL, Coq or ACL2.
E4.2. Development of new features with formal
proofs.
E4.3 Justifying that the prototype can also inte-
grate new sources of data and arguments devel-
oped in the previous stage.
6.1.1 Schedule
Based on the objectives, a doctoral planning has been
done. It has been divided into four stages, each one
corresponding to one year.
Throughout the PhD planning there are in addi-
tion tasks on coordination and supervision meetings
with thesis advisors and other members of the re-
search group. It is also foreseen the participation in
training courses and conferences to expose partial re-
sults obtained.
Furthermore, there will be tasks related to doc-
umentation generation (internal reports, journal and
proceeding papers) and to the development of pro-
grams and formal poofs.
REFERENCES
Acar, U. A., Ahmed, A., Cheney, J., and Perera, R. (2013).
A core calculus for provenance. Journal of Computer
Security, 21(6):919–969.
Buneman, P. and Davidson, S. B. (2010). Data provenance–
the foundation of data quality. )ˆ(Eds.):‘Book Data
provenance–the foundation of data quality’(2013,
edn.).
Cheney, J., Ahmed, A., and Acar, U. A. (2011). Provenance
as dependency analysis. Mathematical Structures in
Computer Science, 21(06):1301–1337.
Cheney, J., Chiticariu, L., and Tan, W.-C. (2009). Prove-
nance in databases: Why, how, and where, volume 4.
Now Publishers Inc.
Del Rio, N., da Silva, P. P., and Porras, H. (2010). Browsing
proof markup language provenance: Enhancing the
experience. In Provenance and Annotation of Data
and Processes, pages 274–276. Springer.
Dom´ınguez, E., P´erez, B., Rubio,
´
A. L., Zapata, M. A., Lav-
illa, J., and Allu´e, A. (2014). Occurrence-oriented de-
sign strategy for developing business process monitor-
ing systems. Knowledge and Data Engineering, IEEE
Transactions on, 26(7):1749–1762.
ProvenanceandFormalMethods:TheCaseofDigitalImageProcessing
5
ForMath. http://wiki.portal.chalmers.se/cse/pmwiki.php/
ForMath/ForMath.
Ikeda, R., Das Sarma, A., and Widom, J. (2013). Logical
provenance in data-oriented workflows? In Data En-
gineering (ICDE), 2013 IEEE 29th International Con-
ference on, pages 877–888. IEEE.
Kitchenham, B. A., Pfleeger, S. L., Pickard, L. M., Jones,
P. W., Hoaglin, D. C., El Emam, K., and Rosenberg, J.
(2002). Preliminary guidelines for empirical research
in software engineering. Software Engineering, IEEE
Transactions on, 28(8):721–734.
Lamb´an, L., Rubio, J., Mart´ın-Mateos, F.-J., and Ruiz-
Reina, J.-L. (2014). Verifying the bridge between sim-
plicial topology and algebra: the eilenberg–zilber al-
gorithm. Logic Journal of IGPL, 22(1):39–65.
Mata, G., Cuesto, G., Morales, M., Rubio, J., and Heras, J.
(2011). Synapcountj: un software para el estudio de
la densidad sin´aptica. In XIV Congreso de la Sociedad
Espa˜nola de Neurociencia (SENC 2011). http://
www.senc2011.com/docs/programa
senc2011.pdf.
Mata, G., Fern´andez, P., Romero, A., Rubio, J., Cuesto,
G., and Morales, M. (2013). Nucleusj: desar-
rollo de un plugin en fiji para el an´alisis de mod-
elos de muerte neuronal. In XV Congreso de la
Sociedad Espa˜nola de Neurociencia (SENC 201).
http://www.senc2013.com/.
Missier, P., Belhajjame, K., and Cheney, J. (2013). The w3c
prov family of specifications for modelling prove-
nance metadata. In Proceedings of the 16th Inter-
national Conference on Extending Database Technol-
ogy, pages 773–776. ACM.
Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y.,
Groth, P., Kwasnikowska, N., Miles, S., Missier, P.,
Myers, J., et al. (2011). The open provenance model
core specification (v1. 1). Future Generation Com-
puter Systems, 27(6):743–756.
Poza, M. (2013). Certifying homological algorithms to
study biomedical images. PhD thesis, Universidad de
La Rioja.
Poza, M., Dom´ınguez, C., Heras, J., and Rubio, J. (2014).
A certified reduction strategy for homological im-
age processing. ACM Transactions on Computational
Logic (TOCL), 15(3):23.
Rodriguez-Priego, E. and Garc´ıa-Izquierdo, F. J. (2007).
Securing code in services oriented architecture. In
Web Engineering, pages 550–555. Springer.
Rodriguez-Priego, E., Garc´ıa-Izquierdo, F. J., and Rubio,
´
A. L. (2013). References-enriched concept map: a
tool for collecting and comparing disparate definitions
appearing in multiple references. Journal of Informa-
tion Science, page 0165551513487848.
SpineUp. http://spineup.es.
ICSOFT2015-DoctoralConsortium
6