models, DeVisa includes a web service for scoring,
composing the models, comparing, searching on the
stored models. DeVisa also includes a library that
builds PMML based on Weka classifier model classes
using the Java Reflection API. It represents work in
progress and at the time of writing of this material it
provides support only for several classification mod-
els (trees, naive bayes, rule set) and association rules.
The system will be extended to support most of the
models supported by the PMML specification.
The freshness of a model is one of the factors
where the accuracy of a model strongly depends
on. In the current implementation the freshness of a
model is a feature that depends only on a model pro-
ducer. DeVisa cannot control that aspect, so models
can get outdated very easily. However the consumer
can specify the freshness and DeVisa can rank the
models from that point of view. One possible solution
is that DeVisa will trigger the upload of the models
using web service interfaces of model producers.
Another item which is part of the DeVisa roadmap
is providing a mash-up interface for the scoring ser-
vices. The system will be able to recommend the ap-
propriate service by selecting the appropriate model
using a technique similar with the one used in the
look-up phase of the PMQL execution (See 2.3.1).
In the DeVisa system the collection of models to-
gether with the related ontologies form a knowledge
base. A good research direction is designing a tech-
nique of ensuring consistency of the knowledge base.
Each upload triggers a check-up against the existing
knowledge base and a conflict might occur. There are
a few ways of dealing with conflicts: reject the whole
knowledge base (FOL approach, but unacceptable un-
der the web setup), determine the maximal consistent
subset of the knowledge base, or use a para-consistent
logic approach. The knowledge base facilitates that
new knowledge is derived from it, so that DeVisa can
include in the future rule management and inference
engines (RuleML, 2007).
REFERENCES
Axis (2007). Apache axis. http://ws.apache.org/axis/.
Boag, S., Chamberlin, D., Fernandez, M., Florescu, D., Ro-
bie, J., and Simeon, J. (2007). Xquery 1.0: An xml
query language. http://www.w3.org/TR/xquery/.
Chaves, J., Curry, C., Grossman, R. L., Locke, D., and Vej-
cik, S. (2006). Augustus: the design and architecture
of a pmml-based scoring engine. In DMSSP ’06: Pro-
ceedings of the 4th international workshop on Data
mining standards, services and platforms, pages 38 –
46, New York, NY, USA. ACM.
Chieh-Yuan Tsai, M.-H. T. (2005). A dynamic web service
based data mining process system. In The Fifth In-
ternational Conference on Computer and Information
Technology CIT 2005, pages 1033– 1039.
DeVisa (2007). Devisa. http://devisa.sourceforge.net.
DMG (2007). Data mining group. http://www.dmg.org.
eXist (2007). Exist - open source native xml database.
http://www.exist-db.org/.
Frawley, W. and Piatetsky-Shapiro, G. (1991). Knowl-
edge Discovery In Databases: An Overview. Knowl-
edge Discovery In Databases. AAAI Press/MIT Press,
Cambridge, MA.
GO (2007). The gene ontology.
Gorea, D. (2007). Towards storing and interchanging data
mining models. In Proceedings of the 3rd Balkan
Conference in Informatics, volume 2, pages 229–236.
Griffiths-Jones, S., Grocock, R., van Dongen, S., Bate-
man, A., and Enright, A. (2006). mirbase: microrna
sequences, targets and gene nomenclature. Nucleic
Acids Res., 34:140 – 144.
Grigorios Tsoumakas, I. V. (2007). An interoperable and
scalable web-based system for classifier sharing and
fusion. Expert Systems with Applications, 33(3):716–
724.
Hand, D., Mannila, H., and Smyth, P. (2001). Principles of
Data Mining. The MIT Press.
Ian H. Witten, E. F. (2005). Data Mining Practical Machine
Learning Tools and Techniques. Morgan Kaufmann
series in data management systems. Elsevier, 2nd edi-
tion.
Nam, J.-W. (2005). Human microrna prediction through a
probabilistic co-learning model of sequence and struc-
ture. Nucleic Acids Research, 33(11):3570–3581.
PMML (2007). Pmml version 3.2.
http://www.dmg.org/pmml-v3-2.html.
Ritchie, W., Legendre, M., and Gautheret, D. (2007). Rna
stem-loops: To be or not to be cleaved by rnase iii.
RNA, 13:457–462.
RuleML (2007). Ruleml. http://www.ruleml.org.
Sewer, A. (2005). dentification of clustered micrornas using
an ab initio prediction method. BMC Bioinformatics.
Studer, R., Grimm, S., and Abecker, A., editors (2007). Se-
mantic Web Services. Concepts, Technologies and Ap-
plications, chapter 2. Springer.
Sung-Kyu, K. (2006). mitarget: microrna target-gene pre-
diction using a support vector machine. BMC Bioin-
formatics 2006, 7(1):411.
SW (2007). W3c semantic web activity.
http://www.w3.org/2001/sw/.
Weka (2007). Weka 3 - data mining software in java.
http://www.cs.waikato.ac.nz/ml/weka.
XUpdate (2003). Xupdate. http://xmldb-
org.sourceforge.net/xupdate/.
DeVisa - Concepts and Architecture of a Data Mining Models Scoring and Management Web System
281