• Each V
O
has outcoming edges to the correspond-
ing property value verticesV
P,Val
. These edges are
labeled according to the property names.
• Each vertex V
O
and V
P,Val
have outcoming edges
to the corresponding vertices representing the
type hierarchy.
These principles lead to normalization of data dur-
ing the conversion and allow fast retrieval of objects
with the same property value.
Therefore, the rule generation can be effectively
implemented using the sequential covering technique
similar to described in (Huysmans et al., 2008). Hav-
ing data indexed like this, rules can operate very ef-
fectively. Detailed discussion of the developed text
mining engine will be a subject of a separate paper.
5 CONCLUSIONS
In this paper, we have described a problem of prepa-
ration for clinical trials, proposed a methodology and
partially described the corresponding tool that facil-
itates the safety and effectiveness estimation of the
regenerative medicine methods. Such a tool neither
existed nor proposed so far.
Our tool is in development at the moment.
Currently we have implemented components for
metasearch, linguistic processing of the downloaded
papers and a part of the quality assessment mod-
ule. The text mining engine was evaluated on CLEF
eHealth 2014 data and showed average F1-measure
when extracting the most difficult characteristic of
about 0.5 - 0.6, which is rather close to the winners of
the this year shared task. The work is still in progress,
thus the results are preliminary. The more detailed
explanation and analysis of the text mining engine is
needed and it should probably deserve a separate pa-
per.
The future work includes development of the rest
of the system; applying the system to build up a test
data set; quality assessment of the results produced by
all the implemented processing steps; improving the
methods according to the results of the quality assess-
ment. The most important problem regarding practi-
cal application of the system being developed is the
reliability of the produced estimations of the regener-
ative medicine methods. One possible solution to this
may be building a dataset that would contain papers
about the well-known and manually estimated treat-
ments. However, it is unclear how to verify that the
method extracts meaningful rules. This in turn can be
addressed either using cross-validation on large data
(hardly believable that a sufficiently large data set can
be collected) or by attracting a group of experts.
ACKNOWLEDGEMENTS
The project is supported by Russian Foundation for
Basic Research grant 13-07-12156.
REFERENCES
Cauwenberghs, G. and Poggio, T. (2001). Incremen-
tal and decremental support vector machine learning.
Advances in neural information processing systems,
pages 409–415.
Chapman, W. W., Nadkarni, P. M., Hirschman, L.,
D’Avolio, L. W., Savova, G. K., and Uzuner, O.
(2011). Overcoming barriers to nlp for clinical text:
the role of shared tasks and the need for additional
creative solutions. Journal of the American Medical
Informatics Association, 18(5):540–543.
Crump, K. S., Chen, C., and Louis, T. A. (2010). The future
use of in vitro data in risk assessment to set human ex-
posure standards: challenging problems and familiar
solutions. Environ. Health Perspect, 118:1350–1354.
De Raedt, L., Kimmig, A., and Toivonen, H. (2007).
Problog: A probabilistic prolog and its application in
link discovery. In IJCAI, volume 7, pages 2462–2467.
Demner-Fushman, D., Chapman, W. W., and McDonald,
C. J. (2009). What can natural language processing do
for clinical decision support? Journal of biomedical
informatics, 42(5):760–772.
Etzioni, O., Banko, M., Soderland, S., and Weld, D. S.
(2008). Open information extraction from the web.
Communications of the ACM, 51(12):68–74.
Huysmans, J., Setiono, R., Baesens, B., and Vanthienen, J.
(2008). Minerva: Sequential covering for rule extrac-
tion. Systems, Man, and Cybernetics, Part B: Cyber-
netics, IEEE Transactions on, 38(2):299–309.
Jensen, P. B., Jensen, L. J., and Brunak, S. (2012). Mining
electronic health records: towards better research ap-
plications and clinical care. Nature Reviews Genetics,
13(6):395–405.
Kiritchenko, S., de Bruijn, B., Carini, S., Martin, J., and
Sim, I. (2010). Exact: automatic extraction of clinical
trial characteristics from journal publications. BMC
medical informatics and decision making, 10(1):56.
Li, D., Kipper-Schuler, K., and Savova, G. (2008). Con-
ditional random fields and support vector machines
for disorder named entity recognition in clinical texts.
In Proceedings of the workshop on current trends in
biomedical natural language processing, pages 94–
95. Association for Computational Linguistics.
Lindberg, D. A., Humphreys, B. L., and McCray, A. T.
(1993). The unified medical language system. Meth-
ods of information in medicine, 32(4):281–291.
Marchant, C. A., Briggs, K. A., and Long, A. (2008). In
silico tools for sharing data and knowledge on tox-
icity and metabolism: Derek for windows, meteor,
and vitic. Toxicology mechanisms and methods, 18(2-
3):177–187.
AssessmentoftheExtentoftheNecessaryClinicalTestingofNewBiotechnologicalProductsBasedontheAnalysisof
ScientificPublicationsandClinicalTrialsReports
347