4 CONCLUSIONS
In this paper, we propose VisualMLTCGA, an easy-
to-use web tool for download, pre-processing,
visualization, processing and analysis of TCGA data.
Along with TCGA data, external data can also be
uploaded and analysed. Finally, relevant features can
be extracted from clinical and genomic datasets using
decision trees for classification purposes.
After analysing different TCGA processing and
visualization applications, we did not find any
existing tool that combined downloading, pre-
processing, processing and visualization of clinical
and genomic data, such as the VisualMLTCGA does.
Additionally, VisualMLTCGA includes the creation
of decision trees as a usable feature. Due to all these
reasons, this tool is suitable for researchers and
clinicians without bioinformatics background.
Nevertheless, the tool is currently being validated
and the potential modifications that arise from the
feedback captured on this phase will be the first part
of the future work. Additionally, we will include the
possibility of downloading other type of data from the
TCGA such as Copy Number Variation or DNA
Methylation data. Furthermore, we expect to include
several machine learning algorithms such as Random
Forest, K-Neighbours or SVC.
ACKNOWLEDGEMENTS
This project has received funding from the Regional
Council of Gipuzkoa through the Science,
Technology and Innovation program.
REFERENCES
Akveo/ngx-admin [TypeScript]. (2019). Retrieved from
https://github.com/akveo/ngx-admin (Original work
published 2016)
Angular. (n.d.). Retrieved 7 October 2019, from
https://angular.io/
Cibulskis, K., Lawrence, M. S., Carter, S. L., Sivachenko,
A., Jaffe, D., Sougnez, C., … Getz, G. (2013). Sensitive
detection of somatic point mutations in impure and
heterogeneous cancer samples. Nature Biotechnology,
31(3), 213–219. https://doi.org/10.1038/nbt.2514
Colaprico, A., Silva, T. C., Olsen, C., Garofano, L., Cava,
C., Garolini, D., … Noushmehr, H. (2016).
TCGAbiolinks: An R/Bioconductor package for
integrative analysis of TCGA data. Nucleic Acids
Research, 44(8), e71. https://doi.org/10.1093/
nar/gkv1507
Deng, M., Brägelmann, J., Schultze, J. L., & Perner, S.
(2016). Web-TCGA: An online platform for integrated
analysis of molecular cancer data sets. BMC
Bioinformatics, 17(1), 72. https://doi.org/10.1186/
s12859-016-0917-9
EASL-EORTC Clinical Practice Guidelines: Management
of Hepatocellular Carcinoma. (n.d.). Retrieved 8
October 2019, from EASL-The Home of Hepatology.
website: https://easl.eu/publication/easl-eortc-clinical-
practice-guidelines-management-of-hepatocellular-
carcinoma/
Fan, Y., Xi, L., Hughes, D. S. T., Zhang, J., Zhang, J.,
Futreal, P. A., … Wang, W. (2016). MuSE: Accounting
for tumor heterogeneity using a sample-specific error
model improves sensitivity and specificity in mutation
calling from sequencing data. Genome Biology, 17(1),
178. https://doi.org/10.1186/s13059-016-1029-6
GATK | BP Doc #11145 | Germline short variant discovery
(SNPs + Indels). (n.d.). Retrieved 22 October 2019,
from https://software.broadinstitute.org/gatk/best-
practices/workflow?id=11145
GATK | BP Doc #24216 | Pipeline Index. (n.d.). Retrieved
8 October 2019, from https://software.broadinstitute.
org/gatk/best-practices/
GDC. (n.d.). Retrieved 17 October 2019, from
https://portal.gdc.cancer.gov/
HCC dataset. (n.d.). Retrieved 8 October 2019, from
https://kaggle.com/mrsantos/hcc-dataset
Hornik [aut, K., cre, Buchta, C., Hothorn, T., Karatzoglou,
A., Meyer, D., & Zeileis, A. (2019). RWeka: R/Weka
Interface (Version 0.4-40). Retrieved from
https://CRAN.R-project.org/package=RWeka
Hothorn, T., Hornik, K., Strobl, C., & Zeileis, A. (2019).
party: A Laboratory for Recursive Partytioning
(Version 1.3-3). Retrieved from https://CRAN.R-
project.org/package=party
Hothorn, T., & Zeileis, A. (2014). partykit: A modular
toolkit for recursive partytioning in R (Working Paper
No. 2014–10). Retrieved from Working Papers in
Economics and Statistics website: https://www.
econstor.eu/handle/10419/101073
Koboldt, D. C., Zhang, Q., Larson, D. E., Shen, D.,
McLellan, M. D., Lin, L., … Wilson, R. K. (2012).
VarScan 2: Somatic mutation and copy number
alteration discovery in cancer by exome sequencing.
Genome Research, 22(3), 568–576. https://doi.org/
10.1101/gr.129684.111
Kuhn, M., Weston, S., Culp, M., Coulter, N., code), R. Q.
(Author of imported C., code), R. R. (Copyright holder
of imported C., & code), R. R. P. L. (Copyright holder
of imported C. (2018). C50: C5.0 Decision Trees and
Rule-Based Models (Version 0.1.2). Retrieved from
https://CRAN.R-project.org/package=C50
Larson, D. E., Harris, C. C., Chen, K., Koboldt, D. C.,
Abbott, T. E., Dooling, D. J., … Ding, L. (2012).
SomaticSniper: Identification of somatic point
mutations in whole genome sequencing data.
Bioinformatics, 28(3), 311–317. https://doi.org/
10.1093/bioinformatics/btr665
VisualMLTCGA: An Easy-to-Use Web Tool for the Visualization, Processing and Classification of Clinical and Genomic TCGA Data
419