Two-stage Neural-network based Prognosis Models using

Pathological Image and Transcriptomic Data: An Application in

Hepatocellular Carcinoma Patient Survival Prediction

Zhucheng Zhan

, Noshad Hosseni

, Olivier Poirion

, Maria Westerhoff

, Eun-Young Choi

Travers Ching

and Lana X. Garmire

2,*

School of Science and Engineering, Chinese University of Hong Kong, Shenzhen Campus, Shenzhen, P.R. China

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, U.S.A.

Department of Cellular and Molecular Medicine, UC-San Diego, La Jolla, CA, U.S.A.

Department of Pathology, University of Michigan, Ann Arbor, MI, U.S.A.

Adaptive Biotechnologies, Seattle, Washington, U.S.A.

*corresponding author

Keywords: Prognosis, Survival, Prediction, Neural Network, Modelling, Cox Proportional Hazards, Pathology, Image,

Gene Expression, Omics, RNA-Seq, Data Integration.

Abstract: Pathological images are easily accessible data type with potential as prognostic biomarkers. Here we extend

Cox-nnet, a neural network based prognosis method previously used for transcriptomics data, to predict

patient survival using hepatocellular carcinoma (HCC) pathological images. Cox-nnet based imaging

predictions are more robust and accurate than Cox proportional hazards model. Moreover, using a novel two-

stage Cox-nnet complex model, we are able to combine histopathology image and transcriptomics RNA-Seq

data to make impressively accurate prognosis predictions, with C-index close to 0.90 and log-ranked p-value

of 4e-21 in the testing dataset. This work provides a new, biologically relevant and relatively interpretable

solution to the challenge of integrating multi-modal and multiple types of data, particularly for survival

prediction.

1 INTRODUCTION

Previously, we developed a neural network model

called Cox-nnet to predict patient survival, using

transcriptomics data (T. Ching, et al., 2018). Cox-

nnet is an alternative to the conventional methods,

such as Cox proportional hazards (Cox-PH) methods

with LASSO or ridge penalization. We demonstrated

that Cox-nnet is more optimized for survival

prediction from high throughput gene expression

data, with comparable or better performance than

other conventional methods, including Cox-PH,

Random Survival Forests (H. Ishwaran and M. Lu,

2019) and CoxbBoost (R. D. Bin and R. De Bin,

2016). Moreover, Cox-nnet reveals much richer

biological information, at both the pathway and gene

levels, through analysing the survival related

“surrogate features” represented in the hidden layer

nodes in Cox-nnet.

One of the questions remaining unexplored, is

whether other data types that previously have been

shown prognostic values are also good input features

to be exploited by Cox-nnet. One of such data types

is pathological image data, eg. H&E staining data.

These images are much more easily accessible and

cheaper to obtain, compared to RNA-Seq

transcriptomics data.

Therefore in this study, we extend Cox-nnet to

take up pathological image features extracted from

imaging processing tool CellProfiler (C. McQuin, et

al., 2018), and compare the predictive performance of

Cox-nnet relative to Cox proportional hazards, the

second best method in the original study. Moreover,

we also propose a new kind of 2-stage complex Cox-

nnet model as the proof-of-concept. The 2-stage Cox-

nnet model combines the hidden node features from

the 1st-stage of Cox-nnet models in parallel, where

each Cox-nnet model is optimized to fit either image

or RNA-Seq based data, and then use these combined

features as the input nodes to train a 2nd-stage Cox-

nnet model. We applied the models on TCGA

hepatocellular carcinoma (HCC), which we had

296

Zhan, Z., Hosseni, N., Poirion, O., Westerhoff, M., Choi, E., Ching, T. and Garmire, L.

Two-stage Neural-network based Prognosis Models using Pathological Image and Transcriptomic Data: An Application in Hepatocellular Carcinoma Patient Survival Prediction.

DOI: 10.5220/0009381002960301

In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 3: BIOINFORMATICS, pages 296-301

ISBN: 978-989-758-398-8; ISSN: 2184-4305

previously processed data and accumulated

experience (K. Chaudhary et al., 2018; K. Chaudhary

et al., n.d.). In summary, our work here not only

extends the previous Cox-nnet model to process

pathological imaging data, but also creatively

addresses the multi-modal data integration challenges

for patient survival prediction.

2 METHODS

2.1 Datasets

The histopathology images and their associated

clinical information are downloaded from The Cancer

Genome Atlas (TCGA). A total of 384 liver tumor

images are collected. Among them 322 samples are

clearly identified with tumor regions by pathology

inspection. Among these samples, 290 have gene

expression RNA-Seq data, and thus are selected for

pathology-gene expression integrated prognosis

prediction. The gene expression RNA-Seq dataset is

also downloaded from TCGA, each feature was then

normalized into RPKM using the function

ProcessRNASeqData by TCGA-Assembler.

2.2 Tumor Image Processing

For each image, the tumor regions are labelled by

pathologists at University of Michigan. The tumor

regions are then extracted using Aperio software

ImageScope (C. Marinaccio and D. Ribatti, 2015). To

reduce computational complexities, each extracted

tumor region is divided into non-overlapping 1000 by

1000 pixel tiles. The density of each tile is computed

as the summation of red, green and blue values, and

10 tiles with the highest density are selected for

further feature extraction similar to others (K.-H. Yu,

2016). To ensure that the quantitative features are

measured under the same scale, the red, green and

blue value are rescaled for each images. Image #128

with the standard background color (patient barcode:

TCGA-DD-A73D) is selected as the reference image

for the others to be compared with. The means of red,

green and blue values of the reference image are

computed and the rest of the images are normalized

by the scaling factors of the its means of red, green,

blue values relative to those of the reference image.

2.3 Feature Extraction from Image Set

CellProfiler is used for feature extraction

(Kamentsky, L. et al., 2011). Images are first

preprocessed by 'UnmixColors' module to H&E

stains for further analysis. 'IdentifyPrimaryObject'

module is used to detect unrelated tissue folds and

then removed by 'MaskImage' module to increase the

accuracy for detection of tumour cells. Nuclei of

tumour cells are then identified by

'IdentifyPrimaryObject' module again with para-

meters set by Otsu algorithm. The identified nuclei

objects are utilised by 'IdentifySecondaryObject'

module to detect the cell body objects and cytoplasm

objects which surround the nuclei. Related biological

features are computed from the detected objects, by a

series of feature extraction modules, including

'MeasureGranularity', 'MeasureObjectSizeShape',

'MeasureObjectIntensity',

'MeasureObjectIntensityDistribution',

'MeasureTexture', 'MeaureImageAreaOccupied',

'MeasureCorrelation', 'MeasureImageIntensity' and

'MeasureObjectNeighbors'. To aggregate the features

from the primary and secondary objects, the related

summary statistics (mean, median, standard deviation

and quartiles) are then calculated to summarize data

from object level to image level, yielding 2429

features in total. Each patient is represented by 10

images, and the median of each feature is selected to

represent the patient's image biological feature.

2.4 Survival Prediction Models

Cox-nnet: The Cox-nnet model is implemented in the

Python package named Cox-nnet (T. Ching, et al.,

2018). Current implementation of Cox-nnet is a fully

connected, two-layer neural network model, with a

hidden layer and an output layer for cox regression.

The drop-out method is used to avoid overfitting. We

used hold-out method by randomly splitting the

dataset to 80% training set and 20% testing set. We

used grid search and 5-fold cross-validation to

optimise the hyper-parameters for the deep learning

model on the selected training set. The model is then

trained under the optimised hyperparameter setting

using the training set and further evaluated on the

remaining testing set, the procedure is repeated 5

times to assess the average performance. More details

about Cox-nnet is described earlier in Ching et al (T.

Ching, et al., 2018).

Cox Proportional Hazards Model: Since the

number of features produced by CellProfiler exceed

the sample size, an elastic net Cox proportional

hazard model is built to select features and compute

the prognosis index (PI) (S. Huang, et al., 2014).

Function cv.glmnet in the Glmnet R package is used

to performs cross-validation to select the tuning

Two-stage Neural-network based Prognosis Models using Pathological Image and Transcriptomic Data: An Application in Hepatocellular

Carcinoma Patient Survival Prediction

297

parameter lambda. The parameter alpha that

controls the trade-off between quadratic penalty and

linear penalty is selected using grid search. Same

hold-out setting is employed by training the model

using 80% randomly selected data and evaluated on

the remaining 20% testing set. The procedure is

repeated 5 times to calculate the mean accuracy of

the model.

2.5 Model Evaluation

Similar to the previous studies (T. Ching, et al., 2018;

K. Chaudhary, et al., 2018; K. Chaudhary, et al., n.d.),

we also use concordant index (C-index) and log-

ranked p-value as the metrics to evaluate model

accuracy. C-index signifies the fraction of all pairs of

individuals whose predicted survival times are

correctly ordered and is based on Harrell C statistics.

Conventionally, a C-index around 0.70 indicates a

good model, whereas a score around 0.50 means

randomness. As both Cox-nnet and Cox-PH model

quantify the patient's prognosis by log hazard ratios,

we use the predicted median hazard ratios to stratify

patients into two risk groups (high vs. low survival

risk groups). We also compute the log-rank p-value to

test if two Kaplan-Meier survival curves produced by

the dichotomised patients are significantly different.

2.6 Feature Evaluation

The input feature importance score is calculated by

drop-out. The values of a variable are set to its mean

and the log likelihood of the model is recalculated.

The difference between the original log likelihood

and the new log likelihood is considered as feature

importance (Bengio Y, et al., 2013). We select 100

features with the highest feature scores from Cox-

nnet for association analysis between pathology

image and gene expression features. We regress each

selected image feature (y) over all the gene expres-

sion features (x) using LASSO penalization, and then

use the R-square statistic as the correlation metric.

2.7 Data Integration

We construct 1st-stage Cox-nnet models using the

image data and gene expression data of HCC,

respectively. For each model, grid search is used on the

training set to optimize the hyper-parameters under 5-

fold cross-validation. Then we extract and combine the

nodes of the hidden layer from each Cox-nnet model

as the new input features for the 2nd-stage model. This

new Cox-nnet model is constructed and evaluated with

the same parameter-optimization strategies.

3 RESULTS

3.1 Overview of Cox-nnet Model on

Pathological Image Data

In this study, we tested if pathological images can be

used to predict cancer patient survival. As described

in the Methods, pathological images of 322 TCGA

HCC patients are individually annotated with tumor

contents by pathologists, before being subject a series

of processing steps. The tumor regions of these

images then undergo segmentation, and the top 10

tiles (as described in section 2.2) out of 1000 by 1000

tiles are used to represent each patient. These tiles are

next normalized for RGB coloring against a common

reference sample, and 2429 image features of

different categories are extracted by CellProfiler.

Summary statistics (mean, median, standard

deviation and quartiles) are calculated for each image

features, and the median values of them over 10 tiles

are used as the input imaging features for survival

prediction.

We applied these imaging features on Cox-nnet, a

neuron-network based prognosis prediction method

previously developed by our group. The architecture

of Cox-nnet is shown in Figure 1. Briefly, Cox-nnet

is composed of the input layer, one fully connected

hidden layer and an output “proportional hazards”

layer. We use 5-fold cross-validation (CV) to find the

optimal regularization parameters. Based on the

results on RNA-Seq transcriptomics previously, we

use dropout as the regularization method.

Additionally, to evaluate the results on pathology

image data, we compare Cox-nnet with Cox-PH

model, the previously 2nd-best prognosis model on

RNA-Seq data.

Figure 1: The architectures of Cox-nnet model: The sketch

of Cox-nnet model for prognosis prediction, based on a

single data type.

C2C 2020 - Workshop on COMP2CLINIC: Biomedical Researchers Clinicians Closing The Gap Between Translational Research And

Healthcare Practice

298

3.2 Comparison of Prognosis

Prediction between Cox-nnet and

Cox-PH over Pathology Imaging

Data

We use two accuracy metrics to evaluate the

performance of models in comparison: C-index and

log-rank P-values. C-index measures the fraction of all

pairs of individuals whose predicted survival times are

correctly ordered by the model. The higher C-index,

the more accurate the prognosis model is. On the other

hand, log-rank p-value tests if the two Kaplan-Meier

survival curves based on the survival risk-stratification

are significantly different (log-rank p-value <0.05). In

this study, we stratify the patients by the median score

of predicted prognosis index (PI) from the model. As

shown in Figure 2, the C-index values from the Cox-

PH model are much more variable (less stable),

compared to those from Cox-nnet. Moreover, the

median C-index score from Cox-nnet is higher (around

0.75) than Cox-PH (less than 0.70).

Figure 2: Comparison of prognosis prediction with different

models and data types.

Additionally, the discrimination power of Cox-nnet

on patient Kaplan-Meier survival difference (Figure

3 C and D) is much better than Cox-PH model

(Figure 3 A and B), using median PI based survival

risk stratification. In the training dataset, Cox-nnet

achieves a log-rank P-value of 1e-13, compared to 3e-

5 for Cox-PH; in the testing dataset, Cox-nnet

achieves a log-rank P-value of 1e-6, whereas Cox-PH

gives a result of 0.01.

We next investigated the top 100 image features

according to Cox-nnet ranking (Figure 4).

Interestingly, the most frequent features are those

involved in textures of the image, accounting for 43%

of raw input features. Intensity and Area/Shape

parameters make up the 2nd and 3rd highest categories,

Figure 3: Comparison of Kaplan-Meier survival curves

resulting from Cox-PH and Cox-nnet models, based on

pathological images.

with 21% and 15% features. Density, on the other

hand, is less important (6%). It is also worthy to note

that among 49 selected features from the

conventional Cox-PH model, 63% (31) are also

found in the top 100 features found by Cox-nnet.

Figure 4: Categories of the top 100 most important image

features in Cox-nnet.

3.3 Prognosis Prediction by Combining

Histopathology Imaging and Gene

Expression RNA-Seq Data

Multi-modal and multi-type data integration is

challenging, particularly so for survival prediction.

We next ask if we can utilize Cox-nnet workframe

for such purpose, exemplified by pathology imaging

and gene expression RNA-Seq based survival

prediction.

Two-stage Neural-network based Prognosis Models using Pathological Image and Transcriptomic Data: An Application in Hepatocellular

Carcinoma Patient Survival Prediction

299

Towards this, we propose a two-stage Cox-nnet

complex model, inspired by other two-stage models

in genomics fields (T. Schulz-Streeck, et al., 2013;

R. Wei, et al., 2016; F. R. Pinu, et al., 2019). The

two-stage Cox-nnet model is depicted in Figure 5

below.

Figure 5: The architectures of 2-stage Cox-nnet complex

model for prognosis prediction, which integrates multiple

data types (eg. pathology image and gene expression).

For the first stage, we construct two Cox-nnet

models in parallel, using the image data and gene

expression data of HCC, respectively. For each

model, we optimize the hyper-parameters using grid

search under 5-fold cross-validation. Then we

extract and combine the nodes of the hidden layer

from each Cox-nnet model as the new input features

for the second-stage Cox-nnet model. We construct

and evaluate the second-stage Cox-nnet model with

the same parameter-optimisation strategy as in the

first-stage.

As shown in Figure 6, the resulting two-stage

Cox-nnet model yields impressive performance,

judged by the C-index values on both training set and

testing set, both of which are close to 0.90. In fact,

from our experience over the years, none of the

prognosis models based on one omic data type had

yielded a predictive C-index score nearly as high.

This outstanding performance of the two-stage Cox-

nnet model is also confirmed by the log-rank P-values

in the Kaplan-Meier survival curves (Figure 6). In the

training dataset, Cox-nnet achieves a log-rank P-

value of 6e-17; in the testing dataset, Cox-nnet has an

even higher log-rank P-value of 4e-21. The fact that

the testing dataset obtains a better log-rank p-value

than the training dataset, indicates that the over-fitting

is less of a concern. Note: the C-index values in

Figure 6 are different from those in Figure 2, since

the objective in these plots is to differentiate the

stratified risk groups post the cox-nnet model, rather

than fitting the survival data directly.

Figure 6: Kaplan-Meier survival curves resulting from the

2-stage Cox-nnet model, combining pathological images

and gene expression RNA-Seq data from same HCC

patients. A. training set. B. testing set.

We also investigate the correlations between the

top imaging features with those RNA-Seq gene

expression features. For this we regress each selected

image feature (y) over all the gene expression features

(x) using LASSO penalization. Interestingly, among

the top 20 imaging features, none but one feature

(StDev_Nuclei_AreaShape_MajorAxisLength) has a

decent correlation value (R-square=0.30) with gene

expression features. This result shows that imaging

features extracted using CellProfiler have mostly

orthogonal (or non-overlapping) predictive

information to the RNA-Seq gene expression

features. This also supports the observed significant

increase in C-index (Figure 2) and log-ranked p-

values (Figure 6), after adding RNA-Seq features to

imaging features.

4 CONCLUSIONS

Driven by the objective to build a uniform workframe

to integrate multi-modal and multi-type data to

predict patient survival, we extend Cox-nnet model, a

C2C 2020 - Workshop on COMP2CLINIC: Biomedical Researchers Clinicians Closing The Gap Between Translational Research And

Healthcare Practice

300

neural-network based survival prediction method, on

pathology imaging data and beyond. Using TCGA

HCC pathology images as the example, we

demonstrate that Cox-nnet is more robust and

accurate at predicting testing dataset, relative to Cox-

PH, the standard method for survival prediction

(which was also the second-best method in the

original RNA-Seq transcriptomic study (T. Ching, et

al., 2018)). Moreover, we propose a new two-stage

complex Cox-nnet model to integrate imaging and

RNA-Seq transcriptomic data, and show case its

outstanding predictive accuracy on testing dataset (C-

index almost as high as 0.90). The two-stage Cox-

nnet model combines the transformed, hidden node

features from the first-stage of Cox-nnet models for

imaging or RNA-Seq based data respectively and use

these combined features as the inputs to train a

second-stage Cox-nnet model.

Rather than using convolutional neural network

(CNN) models that are more complex, we utilized a

less complex but perhaps more biologically relevant

approach, where we extract imaging features using

the tool CellProfiler. These features are then fed in a

relatively simple, two-layer neural network model,

and still achieve credible predictive performance.

Such success argues that in biological domain, it is

possible to use relatively simple neural network

models with have prior biological relevance (such as

in the input features). In summary, our work here not

only extends the previous Cox-nnet model to process

pathological imaging data, but also creatively

addresses the multi-modal data integration challenges

for patient survival prediction.

ACKNOWLEDGEMENTS

LXG would like to thank the support by grants

K01ES025434 awarded by NIEHS through funds

provided by the trans-NIH Big Data to Knowledge

(BD2K) initiative (www.bd2k.nih.gov), R01

LM012373 and LM awarded by NLM, R01

HD084633 awarded by NICHD to L.X. Garmire.

REFERENCES

T. Ching, X. Zhu, and L. X. Garmire, “Cox-nnet: An

artificial neural network method for prognosis

prediction of high-throughput omics data,” PLoS

Comput. Biol., vol. 14, no. 4, p. e1006076, Apr. 2018.

H. Ishwaran and M. Lu, “Random Survival Forests,” Wiley

StatsRef: Statistics Reference Online. pp. 1–13, 2019.

R. D. Bin and R. De Bin, “Boosting in Cox regression: a

comparison between the likelihood-based and the

model-based approaches with focus on the R-packages

CoxBoost and mboost,” Computational Statistics, vol.

31, no. 2. pp. 513–531, 2016.

C. McQuin, A. Goodman, V. Chernyshev, L. Kamentsky,

B. A. Cimini, K. W. Karhohs, M. Doan, L. Ding, S. M.

Rafelski, D. Thirstrup, W. Wiegraebe, S. Singh, T.

Becker, J. C. Caicedo, and A. E. Carpenter,

“CellProfiler 3.0: Next-generation image processing for

biology,” PLoS Biol., vol. 16, no. 7, p. e2005970, Jul.

2018.

K. Chaudhary, O. B. Poirion, L. Lu, and L. X. Garmire,

“Deep Learning-Based Multi-Omics Integration

Robustly Predicts Survival in Liver Cancer,” Clin.

Cancer Res., vol. 24, no. 6, pp. 1248–1259, Mar. 2018.

K. Chaudhary, O. B. Poirion, L. Lu, S. Huang, T. Ching,

and L. X. Garmire, “Multi-modal meta-analysis of 1494

hepatocellular carcinoma samples reveals vast impacts

of consensus driver genes on phenotypes.” .

C. Marinaccio and D. Ribatti, “A simple method of image

analysis to estimate CAM vascularization by APERIO

ImageScope software,” Int. J. Dev. Biol., vol. 59, no. 4–

6, pp. 217–219, 2015.

K.-H. Yu, C. Zhang, G. J. Berry, R. B. Altman, C. Ré, D.

L. Rubin, and M. Snyder, “Predicting non-small cell

lung cancer prognosis by fully automated microscopic

pathology image features,” Nat. Commun., vol. 7, p.

12474, Aug. 2016.

S. Huang, C. Yee, T. Ching, H. Yu, and L. X. Garmire, “A

novel model to combine clinical and pathway-based

transcriptomic information for the prognosis prediction

of breast cancer,” PLoS Comput. Biol., vol. 10, no. 9, p.

e1003851, Sep. 2014.

T. Schulz-Streeck, J. O. Ogutu, and H.-P. Piepho,

“Comparisons of single-stage and two-stage

approaches to genomic selection,” Theor. Appl. Genet.,

vol. 126, no. 1, pp. 69–82, Jan. 2013.

R. Wei, I. De Vivo, S. Huang, X. Zhu, H. Risch, J. H.

Moore, H. Yu, and L. X. Garmire, “Meta-dimensional

data integration identifies critical pathways for

susceptibility, tumorigenesis and progression of

endometrial cancer,” Oncotarget, vol. 7, no. 34, pp.

55249–55263, Aug. 2016.

F. R. Pinu, D. J. Beale, A. M. Paten, et al., “Systems

Biology and Multi-Omics Integration: Viewpoints from

the Metabolomics Research Community,”.

Metabolites. 2019;9(4):76. Published 2019 Apr 18.

Y. Bengio, N. Boulanger-Lewandowski, R. Pascanu,

editors. “Advances in optimizing recurrent networks,”

Acoustics, Speech and Signal Processing (ICASSP),

2013 IEEE International Conference on; 2013: IEEE.

L. Kamentsky, et al. Improved structure, function and

compatibility for CellProfiler: modular high-

throughput image analysis software. Bioinformatics 27,

1179–1180 (2011).

Two-stage Neural-network based Prognosis Models using Pathological Image and Transcriptomic Data: An Application in Hepatocellular

Carcinoma Patient Survival Prediction

301