Length of Hospital Stay Prediction through Unorganised Turing

Machines

Luigi Lella

and Ignazio Licata

Azienda Sanitaria Unica Regionale delle Marche, Ancona, Marche, Italy

Institute for Scientific Methodology, Bagheria, Sicily, Italy

Keywords: Data Mining, Pattern Recognition and Machine Learning, Healthcare Management Systems.

Abstract: Length of hospital stay (LoS) prediction is one of the most important goals in Health Informatics, due to the

fact that through this it is possible to optimize the management of health structure resources. In Italian local

healthcare systems we are experimenting an health cost containment process and the minimization of care

costs is considered an important objective to be achieved. For this reason we have tested several datamining

models trained with hospital discharge data, capable to make accurate LoS predictions. In another work we

have reached encouraging results by the use of unsupervised models which detect autonomously the subset

of non-class attributes to be considered in these classification tasks. Here we are interested in studying also

another intelligent data analysis model, the Turing unorganised A-type machine, that is capable to represent

the acquired knowledge in a logic formalism. In other terms this solution can explain its predictions by the

use of a set of self-acquired knowledge base rules.

1 INTRODUCTION

Length of hospital stay (LoS) prediction is

considered an important strategic objective for the

optimization of healthcare system resources (Wright

et al., 2003, Gomez and Abasolo, 2009). As a matter

of fact this kind of knowledge can lead to costs

containment by the reduction of hospital stays and

readmission rates (Chang et al., 2002, Robinson et

al., 1966). This is considered a factor of vital

importance in Italian States like Marche Region

where the central maneuver of health costs

containment has led to the overall reorganization of

healthcare system processes and to a consistent

reduction of hospital structures and beds. But this

kind of prediction can have also important clinical

outcomes, not just economic results. It has been

proved that the knowledge of the potential discharge

date can improve also long term care activities or

discharge activities planning (Rowan et al., 2007).

Several solutions have been adopted to cope with

LoS prediction. A first group is based on statistical

algorithms such as t-test, one-way ANOVA and

multifactor regression (Arab et al., 2010).

A second kind of methods is based on IA algorithms

such as decision trees and artificial neural networks

(ANN). ANN have produced important results in the

context of postoperative phase of cardiac patients

(Rowan et al., 2007) or in emergency rooms (Wrenn

et al., 2005).

Indeed the best results have been achieved by the

adoption of ensemble models (Jiang et al., 2010).

Learning techniques in general are based on a

structural knowledge representation, both symbolic

and subsymbolic. Subsymbolic models reach the

best results in LoS prediction (Tu and Guerriere,

1992). These models can be further subdivided in

classification algorithms (Jiang et al., 2010, Tu and

Guerriere, 1992), association algorithms (Agrawal

and Srikant, 1994), clustering algorithms (Kohonen,

1999, Van Hulle, 2012, Licata and Lella, 2007).

In classification learning a system is trained with a

set of samples to provide a class output to new

presented inputs. Unfortunately this approach is

effective only when the correlation among the class

and non-class attributes is clearly known

beforehand.

In LoS context this prerequisite cannot be

guaranteed. Sometimes the adoption of new

therapies and diagnostic techniques can result in an

increase of hospital stay. For this reason could be

very difficult to determine beforehand a classified

402

Lella, L. and Licata, I.

Length of Hospital Stay Prediction through Unorganised Turing Machines.

DOI: 10.5220/0006577804020407

In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 5: HEALTHINF, pages 402-407

ISBN: 978-989-758-281-3

set of samples, especially when there is a lack of

guidelines or clinical pathways.

In association learning classes are not defined at all.

The system just tries to detect interesting

correlations among attributes. But these kind of

systems don’t cope very well with LoS classes

prediction.

Finally clustering algorithms are “unsupervised”, in

other words there is not a set of classified examples

which can be used in the training phase of the

system. Just selecting the class attribute (i.e. the LoS

class), the system is simply capable to extrapolate

different clusters characterized by certain LoS

values. In this way it can be easily argued that

human expert knowledge is not needed.

SOM models are the clustering algorithms

(Kohonen, 1999) which have been used in LoS

prediction (Gorunescu et al., 2010), but in another

work (Lella and Licata, 2017) we have successfully

deployed an unsupervised algorithm which can

operate in contexts, like the LoS one, where there is

not a strong correlation among the class attribute and

the other ones. The Growing Neural Gas (GNG)

model by B. Fritzke (1994) that we have used, is

able to detect the exact number of needed attributes

to predict the class of hospital stay. We have

achieved interesting prediction accuracy levels, but

this subsymbolic model was not able to explain the

result using a logic formalism.

In this preliminary work we are studying another

unsupervised clustering algorithm which is based on

the Turing A-type unorganised machine (Turing,

1948). The Turing’s unorganised machine is

generated “in a unsystematic and random way” from

a set of two-input NAND gates. Turing chose a

NAND gate because every other logical operations

can be accomplished by a set of NAND units. A

Turing A-type unorganised machine can be

considered “a kind of Boolean neural network

without a layered structure, due to the fact that

recurrent connections are allowed with no

constraints” (Teuscher and Sanchez, 2000). We used

a genetic algorithm (GA) (Mitchell, 1996) to

determine the best A-type network configuration.

GAs are used to find high quality solutions in

optimization and search problems by relying on bio-

inspired operators of natural selection like mutation,

crossover and selection.

After the evolution, i.e. the training phase, the best

A-type network configuration is able to make LoS

predictions, providing an explanation of the results

through a logic formalism.

2 DATASET PREPROCESSING

We have processed the hospital discharge summary

forms provided by our health structures. In particular

we considered just a part of this dataset, which were

the attributes being filled at the admission of the

patients. The set of non-class attributes was:

recovery regimen, admission discipline, admission

division, provenance, recovery type, trauma, hospital

day care reason, hospital day care recovery type,

main diagnosis, main intervention, complications,

sex, age, marital status, qualification. The hospital

stay period was codified in a discretized form as

class attribute: one day hospital stay, two day

hospital stay, three days hospital stay, below

regional threshold stay, over regional threshold stay

(5 days).

Weka platform (Witten et al., 2011) was used to

launch Zero-R, One-R and J48 algorithms which

need a conversion of all the discretized values in a

nominal form by the use of “NumericToNominal”

filter.

We assumed that all the technologies and processes

of care have been kept unchanged in 2013, and we

processed all the hospital discharge summary forms

of the year. The initial dataset, made up of 274962

instances of hospital stay, was reduced to 1374

instances in order to speed up the training phase of

the tested models by the use of Weka “Resample”

filter.

The chosen self-organizing networks (SOM, GNG

and A-type network) were trained using the

methodology suggested by Kohonen (1999). Each

input vector was built by a concatenation of a

context part representing the length of hospital stay

of the instance and a symbol part consisting of the

other attributes. The symbol part and the context part

formed a vectorial sum of two orthogonal

components such that the norm of the second part

predominated over the norm of the former. Both the

symbol part and the context part were encoded in a

binary way. In particular discrete variables having

relatively few values were encoded using a one-hot

code system. For example the context part was

codified by 5 bits, with just one of them capable to

be in high (1) state. The main diagnosis and the main

intervention attributes were instead coded in binary

(base-2) representations. In this way each of the

hospital discharge cases was codified by an array of

104 bits for the symbol part (the binary

representation of the non-class attributes) and an

array of 5 bits for the context part.

Length of Hospital Stay Prediction through Unorganised Turing Machines

403

3 TRAINING AND TEST

The 66% of the resampled dataset was used as a

training set, while the remaining 34% was used as

test set. Both the symbol part and the context part of

the training set was used for the self-organizing

networks (SOM, GNG and A-type network), while

just the context part of the test set was used to test

the predictive accuracy of these models.

The first tested algorithm was the ZeroR (Witten et

al., 2011) that is used in many cases as a benchmark.

ZeroR predicts always the majority class in case of a

nominal class attribute, and it is considered the

simplest predictive algorithm.

The second tested algorithm was the OneR (Witten

et al., 2011, Holte, 1993), standing for “one rule”,

that generates a decision tree defined by just one

level. Each attribute value is assigned by a rule to

the most frequent class attribute. At the end of the

training phase just the rule with the lowest error rate

is used to make the predictions in the test phase.

This method has revealed a predictive power that is

a little lower than the ones belonging to other

decision tree models.

The third tested algorithm was the J48 (Witten et al.,

2011), that is the eighth version of C4.5 (Quinlan,

1993) that is the last version distributed as free

within this family of algorithms. J48 is based on a

“divide and conquer” algorithm and its decision tree

is recursively generated. At each training step the

node having the highest information quantity is

selected and a branch for each of its possible values

is created. This process stops when all the instances

belong to the same attribute class value.

The fourth tested algorithm was the SOM (Kohonen,

1999). A Self Organizing Map is a mapping of a

higher-dimensional input space. A two-dimensional

mapping was tested in this work. During the training

phase different parts of the network can respond

similarly to certain input patterns. The training is

based on competitive learning, that is just one unit

for each training input vector is selected as winner,

the one whose weight vector is closer to the input.

The fifth tested model was the GNG (Fritzke, 1994)

that is based on the Competitive Hebbian Learning

(CHL) (Martinetz, 1993) and the Neural Gas (NG)

(Martinetz and Shulten, 1991) algorithms. The

former deploys an initial number of centers, i.e.the

weight vectors of the units having the same

dimension of the input space, and subsequently adds

topological connections among the couples of

closest centers to the presented inputs. The other

algorithm adapts the k nearest centers, with k

decreasing from a large initial value to a small final

value. In this way the network topology is generated

incrementally by CHL, with a locally varying

dimensionality. The NG algorithm is used to move

the centers of the nearest unit and its topological

neighbours to the input signal by fractions 

and 

respectively of the total distance.

At last we chose an A-type model consisting of 24

NAND gates.

The first three algorithms were tested with Weka

default parameters.

The output of ZeroR, OneR, J48 algorithms

provided by Weka Explorer are represented in

figures 1,2,3. As expected J48 seems to perform

better than the other two.

Figure 1: ZeroR prediction accuracy.

Figure 2: OneR prediction accuracy.

HEALTHINF 2018 - 11th International Conference on Health Informatics

404

Figure 3: J48 prediction accuracy.

SOM and GNG models have been developed by

two Java implementations. The resampled dataset

was pre-processed as explained in section 2,

obtaining a 109-bits training set and a 109-bits test

set. In the test set we replaced the 5 bits representing

the context part by a zero padding.

A 12x12 SOM was trained for 500 epochs with

the following parameters: start =1, start =0.1, start

=0.5, end =0.005.

The GNG model was tested with the following

parameters:





max



 The training was stopped when the

main square error, i.e. the main of the local square

error related to each unit (expected distortion error),

dropped below the threshold of E=1.

The prediction accuracy of 96,3597% of the GNG

model was considerably higher than the 87,5912%

of the SOM algorithm and the 56,9593% of the J48

algorithm.

Finally the A-type unorganised Turing machine was

tested by a Java implementation. The output of the

network was provided by just 5 units, to give a one-

hot answer. Each of the two inputs of the logic gates

was represented by the output of another NAND

gate or an input unit, that is one of the bits used to

codify the non-class attributes of the hospital

admission form. The overall network was made up

of 128 units, that is the sum of the 104 input units

(non-class attributes) and the 24 NAND gates.

Each of these units were codified by a 7-bit vector,

which is able to represent 128 units. The resulting

chromosome of the GA algorithm, modelling a

certain network configuration, was made up of an

array of 7x2x24=336 bits.

We evolved a population of 7000 chromosomes with

a mutation rate of 0.015.

We employed a tournament selection method (Miller

and Goldberg, 1995). Tournament selection involves

running several "tournaments" among a few

individuals, i.e. the chromosomes, chosen at random

from the population. The winner of each tournament,

that is the one with the best fitness rate, is selected

for crossover. Selection pressure is easily adjusted

by changing the tournament size. If the tournament

size is larger, weak individuals have a smaller

chance to be selected. We chose a tournament size

of 30 individuals. Crossover was implemented in the

single crossover point version.

We also employed the elitism (Baluja and Caruana

1995), meaning that at the end of each generation

the most performing individual was preserved by the

effects of mutation and crossover operators.

The fitness of the network was defined as the

number of correctly classified cases.

The evolution was stopped until we have reached a

prediction accuracy similar to the J48 one.

The evolution of the chromosomes population just

needed an average of 30 generations after that the

system was also able to justify its answers.

We took into consideration just the inputs of the

activated output unit to decode the answer.

Only few attributes were taken into consideration by

the system to give an answer as represented in figure

Figure 4: A-type prediction accuracy.

The right answers provided by the system were

subsequently validated by a team of human experts

chosen within the ASUR medical staff.

Length of Hospital Stay Prediction through Unorganised Turing Machines

405

4 CONCLUSIONS

We actually know that there are not universal

datamining techniques or methodologies to deal with

every kind of problem or task. For length of hospital

stay prediction we think that only unsupervised

models can achieve the best results, because there is

a lack precise guidelines and best practices capable

to infer exactly the period of staying of patients,

especially in those contexts characterized by rapid

changes in technologies and organizational settings.

In other words the knowledge of human experts in

these cases cannot be exploited to define an accurate

LoS prediction system.

For this reason in our research we have focused on

unsupervised machine learning algorithms, in

particular clustering algorithms and self-organizing

networks.

We have obtained encouraging results through the

use of subsymbolic models like the Growing Neural

Gas by B. Fritzke in a previous research work, but

now we are trying to develop more “intelligent” data

analysers which are also capable to give a human-

understandable explanation of their predictions. A

response produced according to a logic formalism

could indeed support decision makers in their health

resources and services management activities.

That is why we have chosen an A-type unorganised

Turing machine to process the admission forms of

hospital patients. The structure itself of the model

could be used like a kind of “dynamic” guideline to

be taken into consideration by a group of human

experts in order to optimally organize the healthcare

activities performed on patients.

The knowledge acquired by an unorganised Turing

machine through its pattern of NAND gates

connections could also be used to produce an

explanation of the reasons that led the system to its

LoS predictions as we have demonstrated in this

preliminary work.

We stopped the training just after having reached the

prediction accuracy of the most performant decision

tree algorithm represented by the J48. Also this

model could be used to build a knowledge

representation to approach the LoS prediction

problem. But its tree-like structure probably is too

simple to generate the complex set of rules to be

used in these kind of decision processes.

We think that these first results can be further

improved adopting another unorganised Turing

machine model, that is the B-type one (Turing,

1948). Also a B-type may contain any number of

NAND gates connected in any pattern. Turing just

added the further condition that each unit-to-unit

connection must pass through a modifier device. The

modifier state can be set in “pass mode”, in which

the output of a NAND gate passes through it

unchanged, or in “interrupt mode”, in which the

signal is always 1, no matter what the output of the

NAND gate is (Copeland and Proudfoot, 1996). The

presence of the modifiers can enable what Turing

described as “appropriate interference, mimicking

education”.

We are going to design and test a two-phase

training, similar to the one proposed by Teuscher

and Sanchez (2000), with a first “evolutive” phase

where the best network configuration is selected,

and a “learning” phase where the switches of

NAND gates are enabled and properly configured to

optimize the prediction accuracy rate.

ACKNOWLEDGEMENTS

Special thanks go to Eng. Antonio Di Giorgio for his

support in Weka datamining processes.

REFERENCES

Agrawal R., Srikant R., 1994. Fast Algorithms for Mining

Association Rules. Proc. Of the 20th VLDB

Conference, Santiago, Chile, 1994.

Arab M., Zarei A., Rahimi A., Rezaiean F., Akbari F.,

2010. Analysis of factors affecting length of stay in

public hospitals in Lorestan Province, Iran, Hakim

Res, Vol. 12, No.4, 2010, pp.27-32.

Baluja S., Caruana R., 1995. Removing the genetics from

the standard genetic algorithm ICML.

Chang K.C., Tseng M.C., Weng H.H., Lin Y.H., Liou

C.W., Tan T.Y., 2002. Prediction of length of stay of

first-ever ischemic stroke, Stroke, Vol. 33, No.11,

2002 pp.2670-4.

Copeland B.J., Proudfoot D., 1996, Alan Turing’s

forgotten ideas in computer science. Sci.Am. n.280,

pp. 76-81.

Fritzke B., 1994. A Growing Neural Gas Network Learns

Topologies. Part of: Advances in Neural Information

Processing Systems 7, NIPS, 1994.

Gomez V., Abasolo J.E., 2009. Using data mining to

describe long hospital stays, Paradigma, Vol. 3, No.1,

2009, pp.1-10.

Gorunescu F., El-Darzi E., Belciug S., Gorunescu M.,

2010. Patient grouping optimization using hybrid Self-

Organizing Map and Gaussian Mixture Model for

length of stay-based clustering system, Intelligent

Systems (IS), 2010 5

International Conference.

Holte R.C., 1993. Very simple classification rules perform

well on most commonly used datasets, Machine

Learning, 1993.

HEALTHINF 2018 - 11th International Conference on Health Informatics

406

Jiang X., Qu X., Davis L., 2010. Using data mining to

analyze patient discharge data for an urban hospital,

In: Proceedings of the 2010 International Conference

on Data Mining, 2010 Jul 12-15; Las Vegas, NV., pp.

139-44.

Kohonen T., 1999. The Self Organizing Map, Proc. Of the

IEEE, vol.78, No.9, 1999.

Lella L., Licata I., 2017. Prediction of Length of Hospital

Stay using a Growing Neural Gas Model, in

Proceedings of the 8

International Multi-Conference

on Complexity, Informatics and Cybernetics (IMCIC

2017), pp. 175-178

Licata I., Lella L., 2007. Evolutionary Neural Gas (ENG):

A model of self-organizing network from input

categorization, EJTP, Vol.4, No.14, 2007.

Martinetz T.M., 1993. Competitive Hebbian learning rule

forms perfectly topology preserving maps. In

ICANN’93: International Conference on Artificial

Neural Networks, pp. 427-434. Amsterdam. Springer,

1993.

Martinetz T.M., Schulten K.J., 1991. A neural gas network

learns topologies. In T. Kohonen, K. Kakisara, O.

Simula, and J. Kangas, Editors, Artificial Neural

Networks, pp. 397-402. North-Holland. Amsterdam,

1991.

Miller B., Goldberg D., 1995. Genetic Algorithms,

Tournament Selection, and the Effects of

Noise, Complex Systems. 9, pp. 193–212.

Mitchell M., 1996. An Introduction to Genetic Algorithms,

Cambridge, MA: MIT Press, 1996.

Quinlan J. R., 1993. C4.5: Programs for Machine

Learning, Morgan Kaufmann Publishers, 1993.

Robinson G.H., Davis L.E., Leifer R.P., 1966. Prediction

of hospital length of stay, Health Serv Res Vol.1,

No.3, 1966 pp.287-300.

Rowan M., Ryan T., Hegarty F., O'Hare N., 2007. The use

of artificial neural networks to stratify the length of

stay of cardiac patients based on preoperative and

initial postoperative factor,. Artif Intell Med, Vol. 40,

No.3, 2007 pp.211-21.

Teuscher C., Sanchez E., 2000. A Revival of Turing’s

Forgotten Connectionist Ideas: Exploring

Unorganized Machines. In Proceedings of the 6th

Neural Computation and Psychology Workshop,

NCPW6, University of Lige, 2000.

Tu J.V., Guerriere M.R., 1992. Use of a neural network as

a predictive instrument for length of stay in the

intensive care unit following cardiac surgery, Proc

Annu SympComput Appl Med Care, pp. 666-72,

1992.

Turing A., 1948. Intelligent Machinery, in Collected

Works of A.M.Turing:Mechanical Intelligence. Edited

by D.C.Ince.Elsevier Science Publishers, 1992.

Van Hulle M.M., 2012. Self Organizing Maps, Handbook

of Natural Computing, pp. 585-622, 2012.

Witten I.H., Frank E., Hall M.A., 2011. Data Mining

Practical Machine Learning Tools and Techniques,

Morgan Kaufmann Publishers, 2011.

Wrenn J., Jones I., Lanaghan K., Congdon C.B., Aronsky

D., 2005. Estimating patient's length of stay in the

Emergency Department with an artificial neural

network, AMIA Annu Symp Proc pp. 2005-1155,

2005.

Wright S.P., Verouhis D., Gamble G., Swedberg K.,

Sharpe N., Doughty R.N., 2003. Factors influencing

the length of hospital stay of patients with heart

failure, Eur. J Heart Fail, Vol. 5, No.2, 2003, pp. 201-

Length of Hospital Stay Prediction through Unorganised Turing Machines

407