to extract visual features manually from images was
conducted (Blum et al, 2010) (Lalys et al, 2012)
(Klank et al, 2008). Recently, with the development
of convolutional neural network (CNN), for various
image recognition tasks CNN is used. Studies using
CNN have also been proposed in the field of surgical
phase recognition (Twinanda et al, 2017). Many
studies based on CNN(Raju et al, 2016) (Sahu et al,
2016) use the data set of M2CAI tool to recognize
equipment and process at the frame level.
On the other hand, the surgical phase is a kind of
continuous function on time domain. Therefore, it is
essential to utilize temporal information for accurate
phase recognition so as to effectively extract
continuous dynamics. Specifically, surgical phase
recognition was achieved by Twinanda et al. They
constructed a 9-layer CNN for visual features and
designed a 2-level hierarchical HMM for modelling
temporal information (Twinanda et al, 2017).
Also, as a result of the development of a long-
short term memory (LSTM) network, it is possible to
model nonlinear dependence of long - range temporal
dependence. SV-RCNet (Jin et al, 2018), one of the
cutting-edge research on phase recognition of
surgical operations using LSTM, proposed to learn
both spatial (visual) information and temporal
information.
This paper aims at achieving automatic analysis
of surgical phases using intraoperative microscopic
video images as one of the operator supporting
functions of the project of the intelligent operating
room (SCOT) (Okamoto et al, 2017) for the awake
brain tumor removal surgery. This surgery removes
brain tumor, preserving maximal brain functions; for
this, the doctors communicate with awaking patients
during the surgery. Difficulties in this surgery are
caused by differences in individual patients’ brains.
As a result, the surgical phases becomes complicated;
therefore, only experienced doctors can perform this
surgery. It is difficult for surgical staffs other than the
experienced doctor to confirm the surgical situations
and predict the next surgical step; consequently, the
flow of the operation is stagnant. In order to solve
the above problems, phase recognition is also
required in surgical removal of waking brain tumor.
However, in brain tumor removal surgery, it is
difficult to recognize phases by frame-level
annotation like the conventional method. This is
because brain tumor removal surgery uses multiple
tools for each phase and the same tool, are also used
in different phases; namely, the phases and the tools
used do not have a one-to-one relationship.
Therefore, in the brain tumor removal surgery, in
order to recognize the phase, it is important to focus
on detailed information of the tool: specifically,
temporal motion information of the tool, the pose of
the tool, the type of the tool, and the like.
TSSD (Chen et al., 2018) is an object detection
method using spatial information and time series
information. However, it is not practical to perform
learning using both spatial and temporal information
like Chen et al.’s method, because an enormous
human annotation work is necessary for recognizing
phases of surgical operations. Therefore, in order to
use temporal information, this paper utilizes a fast
conventional tracking method and deep learning
method, but not LSTM.
2 DATA SET
None of data set for recognizing surgical tools of
awake brain tumor removal surgery has been
disclosed. Hence, We gave spatial annotation
(bounding box) to surgical tools in frames of videos
of actual awake brain tumor removal surgery
performed at Tokyo Women's Medical University
Hospital, and constructed a new data set that enables
higher level phase recognition.
Our dataset consists of 8 brain tumor removal
surgeries’ videos recorded at 30 fps. We pick up the
frames every 15 fps, randomly select 11175 frames
and labeled the 11175 frames with spatial bounding
boxes as tool candidates. The 11175 frames consist of
7755, 2270, 1150 frames for training, validation, and
test, respectively. The surgical tools included in the
data set are Bipolar, Electrode, Scissors, Suction tube,
Forceps, Clippers, which are mainly used for brain
tumor removal surgery. The number of annotated
instances per tool category is shown in Table 1.
Figure 1 shows an example of each tool in the data set.
The frequency of using surgical tools greatly varies
depending on tumor location, grade and so on.
Therefore, when learning is performed using n cross
validation method for each patient, bias could occur
in the current data set; therefore n closs validation is
not used in this paper.
Table 1: Number of annotated frames for each tool.
Number of annotated instance
Detecting and Tracking Surgical Tools for Recognizing Phases of the Awake Brain Tumor Removal Surgery
191