Real-time Nonlinear Signal Processing Super Resolution of 8K

Endoscope Cameras

Seiichi Gohshi

, Chinatsu Mori

, Kenkichi Tanioka

and Hiromasa Yamashita

Kogakuin University, 1-24-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo, 163-8677, Japan

Medical Consortium Network Group, Japan

Kairos Co., Ltd., 4-13-2 Shiba, Minato City, T

oky

o-to 105-0014, Japan

Keywords: Video, Image, Non-linear Signal Processing, Super Resolution, Focus, 8K, Endoscope.

Abstract:

Presently, 8K is the highest resolution of video systems. Originally, 8K research started only for broadcasting

services. However, aside from broadcasting, 8K resolution video systems have also an important application in

medical ﬁeld, and endoscopic surgery is in its practical stage. In endoscopic surgery, a 0.02 mm thread is used.

This 0.02 mm thread is not visible when using a 4K endoscope. However, when an 8K endoscope is employed,

the thread is visible; however, ﬁne focus is still necessary. Moreover, adjusting the focus of 8K by using the

common tools only, such as view ﬁnders or small monitors, is very difﬁcult. Hence, commercial HD/4K

cameras are equipped with auto-focus functions; however, the central areas are not always the focus points.

The focus is very deep and the focus points change during endoscopic surgeries. Owing to these reasons, a

surgeon should manually control the endoscope focus. It is always very difﬁcult to adjust the focus accurately.

Super resolution (SR) has been proposed to sharpen out-of-focus images. However, a real-time SR technology

is necessary for the 8K endoscope. In this study, a nonlinear signal processing super resolution (NLSP) is

introduced to improve the resolution of 8K endoscope cameras. NLSP can enhance the 8K endoscope images

and improve the camera’s focus depth.

1 INTRODUCTION

The progress in video technologies has been remark-

able. High-deﬁnition televisions (HDTVs) are the

standard TV since a long time ago. HDTVs (2K),

which are already unavailable in the market, have

been replaced with 4K televisions. Broadcasting in

8K, which has four times the resolution of 4K, be-

gan in 2018 and 8K commercial TVs were also re-

leased. Aside from broadcasting services, 4K/8K is

also applied in the medical ﬁeld. Endoscope cam-

eras are one of the important applications of 4K/8K.

An endoscopic surgical operation targeted for the 8K

endoscope system is the laparoscopic surgery. Dur-

ing a laparoscopic surgery, small holes are made in

the abdominal or chest wall of a patient. The sur-

geon inserts the endoscope into the body cavity of

the patient and all the operations are performed using

a monitor. The medical doctor cannot directly view

the target areas. Laparoscopic surgery is a popular

operation because patients’ physical load is minimal,

with shorter recovery time than that of laparotomy.

Given that the 1992 laparoscopic surgery became the

subject of health insurances in Japan, the number of

operations is increasing annually. The proposal of

the Ministry of Health, Labor and Welfare aims to

shift 70% of the total number of operations into la-

paroscopic surgery. However, the actual percentage

of laparoscopic surgeries remains at 30%–40%. Al-

though current laparoscopic surgery requires a high

degree of skill and experience owing to the insufﬁ-

cient resolution of an endoscope, the lack of trained

surgeons is the reason that the needs of patients are

not fully catered for. An 8K endoscope provides a

sufﬁcient video resolution for laparoscopic surgery; in

other words, it has a good visibility and the difﬁculty

level of endoscopic surgery is low. Aside from im-

proving the safety of patients, the number of capable

surgeons is expected to increase. By using 8K endo-

scopes, surgeons can already view the thread used in

laparoscopic surgery with a diameter of 0.02–0.029

mm, wherein this thread is not visible when 4K en-

doscopes are utilized. In other words, the 4K en-

doscope cannot meet the resolution required for the

laparoscopic surgery. However, accurately adjusting

the focus of the 8K endoscope is necessary to locate

Gohshi, S., Mori, C., Tanioka, K. and Yamashita, H.

Real-time Nonlinear Signal Processing Super Resolution of 8K Endoscope Cameras.

DOI: 10.5220/0007950603430349

In Proceedings of the 16th International Joint Conference on e-Business and Telecommunications (ICETE 2019), pages 343-349

ISBN: 978-989-758-378-0

343

the 0.02–0.029 mm thread on a monitor.

According to clinical experiments, the surgeon

cannot precisely adjust the focus of the 8K endo-

scope because the focus point on the screen constantly

shifted during operation. Thus, the high-resolution

(HR) advantage of the 8K endoscope cannot be fully

utilized. During surgical operation, the region of in-

terest (ROI) always changes. Although the ROI might

be at the center of the screen at the beginning of

the operation, the image must often be refocused at

the top-left or bottom-right regions. Generally, auto-

focus functions automatically adjust the focus at the

center of the screen. Moving the endoscope to set the

ROI at the center position of the screen is possible.

However, freely moving the endoscope during opera-

tion is not advisable because organs might be affected,

causing the patient to feel pain. Hence, the endoscope

should be placed in a ﬁx position during operation and

the focus is manually controlled. Aside from the op-

erating surgeon, another surgeon is required to super-

vise the 8K endoscope focus. Because freely adjust-

ing the focus with a small monitor is impractical, an

LCD monitor with the size of more than 50 in. must

be used for the surgery. However, accurately control-

ling the focus is still very challenging even if a 50

in. monitor is used. The 8K cameras’ focus control

is also an issue in broadcasting and content-making

industries. Commercial HD/4K/8K cameras are not

equipped with auto-focus function because focus con-

trol is one of the special areas of content production.

The focus point in a frame is one of the techniques of

content direction, and the focus position is not always

at the center of a frame. Hence, focus adjustment de-

pends on the camera man’s technique, and the camera

man manually controls the focus depending on the di-

rectors’ request. Until the HDTV development, the

camera man can manually adjust the focus by using

the view ﬁnder, which was usually built in the cam-

era. However, adjusting the focus of 4K cameras with

the view ﬁnder becomes challenging even for a pro-

fessional camera man because this view ﬁnder is too

small to accurately control the focus. Even if the fo-

cus seems ﬁne on the view ﬁnder, the result is often

out of focus when the footage will be viewed on a

larger screen. Thus, focus control becomes more dif-

ﬁcult for 8K cameras. In 8K content production, a 55

in. 8K monitor and a focus person are necessary.

Professional 4K cameras are equipped with fo-

cus assist function (Funatsu et al., 2013)(Ikegami,

2015)(Hitachi, 2015). The principle of the focus as-

sist function is very simple; in other words, the edges

are detected from the image and are superimposed.

The focus is controlled by maximizing the superim-

posed edges in the ROI. This method is similar to the

enhancer technique (unsharp mask). The edges are

detected using a high pass ﬁlter (HPF), the absolute

value is calculated, and the absolute value edges are

superimposed on the image. Then, the camera man

adjusts the focus by maximizing the edges. Given

that the edges are detected using an HPF, the edges

caused by noise appear on the entire frame when noise

is mixed into the image. When the lighting condition

is good, the noise is suppressed. However, a good

lighting condition is very rare and generally noise is

mixed into the images. Owing to the noise issue, the

focus assist is not applicable to general videos. The

noise results in the difﬁculty of adjusting the appro-

priate focus position. Although the focus is not ﬁne,

clear images can be captured if the depth of the focus

is high. However, the depth of an organ captured by

the endoscope has a wide range. Thus, a signal pro-

cessing method is required to widen the focus depth.

2 SUPER RESOLUTION (SR)

SR is a technology used in enhancing the resolution

of an image/video. Although many SR proposals ex-

ist (Ledig et al., 2017)(Houa and Liu, 2011)(van Eek-

eren et al., 2010)(Shahar et al., 2011)(Bannore, 2010),

they are only applicable to still images and cannot

work real-time because they need iterations. A real-

time signal processing is necessary for laparoscopic

surgery. One issue of existing SR technologies is

ﬁrst capturing an HR image and then developing low-

resolution (LR) images from the HR ones. However,

no HR is available and we only have one LR. Thus,

we must create an HR from only one LR. If we apply

SR for the endoscope video, then we should create

HR from every frame, and their several parts are out

of focus. In previous SR papers, reconstructed HR

is compared with the original HR based on the peak

signal-to-noise ratio (PSNR). However, determining

the PSNR in practical applications is impossible be-

cause we do not have the original HR and we only

have one LR. Because we cannot measure the PSNR,

we must deﬁne SR. In this study, we deﬁne SR ﬁrst

as a technology that can produce high-frequency ele-

ments that do not possess the original LR, and, then,

it can also enhance image quality. The former deﬁ-

nition can be easily proven by using two-dimensional

fast Fourier transform (2D-FFT). However, most of

the SR studies did not report the FFT results of the LR

and HR images. The latter deﬁnition indicates that

invisible things can become visible by the SR pro-

cessing in this study. In the laparoscopic surgery, a

0.02 mm diameter thread is used and is only visible

when an 8K endoscope with ﬁne focus is used. How-

SIGMAP 2019 - 16th International Conference on Signal Processing and Multimedia Applications

344

ever, even in ﬁne focus condition, viewing the 0.02

mm thread by using the 4K endoscope is still impos-

sible. Meanwhile, even if the focus is not ﬁne, the

0.02 mm thread is still visible when an 8K endoscope

is utilized. If the 0.02 mm thread on out-of-focus ar-

eas can be seen with the SR signal processing, then,

SR is proven to improve the image quality.

Figure 1: This caption has one line so it is centered.

3 NLSP

NLSP was originally proposed for up-converted im-

age from HDTV to 4KTV (Gohshi et al., 2017)

and works in real-time. Currently, most commercial

4KTV sets are equipped with SR functions. However,

the SR functions that are presently used are inferior to

NLSP (Mori et al., 2015). The basic idea of NLSP is

similar to that of one-dimensional signal processing

shown in Figure 1. The input is distributed into two

blocks. The upper path establishes high-frequency el-

ements that the original image does not have as fol-

lows: the original image is processed using i.e.,or mi-

nus) for each pixel. After the HPF, the edges are pro-

cessed with a nonlinear function (NLF). If an even

function (e.g., y = x

) is used as the NLF, then, the

sign is lost. To prevent information loss, the most

signiﬁcant bit is obtained from the edge information

prior to the NLF and restored after the NLF. NLFs

generate harmonics that can develop frequency ele-

ments that are higher than that of the original image.

NLSP by using a number of NLFs should be able to

produce high-frequency elements. Here we propose a

cubic function, y = x

, as the NLF.

Generally, images are expanded in a Fourier series

(Mertz and Gray, 1934). Herein, we utilize the one-

dimensional image f (x):

f (x) =

∑

n=−N

cos(nω

) + b

sin(nω

) (1)

is the fundamental frequency and N means a pos-

itive integer. The HPF attenuates low-frequency ele-

ments including the zero frequency element (DC). We

denote the output of the HPF by g(x) and it becomes

as follows.

g(x) =

−M

∑

n=−N

cos(nω

) + b

sin(nω

)

∑

n=M

cos(nω

) + b

sin(nω

) (2)

where M is also a positive integer and N > M. The fre-

quency elements from −M to M are eliminated with

the HPF. DC has the largest energy in the images, and

it sometimes causes saturation whereby the images

become either all white or all black. The output of

NLF does not cause saturation by eliminating DC, and

it has the following effect. Edges are represented with

sin(nω

) and cos(nω

) functions. The cubic func-

tion y = x

generates sin

(nω

) and cos

(nω

) from

sin(nω

) and cos(nω

). sin

(nω

) and cos

(nω

)

generate sin3(nω

) and cos3(nω

). Theoretically it

can be explained as follows.

(g(x))

−M

∑

n=−3N

cos(nω

) + d

sin(nω

)

∑

n=M

cos(nω

) + d

sin(nω

) (3)

where c

and d

are the expansion coefﬁcients

of Equation 3. Although Equation 3 has the high-

frequency elements from (N +1)nω

to 3N, these ele-

ments do not exist in the input image, that is, Equation

1. Given that these high-frequency elements are pro-

duced with the NLF, some of them are too large and

should be processed with LMT. After the LMT pro-

cessing, the developed high-frequency elements are

added into the input by using the ADD function in

Figure 1. The high frequency elements, produced

by NLF, that are three times higher than the input,

and they can be used to double the size of the im-

ages horizontally and vertically, such as during the

up-conversion from 4K to 8K. Moreover, given that

images and videos are two-dimensional signals, ap-

plying NLSP horizontally and vertically is necessary.

NLSP is not a simple edge enhancement, such as the

enhancer. Aside from the enhancer, other technolo-

gies, such as blind deconvolution (Richardson, 1972)

(Lucy, 1974) and SR, are also available. However, be-

cause these technologies require iterations, they can-

not work in real-time for the 8K endoscope.

Real-time Nonlinear Signal Processing Super Resolution of 8K Endoscope Cameras

345

(a) Original image

(256 × 256 pixels)

(b) Enlarged image of (a)

with Lanczos2 ﬁlter

(512 × 512 pixels)

Figure 2: Example of image enlargement.

(a) 2D-FFT result of

Figure 2(a)

(b) 2D-FFT result of

Figure 2(b)

Figure 3: 2D-FFT result.

Figure 4: NLSP processed result of Figure2 (a).

Figure 5: 2D-FFT result of Figure4.

4 CAPABILITY OF THE NLSP

In this section, a simulation result is discussed to

prove the capability of the NLSP. Herein, we use an

Figure 6: Real time NLSP hardware.

image enlarged with the Lanczos-2 ﬁlter (Duchon,

1979) because the enlargement always causes blur.

Figure 2(a) shows a 256 × 256 pixel image, and Fig-

ure 2(b) illustrates the a 512 × 512 pixel image en-

larged of Figure 2(a). Compared with Figures 2(a)

and 2(b), Figure 2(b) has an enlargement blur. Figure

3 illustrates the two dimensional fast Fourier trans-

form (2D-FFT) results of Figure 2. In Figure 3(a) the

same spectrum is repeated every 2π, which is the sam-

pling frequency, horizontally and vertically. In Figure

3(b), the spectrum appears only in the center position

because the horizontal and vertical sampling frequen-

cies become double and the other spectrum appears

outside of the image. Figure 4 illustrates the NLSP-

processed image of Figure 2(b). By comparing Fig-

ures 2(b) and 4, Figure 4 is evidently better. Figure 5

shows the 2D-FFT of Figure 4. Figure 5 has the high-

frequency elements that Figure 2(b) does not possess

indicating that NLSP can produce high-frequency ele-

ments that the input image cannot possess. In section

2, we deﬁne SR ﬁrst as a technology that can pro-

duce high-frequency elements that are not present in

the original LR, and then it also improves the image

quality. By comparing Figures 2(b) and 4, the former

condition of SR deﬁned in Section 2, is satisﬁed. In

the case of Figures 3(b) and 5, the latter condition of

SR is also met. We also develop a real-time 4KTV

NLSP hardware shown in Figure 6. Although many

devices are present on the circuit board, most of them

are interface devices for the input and output. The

Figure 7: 2D-LPF characteristic.

SIGMAP 2019 - 16th International Conference on Signal Processing and Multimedia Applications

346

(a) 8K endoscope image 1

(b) NLSP processed image of Figure 8(a)

8(a)

Figure 8: 8K endoscope image and processed images 1.

NLSP algorithm is written in a ﬁeld-programmable

gate array (FPGA) under the heat sink at the center of

the circuit board.

5 EXPERIMENTAL RESULT OF

THE 8K ENDOSCOPE IMAGE

USING A REAL-TIME NLSP

HARDWARE

Because the input and output of the real-time hard-

ware is 4K, we need four parallel circuit boards when

we use the hardware to an 8K video in real-time. The

full 8K image cannot be illustrated in the paper ow-

ing to space limitations. Figures 8–10 show the ex-

(a) 8K endoscope image 2

(b) NLSP processed image of Figure 9(a)

9(a)

Figure 9: 8K endoscope image and processed images 2.

perimental results. These areas are cropped from the

full 8K endoscope images and contain the 0.02 mm

threads. Figures 8(a), 9(a) and 10(a) show the ar-

eas of the original 8K endoscope images. Figures

8(b), 9(b) and 10(b) illustrate the NLSP-processed

images by using the real-time hardware shown in Fig-

ure 6. When comparing the original images (Figures

8(a), 9(a) and 10(a)) with the NLSP-processed images

(Figures 8(b), 9(b) and 10(b)), the threads become

more visible in the out-of-focus areas. However, the

NLSP processing creates an undesirable effect which

is noise. A two-dimensional low-pass ﬁlter (2D-LPF)

is introduced to reduce the noise. Figure 7 presents

the characteristic of 2D-LPF. The 2D-LPF decreases

only the number of diagonal high-frequency elements

to reduce the noise. Given that the 2D-LPF is pro-

Real-time Nonlinear Signal Processing Super Resolution of 8K Endoscope Cameras

347

(a) 8K endoscope image 3

(b) NLSP processed image of Figure 10(a)

10(a)

Figure 10: 8K endoscope image and processed images 3.

gramed in the FPGA illustrated in Figure 6, the 2D-

LPF can also work in real-time.

Figures 8(c), 9(c) and 10(c) show the NLSP-

and 2D-LPF-processed results of the original images

shown in Figures 8(a), 9(a) and 10(a), respectively.

When comparing Figures 8(c), 9(c) and 10(c) with

Figures 8(a), 9(a) and 10(a), the noise is reduced

and the 0.02 mm threads remain visible. However,

the noise level is not sufﬁciently low in the 2D-

LPF-processed images shown in Figures 8(c), 9(c)

and 10(c). Although an inﬁnitive impulse response

ﬁlter-type noise reducer is a practical signal process-

ing for videos, it does not work well (Lee, 1980).

Three requirements necessary to completely reduce

the noise are : (1) sufﬁcient lighting conditions, (2)

high-sensitivity photoelectric device, and (3) noise-

reducing signal processing. The improvement of the

lighting conditions seem to be easy. However, heat in-

creases in proportion to brightness. Bright lights emit

more heat that might hurt organs because the light is

used inside the body. Although high-sensitivity pho-

toelectric device is useful, it is time consuming. Cur-

rently, the new noise-reducing signal processing is be-

lieved to be practical.

6 CONCLUSION

In this study, a real-time SR system that consists of

NLSP and 2D-LPF is proposed. The SR system is

applied to the 8K endoscope video. It enhances the

image quality and the unfocused 0.02 mm thread be-

comes visible, indicating that the proposed real-time

SR system improves the focus depth. The future work

will be focused on the development of a novel noise-

reducing signal processing method for 8K endoscope.

REFERENCES

Bannore, V. (2010). Iterative-Interpolation Super-

Resolution Image Reconstruction.

Duchon, C. E. (1979). Lanczos ﬁltering in one and two

dimensions. Journal of Applied Meteorology, Vol. 18,

pp. 1016-1022.

Funatsu, R., Yamashita, Y., Mitani, K., and Nojiri, Y.

(2013). Focus-aid signal for super hi-vision cameras.

Technical Report 53, NHK.

Gohshi, S., Nakamura, S., and Tabata, H. (2017). Devel-

opment of real-time hdtv-to-8k tv upconverter. VISI-

GRAPP 2017, VISAPP, 4:52–59.

Hitachi (2015). http://www.rbbtoday.com/article/2015/04/07

/130240.html.

Houa, X. and Liu, H. (2011). Super-resolution image re-

construction for video sequence. IEEE Transactions

on Image Processing.

Ikegami (2015). http://www.ikegami.co.jp/news/detail.html

/news id=907.

Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunning-

ham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J.,

Wang, Z., and Shi, W. (2017). Photo-realistic single

image super-resolution using a generative adversarial

network. IEEE CVPR.

Lee, J. S. (1980). Digital image enhancement and noise ﬁl-

tering by use of local statistics. IEEE Trans. on Pattern

Analysis and Machine Intelligence,, pages 165–168.

Lucy, L. B. (1974). An iterative technique for the rectiﬁca-

tion of observed distributions. THE ASTRONOMICAL

JOURNAL, 79(6):745–754.

SIGMAP 2019 - 16th International Conference on Signal Processing and Multimedia Applications

348

Mertz, P. and Gray, F. (1934). A theory of scanning and its

relation to the characteristics of the transmitted signal

in telephotography and television. IEEE Transactions

on Image Processing.

Mori, C., Sugie, M., Takeshita, H., and Gohshi, S. (2015).

Subjective assessment of super-resolution: High-

resolution effect of nonlinear signal processing. AP-

SITT 2015.

Richardson, W. H. (1972). Bayesian-based iterative method

of image restoration. Journal of The Optical Society

of America, 62(1):55–59.

Shahar, O., Faktor, A., , and Irani, M. (2011). Space-time

superresolution from a single video. CVPR f11 Pro-

ceedings of the 2011 IEEE Conference on Computer

Vision and Pattern Recognition, 19(11):3353–3360.

van Eekeren, A. W. M., Schutte, K., and van Vliet, L. J.

(2010). Multiframe super-resolution reconstruction of

small moving objects. IEEE Transactions on Image

Processing.

Real-time Nonlinear Signal Processing Super Resolution of 8K Endoscope Cameras

349