IMAGE PROCESSING AND MACHINE LEARNING FOR THE

DIAGNOSIS OF MELANOMA CANCER

Arushi Raghuvanshi

and Marek Perkowski

Department of Electrical and Computer Engineering, Portland State University, Portland, OR 97207, U.S.A.

Jesuit High School, 9000 SW Beaverton Hillsdale Hwy, Portland, OR 97225, U.S.A.

Keywords: Melanoma, Skin Cancer, Image processing, Machine Learning, Medical Diagnosis.

Abstract: Melanoma cancer is one of the most dangerous and potentially deadly types of skin cancer; however, if

diagnosed early, it is nearly one-hundred percent curable (UnderstMel09). Here we propose an efficient

system which helps with the early diagnosis of melanoma cancer. Different image processing techniques

and machine learning algorithms are evaluated to distinguish between cancerous and non-cancerous moles.

Two image feature databases were created: one compiled from a dermatologist-training tool for melanoma

from Hosei University and the other created by extracting features from digital pictures of lesions using a

software called Skinseg. We then applied various machine learning techniques on the image feature

database using a Python-based tool called Orange. The experiments suggest that among the methods tested,

the combination of Bayes machine learning with Hosei image feature extraction is the best method for

detecting cancerous moles. Then, using this method, a computer tool was developed to return the probability

that an image is cancerous. This is a very practical application as it allows for at-home findings of the

probability that a mole is cancerous. This does not replace visits to a doctor, but provides early information

that allows people to be proactive in the diagnosis of melanoma cancer.

1 INTRODUCTION

The warning signs of melanoma cancer can be

summarized by the ABCDE method as described by

the Skin Cancer Foundation (UnderstMel09). Each

letter in ABCDE stands for a feature of a mole that

indicates that it might be malignant: Asymmetry,

Boarder irregularity, Colour, Diameter, and

Evolving (Fig. 1). A mole that evolves, or changes at

all in color, shape, or size, is another warning sign of

melanoma. (UnderstMel09). The diagnosis of

melanoma is not based on just one of these factors

but a combination of all of them.

Many dermatologists use a surgical method,

called an excisional biopsy, to further test for

melanoma at the microscopic level. Ideally, the mole

would be noticed early on, so the cancer would still

be isolated in the mole and not have spread to the

lymph nodes. If it is noticed at this stage, only one

surgery is needed to cure the body of cancer. The

problem is, however, that often moles are not

diagnosed until the cancer has developed past this

stage. A device that would give simple feedback on

moles, therefore, would be beneficial in helping

patients check their moles at home and therefore

encouraging early diagnosis.

A new technology that is beginning to be

developed is using imaging techniques to diagnose

melanoma (Stevens09). Although a good concept,

current imaging technologies are not for individual

home use. By making the system more accessible to

individual users, the process can help in the early

diagnosis of melanoma.

Overall, the process involves image capturing,

image processing, feature extraction, and machine

learning for the diagnosis of melanoma. Although

these techniques cannot replace surgical diagnosis

by doctors, they provide a foundation for the early

diagnosis of melanoma. Because early diagnosis is

so important, this process has very practical

applications in the real world and could potentially

be used to save lives.

The first step in the imaging process is image

capturing or image acquisition. One method of

image acquisition some dermatologists use is a

method called dermoscopy, which allows them to

obtain an image which displays colors of the

epidermis, the dermoepidermal junction, and the

405

Raghuvanshi A. and Perkowski M..

IMAGE PROCESSING AND MACHINE LEARNING FOR THE DIAGNOSIS OF MELANOMA CANCER.

DOI: 10.5220/0003173504050410

In Proceedings of the International Conference on Biomedical Electronics and Devices (BIODEVICES-2011), pages 405-410

ISBN: 978-989-8425-37-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

papillary dermis not visible to the naked eye.

(Stanganelli08).

Figure 1: Distinguishing using ABCD method, Source:

The Ear, Nose, and Throat Alliance:

http://www.allianceent.net/index.php?section=3&pid

=198.

Before extracting features, it is important to

perform some pre-processing and noise reduction to

enhance the images. One technique for noise

reduction is combining many images by frame

averaging (Bosdogianni99). Another technique,

called neighborhood averaging, involves adding

together the color or brightness values for pixels in a

certain area and then dividing by the number of

pixels in that area. This average value is then used to

construct a new image with less noise. Another type

of neighborhood averaging, involves replacing each

pixel with the average of its neighbors

(Bosdogianni99). Neighborhood averaging reduces

noise; however, it also blurs edges, displaces

boundaries, and reduces contrast. Other image

processing techniques can be used to correct non-

uniform illumination (Russ95). One currently

available software uses image processing and noise-

reduction to digitally remove hair from images of

moles. To do this it identifies the dark hair locations

by a generalized grayscale closing operation and

makes sure the shape of the hair pixels are thin and

long structures. It then replaces the hair pixels by a

bilinear interpolation and levels the replaced pixels

with an adaptive median filter. (DermWeb07)

The next step is feature extraction. For the

purposes of our project, the features we would need

are the ones described by the ABCDE method. Two

important first steps in feature extraction are edge

detection and image segmentation (Bosdogianni99).

In image segmentation, we must divide up the image

into uniform regions. In order to do so, there are

many methods available, the simplest of which are

histogramming and thresholding (Bosdogianni99).

For an image of a mole, the histogram will usually

have two peaks. However, if the mole has multiple

colors, and therefore is possibly malignant, the

histogram would have three peaks, or one of the

peaks would not be well defined. Therefore, by

looking at the histogram, we can determine a

variation in color of the mole. Once the image is

thresholded, we know the points of the outer edge of

the image (Bosdogianni99). Using these points, we

can determine the perimeter of the mole and use an

integral function to find the area. By comparing the

perimeter to the area using some predefined

algorithm we can extract the asymmetry, border

irregularity, and diameter of a mole. Finally, given

multiple images over time and comparing their

features, we can determine if a mole is evolving. For

this project, however, we will focus on features in

one given point of time.

There are many available tools for feature

extraction. One tool is CVIPtools (CVIP06). We can

use this software for image processing and feature

extraction. This tool can do the segmentation of an

image using Fuzzy C Mean, Grey Level

Quantization, Histogram Thresholding, and many

more techniques. It can also preform edge detection,

and various transforms including Fast-Fourier

Transform, Hadamard, and Walsh. Finally, we can

use this tool to extract texture features, spectral

features, and for pattern classification and image

segmentation. (CVIP06) Other similar tools that can

be used for feature extraction or preprocessing of

images of moles are Dull Razor, Hosei tool, and

Skinseg (DermWeb07) (Hosei09) (Skinseg98).

After extracting the features, the next step is to

create a machine learning database. In this database,

we store the images, their features, and whether or

not they were cancerous as evaluated by trained

dermatologists using microscopic evaluation. Then,

BIODEVICES 2011 - International Conference on Biomedical Electronics and Devices

406

using the database, we perform machine learning

algorithms to determine patterns of cancerous moles.

In order to do this, we can use various methods one

of which is decision trees. Using this approach,

based on the features in the database we create a

decision tree. This can be done by ID3 top-down

method, which is a greedy algorithm. In this method,

construction of a decision tree starts by picking a

key variable (feature) to segment the database and

then applying other features one by one until all the

elements have been mapped to the

outcome/decision. In order to choose variables that

optimize the decision tree, we can look at the

entropy of each variable. The entropy can be found

by the following equation, and we always choose the

variable with the highest entropy gain: H(S) = -p+

log2(p+) – p-log2(p-) where p+ is the probability

that the variable is positive and p- is the probability

that the variable is negative. Then, using this

decision tree, we can predict whether an image not

in the current database will be cancerous.

(DeLaCruz09). Other methods of machine learning

are neural networks, constructive induction, and

support vector machines.

2 IMAGE ACQUISITION

The first step in this process was to acquire a set of

preliminary images for the machine learning

process. Some of these images needed to be of

cancerous moles while others of benign moles. The

images we used are standard images taken from a

normal commercial household camera. We chose to

use images from a normal camera because it fits

with our low-cost application criteria and is

accessible to the common person. We contacted

local dermatologists and collected some images, and

then collected more from dermatologists’ training

sites on the web (Stevens09). Our overall database

included 150 images with 30% of those for benign

moles, and 70% as cancerous moles.

3 FEATURE EXTRACTION

The next step in the process was feature extraction.

We explored a variety of different tools for feature

extraction. The first tool we experimented with was

Skinseg, a tool developed by Wright State

University. This tool segments a given image to

isolate the portion of interest (i.e. the mole) and

extracts a set of features from this segment.

From the images collected, we opened each

image individually within the Skinseg program and

used it to identify the region of interest (mole) using

available methods of segmentation. Fig. 2 shows a

segmented picture of the mole after automatic

segmentation.

Figure 2: Segmented Image using Skinseg tool.

Once the image was segmented, the tool allowed

us to view the features and save them to a text file,

as illustrated in Fig. 3.

Once all of the feature files were saved, we used

a Python script to read all the files and create a

single database (skinsegdb.tab) of the selected

extracted features. The database contained one row

for each image with all the feature values separated

by tabs (Fig 4). This was the format required by the

machine learning tools in the next part of the project.

Figure 3: Extracted Image Features using Skinseg tool.

Once all of the feature files were saved, we used

a Python script to read all the files and create a

single database (skinsegdb.tab) of the selected

extracted features. The database contained one row

for each image with all the feature values separated

by tabs (a partial snapshot of the database is shown

in Fig. 4). This was the format required by the

machine learning tools in the next part of the project.

The second tool we used was the Hosei Tool,

created by Hosei University in Japan. This is a

learning tool for dermatologists, and it has

predetermined features for given images. Most

IMAGE PROCESSING AND MACHINE LEARNING FOR THE DIAGNOSIS OF MELANOMA CANCER

407

Figure 4: Partial snapshot of database of features extracted

from images using Skinseg tool.

likely, these features were determined by doctor

inspection. Using this tool website, we retrieved a

set of pictures along with their image features.

These features included Symmetry, Borders, Color,

Pigment Network, Branched Steaks, Homogenous,

Dots & Globules, Atypical Pigment, Blue Whitesh

Veil, Atypical Vascular Pattern, Irregular Streaks,

Irregular Pigmentation and Regression Structures.

We then created the second database (hoesidb.tab)

using the similar process as for Skinseg feature

database. The partial snapshot of hoseidb.tab is

shown in Fig. 5. The database contained one row for

each image with all the feature values separated by

tabs. This was the format required by the machine

learning tools in the next part of the project.

Figure 5: Partial snapshot of the database of features

extracted from images using Hosei tool.

We also explored a few other tools, but did not

use them for the data gathering and comparison part.

CVIP tool, developed by Southern Illinois

University at Edwardsville, is very powerful, but

mostly interactive, so we did not use it for this

project. We realized that it is possible to create a

code which does the feature extraction in a more

automatic way, but we chose to use Skinseg and the

Hosei tool instead (Skinseg98) (Hosei09). This can

be used in future research work. Mole Expert Micro

is a commercial software for the feature extraction

of melanoma images. We were able to receive an

evaluation version of this software. Unfortunately,

this software required a value for the number of

pixels per millimeter of the image. Since this data

was not available for our images, we could not use

this tool.

Open CV tool from Intel would be very powerful

in completely automating the process of feature

extraction; however, it is not specifically designed

for melanoma images. This would require adapting

it and customizing it to this project. In the future, we

plan to use Open CV or get the source code for

Skinseg in order to completely automate the feature

extraction process for deployment in a website.

4 MACHINE LEARNING

Once we compiled the data of the extracted features

into the database, we used this database for

application of machine learning algorithms. There

are a variety of methods for machine learning that

we tested:

1. Majority Learning: This is a basic technique

which gives a probability of a given mole

being cancerous based on the probability that

any given mole in the training set is

cancerous.

2. Bayes Learning: In Bayes learning, Bayesian

networks are created which represent the

relationship between a given feature and the

probability that the mole is cancerous.

Combined, these networks can give a

probability for whether or not the mole is

cancerous.

3. Decision Trees: This machine learning

method creates a tree based on the training

data. There are a variety of different

techniques for how to create the best tree and

to distinguish which features are important

and which are not. In this method the leaves

of the tree describe positive or negative

decisions.

4. kNN (Neural Nets): Neural Networks are

made of interconnecting neurons and operate

based on the model of biological neural

networks.

In order to test these methods we used a toolkit

called Orange which is Python based (Orange09).

We wrote code in this program to test the percent

accuracies of different sets of data for a given

machine learning method and feature extraction

method.

5 RESULTS

We ran four different machine learning methods

(Majority, Bayes, Decision Tree & kNN) on the two

set of the databases created using Skinseg and Hosei

BIODEVICES 2011 - International Conference on Biomedical Electronics and Devices

408

tools and measured the accuracy of the diagnosis.

For the purposes of the experiment, the Orange tool

was used to segment the database into the ‘learning

set’ and ‘test set’. The ‘learning set’ allows the

algorithm to learn while ‘test set’ is used to test the

accuracy of the learning method. For example, if the

database had 150 entries, then 2 entries could be the

test set while 148 entries are used for learning. This

could also be specified as a percentage. In the

practical implementation of the melanoma detection

tool using this method, the entire dataset becomes

the learning set. A new image submitted by the

patient is the test set for which the algorithm would

provide the probability of it being cancerous or

benign.

We ran the tests with multiple runs for each of

the two feature extraction tools (Hosei and Skinseg)

and the four machine learning methods (Majority

Learning, Naïve Bayes, Decision Tree, k-nearest-

neighbour) combination. Number of entries in the

‘test set’ ranged from 1 to 10, and then a final run

was made where ‘test set’ was kept at 10% of the

entries in the set. So for 150 entries, this was 15.

Then, for each set, we determined an average

accuracy from these 11 runs.

For Majority Learning method used on the

Skinseg and Hosei feature extraction databases, the

percentage accuracy was best when 2 entries were

taken out in ‘test set’. We noted that the method

performed the same for either of the two databases.

With Naïve Bayes learning method for the two

databases, the average accuracy was low for Skinseg

database, but is pretty good for the Hosei database. It

was interesting to note that with two entries in the

test set, the accuracy was good for this method as

well. Decision Tree learning method did not perform

well for either of the database as compared to other

methods. The k-Nearest-Neighbor learning method

showed better results for the Hosei database.

Fig 6 shows the summarized results from

comparison of the different learning methods over

the two databases (Hosei and Skinseg). Clearly, the

Naïve Bayes learning method used with the Hosei

database produced the best results.

6 CONCLUSIONS

Overall, we found that the Hosei tool for feature

extraction and the Bayes machine learning method

was the most effective combination for this

application. The Hosei tool gave better results than

Skinseg . This result was as expected, as errors in

the segmentations of moles were a factor in the

feature extraction from Skinseg but not from the

extraction using Hosei tool. the set of Hosei

Figure 6: Results of different Machine Learning methods

using each of the two databases.

methods, most of the machine learning methods

showed close results, but Bayes performed slightly

better with the Hosei tool. Overall we had expected

the accuracy to be more than what we found. The

average accuracy for the best methods was about

75% (Bayes/Hosei). It can be noted that our

learning database was not very big, and the accuracy

improves with a larger database. In order to improve

the results, we plan on repeating the experiment with

a different data set of additional images.

Furthermore, these results are good given that the

quality of the entrance data is low, because none of

the pictures were taken with a standard camera or

lighting.

In our experiment, we eliminated some variables

from the machine learning database such as the

number of pixels, the perimeter and area of a mole,

and the filename. We eliminated the pixels and

perimeter and area because the pictures were scaled

differently so including these variables would have

skewed the results. Of course we also eliminated the

file name. Along the same lines, one limitation of

my project was that we could not use size of the

mole, because every picture is taken with a different

scale. In the future, we can either set a standard size

IMAGE PROCESSING AND MACHINE LEARNING FOR THE DIAGNOSIS OF MELANOMA CANCER

409

or use a field size ring such as a quarter to

distinguish relative sizes. Also, we did not use the

‘evolution’ feature of the moles because we did not

have such images (evolving moles) available.

In discussing our results with the dermatologists

we got positive feedback for the use of the method

with some improvements in real world (Stevens09)

(Koppula09). We discussed the lack of using the

mole size in my experiment and Dr Koppula felt that

not using the size itself is not a big limitation, since

the size is often very misleading just by itself. It has

to be compared to other moles on the skin. Evolution

of the mole and changing size overtime is important,

and if this is captured in the machine learning

algorithm, this would be good. (Koppula09)

We are now using the information that we

gathered from this experiment to create a website

which will allow users at home to upload an image

of a mole and the website would return a probability

of how likely it is that this mole is cancerous. The

only limitation in this so far is that Skinseg, or the

feature extraction step, is not automated. We

continue to work on this project and automate this

step by either writing code in Open CV or finding

the source code for Skinseg. We are also planning to

create a smartphone application. This application

would allow a user to take an image of a mole with

the phone camera, upload it to the website, and get

immediate result from the website. If the resulting

probability is very high, the application could even

call the doctor. This would also allow us to collect

more images. If the user confirms the prognosis, we

can add the image to our learning database. This

would then improve the accuracy of the results. With

a larger set of database, we plan on using a parallel

computing architecture, such as CUDA, for faster

computation on the backend server. This would be

beneficial in providing real-time instant response to

the user.

ACKNOWLEDGEMENTS

Dr. A. Goshtasby, Wright State University on

Skinseg Image processing tool

Dr. Scott E Umbaughs, Southern Illinois University,

Edwardsville on CVIP tools

Ms. Iris Cheng, University of California Berkeley

for sharing her research on image processing

Dr. B.J. Shrestha, Missouri University of Science

and Technology

Dr. Kristin Stevens, MD, Dermatologist, Providence

Medical Group, Portland, OR

Dr. Sandhya Koppula, MD, Dermatologist, Cornell

Dermatology Clinic, Beaverton, OR

REFERENCES

Bosdogianni, Maria Petrou Panagiota. Image Processing:

The Fundamentals. New York: John Wiley & Sons,

LTD, 1999.

CVIPtools. Southern Illinois University. 7 November

2006.

DeLaCruz, Jomer and Dr. Dinesh Mital. “Classification of

Malignant Melanoma and Dysplastic

Nevi Using Image Analysis: A Visual Texture Approach.”

University of Medicine and Dentistry of New Jersey.

March 2009.

“DermWeb: Dull Razor.” UBC Dermatology Department.

21 March 2007. < http://www.

dermweb.com/dull_razor/>

Hosei on-line tool. Hosei University, Nov 2009

<https://b0112-web.k.hosei.ac.jp/DermoPerl/>

Koppula, Sandhya. MD. Personal Interview.

December 2009.

Stevens, Kristen. MD. Personal Interview.

December 2009.

Russ, John C. The Image Processing Handbook Second

Edition. Boca Raton: CRC, 1995.

Orange, Machine Learning tool. Artificial Intelligence

Laboratory, University of Ljubljana. 7 Nov 2009. <

http://www.ailab.si/orange/>.

Skinseg. Wright State University. 27 Oct

1998.<http://www.cs.wright.edu /~agoshtas/

skinseg.html>.

Stanganelli, Ignazio. “Dermoscopy.” Center for Cancer

Prevention, Italy. 27 May 2008.

<http://emedicine.medscape.com/article/1130783-

overview>

“Understanding Melanoma.” The Skin Cancer Foundation.

New York, New York. 13 December 2009.

<http://www.skincancer.org/

Melanoma/>.

BIODEVICES 2011 - International Conference on Biomedical Electronics and Devices

410