cancer in US (Sierra-Torres 2003). To be specific,
there is a positive relationship between smoking and
diagnosis of cervical disease. The difference is
probably due to the varied situation of each country
that Venezuela is a low-income developing country,
but the United States is a developed country.
Specifically, not every woman in Venezuela may
have access to smoking due to the financial issue
and the high percent of excise tax in cigarette.
Compared to Venezuela, people in the U.S may have
easier access to smoking no matter what income
they have received. Moreover, Venezuela execute
more extensive and stricter ban on smoking and
enforce more bans on advertising than the U.S,
which may result in lower rate of smoking among
women (Venezuela 2019, United States Tobacco
Atlas 2021). Therefore, it can possibly explain the
negative correlation in Venezuela and the positive
correlation in the U.S. Further evidence and
comprehensive researches are needed to prove this
inference.
Different from previous studies that consider the
effects of risk factors on CIN or cervical cancer
separately, we focused on generalized cervical
diseases including both CIN and cervical cancers.
The combination of CIN and cervical cancer might
contribute to the early control and prevention of
generalized cervical diseases. We also compared
different models that were fitted to data both before
and after balancing (oversampling) and data with
different manipulation of missing values.
Nevertheless, our studies still have some
limitations to be considered. Due to the limitation of
our dataset, we only consider the diagnosis of CIN
and diagnosis of cervical cancer. If there is access to
data including more other cervical diseases, like
cervical polyp, cervical cyst etc., models are able to
be further improved and optimized. Moreover, since
our dataset was collected from Venezuela, it needs
to be cautious when generalizing the results and
conclusions to other regions. Venezuela is a low-
income country, so the data may only represent the
conditions in low-income country rather than other
developed or developing countries. In addition,
because of the privacy concerns of some women that
they did not share complete information in data
collection, biases were introduced into analyses.
Lastly, risk factors were screened in our study by
using logistic regression, the results can be further
confirmed by using random forest subsequently.
5 CONCLUSIONS
‘Diagnosis of HPV infection’, ‘IUD’, ‘Number of
sexual partners’ and ‘Age’ are risk factors of
cervical cancer in Venezuela. Logistic regression
models in our study can estimate patients’ risks of
cervical diseases and can be used as a tool for
prevention. In the future, we will employ the
technique of random forest to analyse statistical
correlation between cervical diseases and all
independent variables discussed in this paper and
make comparison on these two statistical methods.
REFERENCES
Bardach, A. E., Garay, O. U., Calderón, M., Pichón-
Riviére, A., Augustovski, F., Martí, S. G., Cortiñas, P.,
Gonzalez, M., Naranjo, L. T., Gomez, J. A., &
Caporale, J. E. (2017). Health Economic Evaluation of
human papillomavirus vaccines in women from
Venezuela by a lifetime markov cohort model. J. BMC
Public Health, 17, 152.
Boateng, E. Y., Abaye, D. A. (2019). A Review of the
Logistic Regression Model with Emphasis on Medical
Research. J. Journal of Data Analysis and Information
Processing. 07, 190–207.
Cancer.Net. - Cervical cancer. (2021). Retrieved from
https://www.cancer.net/cancer-types/cervical-
cancer/statistic.
Correnti, M., Medina, F., Cavazza, M. E., Rennola, A.,
Ávila, M., & Fernándes, A. (2011). Human
papillomavirus (HPV) type distribution in cervical
carcinoma, low-grade, and high-grade squamous
intraepithelial lesions in Venezuelan women. J.
Gynecologic Oncology. 121, 527–531.
Denny, L. (2012). Cervical Cancer: Prevention and
Treatment. Retrieved from
https://www.discoverymedicine.com/Lynette-
Denny/2012/08/27/cervical-cancer-prevention-and-
treatment/.
Drolet, M., Bénard, É., Pérez, N., Brisson, M., Ali, H.,
Boily, M.-C., Baldo, V., Brassard, P., Brotherton, J.
M., Callander, D., Checchi, M., Chow, E. P., Cocchio,
S., Dalianis, T., Deeks, S. L., Dehlendorff, C.,
Donovan, B., Fairley, C. K., Flagg, E. W., … Yu, B.
N. (2019). Population-level impact and herd effects
following the introduction of human papillomavirus
vaccination programmes: Updated systematic review
and meta-analysis. J. The Lancet. 394, 497–509.
Fernandes, K., Cardoso, J. S., Fernandes, J. (2017).
Transfer learning with partial observability applied to
cervical cancer screening. J. Pattern Recognition and
Image Analysis. 10255, 243–250.
Gershenson, D. M., McGuire, W. P., Gore, M., Quinn, M.
A., & Thomas, G., 2004. Gynecologic cancer:
Controversies in management. Elsevier Ltd.
Philadelphia.