ses which are the same a s infrequent tags on Data.gov.
Our futu re work is to re commend tags which are
infrequent in training data. In our curr ent experi-
ments, we eliminated tags which appe a r fewer than
twenty times in the d a ta sets in advance. Nevertheless,
the accuracy of infreque nt tags in training data was
low. Infreq uent tags tend to express the con crete con-
tent of OGD. Therefore, infrequent tags are important
to understand OGD without actually reading the data.
Future work includes the d evelopment of a Web
system which recommends tags when users input the
OGD in formation. The system displays candidate
tags output by multi-label classification a nd on e s ex-
tracted by various viewpoints including our particular
noun phrase extraction.
ACKNOWLEDGEMENTS
This work was partially supported by JSPS KA-
KENHI Gr ant Numbers 15K00426.
REFERENCES
Babbar, R. and Sch¨olkopf, B. (2017). Dismec: Distribu-
ted sparse machines for extreme multi-label classifi-
cation. In Proceedings of the 10th ACM International
Conference on Web Search and Data Mining, pages
721–729. ACM.
Breiman, L. (2001). Random forests. Machine Learning,
45(1):5–32.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegel-
meyer, W. P. (2002). Smote: Synthetic minority over-
sampling technique. J. A rtif. Int. Res., 16(1):321–357.
Corrˆea, A. S. and Zander, P.-O. (2017). Unleashing tabular
content to open data: A survey on pdf table extraction
methods and tools. In Proceedings of the 18th Annual
International Conference on Digital Government Re-
search, pages 54–63. ACM.
Figueiredo, F., Pinto, H., Bel´em, F., Aleida, J., Gonc¸alves,
M., Fernandes, D., and Moura, E. (2013). Assessing
the quality of textual features in social media. Infor-
mation Processing and Management, 49(1):222–247.
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yua-
nyue, H., and Bing, G. (2017). Learning from class-
imbalanced data: Review of methods and applicati-
ons. Expert Systems With Applications, 73:220–239.
Herrera, F., Charte, F., Rivera, A. J., and del Jesus, M. J.
(2016). Multilabel Classification: Problem Analysis,
Metrics and Techniques. Springer Publishing Com-
pany, Incorporated, 1st edition.
Jain, H., Prabhu, Y., and Varma, M. (2016). Extreme multi-
label loss functions for recommendation, tagging, ran-
king & other missing label applications. In Procee-
dings of the 22nd ACM SIGKDD International Confe-
rence on Knowledge Discovery and Data Mining, pa-
ges 935–944. ACM.
Kang, N., Doornenbal, M. A., and Schijvenaars, R. J. A.
(2015). Elsevier journal finder: Recommending jour-
nals for your paper. In Proceedings of the 9th ACM
Conference on Recommender Systems, pages 261–
264. ACM.
K¨oster, V. and Su´arez, G. (2016). Open data for develop-
ment: Experience of uruguay. In Proceedings of the
9th International Conference on Theory and Practice
of Electronic Governance, pages 207–210. ACM.
Manning, C. D., Raghavan, P., and Sch¨utze, H. (2008). In-
troduction to Information Retrieval. Cambridge Uni-
versity Press.
Martins, E. F., Bel´em, F. M., Almeida, J. M., and
Gonc¸alves, M. A. (2016). On cold start for associa-
tive tag recommendation. J. Assoc. Inf. Sci. Technol.,
67(1):83–105.
Oliveira, M. I. S., de Oliveira, H. R., Oliveira, L. A., and
L´oscio, B. F. ( 2016). Open government data portals
analysis: The brazilian case. In Proceedings of the
17th International D igital Government Research Con-
ference on Digital Government Research, pages 415–
424. ACM.
Prabhu, Y. and Varma, M. (2014). Fastxml: A fast, accurate
and stable tree-classifier for extreme multi-label l ear-
ning. In Proceedings of the 20th ACM SIGKDD In-
ternational Conference on Knowledge Discovery and
Data Mining, pages 263–272. ACM.
Ribeiro, I. S., Santos, R. L., Gonc¸alves, M. A., and Laen-
der, A. H. (2015). On tag r ecommendation for exper-
tise profiling: A case study in the scientific domain.
In Proceedings of the 8th ACM International Confe-
rence on Web Search and Data Mining, pages 189–
198. ACM.
Tambouris, E., Kalampokis, E ., and Tarabanis, K. (2017).
Visualizing linked open statistical data to support pu-
blic administration. In Proceedings of the 18th Annual
International Conference on Digital Government Re-
search, pages 149–154. AC M.
Tsoumakas, G., Katakis, I., and Vlahavas, I. (2010). Mi-
ning multi - label data. In Data Mining and Knowledge
Discovery Handbook, pages 667–685.
Vasa, M. and Tamilselvam, S. (2014). Building apps with
open data in india: An experience. In Proceedings
of the 1st International Workshop on Inclusive Web
Programming - Programming on the Web with Open
Data for Societal Applications, pages 1–7. ACM.
Venetis, P., Koutrika, G., and Garcia-Molina, H. (2011). On
the selection of tags for tag clouds. In Proceedings of
the 4th ACM International Conference on Web Search
and Data Mining, pages 835–844. ACM.
Xu, C., Tao, D., and Xu, C. (2016). Robust extreme
multi-label learning. In Proceedings of the 22nd
ACM SIGKDD International Conference on Know-
ledge Discovery and Data Mining, pages 1275–1284.
ACM.
Yamada, Y., Himeno, Y., and Nakatoh, T. (2018). Weig-
hting of noun phrases based on local frequency of
nouns. In Recent Advances on Soft Computing and
Data Mining - Proceedings of the 3rd International
Conference on Soft Computing and Data Mining, pa-
ges 436–445. Springer.