WoS dataset was considered, by invoking Llama2-
7B-chat through a specifically designed Algorithm
based on zero-shot prompting.
In regard of HTC task, we employed the Hierar-
chy Guided Contrastive Learning model, which rep-
resents the current state-of-the-art in the field of su-
pervised HTC. The experiments conducted by train-
ing the above model on a synthetic dataset and tested
with real data extracted from WoS, showed that our
approach HTC-GEN outperforms the current ZERO-
shot state-of-the-art approach.
For future works we plan to test our approach on
more datasets, with bigger models than Llama2-7B-
chat and also by integrating other techniques (Chung
et al., 2023) from the state-of-the-art aiming to refine
further the quality of the synthetic dataset.
DISCLAIMER
The content of this article reflects only the author’s
view. The European Commission and MUR are not
responsible for any use that may be made of the infor-
mation it contains.
ACKNOWLEDGEMENTS
This work was supported by FOSSR (Fostering Open
Science in Social Science Research), funded by the
European Union - NextGenerationEU under NRRP
Grant agreement n. MUR IR0000008.
REFERENCES
Bhambhoria, R., Chen, L., and Zhu, X. (2023). A simple
and effective framework for strict zero-shot hierarchi-
cal classification.
Bongiovanni, L., Bruno, L., Dominici, F., and Rizzo,
G. (2023). Zero-shot taxonomy mapping for doc-
ument classification. In Proceedings of the 38th
ACM/SIGAPP Symposium on Applied Computing,
pages 911–918.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., et al. (2020). Language models are few-
shot learners. Advances in neural information pro-
cessing systems, 33:1877–1901.
Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020).
A simple framework for contrastive learning of visual
representations. In International conference on ma-
chine learning, pages 1597–1607. PMLR.
Chung, J., Kamar, E., and Amershi, S. (2023). Increasing
diversity while maintaining accuracy: Text data gen-
eration with large language models and human inter-
ventions. In Proceedings of the 61st Annual Meeting
of the Association for Computational Linguistics (Vol-
ume 1: Long Papers). Association for Computational
Linguistics.
Cox, S. R., Wang, Y., Abdul, A., von der Weth, C., and
Y. Lim, B. (2021). Directed diversity: Leveraging
language embedding distances for collective creativ-
ity in crowd ideation. In Proceedings of the 2021 CHI
Conference on Human Factors in Computing Systems,
CHI ’21. ACM.
Haj-Yahia, Z., Sieg, A., and Deleris, L. A. (2019). Towards
unsupervised text classification leveraging experts and
word embeddings. In Proceedings of the 57th annual
meeting of the Association for Computational Linguis-
tics, pages 371–379.
Jeronymo, V., Bonifacio, L., Abonizio, H., Fadaee, M.,
Lotufo, R., Zavrel, J., and Nogueira, R. (2023).
Inpars-v2: Large language models as efficient dataset
generators for information retrieval. arXiv preprint
arXiv:2301.01820.
Ko, Y. and Seo, J. (2000). Automatic text categorization by
unsupervised learning. In COLING 2000 Volume 1:
The 18th International Conference on Computational
Linguistics.
Kowsari, K., Brown, D. E., Heidarysafa, M., Meimandi,
K. J., Gerber, M. S., and Barnes, L. E. (2017). Hdltex:
Hierarchical deep learning for text classification. In
2017 16th IEEE international conference on machine
learning and applications (ICMLA), pages 364–371.
IEEE.
Liu, R., Liang, W., Luo, W., Song, Y., Zhang, H., Xu, R.,
Li, Y., and Liu, M. (2023). Recent advances in hierar-
chical multi-label text classification: A survey. arXiv
preprint arXiv:2307.16265.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sen-
tence embeddings using siamese bert-networks. arXiv
preprint arXiv:1908.10084.
Stammbach, D. and Ash, E. (2021). Docscan: Unsuper-
vised text classification via learning from neighbors.
arXiv preprint arXiv:2105.04024.
Wang, Z., Wang, P., Huang, L., Sun, X., and Wang, H.
(2022). Incorporating hierarchy into text encoder:
a contrastive learning approach for hierarchical text
classification.
Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester,
B., Du, N., Dai, A. M., and Le, Q. V. (2021). Fine-
tuned language models are zero-shot learners. arXiv
preprint arXiv:2109.01652.
Zangari, A., Marcuzzo, M., Schiavinato, M., Rizzo, M.,
Gasparetto, A., Albarelli, A., et al. (2023). Hierar-
chical text classification: a review of current research.
EXPERT SYSTEMS WITH APPLICATIONS, 224.
Zhang, Y., Yang, R., Xu, X., Xiao, J., Shen, J., and Han,
J. (2024). Teleclass: Taxonomy enrichment and llm-
enhanced hierarchical text classification with minimal
supervision. arXiv preprint arXiv:2403.00165.
DATA 2024 - 13th International Conference on Data Science, Technology and Applications
138