
Amodei, D. (2020). Language models are few-shot
learners.
Chiang, W.-L., Li, Z., Lin, Z., Sheng, Y., Wu, Z., Zhang,
H., Zheng, L., Zhuang, S., Zhuang, Y., Gonzalez, J. E.,
Stoica, I., and Xing, E. P. (2023). Vicuna: An open-
source chatbot impressing gpt-4 with 90%* chatgpt
quality.
Guo, M., Chen, Y., Li, X., Liang, X., Liu, F., Sun, J.,
and Zhou, M. (2021). Longt5: Efficient text-to-
text transformer for long sequences. arXiv preprint
arXiv:2112.07916.
Johnson, J., Douze, M., and J
´
egou, H. (2019). Billion-scale
similarity search with GPUs. IEEE Transactions on
Big Data, 7(3):535–547.
Karpukhin, V., O’Connor, B., Stokowiec, W., Humeau,
Y., Raison, M., Foster, J., Yih, W.-t., Gao, J., Le,
Q., and Wolf, T. (2020). Dense passage retrieval
for open-domain question answering. arXiv preprint
arXiv:2004.04906.
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola,
P., Maschinot, A., Liu, C., and Krishnan, D. (2021).
Supervised contrastive learning.
Leite, B. and Lopes, H. (2023). Towards enriched controlla-
bility for educational question generation.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed,
A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2019).
Bart: Denoising sequence-to-sequence pre-training for
natural language generation, translation, and compre-
hension.
Lin, C.-Y. (2004). Rouge: A package for automatic evalu-
ation of summaries. In Text summarization branches
out, pages 74–81. Association for Computational Lin-
guistics.
Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R., and Zhu, C. (2023).
G-eval: Nlg evaluation using gpt-4 with better human
alignment.
Liu, Z., Michel, P., and Liu, X. (2020). Etc: Encoding long
and structured inputs in transformers.
MosaicML (2023). Introducing mpt-7b: A new standard
for open-source, commercially usable llms. Accessed:
2023-05-16.
Nicolescu, L. and Tudorache, M. (2022). Human-computer
interaction in customer service: The experience with ai
chatbots—a systematic literature review. Electronics,
11:1579.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright,
C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K.,
Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller,
L., Simens, M., Askell, A., Welinder, P., Christiano,
P., Leike, J., and Lowe, R. (2022). Training language
models to follow instructions with human feedback.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
Bleu: a method for automatic evaluation of machine
translation. In Proceedings of the 40th Annual Meet-
ing of the Association for Computational Linguistics,
pages 311–318, Philadelphia, Pennsylvania, USA. As-
sociation for Computational Linguistics.
Phang, J., Zhao, Y., and Liu, P. (2023). Investigating ef-
ficiently extending transformers for long input sum-
marization. In Bouamor, H., Pino, J., and Bali, K.,
editors, Proceedings of the 2023 Conference on Empir-
ical Methods in Natural Language Processing, pages
3946–3961, Singapore. Association for Computational
Linguistics.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S.,
Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2019).
Exploring the limits of transfer learning with a unified
text-to-text transformer. CoRR, abs/1910.10683.
Rajpurkar, P., Zhang, J., Lachlan, R., and Manning, C. D.
(2016). Squad: 100,000+ questions for machine com-
prehension of text. arXiv preprint arXiv:1606.05250.
Reiter, E. and Belz, A. (2009). An investigation into the
validity of some metrics for automatically evaluating
natural language generation systems. Computational
Linguistics, 35(4):529–558.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization algo-
rithms.
Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X.,
Guestrin, C., Liang, P., and Hashimoto, T. B. (2023).
Stanford alpaca: An instruction-following llama model.
GitHub repository.
Tay, Y., Dehghani, M., Bahri, D., and Metzler, D. (2022).
Efficient transformers: A survey.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux,
M.-A., Lacroix, T., Rozi
`
ere, B., Goyal, N., Hambro,
E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E.,
and Lample, G. (2023). Llama: Open and efficient
foundation language models.
Ushio, A., Alva-Manchego, F., and Camacho-Collados, J.
(2023). A practical toolkit for multilingual question
and answer generation. In Proceedings of the 61th
Annual Meeting of the Association for Computational
Linguistics: System Demonstrations, Toronto, Canada.
Association for Computational Linguistics.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017).
Attention is all you need. Advances in neural informa-
tion processing systems, 30:5998–6008.
Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester,
B., Du, N., Dai, A. M., and Le, Q. V. (2022). Finetuned
language models are zero-shot learners.
Xiong, W., Gupta, A., Toshniwal, S., Mehdad, Y., and tau
Yih, W. (2022). Adapting pretrained text-to-text mod-
els for long text sequences.
Zaheer, M., Guruganesh, G., Dubey, A., Ainslie, J., Alberti,
C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., Yang,
L., et al. (2020). Big bird: Transformers for longer
sequences. Advances in Neural Information Processing
Systems, 33.
Zaib, M., Zhang, W. E., Sheng, Q. Z., Mahmood, A., and
Zhang, Y. (2021). Conversational question answering:
A survey.
Zhang, J., Zhao, Y., Saleh, M., and Liu, P. J. (2020). Pe-
gasus: Pre-training with extracted gap-sentences for
abstractive summarization.
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi,
Y. (2019). Bertscore: Evaluating text generation with
bert. arXiv preprint arXiv:1904.09675.
A Semi-Automatic Light-Weight Approach Towards Data Generation for a Domain-Specific FAQ Chatbot Using Human-in-the-Loop
49