tion, while still being able to access the powerful AI
features of Smart Surveys.
5 CONCLUSION AND FUTURE
RESEARCH
In this position paper, I proposed the Smart Sur-
veys architecture, a pipeline for the automated gener-
ation, distribution, and analysis of survey data. The
architecture is built using the Python programming
language and leverages a large language model and
multiple modules for text processing, question gen-
eration, recommendations for improving the survey
questionnaire, generating custom prompt-based in-
sights, and a full report that aggregates the various in-
sights generated automatically. Applying Smart Sur-
veys in two selected use cases, it has been demon-
strated how it can empower undergraduate students
and researchers to conduct accurate research with
minimal technical requirements, and provide action-
able insights to course instructors and academic de-
partments with the aid of AI systems.
One of the main benefits of the Smart Surveys ar-
chitecture is its scalability. It is built using Python
and open-source libraries, making it easy to adapt to
different industries and research goals, and it can be
easily scaled to handle a large number of survey re-
sponses. However, it is important to note that the
quality of the generated survey questions and insights
depends on the quality of the data and the information
provided by the user. Thus, it will be important to in-
corporate educational material for the users on best
practices when communicating with LLMs.
Looking forward, there are several areas of fu-
ture research that could further enhance the capabil-
ities of the Smart Surveys architecture. One avenue
is to improve the baseline features of the pipeline to
incorporate more robust survey design methods and
sampling techniques to improve the AI’s suggestions
and capabilities. Another important area for future
research is to further investigate the ethical impli-
cations of using AI in survey research, such as pri-
vacy, bias, and explainability. With the increasing use
of AI in survey research, it is crucial to ensure that
the data collected is protected and that the AI sys-
tems used do not perpetuate biases and discrimina-
tion. Furthermore, it is important to explore ways to
make AI systems more transparent and explainable,
allowing for greater trust and understanding of the re-
sults generated. This can be achieved by incorporat-
ing techniques such as counterfactual analysis, which
can help to identify and understand the factors that
influence the AI’s predictions and decisions, and by
providing more detailed explanations of the AI’s rea-
soning behind its insights and recommendations. I
hope this research encourages researchers to explore
other avenues in which AI can help enhance survey
data, a valuable resource for all social sciences.
REFERENCES
Barrus, T. (2018). Pure python spell checking. https://
github.com/barrust/pyspellchecker. [Online; accessed
March 1, 2023].
Bird, S., Klein, E., and Loper, E. (2009). Natural language
processing with python. O’Reilly Media Inc.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., Agarwal, S., Herbert-Voss, A., Krueger,
G., Henighan, T. J., Child, R., Ramesh, A., Ziegler,
D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler,
E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner,
C., McCandlish, S., Radford, A., Sutskever, I., and
Amodei, D. (2020). Language models are few-shot
learners. ArXiv, abs/2005.14165.
Buskirk, T., Kirchner, A., Eck, A., and Signorino, C. S.
(2018). An introduction to machine learning methods
for survey researchers. Survey practice, 11:2718.
Cortes, C. and Vapnik, V. N. (1995). Support-vector net-
works. Machine Learning, 20:273–297.
de Leeuw, E. D. (2013). Thirty years of survey methodol-
ogy / thirty years of ”bms”. BMS: Bulletin of Socio-
logical Methodology / Bulletin de M
´
ethodologie Soci-
ologique, (120):47–59.
Feder, A., Oved, N., Shalit, U., and Reichart, R. (2020).
Causalm: Causal model explanation through counter-
factual language models. Computational Linguistics,
47:333–386.
Foundation, P. S. (2011). Psf/requests: A simple, yet el-
egant, http library. https://github.com/psf/requests.
[Online; accessed March 1, 2023].
Hendrycks, D., Carlini, N., Schulman, J., and Steinhardt, J.
(2021). Unsolved problems in ml safety.
Hunter, J. D. (2007). Matplotlib: A 2d graphics environ-
ment. Computing in science & engineering, 9(3):90–
95.
Lohr, S. L. and Raghunathan, T. E. (2017). Combining Sur-
vey Data with Other Data Sources. Statistical Science,
32(2):293 – 312.
Moss, J. and Hendry, G. (2002). Use of electronic surveys
in course evalution. British Journal of Educational
Technology, 33.
Oliphant, T. E. (2006). A guide to NumPy, volume 1. Trel-
gol Publishing USA.
Omar, M., Choi, S., Nyang, D., and Mohaisen, D. (2022).
Robust natural language processing: Recent advances,
challenges, and future directions. arXiv preprint,
arXiv:2201.00768.
Openai (2020). Openai/openai-python: The openai python
library provides convenient access to the openai api
CSEDU 2023 - 15th International Conference on Computer Supported Education
118