bucketing compares favorably against these
alternatives, exemplifying the enduring advantages of
the ability of the framework algorithm to handle
repetitive operations and process complex nested data
structures. Comparing the performance of index
bucketing against larger datasets is a limitation of this
study. More insights can be gleaned from further
evaluations expanding to other datasets and
implementations. Future work will, in part, explore
the implications of index bucketing to handle
repetitive operations and process complex nested data
structures.
REFERENCES
Agrawal, R. (1988). Alpha: an extension of relational
algebra to express a class of recursive queries. IEEE
Transactions on Software Engineering, 14(7), 879-885.
https://doi.org/10.1109/32.42731
Apache Flink. http://flink.apache.org/.
Apache Spark, http://spark.apache.org/.
Diestelkämper, R. (2021). Explaining existing and missing
results over nested data in big data analytics systems
http://dx.doi.org/10.18419/opus-12052.
Diestelkämper, R., Lee, S., Herschel, M., & Glavic, B.
(2021). To Not Miss the Forest for the Trees - A Holistic
Approach for Explaining Missing Answers over Nested
Data Proceedings of the 2021 International Conference
on Management of Data, Virtual Event, China.
https://doi.org/10.1145/ 3448016.3457249.
Fegaras, L., & Maier, D. (2000). Optimizing object queries
using an effective calculus. ACM Trans. Database
Syst., 25(4), 457–516. https://doi.org/10.1145/3776
74.3 77676.
Fegaras, L., & Noor, M. H. (2018, 2-7 July 2018). Compile-
Time Code Generation for Embedded Data-Intensive
Query Languages. 2018 IEEE International Congress
on Big Data (BigData Congress), doi: 10.1109/
BigDataCongress.2018.00008.
Grust, T., Rittinger, J., & Schreiber, T. (2010). Avalanche-
safe LINQ compilation. Proc. VLDB Endow., 3(1–2),
162–172. https://doi.org/10.14778/ 1920841.1920866.
Kitsuregawa, M., & Ogawa, Y. (1990). Bucket Spreading
Parallel Hash: A New, Robust, Parallel Hash Join
Method for Data Skew in the Super Database Computer
(SDC). Vldb '90, 210–221.
NewsQA: A Machine Comprehension Dataset. https://
www.microsoft.com/en-us/research/publication/news q
a-machine-comprehension-dataset/.
Pandas Python, https://pandas.pydata.org/.
QuAC, Question Answering in Context. https://quac.ai/.
https://quac.ai/.
Ricciotti, W., & Cheney, J. (2021). Query Lifting.
Programming Languages and Systems, 12648, 579 -
606.
Rödiger, W., Idicula, S., Kemper, A., & Neumann, T.
(2016, 16-20 May 2016). Flow-Join: Adaptive skew
handling for distributed joins over high-speed
networks. 2016 IEEE 32nd International Conference on
Data Engineering (ICDE), https://doi.org/10.1109/
ICDE.2016.7498324.
Samwel, B., Cieslewicz, J., Handy, B., Govig, J., Venetis,
P., Yang, C., Peters, K., Shute, J., Tenedorio, D., Apte,
H., Weigel, F., Wilhite, D., Yang, J., Xu, J., Li, J.,
Yuan, Z., Chasseur, C., Zeng, Q., Rae, I., Biyani, A.,
Harn, A., Xia, Y., Gubichev, A., El-Helw, A., Erling,
O., Yan, Z., Yang, M., Wei, Y., Do, T., Zheng, C.,
Graefe, G., Sardashti, S., Aly, A. M., Agrawal, D.,
Gupta, A., & Venkataraman, S. (2018). F1 query:
declarative querying at scale. Proc. VLDB Endow.,
11(12), 1835–1848. https://doi.org/10.14778/32298
63.3229871.
Smith, J. (2021). Declarative nested data transformations
at scale and biomedical applications, University of
Oxford.
Smith, J., Benedikt, M., Moore, B., & Nikolic, M. (2021).
TraNCE: transforming nested collections efficiently.
Proc. VLDB Endow., 14(12), 2727–2730.
https://doi.org/10.14778/3476311.3476330.
Smith, J., Benedikt, M., Nikolic, M., & Shaikhha, A.
(2020). Scalable querying of nested data. arXiv preprint
arXiv:2011.06381.
Suciu, D. (1996). Parallel programming languages for
collections, University of Pennsylvania.
Ulrich, A. (2019). Query Flattening and the Nested Data
Parallelism Paradigm Universität Tübingen].
Van den Bussche, J. (2001). Simulation of the nested
relational algebra by the flat relational algebra, with an
application to the complexity of evaluating powerset
algebra expressions. Theoretical Computer Science,
254(1-2), 363-377.