Related Work This framework is heavily inspired by other open source projects: Huggingface Datasets TensorFlow Datasets NVIDIA's NeMo-Curator