Does this help https://docs.pytorch.org/data/0.7/generated/torchdata.datapipes.iter.ParquetDataFrameLoader.html
Pytorch used to have a torchtext library but it has been deprecated for over a year. You can check it here: https://docs.pytorch.org/text/stable/index.html
Otherwise, your best bet is to subclass one of the base dataset classes https://github.com/pytorch/pytorch/blob/main/torch/utils/data/dataset.py
Here is an example attempt at doing just that https://discuss.pytorch.org/t/efficient-tabular-data-loading-from-parquet-files-in-gcs/160322