79743191

Date: 2025-08-22 09:11:33
Score: 3
Natty:
Report link

Does this help https://docs.pytorch.org/data/0.7/generated/torchdata.datapipes.iter.ParquetDataFrameLoader.html

Pytorch used to have a torchtext library but it has been deprecated for over a year. You can check it here: https://docs.pytorch.org/text/stable/index.html

Otherwise, your best bet is to subclass one of the base dataset classes https://github.com/pytorch/pytorch/blob/main/torch/utils/data/dataset.py

Here is an example attempt at doing just that https://discuss.pytorch.org/t/efficient-tabular-data-loading-from-parquet-files-in-gcs/160322

Reasons:
  • Probably link only (1):
  • Low length (0.5):
  • No code block (0.5):
  • Low reputation (1):
Posted by: ZeSeb