You're right that torchvision.datasets.ImageFolder
doesn’t natively support loading images directly from S3. The 2019 limitation still stands — it expects a local file system path. However, AWS released the S3 plugin for PyTorch in 2021, which allows you to access S3 datasets as if they were local, using torch.utils.data.DataLoader
. Alternatively, you can mount the S3 bucket using s3fs
or fsspec
, copy data to a temporary local directory, or create a custom Dataset
class that streams images directly from S3 using boto3
. For large datasets and training at scale, the S3 plugin is the cleanest and most efficient path.