Reports

Blacklisted phrase (1): Cheers
Long answer (-1):
No code block (0.5):
Me too answer (2.5): having the same issue
Self-answer (0.5):
Low reputation (0.5):

Just wanted to pop by to say that I did eventually come up with a solution in case anyone stumbles upon this thread having the same issue. I'm not sure that the non-pickleability of the dataset applies to all TF datasets, but since in this case it was relevant, that is what needed to be addressed (or worked around).

What I did was put the TFRecord files on distributed storage (UC Volume in this case) and then instantiate the TF dataset object inside the objective function. I imagine in some cases there could be some overhead there, but even with an image dataset of about 8k images, that took well less than a second, so it was fine. That also tends to be approach for any other objects that won't pickle (which did end up being the case after getting the dataset thing sorted); just build it inside the objective function. That can be the dataset from objects on distributed storage, it can be the model itself, or it can be anything really.

This might be a totally basic "duh" answer to folks in the know, but it was my first time trying to actually leverage the power of a Spark cluster, so I was definitely in over my head and could've used the insight. Maybe someone else will be in the same boat and will benefit from this answer as well. Cheers!

79170576