I has been almost 2 years, but I think I found a solution. I do not completely why it works though. It would seem that in some cases the display()
does not evaluate the result of previous lazy operations. So if you just use df.cache()
after these lazy operations, it should work correctly. Hope this helps !