Reports

While push_back is as efficient as possible (read up on "amortized complexity") it is still very slow. You could reserve your vector to some upper bound, which speeds up the push back considerably.

This does not entirely explain why the parallel version would be slower than the sequential. Could it be that the parallel one overflows your memory and you're swapping to disc?

But really, I would try to reformulate the algorithm. Do you actually need the vector or is there a way to do the resulting computation on it without having it stored?

Reasons:

Blacklisted phrase (1): is there a way
Long answer (-0.5):
Has code block (-0.5):
Ends in question mark (2):
High reputation (-1):

Posted by: Victor Eijkhout

79345909