For the same problem I created an std::vectorEigen::Triplet in parallel (each thread collects instances of triplets to a thread local vector in parallel and finally I merge those vectors outside the parallel block).
What is the most efficient way for assembling from triplets in parallel?
Will I have to sort per column-row, merge the same occurrences and then assign each column to a thread?
Is there another option to avoid sorting?