Based on @Pete Becker's answer, I decided to use the following lock-less method: Prepare the output in a std::stringstream and send it to std::cerr in one (expected to be atomic) call.
#include <iostream>
#include <sstream>
[...]
std::stringstream lineToPrint;
lineToPrint << " Hello " << " World " << std::endl;
std::cerr << lineToPrint.str();