Possible scenario:
Thread A calls write(big buffer) which writes partially.
Kernel gets notified that device driver's write buffers become available.
Thread B calls write(big buffer) which writes partially too.
Thread A continues after write() return.
So the problem is not about atomicity of write() itself, but about the fact that write() and processing after write() are two different steps, not single atomic operation.