It doesn't have to be a single message queue. It could have multiple queues, one for each cache-line, and the out-of-order response to messages could be because incoming messages have different latencies, or it is due to how CPU1 prioritizes its tasks.