Reports

After getting feedback and further analyzing the results, I think the answer is that putting a thread to sleep increases how often Windows reschedules it to different cores, which results in more thread context switches, which causes more cache invalidations (which are very expensive). That's how putting a thread to sleep can hurt cache performance.

In this test, putting the thread to sleep for 1ms in a 3000-sample test caused Windows to re-schedule it to a different core about 2.6 times more that not putting it to sleep:

The sleeping thread was rescheduled to a different core 2.6 times more often, and the performance cost about 2.4 times more on average, compared to a non-sleeping thread. I'm not sure if you can compare those two numbers so directly but there seems to be a correlation.

Pinning the thread to core 0 significantly reduced the cost of a sleeping thread, presumably because Windows wasn't moving it around between different cores anymore so cache invalidations happened less often (although other processes & their threads probably switched out the cache if they were also running on that core during the sleeping time):

Thanks to all for your help and hopefully this post can help others.

79523792