I'm hitting this in 2025 - on a modern CPU (AMD) memset with nonzero filler is massively slower and I wonder why - could be some microcode trickery or something else? a bit of context: I'm writing a software rasterizer and I'm clearing a large depth buffer - with zero filler it's the fastest to simply memset the whole buffer, with nonzero filler, I need to parallelize and even then it's slower than serial zero fill... when debugging both simply use rep stosb