In the first case there is more flexibility for compiler to deal with the problem.
"count" is declared in the function and can be returned when the job is done. This means the compiler can keep it wherever it wishes, and it will likely keep it in a CPU register.
In the second case you pass a reference to the "count". That means a few of problems which make vectorization hard:
"count" is kept somewhere in RAM and the function got its address. Vectorized instructions would struggle with this anyway.
The compiler does not know your intentions so it has to play it safe. Do you intend the rest of the code to look at "count" after the job is done or do you want the changes to it to be potentially observable from the outside? The standard does not cover it clearly, so you are in the territory of undefined behaviour anyway. The old ICC compiler was probably making assumptions different to those that Clang and GCC make.
What if the array of "buf" and the location of "count" overlap? Technically, it is not impossible.