Simpler intuitive non-math explanation (I hope) ...
Those diagrams of shift register implementations are exactly the same, i.e. one of them is "wrong" (not what you were trying to demonstrate, but I understand what your intention was).
Consider only diagram (B) for the following, but either will work for this.
[This would be a easier to understand using CRC-8: C(x) = x^8 + x^2 + x^1 + x^0]
The LFSR is 5 bits long, and the message is 8 bits long, so it will take 8 shifts to get the message completely shifted-in. The last 3 bits of the message don't cause feedback until they are shifted-out, so an additional 5 zero bits have to be shifted-in to cause feedback. Voila, 13 shifts!
To answer why in one algorithm's case more shifts are needed, the difference is the initial LFSR value used!
More Shifts Algorithm (as above)
Set the initial LFSR value (say to 0, but it doesn't matter). The message is then shifted-in and it will have to be followed by 5 extra zero bits to get feedback on all the message bits.
Less Shifts Algorithm
Set the initial LFSR value to be XORed with 5 bits of the message, then the step of shifting-in the message is already (mostly) complete, effectively 5 shifts already done!
Now shift in the 3 remaining message bits followed by 5 zero bits = 8 shifts!
The message is longer than the LFSR, so 3 zero bits can be appended to the initial LFSR value before being XORed with the message.
Setting the initial LFSR value in this way is doing two things at once: setting the LFSR initial value and effectively shifting-in 5 bits of the message.
In both cases, effectively 13 bits are shifted!