Looking a the pull request, this was made for performance reasons.
It seems that alignment to 32bits speeds up the processing, or at least eliminates fuzziness in the results.
I have actually never expected that a Compiler would not align functions to 32bits in a 32bit microcontroller architecture. It would result in performance impact while fetching.