I just wrote a big rant about this on Reddit:
https://www.reddit.com/r/arm/comments/1igprj8/arm_branch_prediction_hardware_is_fubar/
I present an example there where the condition codes are set 14 instructions in advance, and at least 40 clock cycles.