I ran into same problem. 90% of the time was used to generate the perfect optimized execution plan. If I add ORDERED hint, the plan is not the optimal. But the total execution time (plan + actual execution) is much faster than then perfect plan.