The problem was that DummyVecEnv is never done, i.e. it never return terminated or truncated as True. This means that evaluate_policy (called here) never increments the count of episodes run and the while loop never ends.