Reports

After a class with my teacher he told us where the problem is. It is the log_interval as with models DQN, PPO, SAC, TD3 the value of 1000 is to large. Default values can be found in library documentation, I just set it to 1 as that was fine for my goal in the assigment.

Keep in mind if you want to compare them at the same points in time you need to make another fix as for some models it counts as epoch so they dont make timestamps to tensorboard at the same time.

correct code:

model_ppo.learn(total_timesteps=350000,log_interval=1, progress_bar=True)

79366911