Reports

Just throwing a few ideas:

Learn a world model (assume given to you to solve your problem)
Load a trained agent and explore + exploit using that agent to get all states [according to Q-learning you would probably explore entire state space if you sample often from the simulator]
gym provides high and low value for each element in state vector. Sample within that range (equivalent to sampling from state distribution).

79215881