Just throwing a few ideas:
- Learn a world model (assume given to you to solve your problem)
- Load a trained agent and explore + exploit using that agent to get all states [according to Q-learning you would probably explore entire state space if you sample often from the simulator]
- gym provides high and low value for each element in state vector. Sample within that range (equivalent to sampling from state distribution).