You need to write a wrapper to one-hot-encode the states. This will help in training the DQN much more effectively.