Reports

Could you perhaps give a bit more context as to what are you trying to accomplish, and what is network trying to learn? It may be easier to understand what is wrong when having some context. What is the model structure? What hyper parameters are you using? Could you perhaps share the rest of the code?

Also - are you sure you want to flip the sign on the reward. If the reward of the state is negative, then it will lead to negative Q values, which is sensible as this indicates you want to avoid such states.

Reasons:

Long answer (-0.5):
No code block (0.5):
Contains question mark (0.5):
Low reputation (1):

Posted by: APasagic

79736365