Changes ・Fixed bellman equation ・Changed learning rate to .2 ・Changed epsilon decay rate to .995 ・Fixed when to set state variable ・Slightly refactored exploit algorithm