John's Machine Learning and Deep Learning Blog

Blog
About
Tags

RL example

Here are my thoughts on trying to provide an example of RL.

Policy gradient
“Standing derivative”, weighted by the return
Instead, we should do Q-learning

← Previous Post
Next Post →

John Chen • © 2020 • John's Machine Learning and Deep Learning Blog

Hugo v0.59.0 powered • Theme Beautiful Hugo adapted from Beautiful Jekyll