Roadmap Almost: VAEs


  1. The score function estimator: a single sample, DOES estimate the score function etc.
  2. The actual point of reinforce: not taking the derivative through the expectation, but taking a derivative through a random variable. And the big kicker is that we take it through the RV, SO that we assume no dependence (i.e. an rv X sampled has no dependence on the probability distribution governing X); and so the problem resorts down to maximum likelihood estimation (essentially); just trying to max/min the probability of something happening; as weighted by the reward function
  3. The two views of taking a derivative through a random variable.
  4. The two views of VAEs. (from a probability view, and from a neural network view). We can also discuss all elbos and all formulations of the training. We want to maximize the log pdf.
  5. The super authouritative guide to neural nets. This one involves all the important and nice quantities: we PRODUCE parameters at the very end, and then we do a form of maximum likelihood on these parameters, trying to maximize probability of our ORIGINAL input. Simple!
    1. This HAS the effect of also regenerating our image, for all intents and purposes.
  6. Additional articles and details about machine learning:
    1. Where does cross entropy loss come from? It is COMPLETELY just a loss function of our choosing! But it is one that kind of encodes our intuition about how the loss should be for examples.
  7. Want to relearn the derivation! (For cross-entropy loss and how it has a linear residual)
  8. Explain the difference between continuous and discrete variables. For instance, when we do classification, are we doing discrete or continuous? (A: we have discrete number of outputs. But each variable is itself cts.)
    1. Follow-up: say we do VAEs, on MNIST. Then, say we have a latent dimension of 10 variables. Technically, these are discrete factors of variation. But they can vary continuously inside.
  9. But there could be multiple formulations:
    1. We could predict a single vector, then have Gumbel softmax, as an alternate approach to classification etc.
  10. https://github.com/vithursant/VAE-Gumbel-Softmax