Choose Your Own Adv

You do training on only 5k examples of a dataset. (the first 5k indices) You notice an improvement in training.

Do you follow this? Or do you ignore this?

Generally, I would ignore it. It could be (and is very like to be) noise. If I really think this is a promising area, I would need a theory (curriculum learning), and then I would explore as a contribution to this field.

Some reasons why it might improve: it shifts our UNLABELLED indices too.

As it turns out: 1. When we say our dataset is smaller, then we have fewer indices for both the labelled and the unlabelled. 2. I do some training on the labelled dataset. (as part of the task learner)
3. However, our testing dataset, is unaffected.

Hence it IS surprising, that using fewer number of examples (in a sorted order), means that we have better performance. But chalk it up to non convergence. If we never see examples again, then we don’t really learn fast enough to adjust our gradient.

Ultimately, this is the line of thinking! If we set lower number of images, then we are sampling from a smaller sorted range. This HAS an effect on the number of unlabelled images too, and which ones we select. But the tested images are always the same On the other hand, if we set the regular number of images and lower budget, then we are sampling from the wide range.

This has implications both on the images we sample, as well as the images we give to the VAE.

Q: why does it take so long to iterate through, even with few iterations? Avoid iterating through the ENTIprRE dataset…? When we used the og query dataloader, it took forever!

So here is the answer: when we use all the images, then we need to get the unlabelled ones as well. This is costly, expensive and pointless if we have tons of unlabelled images to start off with, and if we only label a few images each time.

OK, so the new strategy: We should have a function which shifts the labels; so that we are considering a different interval. For now, we can just use eric’s method still. And let’s quickly wrap up and move on to things like CSC2541.

(but when we try query dataloader, it DOES take a long time)

But they also have a problem on their hands: it is possible to next() beyond the number of train iterations!

Some experiments: 1. testing the length of a dataset, when a sampler has been applied to it 2. testing if we can iterate through the dataset, when we have exhausted the amount of data in it 1. (Due to their iteration, generator patten, this cannot happen interestingly!)

the labelled dataset is smaller, then we have fewer images to help

Wrap up: problems with scope, framing are some of the most difficult issues; and indeed the essence of research from the VAAL paper, we could look at scoping within the VAAL universe. Look at things like correlated batch improvement etc.

But we could also take a step back, and say the VAAL is just one approach and talk about other heads!