A thorny and universal open problem that continues to plague deep learning is the scarcity of labelled data. Unlabelled data can be transformed to labelled data in a procedure that is costly, labour-intensive time-consuming, and not scalable. Active learning thus deals with picking \textit{good} data points from the unlabelled dataset to label. This discrete choice of unlabelled data points is called a \textit{query} and there are several heuristics for selecting the best query, including max entropy, diversity sampling and others. The quality of the query is generally measured with respect to the performance of the task learner. In this report, we investigate the correlated batch problem in active learning, and propose a general solution framework using reinforcement learning.