Keywords: [ Learning Theory ] [ Statistical Learning Theory ] [ Supervised Learning ] [ Transfer and Multitask Learning ] [ Unsupervised and Semi-supervised Learning ]

[
Abstract
]

Abstract:

We study generalization properties of weakly supervised learning, that is, learning where only a few "strong" labels (the actual target for prediction) are present but many more "weak" labels are available. In particular, we show that pretraining using weak labels and finetuning using strong can accelerate the learning rate for the strong task to the fast rate of O(1/n), where n is the number of strongly labeled data points. This acceleration can happen even if, by itself, the strongly labeled data admits only the slower O(1/\sqrt{n}) rate. The acceleration depends continuously on the number of weak labels available, and on the relation between the two tasks. Our theoretical results are reflected empirically across a range of tasks and illustrate how weak labels speed up learning on the strong task.