I've been reading up on the Lottery Ticket Hypothesis, which is super interesting.
Basically, the observation is that these days we build *vast* neural networks with billions of parameters, but most of the parameters aren't needed. That is, after training, you can just throw away 95% of the network (pruning), and it will still work fine.
The LTH paper is asking: could we start with a network just 5% of the size, and get comparable results? If so, that would be a *huge* performance win for Deep Learning.
What's interesting is that you *can* do this, but only by training the full network (perhaps several times) to see which neurons are needed. They argue that training a neural network isn't so much *creating* a model, as finding a lucky sub-network (a lottery ticket) from the randomly initialized network, a bit like a sculpter "finding" the bust hidden in a block of marble.
Initial LTH paper: http://arxiv.org/abs/1803.03635
Follow-up with major clarifications: http://arxiv.org/abs/1905.01067
#science #ai #machinelearning