Computer Vision with EfficientNet, A primer on the EfficientNet family, transfer learning, and tricks to train your neural networks.
How do you measure how big a convolutional neural network is?
You can’t weigh it or use a ruler to measure it. And if you can’t measure it…then how can you scale it? Until 2020, the process of measuring a convolutional neural network was never well understood. That is until researchers set out to answer an important question:
Is there a principled method to scale up ConvNets, so they achieve better accuracy and efficiency?
And in the process, they accomplished two feats which changed the direction of deep learning:
1) Discovered a novel scaling method called compound scaling.
2) Created a new family of SOTA architectures called EfficientNet.
Now, back to the original question: how do we measure the size of a ConvNet?
By looking at three factors:
1) Resolution (dimensions of its inputs)
2) Width (number of feature maps)
3) Depth (number of layers in the network)
All three factors — depth, width, and resolution — impact the accuracy and efficiency of your network. Ideally, you want to maximize all these factors and accomplish the following:
• Retain the baseline model architecture, i.e. keep the operations in each layer fixed.
• Leave the memory footprint of your model constrained to some target hardware.
• Keep the number of FLOPs below some predefined threshold.
But there’s a catch…
Scaling up only one network dimension (width, depth or resolution) improves accuracy, but the accuracy rapidly diminishes. For better accuracy and efficiency, you must balance all network width, depth, and resolution dimensions during ConvNet scaling.