Langrui Energy (Shenzhen) Co.,Ltd , https://www.langruibattery.com
Google engineers talk about the importance of Bengio's deep learning papers
The paper "Understanding Deep Learning Needs to Rethink Generalization" has sparked a lot of discussion and raised many questions among researchers. It was even debated on platforms like Quora. Eric Jang, a Google Brain engineer, believes that understanding the mechanisms behind deep learning can significantly enhance its real-world applications. Additionally, Zhang et al. (2016) might serve as an important indicator in this field.
In 2017, many machine learning researchers were focused on a key question: How do deep neural networks actually work? Why are they so effective at solving real-world problems?

Even if people aren't deeply interested in theoretical analysis or complex mathematics, understanding how deep learning functions can help us better apply it in everyday life.
The paper "Understanding Deep Learning Requires Rethinking Generalization" highlights some fascinating properties of neural networks. Specifically, it shows that these networks have the capacity to memorize random data. In certain optimization settings, such as with Stochastic Gradient Descent (SGD), they can reduce training errors even on large datasets like ImageNet.
This contradicts a long-standing belief: "Deep learning miraculously discovers low-level, intermediate, and high-level features, similar to how the mammalian brain’s V1 system behaves when learning to compress data."
From 2012 to 2015, many researchers used the concept of "inductive bias" to explain how deep networks achieve good test performance, implying some form of generalization.
However, if a deep network can remember random data, this suggests that inductive bias alone may not fully explain generalization. Techniques like convolutional layers, pooling, dropout, and batch normalization all contribute to this dynamic.
Part of the reason this paper gained so much attention is that it won the "Perfect Score" and the ICLR 2017 Best Paper Award. This led to a lot of heated discussions, creating a feedback loop. I think it's a great paper because it raises a question that had never been asked before and provides strong experimental evidence for surprising results.
That said, I believe it will take 1–2 years for the deep learning community to fully agree on the significance of such a paper, especially those based on empirical findings rather than analytical proofs.
Tapabrata Ghosh pointed out that some researchers argue that while deep networks have memory capabilities, this may not be what they actually do in practice. They suggest that the time required to "memorize" a semantically meaningful dataset is shorter than that needed for random data, indicating that deep networks can leverage existing semantic patterns in the training set.
I believe Zhang et al. (2016) could become an important reference in understanding how deep networks operate, but it doesn’t fully solve the problem of generalization. Someone might challenge the paper’s claims soon—this is the nature of experimental science.
In summary, the paper is considered significant because it demonstrates that deep learning can learn random databases through memory. This leads to the next question: How do deep networks learn non-random datasets?
Here are my thoughts on the issue of generalization:
A high-capacity parametric model with a well-designed optimization objective absorbs data like a sponge. I believe the optimization goal of deep networks is "lazy" yet powerful. When provided with the right model bias and compatibility with input data, deep networks can build a hierarchical structure of semantic features. However, if optimization becomes difficult, they may resort to simply memorizing the data.
What we currently lack is a way to control the balance between memory and generalization, as well as tools like weight regularization and dropout that could help manage this trade-off more effectively.