Master’s Thesis

Convolutional Kernel Optimization for Deep Neural Networks using Constructivist Augmented Machine Learning (CAML) Methodology

Hongjin Yu and Corey Clark

Guildhall, Southern Methodist University

ABSTRACT

Deep convolutional networks have achieved state of the art results in various areas. Specifically, Leela Zero, a reproduction of the famous AlphaGo Zero has achieved superhuman performance. Despite these recent achievements, the inner workings of these networks remain a black box. This has made it difficult to apply human knowledge directly to the networks. This poster proposes a method to introduce human knowledge directly into the network that mimics the instructor student relationship seen in Constructivist learning theory. This poster utilizes Constructivist Augmented Machine Learning (CAML) methodology to replace existing kernels in a DNN with ideal kernels constructed by humans. Initially mean shift clustering is applied on the convolution kernels to reduce the problem space. This allows the human-in-the-loop methodology to identify and modify convolutional kernels to an ideal state. Our experiments show that a significant number of network kernels converge towards the ideal kernels in later versions of the network. This demonstrates that humans can identify improved convolution filters and suggests that with the aid of human knowledge the networks can be improved upon.

METHODS

Leela Zero network was chosen for the following reasons

  • Ideal for clustering
    • Small convolutional kernels of the same size (3*3)
    • Large amount of kernels (5,242,880 kernels)
  • Readily available trained networks
    • Known network strength, the strength of the networks is in ascending order

In this work 7 networks were chosen (LZ-187 to LZ-193) and clustering was performed only on LZ-190. Kernels were normalized and then conditionally flip the sign of the entire kernel to ensure that the center value is always positive. The norms and signs are saved so that this process can later be reversed to obtain the original kernels.

Mean shift clustering is applied to the normalized kernels. The single input parameter for mean shift is adjusted to produce clusters with an average of hundreds of data points. Plotting the top centroids shows these kernels have recognizable patterns such as pass through filters, edge detectors, gradients, etc.

Centroids of top clusters

Top 32 centroid kernels are adjusted with the following methods:

  • Sharpening
  • Forcing symmetry
  • Removing noise by setting small values to 0

These adjustments are done based upon human knowledge of what an ideal kernel might look like. i.e. a pass through filter would have the center value as 1 and all other values as 0.

A distance of d=1.2 is chosen empirically. Kernels with a distance lesser than d to their cluster centroid are considered as ‘core kernels’. It is hypothesized that these kernels are more likely to eventually converge to an ideal kernel.

Adjusting kernels to ideal version
(Sharpened to emphasize noise)

RESULTS

The average distances of core kernels to the original cluster centroids over the different networks is plotted. Show in the graph the lowest minimum distance occurs at LZ_190 in all but one case. This is not surprising since the clustering was done at LZ_190. This graph shows that all kernels selected in the cluster converged over time towards the network where clustering was performed.  This shows that kernel can evolve towards patterns identifiable by humans through the CAML process.


The average distances of core kernels to the adjusted cluster centroids over the different networks is plotted. It can be observed that for several of the clusters the minimum distance has shifted to newer networks, this indicates that the adjusted kernels were closer to a newer version of the network and demonstrates that the human was able correctly predict the direction the kernel would shift in the training process. It can also be noted that several of the graphs are flatter on the right side. This indicates that while the network did not converge on the adjusted kernel, it did diverge less than the original kernel thus also marking an improvement. It can also be observed that 3 of the minimum distances shifted to an older network, this indicates that the cluster was diverging from the adjusted kernel.

Of the 32 kernels that were adjusted, 7 had their minimum distance shifted to a newer network and and 6 showed flattening in newer networks. Thus 40.6% of the kernels were improved upon while 9.4% were worse than the original, the remaining showing no significant change.

This improvement to the original network shows that humans could predict where the kernels would converge if additional training data was present.

CONCLUSIONS

This work shows that by utilizing clustering, humans can identify patterns in convolution kernels in trained networks, and can construct ideal kernels that newer networks converge towards. This in essence is the transfer of human intuition and pattern recognition capabilities into a neural network. This mirrors the Constructivism learning theory where knowledge gained from past experiences is supplemented by social interaction and collaboration with an expert. This suggests that with the aid of human knowledge, the network training process can be accelerated and the model can be improved even after training. The method shown here could prove to be especially valuable in the case where training data is limited.

REFERENCES

[1] “AlphaGo,” Wikipedia. 19-Jun-2018.

[2] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.

[3] “Computer Go,” Wikipedia. 21-Jun-2018.

[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proceedings of the 25th International Conference on Neural Information Processing Systems – Volume 1, USA, 2012, pp. 1097–1105.

[5] “Leela-Zero.” [Online]. Available: https://github.com/gcp/leela-zero.

[6] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” ArXiv13112901 Cs, Nov. 2013.

[7] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu, “Towards Better Analysis of Deep Convolutional Neural Networks,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 1, pp. 91–100, Jan. 2017.

[8] F. Y. Tzeng and K. L. Ma, “Opening the black box – data driven visualization of neural networks,” in VIS 05. IEEE Visualization, 2005., 2005, pp. 383–390.

[9] V. Volz, J. Schrum, J. Liu, S. M. Lucas, A. Smith, and S. Risi, “Evolving Mario Levels in the Latent Space of a Deep Convolutional Generative Adversarial Network,” ArXiv180500728 Cs, May 2018.

[10] “Latent space visualization,” A.I. Odyssey, 24-Feb-2017.

[11] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum, “Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 82–90.

[12] J. Despois, “Latent space visualization — Deep Learning bits #2,” Hacker Noon, 24-Feb-2017.

[13] D. Foster, “AlphaGo Zero Explained In One Diagram,” Applied Data Science, 29-Oct-2017.

[14] S. Son, S. Nah, and K. M. Lee, “Clustering Convolutional Kernels to Compress Deep Neural Networks,” in Computer Vision – ECCV 2018, vol. 11212, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds. Cham: Springer International Publishing, 2018, pp. 225–240.

[15] A. Dundar, J. Jin, and E. Culurciello, “Convolutional Clustering for Unsupervised Learning,” ArXiv151106241 Cs, Nov. 2015.

[16] J. Wu, Y. Wang, Z. Wu, Z. Wang, A. Veeraraghavan, and Y. Lin, “Deep $k$-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions,” ArXiv180609228 Cs Stat, Jun. 2018.

[17] M. Kim and L. Rigazio, “Deep Clustered Convolutional Kernels,” ArXiv150301824 Cs, Mar. 2015.

[18] A. Sinha, M. Sarkar, A. Mukherjee, and B. Krishnamurthy, “Introspection: Accelerating Neural Network Training By Learning Weight Evolution,” ArXiv170404959 Cs, Apr. 2017.