# Invariant Information Clustering

# Invariant Information Clustering: Tuning Mass Equalization over Prediction Reinforcement for few Ground-Truth Classes to Avoid Clustering Degeneracy

#### Abstract

Invariant Information Clustering (IIC) presents a principled clustering objective based on maximizing Mutual Information (MI) between paired data samples under a bottleneck, equivalent to distilling their shared abstract content (co-clustering), that tends to avoid degenerate clustering solutions [15]. IIC can be “written as a convolution in the case of segmentation” [15], or pixel-wise classification into perceptually if not semantically meaningful regions. This method may be “trained end-to-end and without any labels”, while remaining “robust to noisy data from unknown or distractor classes” [15] through auxiliary over-clustering. The driving motivation is to produce cluster assignments that “persist through spatio-temporal or non-material distortion” [15], such as geometric or photometric transformations, by training a “bottlenecked” convolutional neural network to distill shared abstract content that is invariant to different perturbations that leave the original image content intact. “Information is the only criteria used” [15]. The MI loss naturally balances prediction reinforcement of pixel-wise class labels with mass equalization of cluster assignments, “preventing degenerate clustering solutions that other methods are susceptible to” [15], in which one cluster may dominate or some clusters may disappear during iterative training. As a result, IIC does not need to re-initialize clusters to avoid degeneracy, nor does it require cumbersome pipelines for feature post-processing, like feature whitening or PCA [21]. However, for small numbers of ground-truth classes, one can introduce a tunable coefficient to the MI loss, skewing the natural balance of entropy terms to discourage premature prediction reinforcement of cluster assignments, as minimizing conditional entropy encourages certainty in the probabilistic pixel-wise cluster assignments. Furthermore, scaling the prediction entropy term encourages mass equalization and thereby imparts sustained tolerance to ambiguous clustering solutions, as when a single cluster dominates in a perceptually, but not semantically meaningful way.

## 1. Introduction

## 2. Conceptual Background and Literature Review

## 3. Proposed Approach

### 3.1. Overview of Experiment

### 3.2. Avoiding clustering degeneracy

*semantically*meaningful regions. In particular, let us “consider inserting a coefficient,

## 4. Discussion of Experiment

## 5. Motivating Extensions

## 6. Conclusion

## 7. Acknowledgements

## 8. References

- M. I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, and R. D. Hjelm. Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062, 2018.
- M. Caron, P. Bojanowski, A. Joulin, and M. Douze. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), pages 132–149, 2018.
- A. Coates, A. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215–223, 2011.
- T. M. Cover and J. A. Thomas. Entropy, relative entropy and mutual information. Elements of information theory, 2:1–55, 1991.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller, and T. Brox. Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE transactions on pattern analysis and machine intelligence, 38(9):1734–1747, 2015.
- N. Friedman, O. Mosenzon, N. Slonim, and N. Tishby. Multivariate information bottleneck. arXiv preprint arXiv:1301.2270, 2013.
- P. Haeusser, J. Plapp, V. Golkov, E. Aljalbout, and D. Cremers. Associative deep clustering: Training a classification network with no labels. In German Conference on Pattern Recognition, pages 18–32. Springer, 2018.
- R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
- W. Hu, T. Miyato, S. Tokui, E. Matsumoto, and M. Sugiyama. Learning discrete representations via information maximizing self-augmented training. arXiv preprint arXiv:1702.08720, 2017.
- K. Y. Hui. Direct modeling of complex invariances for visual object features. In International conference on machine learning, pages 352–360, 2013.
- M. Jaderberg, K. Simonyan, A. Zisserman, et al. Spatial transformer networks. Advances in neural information processing systems, 28:2017–2025, 2015.
- X. Ji. https://github.com/xu-ji/iic (version b7602b7), 2019.
- X. Ji, J. F. Henriques, and A. Vedaldi. Invariant information clustering for unsupervised image classification and segmentation: Supplementary material. IIC, 51804(51804):36660–36660.
- X. Ji, J. F. Henriques, and A. Vedaldi. Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 9865–9874, 2019.
- E. G. Learned-Miller. Entropy and mutual information. Department of Computer Science, University of Massachusetts, Amherst, 2013.
- M. MacKay, P. Vicol, J. Lorraine, D. Duvenaud, and R. Grosse. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. arXiv preprint arXiv:1903.03088, 2019.
- H. Mazidi, T. Ding, A. Nehorai, and M. D. Lew. Measuring localization confidence for quantifying accuracy and heterogeneity in single-molecule super-resolution microscopy. In Single Molecule Spectroscopy and Superresolution Imaging XIII, volume 11246, page 1124611. International Society for Optics and Photonics, 2020.
- A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017.
- K. Sohn and H. Lee. Learning invariant representations with local transformations. arXiv preprint arXiv:1206.6418, 2012.
- J. Xie, R. Girshick, and A. Farhadi. Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487, 2016.