Privacy-preserving k-means clustering across multiple parties
K-means clustering is a widely used unsupervised machine learning technique for discovering patterns in data. However, in many practical scenarios, the data is distributed across multiple parties and contains sensitive information, raising significant privacy concerns. In this talk, I will present two recent advances in privacy-preserving k-means clustering: one tailored for horizontally partitioned data and another for vertically partitioned data. We focus on two complementary notions of privacy: Input secrecy, ensuring that nothing beyond the final cluster centroids is revealed about the private inputs. Output privacy, ensuring that even the released centroids leak only a limited amount of information about the input data. Our protocols achieve the strongest known privacy guarantees while also being highly efficient. Remarkably, we improve performance by up to six orders of magnitude compared to the current state-of-the-art. This efficiency boost is achieved by combining secure multi-party computation (for input secrecy) with differential privacy (for output privacy), demonstrating that differentially private k-means can be securely computed more efficiently than exact k-means.