Privacy Preserving Blockchain Mining
Privacy Preserving Blockchain Mining
A proof-of-useful-work consensus scheme
Hjalmar K. Turesson1, Alexandra Roatis2, Henry Kim1, and Marek Laskowski1
1 Schulich School of Business, York University, Toronto, Ontario Canada
2 Nuco Networks, Toronto, Ontario Canada
AIM
Distributed, decentralized and permissionless blockchains are generally secured by some kind of proof-of-work based consensus. The consensus is achieved by a competition where the winning proof demonstrates that computational work has been done and gives the right to append the next block. However, the work done has no application beyond securing the blockchain. Here, we will explore an alternative proof-of-work scheme aimed at harnessing the computational power of the miners towards training machine learning (ML) models, a task with wide applicability [1].
Applying ML to important problems, such as medical, personal or financial data often results in an apparent contradiction: training the models requires access to large and varied data sets, while, at the same time, security and privacy need to be preserved. Recent news reports have highlighted the repeated failures to achieve the latter. However, advances in cryptography have produced new tools that allow operations on encrypted data, promising both access and maintained privacy and security. One such tool is homomorphic encryption (HE). In contrast to normal encryption schemes, HE allows for mathematical operations on the encrypted data, without requiring access to the decryption key [2]. As an illustration, the ciphertexts returned from HE of the numbers 2 and 3 can be “added” with the resulting ciphertext returning 5 when decrypted. However, today ML on HE data (ML-HE) is computationally very expensive, limiting its spread and development.
By developing a new hybrid consensus scheme relying on ML-HE data for its proof-of-work component, we aim to create a marketplace for secure and privacy-preserving data mining, capable of bootstrapping the spread and development of ML-HE.
METHOD
Core to our proposal is to structure the mining problem as an ML competition, where the miner that best predicts the test set targets wins and gets to assemble the block. This requires a hybrid consensus scheme made up of stakers and miners. The scheme relies on the actions of four parties: 1) a data provider, 2) miners, 3) an organizer controlling the block timing and 4) block validators. In short, the data provider, provides data, preprocesses and makes it available to the network. The miners train ML-HE models on the provided data and submit their predictions of test data. The organizer is composed of a set of stakers that, via threshold encryption and signing schemes, ends the submission period by releasing the test targets and thereby controls the block timing. Finally, the validators accept the block with best proof, made up of the test targets and the best predictions.
PROTOCOL OVERVIEW
Arbitrarily long in advance of block n-1, the data provider performs 5 tasks. 1) Encrypts the data set with an HE scheme, 2) splits the HE data set into training and test sets, 3) hashes the test targets together with a nonce, 4) encrypts the test targets with the organizer's public key, and 5) via a contract commits to a payment for the winning model. The data provider releases the HE training data and test inputs, threshold encrypted test targets and the hashed test targets. Based on the data provider's payment commitment, the miners select a data set to mine. The miner winning block n-1 includes the test target hash for block n in the header, thereby tying the following block to a particular data set. After the publication of block n-1 the miners can submit blocks with their test target predictions for block n. Submissions are signed by the organizer up until the deadline when the organizer decrypts the encrypted test targets, and the validators evaluate and accept the winning block.
RESULTS AND CONCLUSION
This scheme provides two incentives for miners. The incentives are aligned but work at different time-scales. The short-term incentive is the fixed block reward, but over a longer time period, miners are also incentivized to perform a service for the data providers. The data provider pays a variable part of the total mining reward. How much depends on the value of the model to the data provider. Thus, beyond simply winning the block, it is in the miners' long-term interest to provide good models to data providers. It is worth noting, that after the block is won, the resulting model is valuable only to the party that can decrypt its output, that is, the data provider. Thus, there is no incentive to steal or withhold the model from the data provider.
Finally, by rewarding the winning miner with both the block reward and the data provider's payment, the mining work is allowed to be more costly than the trained model is worth to the data provider. Currently, ML-HE data is computationally much more expensive than ML on native data, thus limiting its adoption. By having a greater total reward we hope to create an economic incentive that will stimulate the development of more efficient ML-HE algorithms.
REFERENCES
[1] M. Spoke and Nuco Engineering Team, “Aion: The third-generation blockchain network,” 2017 [Online] Available: https://aion.network/media/en-aion-network-technical-introduction.pdf
[2] L. J. M. Aslett, P. M Esperança and C. C. Holmes, “A review of homomorphic encryption and software tools for encrypted statistical machine learning, ” 2017 [Online] Available: https://arxiv.org/abs/1508.06574v1