Supported by a global cultural shift in how individuals view the ownership of their personal data, new privacy-based data management technologies are being developed that are changing the way we leverage and share our personal data without the need to completely expose it. Businesses are reluctant to release their hold on our data until they see further maturity of this technology and its full benefits but are slowly coming around to accept this new paradigm.
To understand why these new technologies are a win-win for both individuals and businesses, it’s helpful to first get some perspective on recent data privacy breaches.
Whether it was the ransoming of Uber’s 57M rider and driver accounts, the hack of 1B Yahoo email accounts (now being revised to 3B accounts), the heist of Equifax’s 143M individuals unencrypted financial and credit data, or the 50M US Facebook consentless accounts sold to Cambridge Analytica, the result was non-anonymized personal data was put into the hands of people we didn’t approve of, to be used whenever and however the recipient chose. The exposure of our daily life activities with Uber’s breach give the acquirer the ability to rewind our life to any point in time: where we’ve been, when, and for how long. The violation of communications privacy with Yahoo data provided direct access to our interactions within social, professional, and personal networks. Equifax’s breach laid bare a full disclosure of our personal finances, offering insight into our primary assets and even more importantly our liabilities. Although not a data breach in the technical sense, the Facebook-Cambridge Analytica data sold without consent (and now similar findings with analytics firm Crimson Hexagon), resulted in unabridged access to our personal demographics and affinities. The trove of data accessible to these firms gives them insight into our personal social graphs. Social-graphs include data beyond basic attributes like sex, age, and income. They can answer questions like, is someone an introvert or an extravert, do they like women, men, both? How liberal are they, and do they believe in G-d? If you think this kind of leakage is all just hypothetical, a breach of a marketing firm Excatis last month, touched almost every american in the US with this exact type of data.
For businesses using our data for legitimate purposes new regulations like the General Data Protection Regulation (GDPR) in the EU and now California’s Consumer Privacy Protection Act of 2018, are beginning to address the issue of consent. An output of these regulations is clearer data usage policies and greater individual control of how our data can be used. Unfortunately, consent doesn’t cover something even more important, the incredible vulnerability of storing consolidated silos of personal data in one place.
Img Src: hiltonjohani.com
To protect these data silos, businesses will spend some $96B globally in 2018 on cybersecurity software and services, an increase of 8% from 2017. Over the next five years, enterprise cybersecurity spending is expected to rise 10% year-over-year.
In addition to threat prevention and intrusion detection, symmetric encryption (a single key for encryption & decryption of data) is the most commonly used approach to protect the data itself. But depending on the protection services implemented, personal data like passwords are still likely to be handled in plain text at some stage of the encryption process, opening up opportunities for internal data leakage. No matter your prevention, intrusion, or encryption techniques, the number one source of breaches still remains human error. Using phishing techniques, hackers can acquire account passwords, administrative logins, and decryption keys. And leaving all of the data in one central place still makes it an easy target.
In the 1983 techno-thriller War Games an intelligent computer system WOPR (War Operation Plan Response) takes over a US nuclear arms facility and runs multiple simulations of what it thinks is a game of Global Thermonuclear War. Ultimately WOPR comes to the conclusion that the only way to win the ‘game’ was simply not to play. With an endless litany of data breaches worldwide, and quickly accelerating, isn’t it about time we learned that part of the answer to addressing problem is also to simply not play?
Why put all that data in one place if the data can be provided in a way that companies can use it without having to store it? Businesses already use a similar approach for credit card processing. Companies like Google, Samsung, and Apple, provide what’s called tokenized payment services. During the payment process, instead of processing an individual’s actual credit card number, tokenized payments generate a temporary one-time use number making it possible for the retailer or servicer to verify and maintain a record of the transaction without the risk of the actual credit card information being hacked. This reduces significant risk for both the individual and especially for the business not to mention the the reduction of insurance and financial compliance costs.
So how can a similar approach be used for personal data?
Zero-Knowledge Architecture – The Brains Behind It
The concept of Zero-Knowledge Architecture (ZKA) at its simplest is any approach which allows someone or something to verify a set of data (aka ‘knowledge’) so it can be used for a particular purpose without providing the service visibility to the set of data itself.
Img src: avarayoga.com
The term Zero Knowledge (ZK) is often used very loosely, but one implementation of the ZK concept refers to the practice of encrypting data to ensure only the provider of the data can view the data itself. Examples of this include Sync.com, pCloud, SpiderOak,Tresorit, and MEGA where public – private key encryption is used for cloud data storage and backup services such that the owner of the data holds a private key that can be used to read or decrypt data encrypted with its public key that is managed by the servicer. SpiderOak is even trying to establish a set of Zero-Knowledge Privacy Standards.
Alternative approaches to asymmetric cloud storage are being implemented by companies like Storj and SAFE Network that truly decentralize and distribute personal storage by leveraging blockchain technology and incentivize the crowd to store and care for your data.
Zero-knowledge capabilities are also beginning to appear in database platforms. Known as homomorphic encryption, one can perform database computations and functions directly on encrypted data without having to decrypt it. Microsoft, Google, and SAP have already incorporated this kind of functionality in their database products using open source library developed by MIT called CryptDB.
The source of the term Zero-Knowledge comes from the work of three cryptography researchers from MIT and University of Toronto that were working on a paper in 1985 called “The Knowledge Complexity of Interactive proof Systems”. Shafi Goldwasser, Silvio Micali and Charles Rackoff wanted to show that through a series of interactive challenge questions, a provider of some knowledge, the prover, could demonstrate to a verifier, that a proposition about the knowledge was valid. They also wanted to ensure that in the process, the questions or challenges didn’t reveal any further knowledge about the information other than its veracity. The result was the concept of Zero-Knowledge Proofs.
To verify the data was complete and accurate the verifier plays a game similar to 20 questions with the prover (the supplier of the knowledge) using mathematically masked pieces of the data itself. Each time a question (aka challenge) is asked, the data (knowledge) is slightly twisted to prevent the user from piecing together the entire set of data. The more questions the prover answers correctly, the more likely the data the prover holds is complete and accurate.
Here’s how it works. Imagine you are blindfolded and I hand you a closed box that on the inside has two sections. I tell you that on one side I have put slices of a red Honeycrisp apple (my favorite), and on the other side I put slices of a green Granny Smith apple (my second favorite kind). How can I prove to you that I have given you two different types apples without you being able to see them or taste them?
Zero Knowledge Box
One approach would be for you reach in to the box hidden by the lid, and let you choose two slices, either both from the left, both from the right, or one from each side. You then pull the pieces out and put them on a plate and ask me if the pieces came from one apple or two apples. Because I can see the skin color (or taste them if I was hungry), I can tell you if the pieces were from one apple or two different apples. But wait, since I had a 50/50 chance of getting it right, I could be lying to you. So to be sure, we run thru the exercise multiple times. Probabilities tell me that after each round, my chances of answering you correctly on consecutive tries are reduced by one half, so by your 20th question, my chances of getting all 20 challenges in a row correct are 1 and 1,048,576, or 1 in a million!
You’ll never really know for sure if I’m telling the truth, but the odds go down dramatically the more you challenge me, and I answer correctly. This is the concept behind interactive zero-knowledge proofs. Based on the type of data or knowledge the logic behind each challenge is slightly different, but the output to each challenge is either true or false.
A real-world example might be that a business wants to identify customers within a five-mile radius of its store. It then wants to send those customers a promotional coupon, but the customers don’t want to reveal their exact location.
The business might also want to verify a customer has enough money in their bank account to pay for a product or service without revealing how much money is in their bank account.
It turns out that this last example is exactly the kind of verification that is required for a Bitcoin transaction. Only the verifiers (or miners) must know your balances in order to conduct these transactions. In the finance sector it is this transparent processing required by blockchain transaction verification that has limited several financial institutions from using blockchains to conduct asset transfer and settlement.
In practice, because blockchains reveal too much about the transactional activity between peers, usage of this technology is often restricted since it would reveal too much from a competitive standpoint. And more often it is not private enough from a regulatory standpoint.
When we use Bitcoin to make a payment, although a random string of 26-35 characters is used to represent the key of the sender and recipient and the parties involved are not technically revealed, the bulk of the transaction is still transparent. Visible are the transaction amounts, the balances of the sender and recipient, the asset involved, etc. One could even argue that the true identities of senders and receivers are not really hidden on the Bitcoin network since they can be determined using AML and blackmarket detection services like Chainalysis and Skry through network analysis.
Because of this transparency issue with standard blockchain technology, several efforts are underway to apply ZK Proofs to blockchain transactions. These efforts make the transactions completely private but still able to leverage the consensus power of public blockchains. A major hurdle was that ZK proofs require the interaction of challenges between the prover and verifier, and this type of interactivity is not generally possible on the blockchain. So how can you run a ZK Proof on the blockchain without the need for interaction?
ZK-Snarks, Bulletproof & Hawk Oh My! – The Heart of ZK-Tech
In 1991, computer science researchers Blum, Feldman, and Micali extended ZK proofs to incorporate a non-interactive approach. Known as zero-knowledge succinct non-interactive arguments of knowledge or ZK-Snarks for short, the proofs only required a common string of characters known by both the prover and verifier to demonstrate the prover had some knowledge or data without any information leakage and without the need for a verifier to pose multiple challenges. Thus they were non-interactive.
By using a prover’s electronic signature (a hash of a random number & private key) as the common string, ZK-Snarks could be used within blockchain transactions to hide the sender, receiver, amounts, and type of asset but still permit miners to use the Snark computation to demonstrate that the transaction was valid. Using this approach, decentralized anonymous payment (DAP) coins are a new approach to providing zero-knowledge payment services leveraging ZK-Snark and Snark-like technology into their transactions.
Like flying-monkeys, DAP coins incorporating variations on the use of ZK-Snarks seem to be appearing everywhere. Examples of these are ZCash (ZEC), 0x (ZRX), ZenCash (ZEN), ZeroCoin (ZER), PIVX (ZPIV) Komodo (KMD), ByteCoin(BCN), and Cardano (ADA).
Img src: mothersky.com
Where Bitcoin has three plain text columns for each transaction, the sender’s address, the receiver’s address, and the amount of the transaction. The ZCash DAP coin has the same three columns but all are encrypted. Additionally ZCash has a fourth column column made up of a ZK-Snark proof. The ZK-Snark in the fourth column is used by miners to validate the transaction. No one could have created the three encrypted values and the proof in the transaction unless they had a secret key which also had sufficient value to cover the amount being transacted. In the event that a regulator would want a complete picture of an individual’s transactions, ZCash provides the ability to generate viewing keys that can be distributed to other parties for read-only access.
Lastly, Zcash uses homomorphic encryption. Discussed earlier, homomorphic encryption enables calculations and functions to be performed on multiple encrypted transactions at one time.The New York State’s Department of Financial Services recently named Zcash as one of the six approved cryptocurrencies on the heavily regulated Gemini exchange. The Gemini Exchange is run by the infamous Winklevoss twins. Full details on ZCash’s use of ZK-Snarks can be found here.
It’s important to note that ZK-Snark proof requires more CPU and memory than a Bitcoin transaction and takes longer to process. A Bitcoin transaction usually about 0.3kb in size generally takes less than 1 millisecond to process, while one from ZeroCoin for instance requires approximately 45kb of storage and 0.5 seconds to process. And Zcash, while significantly better, still requires 1kb of storage and about 6 milliseconds to process.
Not to be outdone by ZCash, on Oct 2017, Ethereum introduced an update to their blockchain network supporting private transactions also using ZK-Snark verification called Byzantium. The first transaction using a ZK-Snark from Ethereum on their test network can be viewed here.
If you think DAP coins are just a fad, Coinbase recently announced 5 coins they are evaluating to add to their exchange and 3 of the 5 of are DAPs! Zcash, Cardano, and 0x.
Major financial services have also announced support for blockchain ZK proofing in their transactions. Last year, JPMorgan introduced Quorum, an Ethereum-derived, permissioned blockchain platform which integrates a zero-knowledge security layer (ZSL) into its enterprise blockchain. Multinational banking and financial services ING also launched its own zero-knowledge range proof (ZKRP) service. ING asserted that their ZK platform is 10X more efficient than other privacy-based transactions on the Ethereum network.
But not all coins use Snarks for anonymous payment processing. Monero (XMR) leverages multiple signers, or Ring-Signatures, for every transaction to mask the address of the sender and encrypt the transaction amount. It also creates a one-time Stealth-Address based on the recipient’s public address to hide the receiver. You can read more about Monero here. Some currencies like Verge (XVG) incorporate anonymous payment by hiding the sender and receiver’s IP address using the Tor anonymity network originally funded by DARPA. Dash (DASH) has a private send function which uses Dash masternodes that look for transactions from multiple senders that can be broken down into similar common dominations and then mixes those amounts into an anonymous address from which payments are made.
One might ask why we need a blockchain in the first place if a ZK-Snark can prove the validity of a transaction. In addition for the need to maintain a historical record of transactions via an impartial party, the public, the blockchain is necessary because given that all transaction data is encrypted, we need open agreement on the proofs themselves. I.e. that they are being verified and accepted by the receiving parties.
But using consensus has a potential vulnerability. If any one individual or group controlled 51% of the hashing power of the network, they could essentially approve fake transactions to create inflate their account balances. This is known as a 51% attack. Five crypto currencies were recently attacked in this way including $500K USD stolen from ZenCash thru double-spending. There has also been suspicion that chinese companies Bitmain along with others, has been amassing mining computers using special processor chips called ASICs, short for application specific integrated circuit, that specially target the hashing algorithm for a particular currency enabling them to outperform miners using standard CPUs. Some DAPs like PIVX are using a different kinds of consensus method called Proof-of-Stake over the traditional Proof-of-Work consensus to be 51% attack resistant. Monero coin recently implemented some changes to their hashing algorithm to prevent the use of ASICs chips and surprisingly the hashing power for the entire network dropped by a staggering 70%!. Meaning 70% of the computing power behind Manero was probably using these specially equipped cpus to outcompete standard miners.
Another challenges of DAP coins like ZCash is that they rely on a trusted setup process in which two long public keys are derived from a single randomly-generated private one. It’s critical that during this setup process that the private key is destroyed, since anyone who possesses it can forge the proofs on which the system relies. In the case of Zcash, the private key was created in October of 2016 using an elaborate ceremony involving several well known individuals from the cryptocurrency world, each of whom had only a partial view of the private key. This means that the Zcash network could be compromised if the participants wanted to collude maliciously. Ethereum’s implementation of ZK-Snarks along with most DAP coins also required a similar trusted setup. Because the output of this kind of setup could corrupt the network, it’s often referred to as toxic waste.
To solve the problem of toxic waste in a trusted setup for ZK-Snark based networks, a group at Stanford developed a variant called Bulletproof. Unlike ZCash, Bulletproof Snarks don’t require a trusted setup and thus no toxic waste. The downside of Bulletproof is that creating proofs can take significantly longer to generate than those generated using trusted setups. To offset this, Bulletproof can dramatically reduce transaction size and thus transaction fee costs. According to the Bulletproof whitepaper, if Bitcoin were to adopt Bulletproof, it would reduce it’s transaction size by a factor of 10. From 10KB for an average transaction to 1KB. As proof of its support, Monero just recently decided to adopt the Bulletproof ZK protocol.
Another area of blockchain privacy development is within smart contracts. Smart contracts enable the processing and enforcement of conditional rules directly on the blockchain. They can involve one or more parties, and can be triggered by events like if a stock price crosses a threshold, a baseball team wins a game, or simply after a prescribed amount of time has passed. Once an event or events are triggered, the contract can disburse funds it is holding. Andrew Miller at University of Illinois, Urbana-Champaign has developed a ZK algorithm for Smart Contracts called Hawk. Hawk works much the same way ZenCash does for transactions. The primary value add of Ethereum is its ability to execute smart contracts. The problem is if you want the advantage of using a public blockchain your smart contract rules are visible to anyone and any transactions that generate from it. Hawk keeps the contract code private, along with data sent to the contract, and money sent and received by the contract private from the public. Hawk would enable blockchains like Ethereum to execute these with public consensus but private execution.
We’ve shown that it’s possible to use zero-knowledge techniques to protect personal data on public storage using public-private keys, validate blockchain transactions and execute blockchain contracts with complete privacy with implementations of non-interactive ZK-Proofs, the next step is providing an easy on-ramp for individuals to authenticate themselves and interact with ZK services in their daily lives.
Personal Data Wallets – The Courage to Decentralize
If we want to begin seeing a decline in personal data breaches it makes sense to limit the amount of data that needs to be stored in these centralized silos in the first place. By leveraging many of the ZK technologies we’ve already discussed, ZK along with blockchain, makes it possible to shift control of the data back to its owners, enabling them to maintain the master copy of their own data and share only what is necessary.
Unfortunately, the biggest hurdle to this is not technical but more cultural.
It’s only instinctual that businesses want to hang on to protect, and control their property. But this property is really our personal data and analytics generated from our activities. Businesses need to be shown that returning ownership of personal data back to the individual is in their best interest, that there’s a light at the end of this tunnel.
A system of having fiefdoms owning and protecting property and resources generated by the masses should sound familiar. It’s essentially the feudal system of the Western European Middle-Ages.
The Feudal System of the Middle Ages Europe – Img src: jenkinsmiddleages.weebly.com & timeref.com
The feudal system eventually was turned on its head with the concept of constitutional systems in which the government served the people instead of the other way around. In part, France’s Declaration of Human Rights guided these new systems to ensure individuals were provided universal rights and freedoms including the right to free speech and the right to own and profit from one’s property. This way of thinking led us out of the dark ages into a period of enlightenment also known as the the Age of Reason. We are just beginning to see this shift happening in the cyberworld of today.
A central component to this shift is in how we share and store our data. Whether it be financial, social, communications, or health related, our data tends to be siloed with the services from which we use or generate that data. Our social-graphs are with Facebook, our shopping information with Amazon, our financial information with our banks and investment services. We are all becoming fast familiar with the concept of crypto wallets to hold our Bitcoin, but a new kind of wallet is developing that will not only manage our finances, it will also manage our identities and personal data. In addition to storing crypto assets and personal data, a personal data wallet can provide the ability to authenticate our identity and interact with financial and public services.
We are starting to see development of blockchain digital wallets with identity services like SelfKey, Civic, VALID and UPort. To aid and expedite the regulatory services financial institutions are required to go thru when verifying their customers commonly known as Know-Your-Customer or KYC, SelfKey is developing an ecosystem of banks, exchanges, credits, and government services that integrate with the Selfkey personal wallet. Normally KYC processes require the sensitive distribution of government, legal, and tax-related documents amongst multiple parties. SelfKey is using the blockchain to verify these documents for all parties rather than pass around the actual documents.
SelfKey can also be used to provide attestations. For instance, a SelfKey wallet could be used to tell your mortgage broker. that your credit score is above 750, you manage total assets worth over $250,000, and that your last income tax report had net income above $100,000 without revealing the score, account balances, or tax documents themselves. Uport is currently working on efforts to provide the residents of Zug, Switzerland, a blockchain based ID that operates in much the same way. Known as Zug ID, it can be used to interact with local banks, retailers, and government services.
Data management oriented wallet services like MeeCo, DigiMe, DataWallet, and Nuggets are also popping up that provide mechanisms for collecting, sharing, making purchases, and in some cases, monetizing one’s own personal data. These systems allow data to be stored decentrally either using personal data storage services like Dropbox, or by using a combination of blockchain and ZK cloud data storage.
A barrier for all of this is establishing identity and data sharing standards. Organizations like the Decentralized Identity Foundation (DIF) are currently working to address these. Once these standards along with common ZK standards are established and accepted, we’ll begin to see a merger of data wallet functionality and greater adoption.
When the Internet was first developed in the late 60’s early 70’s it wasn’t until standards like TCP/IP were developed in the 80’s and more user-friendly tools like the web-browser was created in the 90’s. Since the blockchain has been gaining ground over the last decade, the big question has been, what will be the internet browser of the blockchain? It’s this author’s guess that buoyed by decentralized identity verification standards and zero-knowledge data sharing capabilities, it will likely be the personal data wallet.
Img source: OneMe.io
Data wallets will provide a decentralized approach to enabling us to pay our bills, pay each other, apply for jobs, share data with our healthcare provider, form and manage business agreements, interact with friends and family. We will even have the ability to earn rewards for allowing services to use our data with or without anonymity. All of this in a safe, privacy compliant, zero-knowledge oriented way.
Inside the next decade, we should expect to see the tipping point of ZK technology maturity and the acceptance by businesses of individual sovereignty over their personal data and identities. What follows will be a terrific battle among several players to dominate the data wallet market.
Img src: wakeup-world.com
New data privacy regulations like the EU’s GDPR and California’s Consumer Privacy Protection Act will help move the needle in the adoption of more transparent and consent-based exchanges of personal data, but businesses will need to recognize the benefits of shifting ownership of data and identity back in the hands of the individual. Without this shift, large product and service organizations will continue to see their centralized stores of personal data pilfered and public outrage over personal data privacy violation will grow. Zero-knowledge technologies that leverage mathematics to enable businesses to use personal data for verifying identity, engaging in contracts, and conducting commerce without the need to reveal or even store the data itself are evolving to address these issues. Once users are given back more control and greater privacy of their data, the synergy of zero-knowledge technologies, the blockchain, and personal data wallets will make a dramatic impact on the way we interact with each other, leverage online services, and conduct everyday business, catapulting the blockchain into a mainstream technology in the same way we think of the Internet today.