Data Contracts for Privacy

The discussion around Data Contracts might be missing an important perspective: how to foster collaboration outside of tech and data.

can I haz data?
A kitten AND data contracts is all the hype you need right now.

Oops, we did it. We’re chiming in on the data contracts discussion.

Over the past weeks the chatter around data contracts has been building. Some of the most heard voices (one, two) in data have been explaining, advocating, demonstrating and arguing against the utility of data contracts and the problems they (aim) to solve. But one important perspective is generally left out: many modern problems related to data span more perspectives than just the technical. Data contracts can help to involve more stakeholders beyond the data folks.

But wait, data contracts?

A little primer: a data contract is simply a data schema (which data?) annotated with an agreement on what should be inside - for instance that a customer ID consists of 9 digits prefixed by a “C”. Being an agreement, the parties involved sign-off on it. Like “I’m buying this house for 400k plus tax and I expect nothing will be wrong with it” - but for data.

Based on the agreement, you can check and enforce if your data adheres to this shape. This usually happens between a data producer (e.g. an app sending analytics data) and the data consumer (e.g. an analytics team upcycling that data into insights for the organisation). Breaking the contract blocks the data, making sure no dirty dots of gooey data end up downstream where they clog and cause explosions in your pipelines (geopolitical pun intended).


data contracts streamline what to expect from data.

It’s all about the handshake. An example of data contracts in event pipelines by

STRM is coming

When we started STRM, one of our earliest challenges was to find a way to translate a core GDPR requirement -data is collected under and can only be used for a purpose- into a technology concept. We tinkered with symmetric keys (every purpose has a separate key, and can only be decrypted using the matching key for that purpose). Our conclusion was it would yield too low a cardinality to fulfil the data minimisation requirements on our radar (too many data points are encrypted by the same keys). But because data technology (like a database) already has a need to know things about the data it carries, the data schema, wouldn’t that be redundant? No, we figured, as schemas are an historical necessity to know how to store the data, not how one can use the data (the purpose in the context of data privacy/GDPR).

So: enter data contracts.

And so data contracts came about as the core of our platform: they enable you to codify the sensitivity of a field and context of how data can be used and under which circumstances.
Through the data contract, all stakeholders involved in data privacy can reach an agreement on how data can be used and should be transformed relative to the purpose - down to every field and combination of purpose x destination.

Relativity in theory

And that’s a very handy trait in a domain where the key concept is “relativity”. Because the basis of most legal discussions is “well…it depends”. That’s where perspectives meet and discussions happen. What do I need data for? Can we take this risk? What can we do with data that’s already there? What can we do with data for this purpose in that country?
And discussions take time, especially across departments that speak legalese and code respectively. And time is… value.

So, dipping our toe in the data lake, there is another pro for data contracts:
Data contracts turn the agreements necessary to collect and consume data from a collaboration problem into an opportunity. As such, they help to clarify, streamline and codify the coordination on data discussions outside of just the tech domain. And that, dear folks, is great value: time saved, energy preserved, more “yesses” from legal and so higher velocity and shorter time-to-market for that new data products of yours.

In the context of privacy, data contracts help other teams than just data to answer a seemingly simple question together:

But can I use that data?

Decrease risk and cost, increase speed encode privacy inside data with STRM.