Oops, we did it. We're chiming in on the data contracts discussion. That makes this edition of Mind the Gap a bit more data- and tech focused than you're used to. Read along for our note on how the discussion around Data Contracts might be missing an important perspective.
But first, new week, new content. The notable quick snacks:
- Silicon Valley can't keep track of data. If the angels of tech and data heaven can't do it, how should the rest of us achieve it?
- We've released an interface, the Data Subjects API, to easily and cheaply manage Data Subject Requests (like RTBF) on STRM-processed data. Our engineer Bart wrote an extensive tutorial using batch jobs (Note: this is tech-heavy)
- If you've been growing fat on data, it might choke you now. Some reflections on the strong winds hitting the biggest tree of them all.
- And finally, you won't believe this: American churches (driven by ethics...) are using invasive phone-monitoring tech to spy on their believers.
Can I haz data? How data contracts go beyond technology
Over the past weeks the chatter around data contracts has been building. Some of the most heard voices (one, two) in data have been explaining, advocating, demonstrating and arguing against the utility of data contracts and the problems they (aim) to solve. But one important perspective is generally left out: many modern problems related to data span more perspectives than just the technical. Data contracts can help to involve more stakeholders beyond the data folks.
But wait, data contracts?
A little primer: a data contract is simply a data schema (which data?) annotated with an agreement on what should be inside - for instance that a customer ID consists of 9 digits prefixed by a “C”. Being an agreement, the parties involved sign-off on it. Like "I'm buying this house for 400k plus tax and I expect nothing will be wrong with it" - but for data.
Based on the agreement, you can check and enforce if your data adheres to this shape. This usually happens between a data producer (e.g. an app sending analytics data) and the data consumer (e.g. an analytics team upcycling that data into insights for the organisation). Breaking the contract blocks the data, making sure no dirty dots of gooey data end up downstream where they clog and cause explosions in your pipelines (geopolitical pun intended).
data contracts streamline what to expect from data.
STRM is coming
When we started STRM, one of our earliest challenges was to find a way to translate a core GDPR requirement -data is collected under and can only be used for a purpose- into a technology concept. We tinkered with symmetric keys (every purpose has a separate key, and can only be decrypted using the matching key for that purpose). Our conclusion was it would yield too low a cardinality to fulfil the data minimisation requirements on our radar (too many data points are encrypted by the same keys). But because data technology (like a database) already has a need to know things about the data it carries, the data schema, wouldn't that be redundant? No, we figured, as schemas are an historical necessity to know how to store the data, not how one can use the data (the purpose in the context of data privacy/GDPR).
So: enter data contracts.
And so data contracts came about as the core of our platform: they enable you to codify the sensitivity of a field and context of how data can be used and under which circumstances.
Through the data contract, all stakeholders involved in data privacy can reach an agreement on how data can be used and should be transformed relative to the purpose - down to every field and combination of purpose x destination.
Relativity in theory
And that's a very handy trait in a domain where the key concept is "relativity". Because the basis of most legal discussions is "well.....it depends". That's where perspectives meet and discussions happen. What do I need data for? Can we take this risk? What can we do with data that's already there? What can we do with data for this purpose in that country?
And discussions take time, especially across departments that speak legalese and code respectively. And time is... value.
So, dipping our toe in the data lake, there is another pro for data contracts:
Data contracts turn the agreements necessary to collect and consume data from a collaboration problem into an opportunity. As such, they help to clarify, streamline and codify the coordination on data discussions outside of just the tech domain. And that, dear folks, is great value: time saved, energy preserved, more "yesses" from legal and so higher velocity and shorter time-to-market for that new data products of yours.
In the context of privacy, data contracts help other teams than just data to answer a seemingly simple question together:
But can I use that data?