- Attaching legal ground and consent
- To process personal data under GDPR you need a legal ground (like fulfilling a contract, or legitimate interest) or consent (a user consented to a specific usage of their data). With STRM, you attach the legal ground as a simple value (e.g. "1") to every datapoint we receive. This acts as the instruction for further processing, and as proof of legitimate collection.
- consentLevels: legal ground, granular and cumulative consent
- In setting legal ground and consent levels, you can choose if they are granular or cumulative. Granular means the provided opt-in stands on its own. E.g. "I consent to personalized marketing" and "I consent to analytical purposes". Cumulative means a higher consent (like "2") includes the lower consents: "I consent to analytics AND personalized marketing"
- Data schemes
- Data schemas describe your data shape: the fields and values you need. In batch modes it's usually the header row of a file describing what the data per column means. In streaming mode, it's an AVRO or Json Schema data event (confusingly technically referred to as the "serialization schema" - how the data should be serialized by a machine).
- Data contracts
- Data contracts describe the privacy implications of your data. They refer to a data schema, and specificy which fields (a) connect datapoints (the Key Field), (b) which fields are sensitive and (c) under which consent or legal ground these fields can be processed.
- Privacy levels
- Privacy levels allow you to choose how you want to transform and process the data into privacy streams. Upon arrival of the data every datapoint is encrypted, the privacy level determines what happens in further processing:We are working on additional privacy levels and customization. Contact us if you need something different!
- Data de-identification & pseudonimyzation
- Data is de-identified by using the encryption keys to "blur" the original user (personal fields become a meaningless string, like "9e9e2b04-7869-11ec"). No additional processing is applied to warrant identification, but over time the same user would show up with different encrypted values. Based on other features in the data, e.g. a female of 161cm and 51kg from Lisbon that loves macarons, identification is still possible and usually relatively simple.
- Data masking
- Masked fields are encrypted with the same key over time, which means data is de-identified (not anonymized!) with a stable key over time. This allows longitudinal analyses on for instance credit card numbers, or heart rate data. No additional processing to warrant privacy is applied - you would see the same person, just not who it is based on private values. It differs from de-identification in that the same user or field is visible over time.
- Data anonymization
Anonymization is achieved through strong encryption and key rotation.
Event data has a temporal dimension and is anonymized in real-time by rotating the keys between different points in time. This way, data belonging to the same person is only connected to that person within a timeframe (e.g. the same user appears as a different user for each day of the week).
Batch data is anonymized using the temporal dimension. This means we need different timestamps per row in order to anonymize. Anonymization on batch data that is user-level only (e.g. "Bob bought 3 tomatoes" with Bob occurring only in one row) is currently *not* possible.
- Privacy streams
- Privacy streams are specific data interfaces that provide data to a team or application. They are derived from the (full) input stream or batch, and split and transformed according to the attached consent/legal ground level and data contract. This way, you can send data once (to the input stream or batch), while we take care of the transforms and processing. You can then provide e.g. your analytics team with a fully anonymous privacy stream, a specific stream for recommendations or personalization, one for customer services etc. etc.
- Right to be forgotten (RTBF)
- RTBF is a right under GDPR users can exercise to make sure they are forgotten in your systems (no trace of personal data). Executing RTBF is a complex (and expensive) task, as user data can live in many systems and locations, be copied and transformed many times over etc. With STRM, RTBF becomes a simple operation: if properly defined in the data contract, just throwing away the keys makes you forget a user (if all data was properly consumed through privacy streams).
- Consent Management Platform (CPM)
Technical & configuration
- Mapping consent and legal ground to consentLevels
- We're working on mapping consent and legal ground to the specific meaning for your organization - e.g. "1 means legitimate interest, 2 means personalized marketing consent" etc, but this is not on production yet. Just define your own mapping and send the consentLevel values as integers alongside your data.
- Data schema registry (public and private)
- With STRM, you can register your data schemas in our registry. This means we store the data schema files for public and private use. The Public schemas in the registry are available to any user and serve as inspiration and example. Private schemas are not exposed to anyone but the logged in account.
- Simple schemas in YAML and Json
- Simple schemas are an abstraction of the complex and expensive AVRO schema we use under the hood. Just write your schema in Json or YAML, and we'll take care of the conversion before we add it to registry and use it for serialization and processing.
- Key streams
- To achieve the transformations we apply to data, we use and need encryption keys. Many organizations prefer the abstraction of not having to deal with data transformations and focus on proper configuration, others want to make sure they can also restore data to the original state. By consuming and storing key streams, you can always achieve the latter easily.
- Simple streaming data and real-time data pipelines
- At its technical core, STRM is a streaming data platform. This means it's still valuable even if you're not too interested in our extensive privacy suite. Quickly launch, configure and consume streaming data pipelines. 5 minutes is all it takes! Register your data schema in our schema registry, fire off one of the drivers, build your events, and consume the privacy streams directly into your existing Kafka topics.
- Data quality through gateway validation (regex)
- Data quality can be many things. First and foremost, it's the question whether or not it adheres to the expected format. With STRM, you can define data validations as part of the event contract that do just this: per field, we check if the value adheres to the rule you set, and bounce the event if it doesn't. Validations are set through regexes, a simple but powerful way to define and check patterns.
- Data you send to STRM is encrypted in transit and at rest. For technical storage and achieving the privacy features through encryption, we apply AES256-SIV with rotating keys. End-to-end encryption (where we don't even see your data) is currently not offered - in many cases you have to take a huge blow in performance and E2E is computationally expensive, offsetting potential value and limiting the use cases you can employ.
- Data retention
- Our technical data retention period is 7 days maximum, so we never exceed the legal 30 day retention period for many data types. It is important to emphasize this is the technical max retention only, consumed data is not retained at all.
- Latency & scale.
For streaming mode we're able to achieve sub 50ms latencies from gateway hit to egress, with all privacy transformations applied. In terms of scale, we have seen 50k/events per second for a single customer without modifications.
For batch mode (within cloud from bucket to bucket), scale and size limits are bound to the connection reset times and differ per cloud and subscription - but enough if you're already running it ;-)
- Deployment modes: SaaS or in-cloud
- STRM is a SaaS solution as of now. We're working on Agent Mode to offer deployments inside your existing cloud subscription, or even outside of a cloud altogether.
- Use the console to configure and manage streams, data schemas, data contracts, and all your sinks and exporters through a visual interface.
- Command line interface (CLI)
- Our CLI (Command Line Interface) offers a quick and developer friendly way to create and manage streams, schemas, data contracts and sinks.
- Data Drivers
- Through our drivers, you can send event-level data directly from your application. Currently we support Python, Java, PHP and NodeJS apps.
- Data Sinks
- Sinks are connections to your cloud (usually buckets) we can read and send data from and to.
- Data Exporters
- Exporters are configurations that run data exports (from privacy streams) with a set interval.
- Role-based access control
- Privacy streams allow you to restrict access per role. You can set these inside your own systems (give a specific user or app only access to a specific stream). We're working on organization accounts with cascading users, so you can configure access to specific streams from STRM.
- Data modes: streaming or batch (under development)
- With streaming or batch you can choose how to send (and consume) your data to STRM. Our streaming mode is intended for high-volume, low-latency event data. If you have existing data you need to replay, or your systems run in scheduled processes (from nightly runs to 5-sec micro batches), you can bring and take your data through our batch mode.