release

Introducing the STRM Data Plane

We're introducing a self-hosted deployment option, so you can keep all sensitive data and processing inside your own cloud.

Pim Nauts, Founder

Reading time: about 2 minutes.

With STRM, we aim to shift left with data privacy and to make building with sensitive data much easier. Not by classifying after the fact or synthesizing data, but by embedding privacy (policies) inside your data: privacy by design for data.

When we set out to achieve this, one of our first important realizations was we would already need to design and build a set of complex technologies. Coping with the intricacies of different environments (like AWS or Azure in any possible configuration) is additional complexity. As that would only push the “minimal working” threshold further away, we decided to design STRM for portability, but to launch as SaaS only.

And that’s what we did, with our SaaS platform supporting the most critical functions of a privacy by design data platform:

the STRM privacy platform
the STRM Privacy platform (with example input and output for medical applications)

All quick and easy to setup through console and CLI as a SaaS solution.

Splitting the STRM Data and Control Plane

Being in “privacy” means we deal with sensitive data, often in sensitive domains like health. While a SaaS solution is a great way to leap ahead on the privacy dimension, data is often regarded so sensitive and strategic many prospects prefer (or simply require) to keep customer data inside their own cloud subscriptions.

As we’re learning and deepening our knowledge of customer demands and processes, we could also fill in important conditions to take that next step: running our data plane inside a (for us) foreign cloud/VPC.

Which, surprise surprise, we’ve tested across a bunch of clouds by now and are launching today 🎂 (officially as bèta).

Whole lotta benefits

There are important benefits that come with the self-hosted deployment as compared to SaaS. You get Privacy by Design for Data that is…

  • More secure: data does not leave your environment
  • Simpler to implement: security and privacy policies do not have to be extended or assessed (e.g. we’re not a data processor for customer data anymore!). Strict security and vendor requirments apply as if it was an internal service (which it effectively is)
  • Easier to integrate: existing security policies and configurations apply, the data plane ties directly into existing data storage, or directly reads and writes on existing Kafka topics without extra roundtrips.
  • Cheaper to operate: No extra bandwith and ingress/egress costs, the Data Plane runs on existing committed use or discounts and benefits of your current provider apply
  • Easier to verify: Ofcourse you can trust us. But it certainly helps we’ve open sourced the Helm chart.

The only drawback: you need a sysadmin, SRE or DevOps team to set it up. So we’re still offering SaaS 😉

How the STRM data plane works

With STRM’s self-hosted option, all components that touch your customers’ data are split from other platform components. Only (meta)data on system health and configurations like data contracts, input streams and privacy streams are retrieved and stored on our control plane (for sign-up your own email is still required and of course needs to be stored).

STRM Data and Control Plane

An overview of the components in the platform and how the STRM control and data plane are split

This means you can run the STRM Privacy platform without data of your customers leaving the environment, as long as you run a cloud that supports Kubernetes and Helm.

We verified the Data Plane on AWS, GCP, Azure and OVH Cloud (with storage on Clever cloud) already. Technical note: Redis and Kafka (necessary for streaming mode) are packaged, or you can hook them up to existing instances.

Requirements

In order to run the Data Plane, STRM needs:

  • An existing or new Kubernetes cluster with access to the internet to connect to the control plane
  • Helm (helm.sh) must be installed, and must have access to the cluster
  • The bare minimum even runs locally with k3s
  • For testing: at least two k8s nodes with ~8gb of memory and two cores are recommended
  • For production we recommend at least 8 nodes, 4 cpu cores and 16GiB per node of available memory

Testing specs will take a few 100 requests per second. Of course we scale horizontally and the data plane is autoscalable with any good k8s scaling supervision. Good to know: from our experience, for production settings, we can run in headroom next to existing workloads of committed use (esspecially for batch, as that’s spin-up, spin-down with very little resources idling).

Setup and quickstart

Setup is straightforward if you’re familiar with k8s and Helm charts:

  • Make sure your STRM subscription is upgraded to self-hosting (please request to if not)
  • Retrieve the necessary Helm chart and/or value from the installations pane
  • Submit the Helm chart to your existing k8s cluster
  • Watch the magic: our control plane will instruct and setup everything and one by one all Data Plane containers will turn from red to green, ready to receive and process the data.

A more comprehensive technical explanation is included in the documentation on Customer Cloud Deployments.

You can see it in action in the following video, where Bart walks you through the quickstart:

Like magic 🪄

I believe this is an important milestone for STRM and an impressive achievement by our team, and I’m exited we can deliver even more trust to our customers and prospects with all the benefits of privacy by design for data.

I’d like to end on a small personal reflection: I come from computational linguistics and lead an applied ML group in e-commerce for quite some time, with often amazing outcomes. We never sold STRM as being machine learning or data science, although it’s place in ML and DS stacks is very clear to us. But seeing the STRM Data Plane being pulled up like magic once the Helm is submitted is the closest thing to “artificial intelligence” I have personally been part of. 😇

Request a demo!

Curious and ready to test or run a self-hosted STRM deployment?

Get in touch to request a demo!

PS We’re hiring!

Want to see your code inside the Data- and Control Planes too? Come build STRM to help data teams deliver data products without sacrificing privacy in the process. We are hiring!

Decrease risk and cost, increase speed encode privacy inside data with STRM.