Intro
One of the key observations in founding STRM Privacy was that true privacy-by-design in data and at scale is hard to achieve. Even if you are lucky enough to be in an enviroment with good privacy policies, managing them inside the data is something else.
We think we made it much simpler. To quickly see this for yourself we included a CLI command to easily simulate sending events and reading a privacy stream. This post will guide you through the steps.
We assume you have created an account and you have our CLI installed and running. Everything in this demo will be done from the command line.
Scope
The new simulate random-events
command helps you to quickly see what is happening to your data. We will:
- Authenticate against STRM Privacy
- Create a stream
- Create a derived stream to set consent levels
- Start a simulator to send data
- Read a privacy stream on that data with specific consent levels.
1. Authenticate to STRM Privacy
First, authenticate to our platform with the auth
command:
#authenticate
❯ strm auth login
❯ Enter password: ***
This will set up the necessary tokens to safely communicate with our API’s.
2. Create a new stream
Creating a stream is done through the create stream
command.
#create stream
❯ strm create stream winston --save
You have now created a “stream” (hence the command), which you can think of as a pipeline through which your data flows.
Note: Use the --save flag to store the stream’s credentials to local storage. This way the CLI will just use the file to authenticate in the next steps. They are credentials stored on your system however, so please treat them appropriately.
If you don’t want to store the stream settings on disk, just don’t use the --save
flag.
Creating a privacy stream
To read back data you create privacy streams, which applies additional config for further processing (like under what consent levels the data will be collected). You can think of a privacy stream as an egress endpoint that contains nothing but data you are allowed to use under a specific consent.
Context
For our purpose here, we will assume you have two consent levels under which you collect the data: 0 - basic
and 1 - personalized
, and that you’re sending events from a situation where a user is logged in (and so you can have a customer ID for purposes of offering that service).
Create a stream
Let’s first create the privacy stream for the 0 - basic
level.
#create the 0 - basic privacy stream
❯ strm create stream --derived-from winston --levels 0 --save
If you want more fine grained control, set the levels and consent-type explicitly:
#create derived streams of type granular with consent levels 0
❯ strm create stream --derived-from winston --levels 0 --consent-type GRANULAR --save
As 0 is the lowest level, it means those event contract fields that are sent with consent 0 or higher, will be decrypted in the winston-0
stream.
For purposes of this post, we’ll also create the 1 - personalized
consent level:
#create derived stream with 2 consent levels
❯ strm create stream --derived-from winston --levels 1 --save
Note: The difference between --consent-type cumulative
and granular
is laid out in the docs.
Start a simulator and send data with sim run-random
To start a simulator and send some dummy data, you can use the built-in command simulate random-events
on your stream. The simulator uses the generic and simple privacy demo
demo event contract.
If you did not set the --save
flag before, make sure to pass the client-id
and client-secret
):
# Run a simulator and send data
❯ strm simulate random-events winston
# Or without the save flag
❯ strm simulate random-events winston --client-id [string] --client-secret [string]
Note: If you are just testing the pipeline and don’t want to spam your terminal with event prints, --quiet
has your back.
Read and inspect data
With your terminal sending events, the real proof of the pudding is in seeing what’s passing through the privacy streams. The CLI includes a simple websocket interface to read back your data (demo and debug only!). With a split terminal window you can neatly see the differences in the data that flows back (we use this a lot for our demos).
Fire up a new terminal window CTRL | CMD + T
and run strm listen web-socket winston-0
:
#Read the privacy stream for stream winston consent 0 - basic
❯ strm listen web-socket winston-0
{"strmMeta": {"schemaId": "clickstream", "nonce": -1890771136, "timestamp": 1626101758721, "keyLink": "81f112bb-05fe-4f06-942b-6ec012eb7c39", "billingId": "demo4678984730", "consentLevels": [0]}, "producerSessionId": "AVba3kpDl1nmFkB4OPykJYbrvSDCfS4OKMLcP21eYIk=", "url": "https://www.strmprivacy.io/rules", "eventType": "", "referrer": "", "userAgent": "", "conversion": 0, "customer": {"id": "customer-session-593"}, "abTests": []}
**Note**: An earlier version of this post used the old CLI and clickstream schema. We'll update them later
.
This basically reads from a simple WS interface the data that is being sent over the winston
stream with consent level 0
(remember, that’s the safest privacy stream).
As we set different consent levels for each of the two privacy streams, you would expect to see a difference in the egressed events:
#Read the privacy stream that is safe to use under consent level 1
❯ strm listen web-socket winston-1
{"strmMeta": {"schemaId": "clickstream", "nonce": 966067871, "timestamp": 1626101816930, "keyLink": "c123e38a-6715-4ae4-bfe6-41168eca98f2", "billingId": "demo4678984730", "consentLevels": [0, 1]}, "producerSessionId": "session-327", "url": "https://www.strmprivacy.io/rules", "eventType": "", "referrer": "", "userAgent": "", "conversion": 0, "customer": {"id": "customer-session-327"}, "abTests": []}
See what happens? As the producerSessionId
field needs consent 1
(or consistentValue
in the demo schema), it remains encrypted if the event we receive includes a different consent. We apply additional processing (like key rotation) to warrant privacy inside the data.
What did we just do?
- We authenticated against STRM Privacy and created a new stream
- We created a derived stream to set consent levels
- We read a privacy stream on that data with specific consent levels from a built-in simulator command
simulate random-events
on the input stream.
This is just a very simple example to create your privacy streams and see what is happening to the data underneath, please reach out if you want to learn more and have elaborate use cases!
PS We’re hiring! Want to join in on the fun?
Do you care deeply about engineering for privacy, want to contribute to tools like our CLI and making building with data safer for both data teams and consumers? We are hiring!