tutorial

Introducing simpler schemas

With simple schemas we bring you a much easier way to define and read (our) data schemas and event contracts.

Pim Nauts, founder

Properly dealing with privacy inside data is a big challenge. At STRM we’re adding simplicity to the process.

We do this by binding consent to each data event, helping you to launch a privacy-proof data pipeline in just a few minutes and “encode” your privacy policies in data via event contracts and data schemas, all through a convenient CLI and console.

All-in, this makes it much easier to build privacy-by-design and helps you to structure privacy from legal to data operations.

Privacy-by-design starts with data definitions

When working with our clients, we spend most time discussing and refining which data is needed (data schema) and what the implications are for privacy (defined in the event contract).

When building with us, your schemas and contracts can currently be defined only in Apache’s Avro format and Json Schema.

While technically convenient (strict, efficient), they aren’t easy on the eye to non-technical users and a bit of an acquired taste even for many data folks.

They also tend to grow big quickly, like the real-world example from a schema that is used for training and evaluation of applied machine learning in media.

A great illustration of how much harder Avro is to write, read and interpret is our privacy demo schema. It is straightforward (basically 4 fields apart from the meta fields) but is already hard to read and grasp in Avro. The Json equivalent is just 16 instead of 92 lines!

Meet the new dynamic duo: Json and Yaml

With the release of Simple Schemas, we’re making defining your schemas and contracts much simpler: just write and provide your definitions in Json or Yaml and we’ll take care of serialization (under the hood we still convert and register as Avro - simply because it is more efficient).

Let’s take a look at how it’s done.

Defining a Simple Schema

Imagine I want to capture a user’s clicks on a canvas. That means I need just a few data points inside my events:

  • Who (the user)
  • When (the session)
  • Where (the URI)
  • What (the cursor position on click).

Defining this in a quick schema through Yaml looks like this:

name: Clicks
namespace: com.mycompany
nodes:
- name: session_id
type: STRING
- name: user_name
type: STRING
- name: url
type: STRING
- name: mouse_positions
repeated: true
type: NODE
nodes:
- name: "x"
type: INTEGER
- name: "y" # careful! https://github.com/go-yaml/yaml/issues/283
type: INTEGER

You can see how much easier it is to actually read and write these definitions. Gotta love that even if you and Yaml previously didn’t get along…! 😉

In theaters now

We released the Simple Schemas into our CLI already.

Simply add it with strm create schema and the --definition flag (pointing to the file location of your definition):

❯ strm create schema STRM/yaml_test/3.0.0 --definition=/Users/pimnauts/Downloads/yaml_test.yaml
SCHEMA TYPE PUBLIC FINGERPRINT
STRM/yaml_test/3.0.0 AVRO false 7985493758633408257

See the lemma in our documentation for more technical background.

Implementation in the console - just upload you definitions via the editor - is coming soon.

Wrap-up 🌯

So, in bite size:

  • With Simple Schemas you can much more easily define your data definitions
  • We support Yaml and regular Json
  • Try it in the CLI and stay tuned for console and visual support!

PS We’re hiring!

Want to work on features like Simple Schemas and help data teams build awesome data products without sacrificing privacy in the process? There’s plenty of cool work left. Did we mention we are hiring!?

Decrease risk and cost, increase speed encode privacy inside data with STRM.