tutorial

On-the-fly decryption with Snowflake

On the fly decryption with Snowflake

In this post, we’ll show how you can integrate STRM’s privacy streams and privacy transformations with (native!) role-based access controls and foreign keys inside data warehouse solutions.

In short, this brings STRM privacy streams (which are localized, purpose-bound and use case specific data interfaces) to data warehousing (centralized + use case agnostic). There’s more background here.

On the fly decryption

So you have your records processed and transformed through STRM, and the encryption keys (the key stream) are available in your databases. What’s next? In the following steps we’re going to show how to bring back the original plaintext data in Snowflake.

I’m using the UCI online retail dataset that we processed (described in this blogpost)
The processed data exist in this publicly accessible Gcloud Bucket. You must have any google account to view this bucket, although you could use any http client to download individual objects in the bucket without authentication.

Prepare

Execute these SQL commands (either in snowsql or the web interface).

Load data

First create the tables. These map the shape of the uci dataset and the keys exports (from batch jobs)

create or replace TABLE "encrypted" (
"InvoiceNo" VARCHAR(500),
"StockCode" VARCHAR(500),
"Description" VARCHAR(500),
"Quantity" VARCHAR(500),
"InvoiceDate" VARCHAR(500),
"UnitPrice" VARCHAR(500),
"CustomerID" VARCHAR(500),
"Country" VARCHAR(500),
"strmMeta.eventContractRef" VARCHAR(500),
"strmMeta.nonce" VARCHAR(500),
"strmMeta.timestamp" VARCHAR(500),
"strmMeta.keyLink" VARCHAR(500),
"strmMeta.billingId" VARCHAR(500),
"strmMeta.consentLevels" VARCHAR(500)
);

create or replace TABLE "keys" (
"keyLink" VARCHAR(50),
"encryptionKey" VARCHAR(500)
);

Load the data with an ad-hoc Snowflake stage. The query assumes you’ve downloaded the files from the Prepare section and are using snowsql from the directory that those files are stored in.

put file://encrypted-44MB.csv @~;
copy into "encrypted" from '@~/encrypted-44MB.csv.gz'
file_format=(type=CSV field_optionally_enclosed_by = '"' skip_header=1);

put file://keys.csv @~;
copy into "keys" from '@~/keys.csv.gz'
file_format=(type=CSV field_optionally_enclosed_by = '"' skip_header=1);

Have a look at the data

select * from "encrypted" limit 3;
InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountrystrmMeta.eventContractRefstrmMeta.noncestrmMeta.timestampstrmMeta.keyLinkstrmMeta.billingIdstrmMeta.consentLevels
53636585123AWHITE HANGING HEART T-LIGHT HOLDER62010-12-01 08:26:002.55ASqc1Q0QalDEN+LHeyZSfGHE+s9Lqu4o+jM=United Kingdomstrmprivacy/online-retail/1.0.0012911883600000fd20015-40e4-484d-ab1f-182acff382acNULL[2]
53636571053WHITE METAL LANTERN62010-12-01 08:26:003.39ASqc1Q0QalDEN+LHeyZSfGHE+s9Lqu4o+jM=United Kingdomstrmprivacy/online-retail/1.0.0012911883600000fd20015-40e4-484d-ab1f-182acff382acNULL[2]
53636584406BCREAM CUPID HEARTS COAT HANGER82010-12-01 08:26:002.75ASqc1Q0QalDEN+LHeyZSfGHE+s9Lqu4o+jM=United Kingdomstrmprivacy/online-retail/1.0.0012911883600000fd20015-40e4-484d-ab1f-182acff382acNULL[2]
select * from "keys" limit 2;
keyLinkencryptionKey
0fd20015-40e4-484d-ab1f-182acff382ac{“primaryKeyId”:714921229,“key”:[{“keyData”:{"typeUr…
1de9b1fc-9ad5-43a2-b501-af6933624f67{“primaryKeyId”:1369926497,“key”:[{“keyData”:{"typeUr…

Create the decrypt UDF

This User Defined Function uses the compiled Kotlin code that contains the thin wrapper that we wrote around Google Tink (sources)[2].

CREATE OR REPLACE FUNCTION f_strm_decrypt(key VARCHAR, ciphertext VARCHAR) RETURNS VARCHAR
LANGUAGE JAVA
RETURNS NULL ON NULL INPUT
IMMUTABLE
IMPORTS=('@~/decrypter-1.0.0-all.jar')
HANDLER='Udf.decrypt'
AS
$$
import io.strmprivacy.aws.lambda.decrypter.Decrypt;
class Udf {
public static String decrypt(String key, String cipherText){
return Decrypt.INSTANCE.decrypt(key, cipherText);
}
}
$$;

And test it

select f_strm_decrypt('{"primaryKeyId":714921229,"key":[{"keyData":{"typeUrl":"type.googleapis.com/google.crypto.tink.AesSivKey","value":"EkBKGzBuy9C3UUmWaOzpe7NBEg6QK21FRhZ9MjuD5hpa0+hPJy0kn1HngA9QUT5aGbTNQQyow0V6qJCFoFRQNNTH","keyMaterialType":"SYMMETRIC"},"status":"ENABLED","keyId":714921229,"outputPrefixType":"TINK"}]}', 'ASqc1Q0QalDEN+LHeyZSfGHE+s9Lqu4o+jM=');

This should produce 17850 as output.

On-the-fly decryption

SELECT
f_strm_decrypt(k."encryptionKey", e."CustomerID") AS "decryptedCustomerID",
e."InvoiceNo", e."Description"
FROM
"encrypted" e
INNER JOIN "keys" k
ON e."strmMeta.keyLink" = k."keyLink"
LIMIT 10;
decryptedCustomerIDInvoiceNoDescription
17850536365WHITE HANGING HEART T-LIGHT HOLDER
17850536365WHITE METAL LANTERN
17850536365CREAM CUPID HEARTS COAT HANGER
17850536365KNITTED UNION FLAG HOT WATER BOTTLE
17850536365RED WOOLLY HOTTIE WHITE HEART.
17850536365SET 7 BABUSHKA NESTING BOXES
17850536365GLASS STAR FROSTED T-LIGHT HOLDER
17850536366HAND WARMER UNION JACK
17850536366HAND WARMER RED POLKA DOT
13047536367ASSORTED COLOUR BIRD ORNAMENT

Filtering on consent

First create a udf that figures out the maximum integer value from the strmMeta.consentLevels column [3].

CREATE OR REPLACE FUNCTION f_json_max(json VARCHAR) RETURNS INT
LANGUAGE JAVA
RETURNS NULL ON NULL INPUT
IMMUTABLE
IMPORTS=('@~/decrypter-1.0.0-all.jar') -- only for the Gson library
HANDLER='Udf.maxFromJson'
AS
$$
import java.lang.reflect.Type;
import com.google.gson.Gson;
import com.google.gson.reflect.TypeToken;
import java.util.Collections;
import java.util.List;

class Udf {
static Gson gson = new Gson();
static Type token = new TypeToken<List<Integer>>(){}.getType();
public static Integer maxFromJson(String json){
try {
List<Integer> ints = gson.fromJson(json, token);
return Collections.max(ints);
}
catch (Exception e) {
return null;
}
}
}
$$;
SELECT
f_strm_decrypt(k."encryptionKey", e."CustomerID") AS "decryptedCustomerID",
e."InvoiceNo", e."Description",
f_json_max(e."strmMeta.consentLevels") as "maxConsent"
FROM
"encrypted" e
INNER JOIN "keys" k
ON e."strmMeta.keyLink" = k."keyLink"
WHERE "maxConsent" >= 2
LIMIT 10;
decryptedCustomerIDInvoiceNoDescriptionmaxConsent
17850536365WHITE HANGING HEART T-LIGHT HOLDER2
17850536365WHITE METAL LANTERN2
17850536365CREAM CUPID HEARTS COAT HANGER2
17850536365KNITTED UNION FLAG HOT WATER BOTTLE2
17850536365RED WOOLLY HOTTIE WHITE HEART.2
17850536365SET 7 BABUSHKA NESTING BOXES2
17850536365GLASS STAR FROSTED T-LIGHT HOLDER2
17850536366HAND WARMER UNION JACK2
17850536366HAND WARMER RED POLKA DOT2
13047536367ASSORTED COLOUR BIRD ORNAMENT2

Performance

SELECT
f_strm_decrypt(k."encryptionKey", e."CustomerID") AS "decryptedCustomerID",
e."InvoiceNo", e."Description",
f_json_max(e."strmMeta.consentLevels") as "maxConsent"
FROM
"encrypted" e
INNER JOIN "keys" k
ON e."strmMeta.keyLink" = k."keyLink"
WHERE "maxConsent" >= 2;

Executing this on a X-Small Virtual Warehouse executes 195000 rows in 9s so ca 22000 rows/s.

PS We’re hiring!

Want to work on features like query-time decryption for privacy streams to help data teams build awesome data products without sacrificing privacy in the process? There’s plenty of cool work left. Did we mention we are hiring!?


  1. This file is a below the 50MB limit that exists for Snowflake ad-hoc loading stages. ↩︎

  2. The trivial bit of inline Java has to do with the fact the Kotlin code doesn’t provide the zero-arguments constructor that the Snowflake interface requires. ↩︎

  3. this UDF can probably be written in plain SQL, but I like this Java solution that leverages the decrypter file that we need anyway. ↩︎

Decrease risk and cost, increase speed encode privacy inside data with STRM.