tutorial

Deploying a User Defined Function for AWS Redshift

how to deploy a User Defined Function written in Kotlin so that it can be used in AWS Redshift SQL queries.

In this post, we’ll show how you can integrate STRM’s privacy streams and privacy transformations with (native) role-based access controls and foreign keys inside AWS Redshift.

In short, this brings STRM privacy streams (which are localized, purpose-bound and use case specific data interfaces) to data warehousing (centralized + use case agnostic). There’s more background here.

On the fly decryption

STRM supports on-the-fly decryption on many platforms. For AWS Redshift, this requires a little more effort than, for instance, BigQuery.

This post explains how to deploy our f_strm_decrypt User Defined Function onto AWS Lambda. The function handles decryption of Google Tink defined decryption [1]

The AWS lambda UDF code for Redshift

We published the source code for the decryption of Google Tink defined encryption keys here on GitHub and put the resulting JVM archive on Maven Central because an AWS Lambda function needs this on a public S3 bucket. The Kotlin code is quite trivial and just consists of a few calls to the Tink library.

The most challenging part of this exercise was defining the correct AWS Identity and Access Management settings (as with any AWS IAM case…).

The user deploying the lambda

I’ve created an IAM user named strm-udf-test, and attached the following policies.

aws iam list-attached-user-policies --user-name strm-udf-test \
| jq -r '.AttachedPolicies[].PolicyName'
AmazonRedshiftFullAccess
IAMFullAccess
IAMUserChangePassword
AWSCloudFormationFullAccess
AWSLambda_FullAccess
AmazonRedshiftDataFullAccess

The role

Make sure this user strm-udf-test has a trust relation with a role. I’ve created a role strm-udf-test-2 [2] with the following permissions.

aws iam get-role --role-name strm-udf-test-2
{
"Role": {
"RoleName": "strm-udf-test-2",
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "redshift.amazonaws.com" },
"Action": "sts:AssumeRole"
},
{
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::079646392542:role/strm-udf-test" },
"Action": "sts:AssumeRole"
}
]
}
}
}

the cluster

Make sure that your Redshift cluster has this role associated in its “Manage IAM Roles” settings (Cluster → Action → Manage IAM Roles).

aws redshift describe-clusters | jq .Clusters[0].IamRoles
[
{ "IamRoleArn": "arn:aws:iam::079646392542:role/aws-service-role/redshift.amazonaws.com/AWSServiceRoleForRedshift" },
{ "IamRoleArn": "arn:aws:iam::079646392542:role/strm-udf-test-2" }
]

Deploy the lambda function

I’ve created a script that execute these steps:

  1. create a lambda function named f-strm-decrypt
  2. define that function as a user defined function in Redshift via an SQL statement.

Download this file, and modify it so that the variables in the top of the file correspond to your values, and then execute the script.

./deployUdf.sh # or bash deployUdf.sh

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - f-strm-decrypt-varchar-varchar
d93ea2c3-3f5e-4fe4-b363-76bdb147d9d3:FINISHED

Here’s the script: deployUdf.sh

#!/bin/bash
set -e

# You don't need to change this one if you use our public jvm archive
s3Bucket=strmprivacy-udfs
packagename=decrypter-1.0.0-all.jar

# You need to CHANGE all these to point to YOUR cluster, user, role and database.
cluster=udf-test-cluster
db=dev
user=strm-udf-test
schema=public
redshiftRole=arn:aws:iam::079646392542:role/strm-udf-test-2

execQuery()
{
output=`aws redshift-data execute-statement --cluster-identifier $cluster --database $db --db-user $user --sql "set search_path to $schema; $1"`
id=`echo $output | jq -r .Id`

status="SUBMITTED"
while [ "$status" != "FINISHED" ] && [ "$status" != "FAILED" ]
do
sleep 1
status=`aws redshift-data describe-statement --id $id | jq -r .Status`
done
if [ "$status" == "FAILED" ]; then
aws redshift-data describe-statement --id $id
exit 1
else
echo $id:$status
fi
}


trap '{ rm -f -- "$lambdafile"; }' EXIT
lambdafile=$(mktemp udf-lambda-yaml-XXXXXX)

cat > $lambdafile <<'EOF'
AWSTemplateFormatVersion: '2010-09-09'
Parameters:
S3Bucket:
Description: Bucket containing the java packages
Type: String
S3Key:
Description: Key containing the java packages
Type: String
Resources:
LambdaRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- sts:AssumeRole
Path: /
Policies:
- PolicyName: CloudwatchLogs
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
Resource:
- !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:*"
- Effect: Allow
Action:
- logs:CreateLogStream
- logs:PutLogEvents
Resource:
- !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*"
LambdaUDFFunction:
Type: "AWS::Lambda::Function"
Properties:
FunctionName: f-strm-decrypt
Role: !GetAtt 'LambdaRole.Arn'
Timeout: 100
Code:
S3Bucket: !Ref S3Bucket
S3Key: !Ref S3Key
Handler: io.strmprivacy.aws.lambda.decrypter.Handler::handleRequest
Runtime: java11
EOF


functionname="f_strm_decrypt(varchar,varchar)"
stackname=f-strm-decrypt-varchar-varchar
paramsBuckets="S3Bucket=$s3Bucket S3Key=$packagename"

if ! aws cloudformation deploy --template-file $lambdafile --parameter-overrides ${paramsBuckets} --stack-name ${stackname} --no-fail-on-empty-changeset --capabilities CAPABILITY_IAM; then
aws cloudformation describe-stack-events --stack-name $stackname
aws cloudformation delete-stack --stack-name ${stackname}
exit 1
fi

sql="CREATE OR REPLACE EXTERNAL FUNCTION $functionname RETURNS varchar IMMUTABLE LAMBDA 'f-strm-decrypt' IAM_ROLE '$redshiftRole';"
execQuery "$sql"

Test the lambda function

Log in to Redshift and open the Query Editor. Execute

select f_strm_decrypt('{"primaryKeyId":714921229,"key":[{"keyData":{"typeUrl":"type.googleapis.com/google.crypto.tink.AesSivKey","value":"EkBKGzBuy9C3UUmWaOzpe7NBEg6QK21FRhZ9MjuD5hpa0+hPJy0kn1HngA9QUT5aGbTNQQyow0V6qJCFoFRQNNTH","keyMaterialType":"SYMMETRIC"},"status":"ENABLED","keyId":714921229,"outputPrefixType":"TINK"}]}', 'ASqc1Q0QalDEN+LHeyZSfGHE+s9Lqu4o+jM=');

and after a little while you should see 17850 in the results panel if everything went ok.


  1. there’s nothing Google specific about this encryption library, it’s just a well defined interface to common encryption standards. ↩︎

  2. yes, I don’t know much about AWS IAM 🙂. ↩︎

Decrease risk and cost, increase speed encode privacy inside data with STRM.