Certain fields in our snapshots contain personally identifiable information (PII), which we want to manage and transfer very carefully. The European General Data Protection Regulation (GDPR) act provides clear guidelines around the transfer of this kind of data, and in these exports we will share a considerable amount of PII.

This means that we take certain precautions. Primarily, we ensure that information that is considered personally identifiable is encrypted during transfer. We do this by using column-level encryption in our exports.

Choice of algorithm

Our choice of encryption algorithms is somewhat limited by the capabilities of the systems that the data is export from and the systems the data is intended to be imported in.

For this reason, we have chosen to implement AES-256-CBC with PKCS5 padding. This method is considered cryptographically secure. We have chosen to create a random initialization vector (IV) per row supplied, to ensure that the IV is not shared, never re-used (although that is not a requirement of the chosen algorithm), and that the encryption then becomes non-deterministic (meaning that two encrypted values of the same value does not produce the same ciphertext).

Conventions

Decrypting data

Algorithm

To decrypt the provided data, you need an encryption library that supports AES-256-CBC with PKCS5 padding.

First, you need the decryption key. You will have been provided this in base-64 encoded, so you need to base-64 decode it into a binary variable.

You decrypt the data row-by-row:

  1. Base-64 decode the value you want into a binary value.
  2. Base-64 decode the iv (initialization vector) for that row into a binary value.
  3. Use the library’s decryption function, and pass in:
    1. The decryption key
    2. The encoded value
    3. The initialization vector
    4. Ensure you set the decryption mode to AES-CBC with PKCS5.
  4. Interpret the binary output from decryption as a UTF-8 string.

Using Snowflake

Working with encrypted data