TIMESTAMP_TZ
everywhereAs all of our CSV files with timestamps contains time zone information, the TIMESTAMP_TZ
data-type is strongly preferable to its alternatives. Although we currently export all times in UTC time — and the time zone information in the data format will indicate this — using this datatype in Snowflake will simplify querying.
In short, you should avoid both TIMESTAMP_NTZ
(because it doesn’t actually represent a single point in time) and TIMESTAMP_LTZ
(because it is complicated to interpret correctly).
Beat stores most of our identifiers as numbers internally, and it may be tempting to chose to represent them as numbers in Snowflake as well. However, we recommend that you treat Beat’s identifiers as opaque string values, and not do any numeric comparisons using them. They are meant to be used as unique identifiers, and the number itself should not be used for calculations outside of this.
While Beat has several enum-like data types — and that we represent as enums internally — it quickly becomes complex and error-prone if these were to be imported as actual enums in Snowflake.
In the cases where we provide enums (e.g. the release_type
column in the simplified_releases
snapshot), we recommend that these are imported as string data (VARCHAR
).
This prevents failures in the case where Beat adds a new enum value at a later stage.