Periodic data in Beat’s reports is data that is delivered based on when it was generated. We use this for event data, where it makes sense to export the data for a given period on a daily basis.

To look at the entirety of data, you therefore need to put together the daily exports over time.

Data reduction

Events are generated on a 2-minute basis, which means that for services with a lot of consumption, the number of events per day can easily reach into millions.

There are a set of important dimensions when thinking about listening data:

  1. Which user did the listening?
  2. Which release did the user listen to?
  3. How long did the user spend listening?
  4. Which parts of the book did the user listen to?

We have therefore come up with the following data reduction steps to reduce the volume of data that we have to process. Take e.g. listening events (player_progress):

  1. We temporarily import the raw events into a table listening_events.
  2. We combine consecutive (or continuous) events into listening sessions, and insert these in a separate table, called listening_sessions. The goal here is to reduce the amount of data stored without losing precision.
  3. From the listening sessions, we construct a set of materialized views that allow you to look at the data at a daily granularity, resulting in a list of user listening per day and book. From this point onwards it is easy to work with the data to determine user behavior.

Listening sessions

From the important dimensions mentioned above, we create listening sessions based on a windowing function over the following dimensions: