BigQuery & Snowflake Date Bucketing
Cloud data warehouses like BigQuery and Snowflake charge by the amount of data scanned or compute used. Efficient date bucketing (partitioning) using Unix timestamps is critical for cost control and performance.
Google BigQuery Strategies
1. Integer Range Partitioning
If you store raw Unix timestamps (e.g., event_ts), you can partition the table directly on that integer column.
CREATE TABLE my_dataset.events (
event_id STRING,
event_ts INT64
)
PARTITION BY RANGE_BUCKET(event_ts, GENERATE_ARRAY(1704067200, 1735689600, 86400));
-- Partitions by day (86400 seconds)
-- Start: 2024-01-01, End: 2025-01-012. Ingestion-Time Partitioning vs. Column Partitioning
If your data arrives in real-time, ingestion-time partitioning (_PARTITIONDATE) is easiest. However, for event-time accuracy, create a derived DATE or TIMESTAMP column and partition on that.
CREATE TABLE my_dataset.events (
event_id STRING,
event_ts TIMESTAMP
)
PARTITION BY TIMESTAMP_TRUNC(event_ts, DAY);Snowflake Strategies
Micro-Partitions & Clustering
Snowflake manages "Micro-partitions" automatically. You rarely explicitly "create partitions". Instead, you define a Clustering Key.
CREATE TABLE events (
event_id VARCHAR,
event_ts NUMBER -- Unix Timestamp
)
CLUSTER BY (event_ts);
-- Or simpler expressions like TO_DATE(TO_TIMESTAMP(event_ts))Tip: If you query by day, cluster by day. If you query by exact timestamp ranges, cluster by the raw timestamp. For massive tables, consider a derived date column to reduce the cardinality of the clustering key.
Converting on the Fly
Sometimes you just need to query existing raw data. Here are the conversion functions.
-- BigQuery TIMESTAMP_SECONDS(1704067200) -- From Seconds TIMESTAMP_MILLIS(1704067200000) -- From Milliseconds -- Snowflake TO_TIMESTAMP(1704067200) -- From Seconds TO_TIMESTAMP(1704067200000, 3) -- From Milliseconds (scale 3)
Frequently Asked Questions
BigQuery: Partition by Integer or Date?
BigQuery now supports partitioning by Integer Range, which is perfect for Unix timestamps. However, using a generated DATE column is often more user-friendly for analysts.
How does clustering help?
While partitioning splits data into physical files/segments, clustering sorts the data within those partitions. Clustering by timestamp helps when querying small time ranges within a large partition.
Can I partition by hour?
Yes. In BigQuery, you can partition by TIMESTAMP_TRUNC(timestamp_col, HOUR). In Snowflake, you partition by an expression or column, though micro-partitions handle much of this automatically.