BigQuery & Snowflake Date Bucketing

Cloud data warehouses like BigQuery and Snowflake charge by the amount of data scanned or compute used. Efficient date bucketing (partitioning) using Unix timestamps is critical for cost control and performance.

Google BigQuery Strategies

1. Integer Range Partitioning

If you store raw Unix timestamps (e.g., event_ts), you can partition the table directly on that integer column.

BigQuery SQL

CREATE TABLE my_dataset.events (
    event_id STRING,
    event_ts INT64
)
PARTITION BY RANGE_BUCKET(event_ts, GENERATE_ARRAY(1704067200, 1735689600, 86400));
-- Partitions by day (86400 seconds)
-- Start: 2024-01-01, End: 2025-01-01

2. Ingestion-Time Partitioning vs. Column Partitioning

If your data arrives in real-time, ingestion-time partitioning (_PARTITIONDATE) is easiest. However, for event-time accuracy, create a derived DATE or TIMESTAMP column and partition on that.

BigQuery SQL

CREATE TABLE my_dataset.events (
    event_id STRING,
    event_ts TIMESTAMP
)
PARTITION BY TIMESTAMP_TRUNC(event_ts, DAY);

Snowflake Strategies

Micro-Partitions & Clustering

Snowflake manages "Micro-partitions" automatically. You rarely explicitly "create partitions". Instead, you define a Clustering Key.

Snowflake SQL

CREATE TABLE events (
    event_id VARCHAR,
    event_ts NUMBER -- Unix Timestamp
)
CLUSTER BY (event_ts); 
-- Or simpler expressions like TO_DATE(TO_TIMESTAMP(event_ts))

Tip: If you query by day, cluster by day. If you query by exact timestamp ranges, cluster by the raw timestamp. For massive tables, consider a derived date column to reduce the cardinality of the clustering key.

Converting on the Fly

Sometimes you just need to query existing raw data. Here are the conversion functions.

SQL Reference

-- BigQuery
TIMESTAMP_SECONDS(1704067200)       -- From Seconds
TIMESTAMP_MILLIS(1704067200000)     -- From Milliseconds

-- Snowflake
TO_TIMESTAMP(1704067200)            -- From Seconds
TO_TIMESTAMP(1704067200000, 3)      -- From Milliseconds (scale 3)

Frequently Asked Questions

BigQuery: Partition by Integer or Date?

BigQuery now supports partitioning by Integer Range, which is perfect for Unix timestamps. However, using a generated DATE column is often more user-friendly for analysts.

How does clustering help?

While partitioning splits data into physical files/segments, clustering sorts the data within those partitions. Clustering by timestamp helps when querying small time ranges within a large partition.

Can I partition by hour?

Yes. In BigQuery, you can partition by TIMESTAMP_TRUNC(timestamp_col, HOUR). In Snowflake, you partition by an expression or column, though micro-partitions handle much of this automatically.