Analytics Functions in Google BigQuery
Get quick insights using SQL analytics functions
A window(analytic) function computes values over a group of rows and returns a single result for each row. This is different from an aggregate function, which returns a single result for a group of rows.
A window function includes an OVER
clause, which defines a window of rows around the row being evaluated. For each row, the window function result is computed using the selected window of rows as input, possibly doing aggregation.
With window functions, you can compute the following:
moving averages,
rank items,
calculate cumulative sums, e.t.c.,
Step 1: Create Sample BigQuery Table
Let's create a simple sale_orders
table inside the sales
dataset in Google BigQuery:
-------------- Create a sales_order table --------------
with
sales_order as (
select DATE('2023-11-07') as delivery_date, 'Customer A' as customer, 'Kenya' as shipping_country, 10 as amount
union all select DATE('2023-11-08') as delivery_date, 'Customer A' as customer, 'Kenya' as shipping_country, 20 as amount
union all select DATE('2023-11-09') as delivery_date, 'Customer A' as customer, 'Kenya' as shipping_country, 30 as amount
union all select DATE('2023-11-10') as delivery_date, 'Customer A' as customer, 'Kenya' as shipping_country, 40 as amount
union all select DATE('2023-11-08') as delivery_date, 'Customer B' as customer, 'Kenya' as shipping_country, 100 as amount
union all select DATE('2023-11-09') as delivery_date, 'Customer B' as customer, 'Kenya' as shipping_country, 200 as amount
union all select DATE('2023-11-10') as delivery_date, 'Customer B' as customer, 'Kenya' as shipping_country, 300 as amount
)
select * from sales_order
Output:
Analytics Function 1: FIRST_VALUE
FIRST_VALUE
returns a value for the first row in the current window frame.Use case:
Suppose we are interested in getting a unique list of customers with their corresponding first
delivery_date
from thesales_order
?We could simply use the
FIRST_VALUE
analytics function as follows:------- get customer list with last delivery date select distinct customer, FIRST_VALUE(delivery_date) OVER (PARTITION BY customer ORDER BY delivery_date DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_delivery_date from sales_order
Output:
Analytics Function 2: LAST_VALUE
LAST_VALUE
returns the value of the last row in the current window frame.Use case:
Suppose we are interested in getting a unique list of customers together with their last
delivery_date
from thesales_order
?We could simply use the
LAST VALUE
function as follows:------- get customer list with last delivery date select distinct customer, LAST_VALUE(delivery_date) OVER (PARTITION BY customer ORDER BY delivery_date DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_delivery_date from sales_order
Output: