OVER
Description
Window functions, also known as analytic functions, are a powerful feature in SQL that allows to perform calculations across a set of table rows that are somehow related to the current row. This is different from regular aggregate functions which return a single aggregate value for a set of rows. Window functions retain the individual rows while calculating aggregates, ranks, running totals, or other calculations. The PARTITION BY
clause is used within window functions to divide the result set into partitions to which the window function is applied.
Window functions are not supported by all Database Management Systems (DBMS). Oracle, Microsoft SQL Server, MySQL, etc supports it.
Syntax
window_function(): This is a placeholder for any window function, such as
SUM()
,AVG()
,ROW_NUMBER()
,RANK()
, etc.OVER: Indicates the use of a window function.
[PARTITION BY partition_expression]: (Optional) Divides the result set into partitions to which the window function is applied. Each partition is processed separately.
[ORDER BY order_expression]: (Optional) Specifies the order of rows within each partition or within the entire result set if
PARTITION BY
is not used. This is necessary for functions that depend on the order, like cumulative sums or rankings.[RANGE BETWEEN start_expr AND end_expr]: (Optional) Defines a window frame, which is a set of rows relative to the current row.
RANGE
considers rows within a certain range of values, whileROWS
considers a specific number of rows. This can be especially useful for cumulative sums, moving averages, or other types of running totals.
Keywords useful when using RANGE
INTERVAL: Specifies a time interval. Used with date or timestamp columns to define the range in terms of days, months, years, etc.
RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
PRECEDING: Indicates that the boundary of the frame is before the current row.
RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
FOLLOWING: Indicates that the boundary of the frame is after the current row.
RANGE BETWEEN CURRENT ROW AND INTERVAL '1' DAY FOLLOWING
CURRENT ROW: Refers to the current row being processed.
RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
UNBOUNDED PRECEDING: Specifies the start of the frame from the first row of the partition.
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
UNBOUNDED FOLLOWING: Specifies the end of the frame to the last row of the partition.
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
Common Window Functions
Ranking Functions:
ROW_NUMBER()
: Assigns a unique, sequential number to each row within a partition or ordered result set. It's useful for assigning row positions or creating custom ordering.RANK()
: Assigns a rank to each row based on a specified ordering criteria (e.g., descending sales amount). Rows with the same value share the same rank, and subsequent ranks might skip numbers.DENSE_RANK()
: Similar toRANK()
but assigns consecutive ranks without gaps, even if multiple rows share the same value in the ordering criteria.
Aggregate Window Functions:
SUM(expression) OVER (...)
: Calculates the running or cumulative sum of an expression over a window. It's useful for tracking totals within partitions or ordered sets.AVG(expression) OVER (...)
: Calculates the running or cumulative average of an expression over a window. It's helpful for analyzing trends within groups of data.MIN(expression) OVER (...)
: Identifies the minimum value of an expression within a window. It can be used to find the lowest value within a specific range of rows.MAX(expression) OVER (...)
: Identifies the maximum value of an expression within a window. It can be used to find the highest value within a specific range of rows.
Other Window Functions:
FIRST_VALUE(expression) OVER (...)
: Retrieves the first value of an expression encountered within a window. It's useful for grabbing the initial value within a partition or ordered set.LAST_VALUE(expression) OVER (...)
: Retrieves the last value of an expression encountered within a window. It's helpful for grabbing the final value within a partition or ordered set.LEAD(expression, offset) OVER (...)
: Looks ahead a specified number of rows (offset
) and returns the value of the expression at that position. It's useful for comparing values with future positions within the window.LAG(expression, offset) OVER (...)
: Looks behind a specified number of rows (offset
) and returns the value of the expression at that position. It's useful for comparing values with past positions within the window.RATIO_TO_REPORT(expression)
: It is an analytic function used to calculate the ratio of a value to the sum of a set of values. It is commonly used to determine the proportion of a value within a group relative to the total for that group.
All these window functions require the OVER (...)
clause to define the window for the calculation.
Examples
Calculating ROW_NUMBER() with PARTITION BY
We want to assign a unique row number to each sale within each product.
Table sales
:
sales
:Output:
Calculating SUM() with PARTITION BY
We want to calculate the total sales amount for each customer.
Table sales
:
sales
:Output:
Example: Using RANGE BETWEEN
with SUM
RANGE BETWEEN
with SUM
Sample Input
We want to calculate the cumulative sales amount within the last 90 days for each row.
Example with ROWS BETWEEN
ROWS BETWEEN
For considering the last 3 rows instead of a time interval, we would use ROWS BETWEEN
Example using PARTITION BY
with RANGE
PARTITION BY
with RANGE
We want to calculate cumulative sales within each year and the cumulative sales should be within the last 90 days for each year.
Example using RATIO_TO_REPORT
Suppose we have a table sales
with the following structure:
We want to calculate the ratio of each sales_amount
to the total sales_amount
for each region_id
Output
Last updated