MODEL

The MODEL clause is a feature specific to Oracle SQL and is not supported by all DBMS. It is unique to Oracle and provides advanced, spreadsheet-like calculation capabilities within SQL queries, which is not a standard feature across other SQL databases.

Description

The MODEL clause in Oracle SQL is a feature that allows to perform inter-row and inter-column calculations in a spreadsheet-like fashion within a SQL query. It provides a way to define a multidimensional array and apply rules to it, enabling complex data manipulations and calculations.

Key Features of the MODEL Clause

  1. Multidimensional Arrays: Treats query results as multidimensional arrays.

  2. Rules-Based Calculations: Allows the definition of rules for calculations that can span across rows and columns.

  3. Cell References: Supports references to other cells in the array, similar to how spreadsheet cells can reference each other.

Basic Syntax

Here is a simplified syntax of the MODEL clause:

SELECT ...
FROM ...
MODEL
    [ PARTITION BY (partition_clause) ]
    DIMENSION BY (dimension_clause)
    MEASURES (measure_clause)
    [ RULES (rule_clause) ];
  • PARTITION BY: Divides the data into partitions, similar to a GROUP BY clause.

  • DIMENSION BY: Specifies the dimensions of the array (e.g., rows or columns). In the context of the MODEL clause, dimensions act like rows in a spreadsheet

  • MEASURES: Defines the measures or the actual data values that will be manipulated. Measures act like the data cells in a spreadsheet that can be manipulated.

  • RULES: Contains the rules that define how calculations are performed.

Pseudo Keywords

ANY: This keyword refers to all members within a particular dimension. It's often used in conjunction with aggregate functions like SUM or AVG to calculate values across all elements in that dimension.

CURRENT: This keyword refers to the current row being processed within the model. It can be useful for calculations that depend on the values in the current row.

Pseudo Functions

CV() (Current Value): This function retrieves the value of a measure from the previous row within the same dimension partition. It's particularly helpful for calculations that involve comparisons or running totals.

DENSE_RANK() and RANK(): These functions assign ranking positions to members within a dimension, considering potential ties and gaps in the data. They can be used for calculations based on ranking or percentiles.

LAG() and LEAD(): These functions access values from previous or subsequent rows within the same dimension partition, allowing for calculations that depend on past or future values.

Example

Sample Data

select employee_id, first_name, last_name, salary, manager_id, department_id 
                      from employees;
                      
 EMPLOYEE_ID FIRST_NAME           LAST_NAME                     SALARY MANAGER_ID DEPARTMENT_ID
 ----------- -------------------- ------------------------- ---------- ---------- -------------
 198 Donald               OConnell                        2600        124            50
 199 Douglas              Grant                           2600        124            50
 200 Jennifer             Whalen                          4400        101            10
 201 Michael              Hartstein                      13000        100            20
 202 Pat                  Fay                             6000        201            20
 203 Susan                Mavris                          6500        101            40
 204 Hermann              Baer                           10000        101            70
 205 Shelley              Higgins                        12008        101           110
 206 William              Gietz                           8300        205           110
 100 Steven               King                           24000                       90
 101 Neena                Kochhar   

Simple example with no calculations or modifications (rules) applied to the data.

SELECT
    employee_id,
    salary
FROM
    employees
MODEL
    DIMENSION BY ( employee_id )  -- employee_id is the dimension
    MEASURES ( salary )           -- salary is the measure
    ( )                           -- empty rules section
ORDER BY
    employee_id
FETCH FIRST 10 ROWS ONLY

Addressing a specific cell

SELECT
    employee_id,
    salary
FROM
    employees
MODEL
    DIMENSION BY ( employee_id )
    MEASURES ( salary )
    RULES ( salary[employee_id = 100]= salary[employee_id = 100]+ 1000 )
ORDER BY
    employee_id
FETCH FIRST 10 ROWS ONLY;

Addressing a range of cells (using a loop)

SELECT
    employee_id,
    salary
FROM
    employees
MODEL
    DIMENSION BY ( employee_id )
    MEASURES ( salary )
    RULES ( salary[
        FOR employee_id FROM 100 TO 105 INCREMENT 1
    ]= 1000 )
ORDER BY
    employee_id
FETCH FIRST 10 ROWS ONLY;

Following query throws an error

  select employee_id, salary 
  from employees
  model
    dimension by (employee_id)
    measures (salary)
    RULES (
      salary[for employee_id from 100 to 105 INCREMENT 1] = 
        salary[for employee_id from 100 to 105 INCREMENT 1] + 1000 
      )
  order by employee_id
  fetch first 10 rows only;
  
  select employee_id, salary
  *
  ERROR at line 1:
  ORA-32622: illegal multi-cell reference

Use of pseudo function cv()

SELECT
    employee_id,
    salary
FROM
    employees
MODEL
    DIMENSION BY ( employee_id )
    MEASURES ( salary )
    RULES ( salary[
        FOR employee_id FROM 100 TO 105 INCREMENT 1
    ]= salary[cv()]+ 1000 )
ORDER BY
    employee_id
FETCH FIRST 10 ROWS ONLY;

Addressing a range of cells (using a list of values)

SELECT
    employee_id,
    salary
FROM
    employees
MODEL
    DIMENSION BY ( employee_id )
    MEASURES ( salary )
    RULES ( salary[
        FOR employee_id IN ( 100,
        102,
        104,
        105 )
    ]= salary[cv()]+ 1000 )
ORDER BY
    employee_id
FETCH FIRST 10 ROWS ONLY;

Addressing a range of cells (using a subquery)

SELECT
    employee_id,
    salary
FROM
    employees
MODEL
    DIMENSION BY ( employee_id )
    MEASURES ( salary )
    RULES ( salary[
        FOR employee_id IN (
            SELECT
                level + 99
            FROM
                dual
            WHERE
                level NOT IN ( 2, 4 )
            CONNECT BY level <= 6
        )
    ]= salary[cv()]+ 1000 )
ORDER BY
    employee_id
FETCH FIRST 10 ROWS ONLY;

The pseudo keyword ANY

-- Update all salaries by adding 1000 to the existing salary
SELECT
    employee_id,
    salary
FROM
    employees
MODEL
    DIMENSION BY ( employee_id )
    MEASURES ( salary )
    RULES ( salary[ANY]= salary[cv()]+ 1000 )
ORDER BY
    employee_id
FETCH FIRST 10 ROWS ONLY;

Addressing cells relative to the current cell

SELECT
    employee_id,
    salary
FROM
    employees
MODEL
    DIMENSION BY ( employee_id )
    MEASURES ( salary )
    RULES ( salary[ANY]
    ORDER BY
        employee_id ASC
    = salary[cv()]+ salary[cv() + 1]+ salary[cv() + 3])
ORDER BY
    employee_id
FETCH FIRST 10 ROWS ONLY;

Generating data

SELECT
    criteria
FROM
    dual
MODEL
    DIMENSION BY ( 100 criteria )
    MEASURES ( 0 x )
    RULES ( x[
        FOR criteria FROM 100 TO 105 INCREMENT 1
    ]= 0 );

Row returning behaviour

SELECT
    employee_id
FROM
    employees
MODEL RETURN UPDATED ROWS
    DIMENSION BY ( employee_id )
    MEASURES ( 0 x )
    RULES ( x[
        FOR employee_id FROM 1 TO 10 INCREMENT 1
    ]= 0 );

Last updated