Aggregate functions v16

Aggregate functions compute a single result value from a set of input values.

Built-in aggregate functions

The built-in aggregate functions are listed in the following tables.

Function	Argument type	Return type	Description
`AVG (expression)`	`INTEGER, REAL, DOUBLE PRECISION, NUMBER`	`NUMBER` for any integer type, `DOUBLE PRECISION` for a floating-point argument, otherwise the same as the argument data type	The average (arithmetic mean) of all input values
`COUNT(*)`		`BIGINT`	Number of input rows
`COUNT (expression)`	Any	`BIGINT`	Number of input rows for which the value of expression is not null
`MAX (expression)`	Any numeric, string, date/time, or bytea type	Same as argument type	Maximum value of expression across all input values
`MIN (expression)`	Any numeric, string, date/time, or bytea type	Same as argument type	Minimum value of expression across all input values
`SUM (expression)`	`INTEGER, REAL, DOUBLE PRECISION, NUMBER`	`BIGINT` for `SMALLINT` or `INTEGER` arguments, `NUMBER` for `BIGINT` arguments, `DOUBLE PRECISION` for floating-point arguments, otherwise the same as the argument data type	Sum of expression across all input values

Except for COUNT, these functions return a null value when no rows are selected. In particular, SUM of no rows returns null, not zero. You can use the COALESCE function to substitute zero for null when necessary.

Aggregate functions for statistical analysis

The following table shows the aggregate functions typically used in statistical analysis. Where the description mentions N, it means the number of input rows for which all the input expressions are non-null. In all cases, null is returned if the computation is meaningless, for example when N is zero.

Function	Argument type	Return type	Description
`CORR(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Correlation coefficient
`COVAR_POP(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Population covariance
`COVAR_SAMP(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Sample covariance
`REGR_AVGX(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Average of the independent variable (sum`(X) / N`)
`REGR_AVGY(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Average of the dependent variable (sum`(Y) / N`)
`REGR_COUNT(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Number of input rows in which both expressions are nonnull
`REGR_INTERCEPT(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	y-intercept of the least-squares-fit linear equation determined by the `(X, Y)` pairs
`REGR_R2(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Square of the correlation coefficient
`REGR_SLOPE(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Slope of the least-squares-fit linear equation determined by the `(X, Y)` pairs
`REGR_SXX(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Sum (X²) – sum (X) ² / N (“sum of squares” of the independent variable)
`REGR_SXY(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Sum (XY) – sum (X) sum (Y) / N (“sum of products” of independent times dependent variable)
`REGR_SYY(Y, X)`	`DOUBLE PRECISION`	`DOUBLE PRECISION`	Sum (Y²) – sum (Y) ² / N (“sum of squares” of the dependent variable)
`STDDEV(expression)`	`INTEGER, REAL, DOUBLE PRECISION, NUMBER`	`DOUBLE PRECISION` for floating-point arguments, otherwise `NUMBER`	Historic alias for `STDDEV_SAMP`
`STDDEV_POP(expression)`	`INTEGER, REAL, DOUBLE PRECISION, NUMBER`	`DOUBLE PRECISION` for floating-point arguments, otherwise `NUMBER`	Population standard deviation of the input values
`STDDEV_SAMP(expression)`	`INTEGER, REAL, DOUBLE PRECISION, NUMBER`	`DOUBLE PRECISION` for floating-point arguments, otherwise `NUMBER`	Sample standard deviation of the input values
`VARIANCE(expression)`	`INTEGER, REAL, DOUBLE PRECISION, NUMBER`	`DOUBLE PRECISION` for floating-point arguments, otherwise `NUMBER`	Historical alias for `VAR_SAMP`
`VAR_POP(expression)`	`INTEGER, REAL, DOUBLE PRECISION, NUMBER`	`DOUBLE PRECISION` for floating-point arguments, otherwise `NUMBER`	Population variance of the input values (square of the population standard deviation)
`VAR_SAMP(expression)`	`INTEGER, REAL, DOUBLE PRECISION, NUMBER`	`DOUBLE PRECISION` for floating-point arguments, otherwise `NUMBER`	Sample variance of the input values (square of the sample standard deviation)

LISTAGG

LISTAGG is an aggregate function that concatenates data from multiple rows into a single row in an ordered manner. You can optionally include a custom delimiter for your data.

The LISTAGG function mandates the use of an ORDER BY clause under a WITHIN GROUP clause to concatenate values of the measure column and then generate the ordered aggregated data.

Objective

You can use LISTAGG without any grouping. In this case, the LISTAGG function operates on all rows in a table and returns a single row.
You can use LISTAGG with the GROUP BY clause. In this case, the LISTAGG function operates on each group and returns an aggregated output for each group.
You can use LISTAGG with the OVER clause. In this case, the LISTAGG function partitions a query result set into groups based on the expression in the query_partition_by_clause and then aggregates data in each group.

Synopsis

LISTAGG( <measure_expr> [, <delimiter> ]) WITHIN GROUP( <order_by_clause> )
[ OVER <query_partition_by_clause> ]

Parameters

measure_expr

measure_expr (mandatory) specifies the column or expression that assigns a value to aggregate. NULL values are ignored.

delimiter

delimiter (optional) specifies a string that separates the concatenated values in the result row. The delimiter can be a NULL value, string, character literal, column name, or constant expression. If ignored, the LISTAGG function uses a NULL value by default.

order_by_clause

order_by_clause (mandatory) determines the sort order in which the concatenated values are returned.

query_partition_by_clause

query_partition_by_clause (optional) allows the LISTAGG function to be used as an analytic function and sets the range of records for each group in the OVER clause.

Return type

The LISTAGG function returns a string value.

Examples

This example concatenates the values in the emp table and lists all the employees separated by a delimiter comma. First, create a table named emp. Then insert records into the emp table.

edb=# CREATE TABLE emp
edb-#        (EMPNO NUMBER(4) NOT NULL,
edb(#         ENAME VARCHAR2(10),
edb(#         JOB VARCHAR2(9),
edb(#         MGR NUMBER(4),
edb(#         HIREDATE DATE,
edb(#         SAL NUMBER(7, 2),
edb(#         COMM NUMBER(7, 2),
edb(#         DEPTNO NUMBER(2));
CREATE TABLE

edb=# INSERT INTO emp VALUES
edb-#         (7499, 'ALLEN',  'SALESMAN',  7698,
edb(#         TO_DATE('20-FEB-1981', 'DD-MON-YYYY'), 1600,  300, 30);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-#         (7521, 'WARD',   'SALESMAN',  7698,
edb(#         TO_DATE('22-FEB-1981', 'DD-MON-YYYY'), 1250,  500, 30);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-#         (7566, 'JONES',  'MANAGER',   7839,
edb(#         TO_DATE('2-APR-1981', 'DD-MON-YYYY'),  2975, NULL, 20);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-#         (7654, 'MARTIN', 'SALESMAN',  7698,
edb(#         TO_DATE('28-SEP-1981', 'DD-MON-YYYY'), 1250, 1400, 30);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-#         (7698, 'BLAKE',  'MANAGER',   7839,
edb(#         TO_DATE('1-MAY-1981', 'DD-MON-YYYY'),  2850, NULL, 30);
INSERT 0 1

Output

edb=# SELECT LISTAGG(ENAME, ',') WITHIN GROUP (ORDER BY ENAME) FROM emp;
               listagg
-------------------------------------
 ALLEN,BLAKE,JONES,MARTIN,WARD
(1 row)

This example uses a PARTITION BY clause with LISTAGG in the emp table. It generates output based on a partition by deptno that applies to each partition and not on the entire table.

edb=# SELECT DISTINCT DEPTNO, LISTAGG(ENAME, ',') WITHIN GROUP (ORDER BY
ENAME) OVER(PARTITION BY DEPTNO) FROM emp;

Output

deptno |         listagg
--------+-------------------------
     30 | ALLEN,BLAKE,MARTIN,WARD
     20 | JONES
(2 rows)

This example includes the GROUP BY clause.

edb=# SELECT DEPTNO, LISTAGG(ENAME, ',') WITHIN GROUP (ORDER BY ENAME) FROM
emp GROUP BY DEPTNO;

Output

deptno |         listagg
--------+-------------------------
     20 | JONES
     30 | ALLEN,BLAKE,MARTIN,WARD
(2 rows)

MEDIAN

The MEDIAN function calculates the middle value of an expression from a given range of values. NULL values are ignored. The MEDIAN function returns an error if a query doesn't reference the user-defined table.

Objective

You can use MEDIAN without any grouping. In this case, the MEDIAN function operates on all rows in a table and returns a single row.
You can use MEDIAN with the OVER clause. In this case, the MEDIAN function partitions a query result set into groups based on the expression specified in the PARTITION BY clause. It then aggregates data in each group.

Synopsis

MEDIAN( <median_expression> ) [ OVER ( [ PARTITION BY... ] ) ]

Parameters

median_expression

median_expression (mandatory) is a target column or expression that the MEDIAN function operates on and returns a median value. It can be a numeric, datetime, or interval data type.

PARTITION BY

PARTITION BY clause (optional) allows you to use MEDIAN as an analytic function and sets the range of records for each group in the OVER clause.

Return types

The return type is determined by the input data type of expression. The following table shows the return type for each input type.

Input type	Return type
`BIGINT`	`NUMERIC`
`FLOAT, DOUBLE PRECISION`	`DOUBLE PRECISION`
`INTEGER`	`NUMERIC`
`INTERVAL`	`INTERVAL`
`NUMERIC`	`NUMERIC`
`REAL`	`REAL`
`SMALLINT`	`NUMERIC`
`TIMESTAMP`	`TIMESTAMP`
`TIMESTAMPTZ`	`TIMESTAMPTZ`

Examples

In this example, a query returns the median salary for each department in the emp table:

edb=# SELECT * FROM emp;

Output

empno| ename |   job   | mgr  |      hiredate     |   sal  | comm   | deptno
 -----+-------+---------+------+-------------------+--------+--------+-------
 7369 | SMITH | CLERK   | 7902 | 17-DEC-80 00:00:00|  800.00|        |   20
 7499 | ALLEN | SALESMAN| 7698 | 20-FEB-81 00:00:00| 1600.00| 300.0  |   30
 7521 | WARD  | SALESMAN| 7698 | 22-FEB-81 00:00:00| 1250.00| 500.00 |   30
 7566 | JONES | MANAGER | 7839 | 02-APR-81 00:00:00| 2975.00|        |   20
 7654 | MARTIN| SALESMAN| 7698 | 28-SEP-81 00:00:00| 1250.00| 1400.00|   30
(5 rows)

 edb=# SELECT MEDIAN (SAL) FROM emp;

Output

median
 --------
    1250
 (1 row)

This example uses the PARTITION BY clause with MEDIAN in the emp table and returns the median salary based on a partition by deptno:

edb=# SELECT EMPNO, ENAME, DEPTNO, MEDIAN (SAL) OVER (PARTITION BY DEPTNO)
FROM emp;

Output

empno | ename  | deptno | median
-------+--------+--------+--------
  7369 | SMITH  |     20 | 1887.5
  7566 | JONES  |     20 | 1887.5
  7499 | ALLEN  |     30 |   1250
  7521 | WARD   |     30 |   1250
  7654 | MARTIN |     30 |   1250
(5 rows)

You can compare the MEDIAN function with PERCENTILE_CONT. In this example, MEDIAN generates the same result as PERCENTILE_CONT:

edb=# SELECT MEDIAN (SAL), PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY SAL)
FROM emp;

Output

median | percentile_cont
--------+-----------------
   1250 |            1250
(1 row)

STATS_MODE

The STATS_MODE function takes a set of values as an argument and returns the value that occurs with the highest frequency. If multiple values appear with the same frequency, the STATS_MODE function arbitrarily chooses the first value and returns only that one value.

Objective

You can use STATS_MODE without any grouping. In this case, the STATS_MODE function operates on all the rows in a table and returns a single value.
You can use STATS_MODE as an ordered-set aggregate function using the WITHIN GROUP clause. In this case, the STATS_MODE function operates on the ordered data set.
You can use STATS_MODE with the GROUP BY clause. In this case, the STATS_MODE function operates on each group and returns the most frequent and aggregated output for each group.

Synopsis

STATS_MODE( <expr> )

STATS_MODE() WITHIN GROUP ( ORDER BY sort_expression )

Parameters

expr

An expression or value to assign to the column.

Return type

The STATS_MODE function returns a value that appears frequently. However, if all the values of a column are NULL, STATS_MODE returns NULL.

Examples

This example returns the mode of salary in the emp table:

edb=# SELECT * FROM emp;

Output

empno| ename |   job   | mgr  |      hiredate     |   sal  | comm   | deptno
------+-------+---------+------+-------------------+--------+--------+-------
 7369 | SMITH | CLERK   | 7902 | 17-DEC-80 00:00:00|  800.00|        |   20
 7499 | ALLEN | SALESMAN| 7698 | 20-FEB-81 00:00:00| 1600.00| 300.0  |   30
 7521 | WARD  | SALESMAN| 7698 | 22-FEB-81 00:00:00| 1250.00| 500.00 |   30
 7566 | JONES | MANAGER | 7839 | 02-APR-81 00:00:00| 2975.00|        |   20
 7654 | MARTIN| SALESMAN| 7698 | 28-SEP-81 00:00:00| 1250.00| 1400.00|   30
(5 rows)

edb=# SELECT STATS_MODE(SAL) FROM emp;

Output

stats_mode
------------
   1250.00
(1 row)

This example uses GROUP BY and ORDER BY clauses with STATS_MODE in the emp table and returns the salary based on a partition by deptno:

edb=# SELECT STATS_MODE(SAL) FROM emp GROUP BY DEPTNO ORDER BY DEPTNO;

Output

stats_mode
------------
   800.00
   1250.00
(2 rows)

This example uses the WITHIN GROUP clause with STATS_MODE to perform aggregation on the ordered data set.

SELECT STATS_MODE() WITHIN GROUP(ORDER BY SAL) FROM emp;

Output

stats_mode
------------
   1250.00
(1 row)

BIT_AND_AGG

BIT_AND_AGG is a bitwise aggregation function that performs a bitwise AND operation and returns a value based on the data type of input argument.

Objective

You can use the BIT_AND_AGG function with the GROUP BY clause. In this case, the BIT_AND_AGG function operates on a group and returns the result of a bitwise AND operation.
You can use the DISTINCT or UNIQUE keywords with the BIT_AND_AGG function to ensure that unique values in the expr are used for computation.

Synopsis

BIT_AND_AGG ( [DISTINCT | ALL | UNIQUE]  <expr> )

Parameters

expr

An expression or value to assign to the column.

Return Type

The BIT_AND_AGG function returns the same value as the data type of the input argument. However, if all the values of a column are NULL, the BIT_AND_AGG returns NULL.

Examples

This example applies the BIT_AND_AGG function to the sal column in an emp table and groups the result by deptno:

edb=# SELECT deptno,BIT_AND_AGG(DISTINCT sal) FROM emp GROUP BY deptno;

Output

deptno | bit_and_agg 
--------+-------------
     20 |     2975.00
     30 |           0
(2 rows)

BIT_OR_AGG

BIT_OR_AGG is a bitwise aggregation function that performs a bitwise OR operation and returns a value based on the data type of input argument.

Objective

You can use the BIT_OR_AGG function with the GROUP BY clause. In this case, the BIT_OR_AGG function operates on a group and returns the result of a bitwise OR operation.
You can use the DISTINCT or UNIQUE keywords with the BIT_OR_AGG function to ensure that unique values in the expr are used for computation.

Synopsis

BIT_OR_AGG ( [DISTINCT | ALL | UNIQUE]  <expr> )

Parameters

expr

An expression or value to assign to the column.

Return type

The BIT_OR_AGG function returns the same value as the data type of the input argument. However, if all the values of a column are NULL, BIT_OR_AGG returns NULL.

Examples

This example applies BIT_OR_AGG to the sal column in an emp table and groups the result by deptno:

edb=# SELECT deptno,BIT_OR_AGG(DISTINCT sal) FROM emp GROUP BY deptno;

Output

deptno | bit_or_agg 
--------+------------
     20 |    2975.00
     30 |       4066
(2 rows)

← Prev

Conditional expressions

Subquery expressions

Aggregate functions v16

Built-in aggregate functions

Aggregate functions for statistical analysis

LISTAGG

Objective

Synopsis

Parameters

Return type

Examples

MEDIAN

Objective

Synopsis

Parameters

Return types

Examples

STATS_MODE

Objective

Synopsis

Parameters

Return type

Examples

BIT_AND_AGG

Objective

Synopsis

Parameters

Return Type

Examples

BIT_OR_AGG

Objective

Synopsis

Parameters

Return type

Examples

← Prev

Next →