Aggregating data v16

Like most other relational database products, EDB Postgres Advanced Server supports aggregate functions. An aggregate function computes a single result from multiple input rows. For example, some aggregates compute the COUNT, SUM, AVG (average), MAX (maximum), and MIN (minimum) over a set of rows.

As an example, you can find the highest and lowest salaries with the following query:

SELECT MAX(sal) highest_salary, MIN(sal) lowest_salary FROM emp;
Output
highest_salary | lowest_salary
----------------+---------------
       5000.00  |        800.00
(1 row)

If you want to find the employee with the largest salary, you might be tempted to try:

SELECT ename FROM emp WHERE sal = MAX(sal);
Output
ERROR:  aggregates not allowed in WHERE clause

This approach doesn't work because you can't use the aggregate function MAX in the WHERE clause. This restriction exists because the WHERE clause determines the rows that go into the aggregation stage. Hence, it has to be evaluated before aggregate functions are computed. However, you can restart the query to accomplish the intended result by using a subquery. The subquery is an independent computation that obtains its own result separately from the outer query.

SELECT ename FROM emp WHERE sal = (SELECT MAX(sal) FROM emp);
Output
ename
-------
 KING
(1 row)

Aggregates are also useful in combination with the GROUP BY clause. For example, the following query gets the highest salary in each department.

SELECT deptno, MAX(sal) FROM emp GROUP BY deptno;
Output
deptno |   max
--------+---------
     10 | 5000.00
     20 | 3000.00
     30 | 2850.00
(3 rows)

This query produces one output row per department. Each aggregate result is computed over the rows matching that department. These grouped rows can be filtered using the HAVING clause.

SELECT deptno, MAX(sal) FROM emp GROUP BY deptno HAVING AVG(sal) > 2000;
Output
deptno |   max
--------+---------
     10 | 5000.00
     20 | 3000.00
(2 rows)

This query gives the same results for only those departments that have an average salary greater than 2000.

Finally, the following query takes into account only the highest paid employees who are analysts in each department.

SELECT deptno, MAX(sal) FROM emp WHERE job = 'ANALYST' GROUP BY deptno HAVING AVG(sal) > 2000;
Output
deptno |   max
--------+---------
     20 | 3000.00
(1 row)

There's a subtle distinction between the WHERE and HAVING clauses. The WHERE clause filters out rows before grouping occurs and aggregate functions are applied. The HAVING clause applies filters on the results after rows are grouped and aggregate functions are computed for each group.

So, in the previous example, only employees who are analysts are considered. From this subset, the employees are grouped by department. Only those groups where the average salary of analysts in the group is greater than 2000 are in the final result. This is true only of the group for department 20, and the maximum analyst salary in department 20 is 3000.00.