MariaDB Crash Course - Ben Forta [37]
• If you have nested groups in your GROUP BY clause, data is summarized at the last specified group. In other words, all the columns specified are evaluated together when grouping is established (so you won’t get data back for each individual column level).
• Every column listed in GROUP BY must be a retrieved column or a valid expression (but not an aggregate function). If an expression is used in the SELECT, that same expression must be specified in GROUP BY. Aliases cannot be used.
• Aside from the aggregate calculations statements, every column in your SELECT statement should be present in the GROUP BY clause.
• If the grouping column contains a row with a NULL value, NULL will be returned as a group. If there are multiple rows with NULL values, they’ll all be grouped together.
• The GROUP BY clause must come after any WHERE clause and before any ORDER BY clause.
* * *
Tip: Using ROLLUP
To obtain values at each group and at a summary level (for each group), use the WITH ROLLUP keyword, as seen here:
SELECT vend_id, COUNT(*) AS num_prods
FROM products
GROUP BY vend_id WITH ROLLUP;
* * *
Filtering Groups
In addition to being able to group data using GROUP BY, MariaDB also allows you to filter which groups to include and which to exclude. For example, you might want a list of all customers who have made at least two orders. To obtain this data you must filter based on the complete group, not on individual rows.
You’ve already seen the WHERE clause in action (introduced back in Chapter 6, “Filtering Data.”) But WHERE does not work here because WHERE filters specific rows, not groups. As a matter of fact, WHERE has no idea what a group is.
So what do you use instead of WHERE? MariaDB provides yet another clause for this purpose: the HAVING clause. HAVING is similar to WHERE. In fact, all types of WHERE clauses you learned about thus far can also be used with HAVING. The only difference is that WHERE filters rows and HAVING filters groups.
* * *
Tip: HAVING Supports All of WHERE’s Operators
In Chapter 6 and Chapter 7, “Advanced Data Filtering,” you learned about WHERE clause conditions (including wildcard conditions and clauses with multiple operators). All the techniques and options you learned about WHERE can be applied to HAVING. The syntax is identical; just the keyword is different.
* * *
So how do you filter rows? Look at the following example:
Input
SELECT cust_id, COUNT(*) AS orders
FROM orders
GROUP BY cust_id
HAVING COUNT(*) >= 2;
Output
+---------+--------+
| cust_id | orders |
+---------+--------+
| 10001 | 2 |
+---------+--------+
Analysis
The first three lines of this SELECT statement are similar to the statements seen previously. The final line adds a HAVING clause that filters on those groups with a COUNT(*) >= 2—two or more orders.
As you can see, a WHERE clause does not work here because the filtering is based on the group aggregate value, not on the values of specific rows.
* * *
Note: The Difference Between HAVING and WHERE
Here’s another way to look at it: WHERE filters before data is grouped, and HAVING filters after data is grouped. This is an important distinction; rows that are eliminated by a WHERE clause are not included in the group. This could change the calculated values, which in turn could affect which groups are filtered based on the use of those values in the HAVING clause.
* * *
So is there ever a need to use both WHERE and HAVING clauses in one statement? Actually, yes, there is. Suppose you want to further filter the previous statement so it returns any customers who placed two or more orders in the past 12 months. To do that, you can add a WHERE clause that filters out just the orders placed in the past 12 months. You then add a HAVING clause to filter just the groups with two or more rows in them.
To better demonstrate this, look at the following example that lists all vendors who have 2 or more products priced at 10 or more:
Input
SELECT vend_id, COUNT(*) AS num_prods