Published on

Understanding Aggregate/Grouping Functions in SQL

Authors

Introduction

Aggregate functions are SQL functions that operate on a set of values and return a single, summarized result. They are commonly used to perform calculations on a group of records rather than individual rows. Aggregate functions are particularly useful for tasks such as finding the minimum, maximum, average, sum, or count of a set of values within a database table.

Aggregate functions groups records and then find the value i.e it finds the maximum, minimum, average, sum or counts the records. MIN(), MAX(), and COUNT() work on any type of values - Numeric, Date, or String. AVG(), and SUM() work on only Numeric values (INT and DECIMAL). Consider the table shoppe for all the examples of grouping functions

shoppe.sql

+------------+--------------+--------+-----+------------+---------+
| product_id | product_name | price  | qty |    city    | company |
+------------+--------------+--------+-----+------------+---------+
|     1      |    Laptop    | 500.00 |  10 |  New York  |   HP    |
|     2      |  Smartphone  | 300.00 |  20 | Los Angeles|  Apple  |
|     3      |      TV      | 400.00 |  5  |  Chicago   | Samsung |
|     4      |    Tablet    | 200.00 |  50 |  New York  |  Apple  |
|     5      |    Camera    | 100.00 |  56 |  Chicago   |  Canon  |
|     6      | Refrigerator | 600.00 |  30 | Los Angeles|   LG    |
|     7      |  Microwave   | 200.00 |  25 |  Chicago   |   LG    |
+------------+--------------+--------+-----+------------+---------+

Now, let's run SQL queries with explanations for each aggregate function:

Aggregate Functions

COUNT(*) - Counts the number of records:

shoppe.sql

SELECT COUNT(*) FROM shoppe;
-- OUTPUT: 7

This query counts all the records in the "shoppe" table, which is 7 in this case.

COUNT(fieldname) - Counts the values in the field excluding NULL:

shoppe.sql

SELECT COUNT(price) FROM shoppe;
-- OUTPUT: 6

Here, we count the number of non-null values in the "price" column, which is 6.

COUNT(DISTINCT fieldname) - Counts the distinct or different values in a field:

shoppe.sql

SELECT COUNT(DISTINCT city) FROM shoppe;
-- OUTPUT: 3

This query counts the number of distinct (different) values in the "city" column, which is 3 (New York, Los Angeles, and Chicago).

SUM(fieldname) - Find the sum of the values in a field:

shoppe.sql

SELECT SUM(price) FROM shoppe;
-- OUTPUT: 2100.00

The SUM() function calculates the total sum of values in the "price" column, which is $2100.00 in this case.

AVG(fieldname) - Finds the average of numbers in a field excluding NULL:

shoppe.sql

SELECT AVG(qty) FROM shoppe;
-- OUTPUT: 36.5714

AVG() computes the average of the "qty" values, excluding NULL values, resulting in an average of approximately 36.5714.

COUNT(*) - Counts the number of records:

shoppe.sql

SELECT COUNT(*) FROM shoppe;
-- OUTPUT: 7

MAX(fieldname) - Finds the highest of all numbers or date in a field:

shoppe.sql

SELECT MAX(price) FROM shoppe;
-- OUTPUT: 600.00

The MAX() function identifies the highest value in the "price" column, which is $600.00.

MIN(fieldname) - Finds and displays the lowest of all numbers/date in a field:

shoppe.sql

SELECT MIN(qty) FROM shoppe;
-- OUTPUT: 5

MIN() returns the lowest value in the "qty" column, which is 5.

These SQL queries and their respective outputs demonstrate how to use various aggregate functions to perform calculations and retrieve summarized information from a database table.

GROUP BY clause in aggregate function

Group By clause is only used with the aggregate function, it first groups the record according to a field values and then finds and displays the maximum, minimum, average, sum or counts the records.

Example 1: Grouping by City and Calculating Total Sales for Each City

shoppe.sql

SELECT city, SUM(price * qty) AS total_sales
FROM shoppe
GROUP BY city;


-- Output

|   city    | total_sales |
|-----------|-------------|
| New York  |  8000.00    |
| Chicago   |  15600.00   |
| Los Angeles| 18000.00   |

Explanation:

  • In this example, we group the data by the "city" column.
  • The SUM() function calculates the total sales for each city by multiplying the "price" and "qty" columns for each row within the city group.
  • The output table displays each unique city along with the corresponding total sales.

Example 2: Grouping by Company and Finding the Average Price of Products for Each Company

shoppe.sql

SELECT company, AVG(price) AS avg_price
FROM shoppe
GROUP BY company;

-- Output

|  company  | avg_price |
|-----------|-----------|
| Apple     |  250.00   |
| Canon     |  100.00   |
| HP        |  500.00   |
| LG        |  400.00   |
| Samsung   |  400.00   |

Explanation:

  • Here, we group the data by the "company" column.
  • The AVG() function calculates the average price of products for each company.
  • The output table displays each unique company and the average price of their products.

Example 3: Example: Grouping by City and Counting the Number of Occurrences

shoppe.sql

SELECT city, COUNT(*) AS city_count
FROM shoppe
GROUP BY city;

-- Output

|      city     | city_count |
|--------------|-------------|
|   New York    |     2      |
|  Los Angeles  |     2      |
|    Chicago    |     3      |

Explanation:

  • In this SQL query, we are performing a data analysis task on the "shoppe" table.
  • We start by selecting two columns from the table:
    • city: This is the column that we want to group our data by, and it represents the city where each product is sold.
    • COUNT() AS city_count: We use the COUNT() aggregate function to count the number of rows (records) for each unique value in the "city" column. We alias the result of this count as "city_count" for clarity.

GROUP BY Clause:

The GROUP BY clause is used to group the data based on the "city" column. This means that all rows with the same city value will be grouped together.

Output Explanation:

  • The output table displays two columns:
    • city: This column shows each unique city from the "shoppe" table.
    • city_count: This column displays the count of how many times each city appears in the table.

Here's what the output table tells us:

  • There are three unique cities in the "shoppe" table: New York, Los Angeles, and Chicago.
  • In New York, there are two records/products.
  • In Los Angeles, there are two records/products.
  • In Chicago, there are three records/products.

This query and output are helpful for summarizing data and understanding the distribution of products across different cities in the dataset.

HAVING clause in aggregate function

Having clause is only used with the aggregate function, it is used to specify condition on grouping functions

shoppe.sql

SELECT city, SUM(price * qty) AS total_sales
FROM shoppe
GROUP BY city
HAVING SUM(price * qty) > 1000;

-- Output

|     city     | total_sales |
|--------------|-------------|
|   New York   |    8000.00  |
|  Los Angeles |   18000.00  |
|    Chicago   |   15600.00  |

Explanation:

In this example, we want to find cities with a total sales value (price * qty) greater than $1,000. Here's how the query works:

  • We start by selecting two columns: city and the sum of price * qty as total_sales.
  • We use the GROUP BY clause to group the data by the "city" column, which means that all rows with the same city value are grouped together.
  • The SUM(price * qty) calculates the total sales for each city.
  • The HAVING clause is used to filter the grouped data based on a condition. In this case, we filter cities where the total sales (SUM(price * qty)) is greater than $1,000.

Output Explanation:

The output table displays two columns:

  • city: This column shows each unique city from the "shoppe" table.
  • total_sales: This column displays the total sales (price * qty) for each city.

Here's what the output tells us:

  • Only cities with a total sales value greater than $1,000 are displayed in the result.
  • New York has total sales of $8,000.00.
  • Los Angeles has total sales of $18,000.00.
  • Chicago has total sales of $15,600.00.

Example 2: Using HAVING Clause to Find Companies with the Maximum Count of Products Sold

shoppe.sql

SELECT company, COUNT(*) AS product_count
FROM shoppe
GROUP BY company
HAVING COUNT(*) = (
    SELECT MAX(product_count)
    FROM (
        SELECT COUNT(*) AS product_count
        FROM shoppe
        GROUP BY company
    ) AS counts
);

-- Output

|  company  | product_count |
|-----------|---------------|
|   LG      |      2        |


Explanation:

  • In this example, we want to find companies that have the maximum count of products sold among all companies.
  • We start by selecting two columns: company and the count of products (COUNT(*)) sold by each company.
  • The GROUP BY clause groups the data by the "company" column.
  • The HAVING clause filters the results to include only companies where the count of products sold matches the maximum count.

Subquery Explanation:

  • To find the maximum count of products sold, we use a subquery:
  • The subquery first calculates the count of products sold for each company and aliases it as "product_count."
  • The outer query then compares the count of each company with the maximum count found in the subquery.

Output Explanation:

  • The output table displays two columns:
    • company: This column shows each unique company from the "shoppe" table.
    • product_count: This column displays the count of products sold by each company.

Here's what the output tells us:

  • The company "LG" has the maximum count of products sold, with a count of 2.

Practical Applications

Aggregate functions are widely used in SQL queries for various real-world applications, including financial analysis, inventory management, and sales reporting. They provide valuable insights into datasets, enabling data professionals to make informed decisions.

Conclusion

In conclusion, Aggregate Functions are indispensable tools in SQL for summarizing and analyzing data. Whether you need to find the highest sales figure, count the number of customers, or compute the average rating, these functions empower you to extract meaningful information from your databases efficiently.

FAQs

  1. What is the primary purpose of Aggregate Functions in SQL?

Aggregate Functions in SQL are primarily used to perform calculations on groups of data records, providing summarized results.

  1. Can I use AVG() on non-numeric data types?

No, AVG() is designed to work exclusively with numeric values, such as integers and decimals.

  1. When should I use COUNT() over other Aggregate Functions?

COUNT() is useful when you need to count the number of records in a dataset, regardless of data type.

  1. What are some practical applications of Aggregate Functions?

Aggregate Functions find applications in areas like financial analysis, inventory management, and sales reporting.

  1. Where can I learn more about SQL and Aggregate Functions?

To explore SQL and its functions further, consider accessing educational resources online. You can access one such resource here.

Incorporating Aggregate Functions into your SQL repertoire can significantly enhance your ability to extract meaningful insights from your data. These functions are versatile, easy to use, and invaluable for data-driven decision-making. So, start exploring the world of Aggregate Functions and unlock the potential of your data.