在这篇文章中，作者简单扼要的介绍了 SQL移动汇总的方法、只有一个order by次序的语法、分别包含order by和partition by的语法、partiton by包含多个参数的语法。
- 在Tableau表计算中，我们说所有的表计算都必须引用聚合，也就是window_sum( AGG)的样式。但是我们发现SQL似乎就没有这样的要求，可以直接 SUM（） partition by order by ——这个时候，也没有group by，似乎可以理解为行级别的查询和聚合。
- 这就是奇怪的地方，我还需要确认我在Tableau中诠释逻辑，是否可以推演到SQL中，这样就可以用可视化的、层次的理解方式，理解sql的窗口计算。特别是 “窗口计算是维度参与的计算”。
Dorota is an IT engineer and works as a Data Science Writer for Vertabelo. She has experience as a Java programmer, webmaster, teacher, lecturer, IT specialist, and coordinator of IT systems. In her free time, she loves working in the garden, taking photos of nature, especially macro photos of insects, and visiting beautiful locations in Poland.
The SQL running total is a very common pattern, used frequently in finance and in trend analysis. In this article, you’ll learn what a running total is and how to write a SQL query to compute it.
So, without further ado, let’s get started on the first part of the question.
What’s a SQL Running Total?
In SQL, a running total is the cumulative sum of the previous numbers in a column. Look at the example below, which presents the daily registration of users for an online shop:
The first column shows the date. The second column shows the number of users who registered on that date. The third column, total_users, sums the total number of registered users on that day.
For example, on the first day (2020-03-05), 32 users registered and the total value of registered users was 32. The next day (2020-03-06) 15 users registered; the total_usersvalue became 47 (32+15). The third day (2020-03-07), six users registered and the total_users value was 53. In other words, total_users is a running value that changes from day to day. It is the total number of users on each day.
The next example uses the total_running column to deal with company revenue in a similar way. Look at the table below:
|2020-04-02||125 000||125 000|
|2020-04-03||125 000||250 000|
|2020-04-04||20 500||270 500|
|2020-04-05||101 000||371 500|
For each day, the total_revenue column is calculating the amount of revenue generated up to the given day. On 2020-04-04, the company achieved a total revenue of $270,500 because that is the sum of all revenues from 2020-04-02 to 2020-04-04.
Relational databases (like SQL SERVER, ORACLE, POSTGRESQL, and MYSQL) and even non-relational engines like HIVE and PRESTO provide window functions that allow us to calculate a running total. Next, we’ll talk about the SQL query that builds such a sum and learn more about window functions.
How to Compute a Cumulative Sum in SQL
If you would like to compute running total in SQL, you need to be familiar with the window functions provided by your database. Window functions operate on a set of rows and return an aggregate value for each row in the result set. If you are interested in learning more about window functions, try the WINDOW FUNCTIONS course on LEARNSQL.COM platform.
Don’t just read about window functions – practice what you’re learning. I recommend LearnSQL.com’s Window Functions course. It’s a great hands-on way to dig into using analytical functions to power up your SQL.
The syntax of the SQL window function that computes a cumulative sum across rows is:
It’s mandatory to use the OVER clause in a window function, but the arguments in this clause are optional. We will discuss them in the next paragraphs of this article.
In this example, we will calculate the total running sum of the registered users each day.
This query …
… selects the registration date for all users. We also need the sum of all users for each day, starting from the first given day (2020-03-05) to the day in that row.
This is the result set:
To calculate the running total, we use the
SUM() aggregate function and put the column
registered_users as the argument; we want to obtain the cumulative sum of users from this column.
The next step is to use the OVER clause. In our example, this clause has one argument:
ORDER BY registration_date. The rows of the result set are sorted according to this column (
registration_date). For each value in the
registration_date column, the total sum of the previous column values is computed (i.e. the sum of the number of users before the date in the current row) and the current value (i.e. users registered on the day of the current row) is added to it.
Notice that the total sum is shown in the new column, which we named
In the first step (the registration date 2020-03-05), we have 57 registered users. The sum of users registered this day is the same 57. In the next step, we add to this total value (57). What do we add? The number of users registered on the current date (2020-03-06), which is 27; this gives us a running total of 84. In the last row of the result set (for the last registration date, 2020-03-07) the running total is 100.
Thanks to SQL window functions, it is easy to find the cumulative total number of users during a given period of time. For example, during 2020-03-05 – 2020-03-06, the total number of registered users was 84.
In the second example, we’ll go into more details about users. We’ll show users with their countries. Look at the table below:
Notice that for each day we have the number of users for each country shown separately. In this example, we will compute a separate cumulative sum of registered users for each country.
This query …
… calculates the sum of users for each day, first for users from England and then for users from Poland.
Here’s the result set:
For each country, each registration day gets a running total. The PARTITION BY clause in the OVER clause has the column country as its argument. This partitions rows by country, allowing SQL to compute a running total for that country only (instead of both countries together). Thus, in England from 2020-03-05 to 2020-03-07, we have a total of 47 users. For the same period in Poland, the total of registered users was 53.
Tired of doing simple SQL exercises? Let’s move to a more advanced level! Check out our Advanced SQL track!
In the last example, we’ll analyze the data in the
competition table, which stores the columns game_id, gamer_id, game_level, competition_date, and score.
We need to check each gamer’s total cumulative score for each day in two different games. Look at the query below, which creates this running total:
In this result table, we can read that the gamer with ID=4 starts from a score of 4 and finishes with a total score of 11. The best was the gamer with ID=7, who finished with a total score of 12.
Once again, in the OVER clause we use PARTITION BY. This time, we use a list of columns (
game_id, gamer_id). This allows us to create two partitions: one for game 1 and one for game 2.
Next, rows were divided by gamer_id for each game. In game 1, we have the gamers 4 and 5; in game 2, we have the gamers 6, 7, and 8. Among each group (a given gamer plays in a given game), rows are sorted by competition_date and the score from each day is summed. In each group, we can observe each gamer’s changing score in a given game.
How Will You Use SQL Running Totals?
Using a running total value in SQL reports can be very handy, especially for financial specialists. Therefore, it is worthwhile to know what a cumulative sum is and how to use SQL window functions to create one. This article presented a few selected use cases. For more about window functions, check out our article SQL WINDOW FUNCTION EXAMPLE WITH EXPLANATIONS or the LearnSQL course WINDOW FUNCTIONS.
Want to learn about window functions? Click here for a great interactive experience!