V0.2 Revise Part one
V0.3 Revise part one again (add SUMX)
This article will be revised for many times, and focus CALCULATE expression to help people to understand analytical method in Power BI, with a new method to instruct.
Practice makes perfect. I hop my English level will be better and better along with understanding of Power BI.
Several important points of this article：
- CALCULATE is an important function, and more likely an expression including several functions. From a certain point of view, the CALCULATE expression is equal to SUMIF function of Excel, both of which are evolution from SUM+IF combinition.
- There are two parts of CALCULATE expression: aggregate expression and filter expression. The former is mandatory and the latter is optional. All circumstances that influent aggregation are called FILTER CONTEXT, no matter internal in CALCULATE or external dimensions and filters in report.
- In a report, common context of several measures in a question should be placed in external filter context, such as report filter and dimension; while distinguishing context should be embed in measures calculation.
- The cardinal point of CALCULATION is relationship of internal filter context and external filter context.
- There are two types of context functionally: dimensional context and conditional context.
- There are two types of context in perspective of place: internal context and external context.
- CALCULATION is designed to optimized the relationship of aggregation and its filter context. Aggregation is a process from detail table to aggregate table, and filter function in CALCULATE expression is applied in detail table, not aggregate table (i.e. result table).
TIP: From now, I will use one table named “orders” from a superstore to introduce.
1. the origin of CALCULATE Expression_evolution of SUM+IF
In excel and SQL ,we can calculate total sales in any certain range of any fields by combining SUM aggregation and IF condition. It’s flexible and easy to use, although lack of elegance sometimes.
For example, Let’s to calculate “total sales of furniture” with fields [sales] and [category] with different methods, and introduce their difference. Step by step, newbies can get elegance and simplity of CALCULATE expression.
In excel, seniors can use SUMIF function to complete “total sales of Furniture” without help of any temporary field, just as picture below. Of course, the emphasis of this case is how to combine aggregation and filter condition in one calculation, so I don’t use pivot table because of its seperate design of filter region and aggregation value.
By the way, SUMIF is a combinition of SUM and logical function(if), that is the reason I usaully regard SUMIF as an expression, no a typical function. SUMIF simplify process of combinition calculation, but also brings a bit complex to understand its synatx.
Of course, product scientists
We can use the same way by using SUM aggregation and IF logical function in MySQL to interact with database. Just like this below.
SELECT SUM( IF [category] ="Furniture" THEN [sales] END) FROM orders
Although we cannot find a similar SUMIF function in SQL, there is an equivalent expression as below. It’s faster and more elegant using WHERE clause instead of IF condition .
SELECT SUM( [sales] ) FROM orders WHERE [category] ="Furniture"
In this syntax, SQL engineer calculate WHERE condition by each row of table, and return TRUE when value of category is equal to “Furniture”. Only receiving TURE value , SQL engineer will aggregate corresponding number of [sale] field, which will ignore sales values of FALSE rows.
By contrast, SQL enigneer calculate every row with SUM and IF combinition, not only rows of “Furniture” category, but all other category. This is the reason of weak performance of SUM+IF combition, compared with SUMIF expression.
It’s worth to stressing that SUMIF is a big improvement relative to simple combinition of SUM and IF function. A fatal weakness of “SUM + IF” combination is slow performance in dealing with big data. We can assume that Microsoft optimize performance in SUMIF and SUMIFS functions.
In Power BI, we can use SUM and logical funtion (if) to complete similar calcuation.
Considering IF clause is calculated and return itself or zero by every row, we should create a calculated column ahead of aggregation in report.
Because calculated columns are evaluated prior to aggregation in virtual report, they have low performance. In Power BI, product scientists have no reason to refuse a similar expression like SUMIF in excel and SUM+where clause combinition in SQL.
Why not invent a SUMIF function in POWER BI?
Indeed, it’s the simplest way to resolve this current case. But how to tackle other aggregation such as AVERAGE, MAX, MIN, COUNT and other functions? It’s not elegant to invent many functions, like AVGIF, MAXIF,MINIF ,etc.
Talented product scientists want to seperate aggregation functions and filter context on the one hand, and keep them as a whole on the other hand. In this way, analytics can control their relationship as they want.
SUMX and AVGX are created firstly to satisfy these demands , so we can complete as below:
Furniture Sales =SUMX ( FILTER ( 'Global-Superstore', 'Global-Superstore'[Category] = "Furniture" ), 'Global-Superstore'[Sales] )
SUMX can be regarded as a high improvement of SUMIF. We have unlimited possibility using FILTER clause instead of IF . In these iterators, filter context is regarded as a argument of aggregation, just like SUMIF functions. It’s a little bit complex.
Later, talented product scientists create a more simple expression to separate aggregation and filter context. CALCULATE is used to combine aggregation and filter conditions (one or many) , and we can use expression below to get same results.
Furniture Sales =CALCULATE (, SUM('Global-Superstore'[Sales]) , FILTER ('Global-Superstore', 'Global-Superstore'[Category] = "Furniture" ) )
The order is reversed and others are same. its syntax shows below:
CALCULATE(<aggregation expression>[, <filter1> , <filter2> , …])
In the syntax, the first argument is not an expression simply, but an aggregation expression exactly, which is mandatory and calculated after filter expression.
Relatively to combination of SUM and if , CALCULATE creates a temporary table, not a temporary column to optimize the process. The aggregation expression of the first argument begins from the temporary table, not the detail table instead.
From this point of view, we can see that CALCULATE is a special expression to optimize aggregation and filter condition to satisfy faster requirement of big data.
In Power BI, newbies may try to combine sum function and if contion in one expression at first. Unfornately, it’s not accessible.
The SUM function have only one argument , and only accepts a column reference, no mater you want to create a column or a measure.
Althought we can regard the logical expression as a virtual and temporary field, you must write it ahead and then reference it in other fields.
Another way is to use SUMX expression, which can have many arguments like this.
Furniture Sales =SUMX ( FILTER ( 'Global-Superstore', 'Global-Superstore'[Category] = "Furniture" ), 'Global-Superstore'[quantity] * 'Global-Superstore'[Price] )
There are valid result in either column or measure, and different logics in different places. In this article, I will focus measure and discuss their difference in later article.
2. the evaluation of aggregation and filter context
Since CALCULATE is designed to optimize aggregation and filter condition, the most import of understanding is how filter context influences the evolution of aggregation.
In DAX, CALCULATE and CALCULATETALBE are the only functions that can create new filter context. So, the cardinal point is the relationship of new filter context and existing filter context, which can be named internal and external filter also.
There are several principles in this area:
- internal filter context (new ) is prior to external ones. (surpass)
- internal filter can control external filter of the same table or columns, such as replace, neglect, or reset it.
2.1 CALCULATE with simple internal and external filter context
“In the year 2022, what is the total sales of each category?”
2.2 CALCULATE with multiple filter context
“In the year 2022, what is the total sales(YTD) and MTD sales of each category?”
3. set calculation of multiple filter context
2.3 CALCULATE with the AND filter
2.4 CALCULATE with the OR filter
4. conflict of internal and external filter context