Skip to main content

Cartesian Product in Power BI

This is the last post of my join series and it's about how to create a Cartesian product table in DAX and in Power Query. If you are wondering what is Cartesian product, google it :) I'm joking come back here :) Okay, serious mode on. First I'd like to share Wikipedia's definition and visual to describe it: 'In mathematics, specifically set theory, the Cartesian product of two sets A and B, denoted A × B, is the set of all ordered pairs (a, b) where a is in A and b is in B. In terms of set-builder notation, that is:'



My definition for Cartesian product (in Power BI) is taking two different tables and combining them with all possible pairs of rows.


To get this result in DAX we can use CROSSJOIN function. Microsoft's definition for CROSSJOIN is: "Returns a table that contains the Cartesian product of all rows from all tables in the arguments. The columns in the new table are all the columns in all the argument tables." The syntax is pretty simple: 

CROSSJOIN(<table>, <table>[, <table>]…) 

My formula to obtain the result set on below is:

Like it's obvious from syntax, we are not limited to only two tables. We need different column names in every table, otherwise, it returns an error. Keep in mind that the function is not supported for use in DirectQuery mode or row-level-security RLS rules. I also want to emphasize that in this example, the columns are in the same order with parameters. Other than these, I'd like to add that we can create Cartesian product table by using GENERETE and GENERATEALL functions as well. But I'm not going to show how to do it with those functions, because their real power comes from creating inner join and left/right outer join combined with some other functions, which is the subject of another post.

Now I want to ask a hypothetical question. Is it possible to create Cartesian product of two tables with filtered rows? The answer is yes! DAX can be quite cool sometimes. For example, I want to create a Cartesian product of two tables but I don't want to include 'Hat' in my result-set. How can I do that?


It's as simple as that! What makes this possible is: the idea that wherever we can use a table name in the parameter, we can also use a table function, like FILTER, CALCULATETABLE, etc. The other thing that I'd like to emphasize is, despite the fact that the Product table is the first parameter of the formula, in the result-set, product table columns come after the Stock table columns. It shows that the order of columns is not always related to the order of parameters.

To create a Cartesian Product table in Power Query, first, we need to choose one of the tables. Which one comes first doesn't change the number of rows in the result-set table, but it changes the order of columns. In this example, I choose the product table and click on Custom Column. After that, to the custom column formula part, I write only the name of the table that I want to combine, being 'Stock', and click OK.


And this is the result after expanding newly added columns.



To create a Cartesian product table with more than 2 tables, we can add the name of other tables to Custom Column formula part with "&". To filter the result-set, we can simply choose the column and filter. If you prefer hard-coding, here is the M formula to do it in Advanced Editor.

This is all I want to say about Cartesian product. End of my join series. My next post is going to be about solving data type conflicts in Power BI and, wait for it, in SQL. Stay tuned.

Comments

Popular posts from this blog

Manipulating Grand Totals with ISINSCOPE in Power BI

ISINSCOPE function was introduced in with November 2018 Power BI update. We can say it's a very young player in the game, but it's definitely magic. It can be used to manage lots of issues related to hierarchies, but in this post, I'm going to focus on manipulating the grand total line with it.  This is my budget forecast table. The budget forecast that you see for every month is not the forecast for every month, but it's the forecast for the whole year. The budget forecast for 2020 has changed every month. That's why we have different numbers for every row. Below you see the table visual with the year-month column and a measure that shows the values for every month. The thing that I want you to pay attention to is the total line. I wanted to see the last month's value for the total line and this is the measure for that.  This measure calculates this: If the year-month column has one value then take the average of that value. If it doesn't have one value, li...

How to Add Flags to Power BI Reports

Adding flags to Power BI Reports might be easier said than done. If you have a dataset which contains data of various countries, adding country flags looks cool. The process to do it, on the other hand, might be compelling. Now I am going to share with you the obstacles I  have faced during the process and of course, how I overcame them. First I made an excel list of countries that my dataset has. Then I have downloaded to my computer all the flags I need from Wikipedia and converted them to Base64 link via an image encoder website  base64-image.de  and added those links to my excel file, which looks like this: After I added my excel file to Power BI, I converted the data category of URL column from 'uncategorized' to 'image URL', which might be a small step for you but it's a big step for the success of the flag operation. Then I created one to many relationship with DimLocation table and chose 'both' as filter direction, so that I can filter from both t...

Solving Data Type Conflicts in Power BI and SQL

Whether we like it or not, error messages are part of our lives. The other day, I faced one of those little horrid yellow messages. For me, it's impossible to stay chill when I see that message. For a second, I feel alarmed and panicked. Let me show you what I'm talking about. The first column is the customer code, quite self-explanatory. The Registration Date column shows the date when the customer registered to my shop. The First Sale Date column shows the date when the customer bought something for the first time. And this is what I'm trying to see in the Sales Day column: If Registration Date and First Sale Date are the same, meaning customer bought something at the same day customer registered to my shop, print  "First Day", otherwise show the data of First Sale Date column. But computer says no! It says "Expressions that yield variant data-type cannot be used to define calculated columns." If I need to translate it in simplified English, it means: ...