Posts Tagged technology

Recursive CTE vs While Loop – Performance Analysis

When it comes to Recursive CTEs in SQL Server there are often misconceptions about their performance, especially compared to While Loops. In this post we’ll explore two iterative scenarios and explore the performance of both a Recursive CTE solution and a While Loop solution. Its important to understand that both a Recursive CTE and While Loop both follow a linear pattern. That means that the duration and use of resource will grow linearly with increases in the number of iterations/recursions/rows required to process the query. The linear growth of both is where a lot of the misconceptions occur. While Loops are clear iterative processes and its easy to see that the growth would be linear, but a Recursive CTE is not immediately obvious as being linear because of the way they are written with an anchor and a recursive element. The thing to remember is that a Recursive CTE is the equivalent of writing multiple selects with union all statements in between. So each recursion is essentially a new union all and select statement.By thinking about Recursive CTEs as multiple union all statements it is easier to understand the linear growth.

Now onto the performance test results. There are two scenarios for this performance test. The first scenario is a data generation scenario and the second scenario is a row concatenation scenario. The data generation scenario will illustrate performance for an extreme number of recursions/iterations as only one row of data will be generated for each iteration. The row concatenation scenario will illustrate a more average scenario where the number of recursions/iterations is low, maxing out at 30 iterations, but across a wide range of rows of data.

Looking at the Data Generation scenario we can easily see that the Recursive CTE and While Loop have equal durations as the variance is under 10.00% and even at 10 million iterations/recursions/rows the Recursive CTE is only 3 seconds faster than the While Loop. Additionally we can see that the Recursive CTE is making better use of CPU than the While Loop but the While Loop makes better use of Reads. This is typical of Recursive CTEs, they frequently out perform other query types in CPU usage, but the reads are always higher. Its important to note though that the reads are all from buffer cache, there is no actual disk IO occurring. The charts depicting the ratios of each metric per row shows conclusively the linear growth of both query types have equal performance.

Finally in the Row Concatenation scenario that more accurately mimics a real world scenario for both query types we see the Recursive CTE really shinning. Now only is it substantially more efficient with its CPU usage, but we see almost 200% better efficiency on duration. Like before though there are substantially more reads with the Recursive CTE, but again these are from buffer cache and no actual disk IO is occurring. The data also suggests that additional rows of data would see the Recursive CTE actually out perform the While Loop in terms of Reads as well.

Hopefully this performance analysis will help to dispel the misconceptions about Recursive CTEs and their performance.

Since there is a lot of results for the performance tests I’ve attached the full performance results that includes additional charting and metrics as well as the full break out of the data. Each test was run three times and then an average taken. Additional below is the full code samples used for testing.

Recursive CTE vs While Loop – Data Generator – Performance Analysis

Recursive CTE vs While Loop – Row Concatenator Performance Analysis


 

Data Generation Scenario:

Generates N rows of parent child key combinations using N iterations/recursions.

Recursive CTE

SET NOCOUNT ON;

DECLARE

@Iteration INT = 0

, @MaxIterations INT = 10000000

, @Partitioner INT = 1

, @ParentKey INT = 0

, @ChildKey INT = 0;

CREATE TABLE #GeneratedData

(

[ParentKey] INT

, [ChildKey] INT

);

————————————————————————————————————–

WITH

[GeneratedData] AS

(

SELECT

@Iteration AS [Iteration]

, @Partitioner AS [Partitioner]

, CASE

WHEN @Iteration % @Partitioner = 0

THEN @Iteration

ELSE @ParentKey

END AS [ParentKey]

, CASE

WHEN @Iteration % @Partitioner = 0

THEN @Partitioner 1

ELSE @Iteration % @Partitioner

END AS [ChildKey]

UNION ALL

SELECT

GD.[Iteration] + 1 AS [Iteration]

, CASE

WHEN ((GD.[Iteration] + 1) % GD.[Partitioner]) = 0

THEN ((GD.[Partitioner] + 1) % 30) + 1

ELSE GD.[Partitioner]

END AS [Partitioner]

, CASE

WHEN ((GD.[Iteration] + 1) % GD.[Partitioner]) = 0

THEN GD.[Iteration] + 1

ELSE GD.[ParentKey]

END AS [ParentKey]

, ((GD.[Iteration] + 1) % GD.[Partitioner]) AS [ChildKey]

FROM

[GeneratedData] AS GD

WHERE

GD.[Iteration] < @MaxIterations

)

————————————————————————————————————–

INSERT INTO

#GeneratedData

([ParentKey], [ChildKey])

SELECT

GD.[ParentKey] AS [ParentKey]

, GD.[ChildKey] AS [ChildKey]

FROM

[GeneratedData] AS GD

OPTION

(MAXRECURSION 0);

————————————————————————————————————–

DROP TABLE #GeneratedData;

 


 

 

While Loop

SET NOCOUNT ON;

DECLARE

@Iteration INT = 0

, @MaxIterations INT = 10000000

, @Partitioner INT = 1

, @ParentKey INT = 0

, @ChildKey INT = 0;

CREATE TABLE #GeneratedData

(

[ParentKey] INT

, [ChildKey] INT

);

————————————————————————————————————–

WHILE

@Iteration < @MaxIterations

BEGIN

————————————————————————————————————–

INSERT INTO

#GeneratedData

([ParentKey], [ChildKey])

SELECT

CASE

WHEN @Iteration % @Partitioner = 0

THEN @Iteration

ELSE @ParentKey

END AS [ParentKey]

, CASE

WHEN @Iteration % @Partitioner = 0

THEN @Partitioner 1

ELSE @Iteration % @Partitioner

END AS [ChildKey];

————————————————————————————————————–

SELECT

@Iteration = @Iteration + 1

, @Partitioner =

CASE

WHEN (@Iteration % @Partitioner) = 0

THEN ((@Partitioner + 1) % 30) + 1

ELSE @Partitioner

END

, @ParentKey =

CASE

WHEN (@Iteration % @Partitioner) = 0

THEN @Iteration + 1

ELSE @ParentKey

END

, @ChildKey = (@Iteration % @Partitioner);

END;

————————————————————————————————————–

DROP TABLE #GeneratedData;


 

Performance Results

Recursive CTE vs While Loop - Data Generator - Performance Analysis - Duration

Recursive CTE vs While Loop - Data Generator - Performance Analysis - CPU

Recursive CTE vs While Loop - Data Generator - Performance Analysis - Reads

Recursive CTE vs While Loop - Data Generator - Performance Analysis - Charts

 


 

 

Row Concatenation Scenario:

Concatenates all child keys for each parent key. This uses the data generation query from before to populate the source data and as such the maximum number of child keys is 30 but the actual number varies for each parent key to mimic real world scenarios.

Recursive CTE

SET NOCOUNT ON;

DECLARE

@Iteration INT = 0

, @MaxIterations INT = 10000000

, @Partitioner INT = 1

, @ParentKey INT = 0

, @ChildKey INT = 0;

CREATE TABLE #GeneratedData

(

[ParentKey] INT NOT NULL

, [ChildKey] INT NOT NULL

);

————————————————————————————————————–

WITH

[GeneratedData] AS

(

SELECT

@Iteration AS [Iteration]

, @Partitioner AS [Partitioner]

, CASE

WHEN @Iteration % @Partitioner = 0

THEN @Iteration

ELSE @ParentKey

END AS [ParentKey]

, CASE

WHEN @Iteration % @Partitioner = 0

THEN @Partitioner 1

ELSE @Iteration % @Partitioner

END AS [ChildKey]

UNION ALL

SELECT

GD.[Iteration] + 1 AS [Iteration]

, CASE

WHEN ((GD.[Iteration] + 1) GD.[Partitioner]) = 0

THEN ((GD.[Partitioner] + 1) % 30) + 1

ELSE GD.[Partitioner]

END AS [Partitioner]

, CASE

WHEN ((GD.[Iteration] + 1) % GD.[Partitioner]) = 0

THEN GD.[Iteration] + 1

ELSE GD.[ParentKey]

END AS [ParentKey]

, ((GD.[Iteration] + 1) % GD.[Partitioner]) AS [ChildKey]

FROM

[GeneratedData] AS GD

WHERE

GD.[Iteration] < @MaxIterations

)

————————————————————————————————————–

INSERT INTO

#GeneratedData

([ParentKey], [ChildKey])

SELECT

GD.[ParentKey] AS [ParentKey]

, GD.[ChildKey] AS [ChildKey]

FROM

[GeneratedData] AS GD

OPTION

(MAXRECURSION 0);

————————————————————————————————————–

ALTER TABLE #GeneratedData

ADD PRIMARY KEY CLUSTERED

(

[ParentKey]

, [ChildKey]

);

GO

————————————————————————————————————–

CREATE TABLE #SortedData

(

[ParentKey] INT

, [ChildKey] VARCHAR(4000)

, [Order] INT

, [ReverseOrder] INT

PRIMARY KEY

(

[Order]

, [ParentKey]

)

);

CREATE TABLE #ConcatenatedData

(

[ParentKey] INT

, [ChildKeys] VARCHAR(4000)

);

————————————————————————————————————–

INSERT INTO

#SortedData

([ParentKey], [ChildKey], [Order], [ReverseOrder])

SELECT DISTINCT

GD.[ParentKey] AS [ParentKey]

, GD.[ChildKey] AS [ChildKey]

, ROW_NUMBER() OVER (PARTITION BY GD.[ParentKey] ORDER BY GD.[ParentKey], GD.[ChildKey]) AS [Order]

, ROW_NUMBER() OVER (PARTITION BY GD.[ParentKey] ORDER BY GD.[ParentKey], GD.[ChildKey] DESC) AS [ReverseOrder]

FROM

#GeneratedData AS GD;

————————————————————————————————————–

WITH

[ParentKeys] AS

(

SELECT

SD.[ParentKey] AS [ParentKey]

, SD.[ChildKey] AS [ChildKeys]

, SD.[Order] + 1 AS [NextOrder]

, SD.[ReverseOrder] AS [ReverseOrder]

FROM

#SortedData AS SD

WHERE

SD.[Order] = 1

UNION ALL

SELECT

SD.[ParentKey] AS [ParentKey]

, CAST((PK.[ChildKeys] + ‘,’ + SD.[ChildKey]) AS VARCHAR(4000)) AS [ChildKeys]

, SD.[Order] + 1 AS [NextOrder]

, SD.[ReverseOrder] AS [ReverseOrder]

FROM

[ParentKeys] AS PK

INNER JOIN

#SortedData AS SD ON

PK.[NextOrder] = SD.[Order]

AND PK.[ParentKey] = SD.[ParentKey]

)

————————————————————————————————————–

INSERT INTO

#ConcatenatedData

([ParentKey], [ChildKeys])

SELECT

PK.[ParentKey] AS [ParentKey]

, PK.[ChildKeys] AS [ChildKeys]

FROM

[ParentKeys] AS PK

WHERE

PK.[ReverseOrder] = 1;

————————————————————————————————————–

DROP TABLE #GeneratedData;

DROP TABLE #SortedData;

DROP TABLE #ConcatenatedData;


 

 

While Loop

SET NOCOUNT ON;

DECLARE

@Iteration INT = 0

, @MaxIterations INT = 10000000

, @Partitioner INT = 1

, @ParentKey INT = 0

, @ChildKey INT = 0;

CREATE TABLE #GeneratedData

(

[ParentKey] INT NOT NULL

, [ChildKey] INT NOT NULL

);

————————————————————————————————————–

WITH

[GeneratedData] AS

(

SELECT

@Iteration AS [Iteration]

, @Partitioner AS [Partitioner]

, CASE

WHEN @Iteration % @Partitioner = 0

THEN @Iteration

ELSE @ParentKey

END AS [ParentKey]

, CASE

WHEN @Iteration % @Partitioner = 0

THEN @Partitioner 1

ELSE @Iteration % @Partitioner

END AS [ChildKey]

UNION ALL

SELECT

GD.[Iteration] + 1 AS [Iteration]

, CASE

WHEN ((GD.[Iteration] + 1) % GD.[Partitioner]) = 0

THEN ((GD.[Partitioner] + 1) % 30) + 1

ELSE GD.[Partitioner]

END AS [Partitioner]

, CASE

WHEN ((GD.[Iteration] + 1) % GD.[Partitioner]) = 0

THEN GD.[Iteration] + 1

ELSE GD.[ParentKey]

END AS [ParentKey]

, ((GD.[Iteration] + 1) % GD.[Partitioner]) AS [ChildKey]

FROM

[GeneratedData] AS GD

WHERE

GD.[Iteration] < @MaxIterations

)

————————————————————————————————————–

INSERT INTO

#GeneratedData

([ParentKey], [ChildKey])

SELECT

GD.[ParentKey] AS [ParentKey]

, GD.[ChildKey] AS [ChildKey]

FROM

[GeneratedData] AS GD

OPTION

(MAXRECURSION 0);

————————————————————————————————————–

ALTER TABLE #GeneratedData

ADD PRIMARY KEY CLUSTERED

(

[ParentKey]

, [ChildKey]

);

GO

————————————————————————————————————–

DECLARE

@Iteration INT = 0

, @MaxIterations INT = 0;

————————————————————————————————————–

CREATE TABLE #SortedData

(

[ParentKey] INT

, [ChildKey] VARCHAR(4000)

, [Order] INT

, [ReverseOrder] INT

PRIMARY KEY

(

[Order]

, [ParentKey]

)

);

CREATE TABLE #ConcatenatedData

(

[ParentKey] INT

, [ChildKeys] VARCHAR(4000)

);

————————————————————————————————————–

INSERT INTO

#SortedData

([ParentKey], [ChildKey], [Order], [ReverseOrder])

SELECT DISTINCT

GD.[ParentKey] AS [ParentKey]

, GD.[ChildKey] AS [ChildKey]

, ROW_NUMBER() OVER (PARTITION BY GD.[ParentKey] ORDER BY GD.[ParentKey], GD.[ChildKey]) AS [Order]

, ROW_NUMBER() OVER (PARTITION BY GD.[ParentKey] ORDER BY GD.[ParentKey], GD.[ChildKey] DESC) AS [ReverseOrder]

FROM

#GeneratedData AS GD;

————————————————————————————————————–

SELECT

@Iteration = 2

, @MaxIterations = MAX(SD.[Order]) + 1

FROM

#SortedData AS SD;

————————————————————————————————————–

INSERT INTO

#ConcatenatedData

([ParentKey], [ChildKeys])

SELECT

SD.[ParentKey] AS [ParentKey]

, SD.[ChildKey] AS [ChildKeys]

FROM

#SortedData AS SD

WHERE

SD.[Order] = 1;

————————————————————————————————————–

WHILE @Iteration < @MaxIterations

BEGIN

————————————————————————————————————–

UPDATE

#ConcatenatedData

SET

[ChildKeys] = CAST((CD.[ChildKeys] + ‘,’ + SD.[ChildKey]) AS VARCHAR(4000))

FROM

#ConcatenatedData AS CD

INNER JOIN

#SortedData AS SD ON

CD.[ParentKey] = SD.[ParentKey]

AND SD.[Order] = @Iteration;

SET @Iteration = @Iteration + 1;

END;

————————————————————————————————————–

DROP TABLE #GeneratedData;

DROP TABLE #SortedData;

DROP TABLE #ConcatenatedData;


 

 

Performance Analysis

Recursive CTE vs While Loop - Row Concatenator - Performance Analysis - Duration

Recursive CTE vs While Loop - Row Concatenator - Performance Analysis - CPU

Recursive CTE vs While Loop - Row Concatenator - Performance Analysis - Reads

Recursive CTE vs While Loop - Row Concatenator - Performance Analysis - Charts

Advertisements

, , , , , , , , ,

1 Comment

What is Business Intelligence?

Business Intelligence is Intuitive Actionable Data. Intuitive because a person should not need to look at the data for long periods of time to understand the data, and Actionable because the data should empower its user to take an action.

Business intelligence data is an elevator pitch to a user, it needs to be communicated concisely with brevity. If the user can’t understand the data or what they are supposed to do with it inside of thirty seconds, then the data is not intuitive. Just like an elevator pitch business intelligence data should be summarized to communicate the right information quickly. For instance, if I want to know how much I spent on gas this month commuting, I could look at all the transactions on my bank statement adding up the gas station transactions. However, that is not intuitive, as it requires me to dig through a mountain of data to find the desired information. Whereas if I had a summarized statement that gave me the total amount spent that month across a handful of categories, I could immediately spot the gas category and know if it was a good, bad, or ugly month for commuting to work.

In addition to brevity, business intelligence data needs to communicate an idea, not answer a question. A question is always narrow in scope and context, hence the answer to a question is equally narrow in scope and context, and thus an answered question tends to spawn more questions. Likewise an idea will spark additional ideas but a person with an idea is empowered to take an action, where a person with a question is blocked until they have an answer. So by setting a goal of communicating an idea instead of answering a question, business intelligence data can better empower users and avoid the spiral of questions. Going back to my previous example, my monthly bank statement should be designed to communicate the current state of my finances. Whether I’m trying to determine how much my commute costs each month, what is my monthly revenue, or can I afford a new Lexus, my bank statement can answer all these questions because it is communicating the idea of my financial state instead of trying to answer an individual question.

Business intelligence data should also be actionable. A user must be able to look at data and know what action to take. Data that does not drive a user to action is not useful data. When I look at my monthly bank statement I should immediately understand the current state of my finances and be driven to take any action needed to meet my financial goals. This can only occur if my bank statement is designed to be actionable, presenting useful data that clearly, concisely, and with brevity tells me the state of my finances. A list of transactions does not do this. A list of transactions will force me to sit and stare at my statement trying to mentally summarize the data I believe is relevant. On the other hand if my bank statement has summarized data about my total deposits, total withdrawals, the existing balance, and net total, I can easily assess my financial situation and determine my next action toward meeting my financial goals.

Intuitive Actionable Data, that’s business intelligence. By making data adhere to that simple fundamental, we can add value, increase productivity, and ultimately build a better business.

, ,

4 Comments

%d bloggers like this: