Monday, July 4, 2011

MDX: Query Performance ORDER using Val

One of my colleague working on a different project had an issue with performance of a query. The query looked something like this

Key Points: It has an ORDER BY clause, the member used for ordering converts the string value to Int using Val.

SELECT 
{[Measures].[Answer Count]} on 0,
NONEMPTY(
[Location].[Region].[Region] *
ORDER
(
[Location].[Retailer Store Number].[Retailer Store Number],
Val(
[Location].[Retailer Store Number].CURRENTMEMBER. MEMBER_CAPTION
)
)
,[Measures].[Answer Count]) on 1
FROM 
cdw 
WHERE [Visit Start Date].[Last 1 Week].&[Current Week]


Limitations 
One does not have permission to change the structure of the cube as client has not given permission. Data type for store number is String. The store number stores integer values but yet the data type is String. Hence ORDER was @MDX

Initially I suggested the following

WITH
SET OrderedStoreNum AS
Order
(
    NonEmpty
    (
        [Location].[Retailer Store Number].[Retailer Store Number]
        ,[Measures].[Answer Count]
    )
    ,Cint([Location].[Retailer Store Number].CurrentMember.Member_Caption)
    ,BASC
)
SELECT
{
    [Measures].[Answer Count]
} ON 0
,NonEmpty
(
    [Location].[Region].[Region] * OrderedStoreNum
    ,[Measures].[Answer Count]
) ON 1
FROM [cdw];

Reasoning for above suggestion was as follow
It is better to do an "order by" on a set independent of the current context and then do    
cross join.
           Additionally filter out Nulls before doing order by and reduce the size of set
 Use CInt instead of Val directly and not let AS decided the base data type it needs to  
 convert to under the hood. (Val does not throw an error if value is a string value whereas 
 CInt will throw an error  Type Mismatch.)
 Resort to BDESC rather than DESC. BDESC does a simple sort and not a hierarchical sort

Additionally I landed up in this article:- .http://sqlblog.com/blogs/mosha/archive/2008/10/22/optimizing-mdx-aggregation-functions.aspx which speaks about not using the NonEmpty Clause within the aggregate function. But in our case this does not apply as the order by is not an aggregate function.

To substantiate this and dig more into it, found some interesting stuff. Hrvoje helped me in this as he had the required volume of dataset.

So here were the readings

The fact table contains around 3M rows. The dimension contains around 249,607 members: [Dim1].[Attr1].[Attr1]. Dim1].[Attr2].[Attr2] contains around 160 members. [Dim1].[Attr1] is an hierarchy that has 7 levels.

a:- this is the time (mm:ss) for query created using set
b:- this is the time (mm:ss) for query created using ORDER directly in the cross join.

Query 1: 
a) Time 07:51
b) Time 07:51 

WITH
SET setL AS
ORDER
(
      [Dim1].[Attr1].[Attr1]
      ,Val([Dim1].[Attr1].CurrentMember.MEMBER_Key) -- Key is SK of int type
      ,DESC
)

SELECT
      [Measures].[M1] ON 0,
      NON EMPTY [Dim1].[Attr2].[Attr2] * setL ON 1
FROM
      [Cube1]



Query 2: 
a) Time 03:57  
b) Time 03:58 

WITH
SET setL AS
ORDER
(
      NonEmpty([Dim1].[Attr1].[Attr1], [Measures].[M1])
      , Val([Dim1].[Attr1].CurrentMember.MEMBER_Key)
      ,DESC
)

SELECT
      [Measures].[M1] ON 0,
      NON EMPTY [Dim1].[Attr2].[Attr2] * setL ON 1
FROM
[Cube1]


Query 3:
a) Time 01:06
b) Time 01:04

WITH
SET setL AS
ORDER
(
      NonEmpty([Dim1].[Attr1].[Attr1], [Measures].[M1])
      ,Int([Dim1].[Attr1].CurrentMember.MEMBER_Key)
      ,DESC
)

SELECT
      [Measures].[M1] ON 0,
      NON EMPTY [Dim1].[Attr2].[Attr2] * setL ON 1
FROM
[Cube1]



Query 4:
a) Time 00:46
b) Time 00:39

WITH
SET setL AS
ORDER
(
      NonEmpty([Dim1].[Attr1].[Attr1], [Measures].[M1])
      ,[Dim1].[Attr1].CurrentMember.MEMBER_Caption
      ,DESC
)

SELECT
      [Measures].[M1] ON 0,
      NON EMPTY [Dim1].[Attr2].[Attr2] * setL ON 1
FROM
[Cube1]



Conclusion
1. Using Val is a big culprit. Int is better than Val. But changing the source data type is the best.
2. It is better to use NonEmpty within ORDER().
3. Though there is no considerable difference between the creating a set versus using ORDER() directly in cross join, i think it is better practice to do so.
4. The observation is when both the attributes are belonging to same dimension. I assume that the behavior would be same when we use attributes of different dimension in a cross join
5. I could not do much analysis on BDESC versus DESC.


Note: This query difference may not be substantial if the dimension and measure data is small.

LinkWithin

Related Posts with Thumbnails