Tuesday 22 February 2011

Analysis Services Slicing

I must admit that I never cease to be amazed how you can work with a product for years and still every now and again a little nugget appears that makes you go "Really? Does it really work like that?"

I had one of those moments recently whilst answering posts on the Analysis Services forum regarding why one query hit only one partition whilst another query hit all the partitions in a measure group. You can read about all of this here: http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/b4b4f28a-2ddd-4335-91eb-1732b371eb74 where Greg Galloway provides some excellent insight into all of this.

So, for the most part people know about partitioning in Analysis Services and if not, someone at some point on the forums will mention that "you really should be partitioning your data" etc. This eventually evolves into "oh yes and if you are partitioning you should be setting the slice as well" then queue many links to various articles on slicing etc. and the world is good. Or is it...

I will admit that I have been one of those citizens posting about the virtues of partitioning and slicing whilst apparently being blissfully (and dangerously) ignorant of the real mechanics of slicing. As far as I can tell, all Microsoft documentation appears to indicate that "ProcessIndexes" is required for aggregations but nowhere does it state that it is required for slicing too. For those of you that are now going "huh, you what?", yessssssss, welcome to my world. Note too, how I am deliberately ignoring the debate regarding whether or not one should manually set the slice instead of relying on auto-slicing, as funnily enough manual slicing is affected too, albeit in a different way.

Essentially it boils down to this. If you are relying on auto-slicing (either by design or merely for those attributes where it may just happen to only contain a small subset of members) then you need to guarantee that you call "ProcessIndexes" (which is naturally "called" automatically when using "ProcessFull") else auto-slicing will not take place. Yup, that's the fundamental point here. Auto-slicing, or at least the capturing of the min and max data IDs for an attribute, does not happen whilst processing the data into the partition but rather during the processing of indexes. Similarly, if you set the slice on a partition, unless you call "ProcessIndexes" on the partition, the data it contains will not be validated against the slice and thus you can actually process incorrect data into a partition.

I guess this makes sense from a programming perspective, in that tracking every value whilst processing data  into the partition would make the act of processing more expensive. However, when processing indexes this is essentially what this part of processing is doing anyway I would suspect and thus it makes sense to happen here. Note that due to this, slicing is also governed by the IndexBuildThreshold property too. The problem is, other than this link http://cwebbbi.wordpress.com/2007/11/08/viewing-partition-slice-information I have not seen anything else that indicates that you need ProcessIndexes in order to make slicing work completely.

Note also, that it is often stated in Microsoft performance tuning articles to use ProcessData followed by ProcessIndexes instead of the generic ProcessFull. However, once again it is usually stated that ProcessIndexes is required in order to rebuild aggregations and nowhere does it mention slicing.