BI Log: April 2011

Tuesday, 26 April 2011

Implementing Analysis Services Process Add Functionality – Part 1

I think most people who have used Analysis Services know about the ability to incrementally process dimensions using “ProcessAdd” and let’s face it, without the ability to partition dimensions it’s about the only data-reducing optimisation we can perform on dimensions. Sure, we have “by table” processing versus “by attribute” processing, the ability to process attributes in parallel and we can make sure that the relational layer can produce results quickly by either using snow-flaked tables or star-schema tables with indexes etc. However, we are still bound by the amount of time taken to process a given set of rows, where a subset of this information may already have been previously processed.

To this end we can use incremental dimension processing, i.e. the “ProcessAdd” processing type, which aims to reduce the amount of time taken to process a dimension by only processing new information into the dimension. Clearly changes in hierarchy relationships will still require “ProcessUpdate” or “ProcessFull” as the entire dimension data set will need to be reprocessed in order to identify and reflect any hierarchy changes.

However there are some fundamental limitations with regards how fast incremental dimension processing can go…

1. Analysis Services performs transactional processing using files on the file system. As such, when starting a transaction its first task is to take a copy of all relevant files. It then starts to perform processing and if processing is successful the newly processed file is used. Based on this we can see that the amount of time taken to process must include at least the amount of time taken to clone all relevant files for the dimension. This can be a non-trivial amount of time on large dimensions.
I guess an enhancement request could be to have Analysis Services clone existing dimension files using a background thread some time after processing. A per dimension configuration would be ideal so that we don’t automatically double disk space and waste resources on small dimensions.

2. In order to use incremental processing on dimensions there appears to be two general forms of implementation. There’s the method that presents only the new rows to Analysis Services during processing (the intended but clearly more arduous method) and the one that presents all rows but changes the error configuration to ignore duplicate errors. I consider the latter option a bit of a hack to be honest and not in the true spirit of incremental processing.

3. Even after providing only new rows, Analysis Services needs to ensure that attributes which have ordering defined based on some other attribute are updated to reflect this. Clearly, if ordering has been defined on an attribute with many members, the act of Analysis Services revisiting this set could take a considerable amount of time.

From what I’ve seen, the implementation of Analysis Services “ProcessAdd” on dimensions seems somewhat diverse and I’m not aware of anyone documenting a definitive guide to implementing this. Now I certainly don’t intend these posts to be considered a definitive guide but more of a starting point for discussion. To this end, I’ll try to cover off how I see “ProcessAdd” working with regards the relationship between Analysis Services and SQL Server. Don’t get me wrong, others have most definitely blogged about “ProcessAdd” and you can see Greg Galloway’s blog for more information on ProcessAdd performance metrics here: http://www.artisconsulting.com/blogs/greggalloway/Lists/Posts/Post.aspx?List=83c71d76-a4d8-4197-9257-38d6b857634f&ID=7. There are others out there like this too. However, these posts tend to focus more on the Analysis Services side, detailing the performance characteristics or the options available such as Out-of-Line DSV’s etc. but not much with regards how to actually go about integrating the relational layer and Analysis Services with incremental processing of dimensions in mind.

So what are the challenges with integrating Analysis Services with SQL Server?

1. Well, first we need to be able to identify new rows for every attribute. Note that it is insufficient to only identify new “key” rows as each attribute in a dimension is processed separately. So it is well possible, and indeed likely, that any new “key” row will be related to attribute members that have already been processed.

2. Once we have the ability to identify new rows for every attribute, how do we only present these new rows to Analysis Services when processing the dimension?

Delving into the second point a bit more, we should note that this tends to be a bit of a double-edged sword. We could use an out-of-line DSV to point Analysis Services to some object(s) to present only new rows but the use of an out-of-line DSV requires a connection to be specified too. Now as soon as you specify a connection in an XMLA script you lose the option to allow Analysis Services to process under the service account. So you either have to specify a username/password combination in the XMLA connection configuration, not nice, or use Kerberos and ensure that the account submitting the process request has sufficient privileges both in Analysis Services and in SQL Server. We could also opt to just configure the base objects upon which the dimension is built to only present new rows, allowing the use of the service account during processing. However, if you do this then how do you go about performing a “ProcessUpdate” or “ProcessFull” on an adhoc basis in order to fix a rigid relationship issue or to refresh the hierarchies?

In my next post (here) I’ll try to cover off one solution to this problem. Clearly this is not going to be straightforward or risk-free but hopefully it’s a starting point.

Please find part 2 here.

Sunday, 17 April 2011

Configure Reporting Services Database on a SQL Server with non-default port

I know this is “so last year” when talking about SQL Server 2008 Reporting Services R1 but I figured I’d talk about it anyway, as it may apply to R2 as well. I haven’t checked so please feel free to leave a comment if you can confirm one way or the other.

I am also aware that it has been a long time since we discovered this problem and that there is now another more elegant solution to the problem as well. However, I figured I would provide the information on how we solved this same issue for reference.

So, given a SQL Server instance where the SQL Browser service is disabled (or cannot be reached) and/or SQL Server is using a non-default port number, when attempting to generate the “rights” script whilst configuring Reporting Services you get the following error:

System.Net.Sockets.SocketException: No such host is known

at System.Net.Dns.GetAddrInfo(String name)

at System.Net.Dns.InternalGetHostByName(String hostName, Boolean includeIPv6)

at System.Net.Dns.GetHostEntry(String hostNameOrAddress)

at ReportServicesConfigUI.RSDatabase.IsLocalDbServer(String dbServer)

at ReportServicesConfigUI.RSDatabase.GrantRSConnectionRights()

at ReportServicesConfigUI.RSDatabase.CreateNewDatabase()

In our case, our UAT environment has the SQL Browser service disabled and we do not use the default port for SQL Server, both for security reasons I suspect. As such we need to specify our SQL Server as “servername.domain,port” which in the general case is fine. Even when creating the actual databases whilst configuring Reporting Services this is fine. However, it would appear that when generating the rights script, this form of connection string is not parsed into the individual server name and port number components. Really bizarre when one part of the product does and the other does not, especially when it is in the same part of the configuration!

For further confirmation that the connection string is not parsed, see:

http://blogs.msdn.com/b/psssql/archive/2009/03/05/socketexception-when-creating-a-new-report-server-database.aspx

So after much Googling at the time, we discovered the link above, as well as the following one, which at the time did not include a solution.

http://blogs.msdn.com/b/sameersurve/archive/2009/06/15/no-such-host-is-known-error-when-configuring-ssrs-2008.aspx

However, from both of those links we did get enough information in order to determine what we needed to do. Still, I’m not sure why specifying the instance name as well as the port number works (as detailed in the second of those links) over just the port number as that’s a bit weird, but if it works, hey why not use it. Note: I haven’t tested that the instance name and port number does actually work.

In our case though, we went with the *nix philosophy that you can always do more with the command line tools than you can with the GUI. So we did that. Essentially we used the GUI to configure Reporting Services as far as it would let us, including setting up the databases and allowing it to fail when it tried to configure the “rights scripts”. We then switched to using the command line tools by connecting as the relevant user and then ran the following:

rsconfig -c -m <SERVER_NAME> -s <DATABASE_NAME,PORT> -d reportserver -a Windows

Where <RS_SERVER_NAME> is the server name for the Reporting Services server and <DATABASE_NAME,PORT> is the database name and port number for the SQL Server where we were trying to install the RS databases.

After running the command, we restarted Reporting Services, went back into the configuration tool to apply the changes there and continued.

Clearly this is a bit of a hack, at best, and most likely not supported but in our case it worked and we were up and running after a couple of hours of head scratching and bemoaning the insanity that is Reporting Services.

Wednesday, 13 April 2011

SQL Server - PARSENAME Function

Similar to the QUOTENAME function, another one of those often unknown or overlooked functions is the PARSENAME function. I’ll confess that I’m not entirely sure why it isn’t PARSE_NAME, inline with other functions such as OBJECT_NAME etc. but then I guess the string functions all seem to be concatenated. Anyhoo, let’s ignore all of that for now.

The beauty of this function is that it allows you to get the individual components of a given 4-part object name presented as a string. So, given '[ServerNameFoo].[DatabaseBar].[SchemaBaz].[ObjectCaz]' we can get the server name component or the database component if we want, maybe to ensure that what we are evaluating exists in the current database for example. We can also get the schema or object names too. I agree that its usefulness is probably limited to esoteric edge cases but it is one of those functions that when you need it, it’s really handy. Especially when you consider trying to implement this functionality yourself and catering for object names with or without square brackets or full stops etc. in their name.

To use the function you would write something like:

SELECT PARSENAME('server.database.schema.object', 2);

This would return the value “schema”. Note that the function doesn’t evaluate whether or not the object exists and doesn’t care. All it does is return the requested component of the string if it is available else NULL. For an example of it in use, have a look at the internals of the sp_spaceused stored procedure. (i.e. Execute sp_helptext 'sp_spaceused'.)

Also note that there are some articles on the web about using this function in order to split strings for other reasons (eg. IP Addresses) which could be quite a neat solution. (See: http://www.sqlteam.com/article/using-the-parsename-function-to-split-delimited-data) However, remember that the function is SQL Server delimiter aware so you need to be careful. Running the function on the string 'foo.[bar]' to return the object name component will return "bar" and not "[bar]".

I’ll be honest and say that I’ve never really had a reason to use it but at least I know it’s there if I do. And now you do too. :-)