Thursday, 17 December 2009

SQL Server 2008 SP1 - Cannot drop database because it is currently in use.

Recently I came across an issue in SQL Server 2008 SP1 where I was trying to rebuild our DEV databases from script (we have Nant scripts that perform a continuous build process so we know that they work) but during the rebuild the process failed because one of the databases could not be dropped. At first I thought that there must be existing connections to the database but a quick review of the output of "sp_who2" showed no such thing.

Anyway, to shortcut the whole build process I figured I would just use SQL Server Management Studio to manually drop the database and enable the "force close existing connections" option. However, this too failed saying that the database was currently in use, even with the "force close" option set and even more bizarrely once again when I checked the output of "sp_who2", I could not see any connections to the particular database in question. Restarting the SQL Server service didn't help either.

A quick Google search on this error message produced quite a few hits (as you probably know if you found this page that way), a lot of which seemed to indicate that the way to resolve this is to alter the database into single user mode with the immediately rollback option set. Using this produced no joy for me either, which wasn't really surprising considering that there appeared to be no connections in the first place.

In the end, this is how I solved it… You need to get SQL Server to tell you who is blocking the drop command. Not rocket science is it, once you know the answer. ;-) So…

  • First of all, using SQL Server Management Studio, open a query window on to the server and ensure that it is connected to the [master] database.
  • Next, also using SQL Server Management Studio, right click on the database in question and select the option to drop the database and enable the "close existing connections" option. Click OK.
  • While this is trying to drop the database, quickly switch to the open query window and run "sp_who2". Look through the result set for the row with a status of "SUSPENDED" and note the value in the "BlkBy" column.
  • Also in the query window, execute the kill command to kill the spid with the value noted in the point above.
And there you go, the database will have been dropped as soon as you killed the SPID.

Wierd eh?

Monday, 22 June 2009

Analysis Services 2005 - Dynamic Default Member

Whilst migrating a database from AS2000 to AS2005 I came across in interesting issue when trying to replicate the same behaviour as the "legacy" version. Firstly, a little background...

For the sake of expediency I used the migration wizard to convert most of the AS2000 database over to AS2005. I say most because although the validation checker stated that everything was a-okay, when it came to actually do the migration the wizard whinged about a dimension that was joined to different cubes at different levels within the dimension. In order to progress this, I deleted the dimension and migrated the rest.

So, what I was left with was a pretty complete system that, although not overly complex, it wasn't trivial either. The database contained a number of cubes and virtual cubes etc. and they contained a financial line dimension making use of unary operators and custom MDX for specific members. When designing the AS2000 database originally, I specifically avoided calculated cells so there were none of those but it did contain quite a number of calculated members. I must say that I was actually quite impressed with the conversion process once it worked. Certainly the dimensions were migrated quite well, though it made a bit of a mess of migrating the "alternate hierarchy" dimensions. I'm also not a big fan of using linked measure groups etc. so I'll probably back those out into a series of cubes using multiple measure groups etc. but I digress.

Whilst going through all the dimensions, to compare the migrated version to the original version, I noticed that the [Currency] dimension had not been migrated to be the same as the original. The dynamic default member definition was missing and needed to be added back. Now, based on AS2000 functionality, the [Currency] dimension had the default member defined using the "Filter" function to filter the members of the dimension based on a "Default Flag" member property such that when the member property was "1" it identified the default member.

In AS2005 the preferred method is to use the "Exists" function and it performs a lot better than filtering. Indeed, there is also a known issue in AS2005/8 where use of the "Properties" function when specifying a default member causes heap corruption during processing so it should be avoided. Now the interesting bit, The currency dimension does not have an "all" level so I set the "IsAggregatable" property to false, changed the "Default Flag" attribute's "AttributeHierarchyEnabled" property to true, to change it from a member property to a true attribute and set about deploying and processing the revised dimension. All of that resulted in the following error:
Errors in the metadata manager. The 'Currency' attribute with IsAggregatable=false must not have any relationship to 'Default Flag' attribute that has AttributeHierarchyEnabled=true

So, rather interestingly, you cannot have attribute 'x' related to attribute 'y' if attribute 'y' has its "IsAggregatable" property set to false. Converting attribute 'x' to a member property serves to allow the dimension to deploy but prevents the use of the attribute in the "Exists" function. One would have to revert back to the use of the "Filter" function which we really need to avoid. Now I'm not sure why this limitation exists, it just does. So how to work around it?

Well what I did, was to create a copy of the key attribute and have it related to the key. I changed the attribute relationship to "Rigid", as this should never change, and the cardinality to "one to one" even though this latter property isn't used in AS2005. Then, on the key attribute, I set the "IsAggregatable" property to true, the "AttributeHierarchyVisible" property to false and left the "Default Flag" attribute relationship on the key. The key also had to be renamed so I added an underscore to the end of it.

On the copied attribute (i.e. the copy of the key) I renamed it to the name of the key attribute but without the underscore, set the "IsAggregatable" property to false and defined the default member with MDX using the "Exists" function and the "Default Flag" attribute as follows:
Exists([Currency].[Currency].[Currency].Members, [Currency].[Default Flag].&[1]).Item(0)

This appears to do the trick at the expense of duplicating the attribute, providing a dynamically set default member on an attribute that is not aggregatable.

I hope this helps others out there who may face the same issue and hey, if you know why you can't do this straight on the key, please enlighten me!

Monday, 11 May 2009

Installing Subversion from source on CentOS 5.2

Okay so this not strictly BI related but hey, most of us BI types have a slight geek inclination don't we, or is that just me?

I guess the main reason for blogging about this is that I really didn't expect this to be that hard. I mean, I've been installing Subversion from source since about Fedora Core 3 without issues (other than when the Subversion/Berkley DB integration underwent some changes that shafted access to my repositories) so I kinda expected the same here. Um, wrong! At least this time the issue related to a fresh install rather than suddenly finding that I can no longer access my repositories. Needless to say I now use FSFS rather than Berkley DB as the back-end but still. A friend of mine also works on FSFS (see: http://www.farside.org.uk/200704/tree_structured_fsfs) so he'd been going on at me for a while to switch. Still took that heartache to make me use FSFS though, groan...

Anyway, so a while back (meant to blog about this a long time ago but based on how long it has been since my last blog, I can see that it really was a loooong time ago) I set about compiling Subversion from source code. I guess there are many reasons for installing Subversion from source code but that's immaterial here. Needless to say that I started and soon came to realise after running ./configure that there were some missing dependencies when trying to enable SSL. After a little bit of searching and running rpm -qa etc. to find if certain libraries were installed I found that my install of CentOS was missing the openssl-devel library. So, a quick install using yum and I was away, or was I?

Okay, so maybe not. After installing the missing library another error occurred during the ./configure script run. This particular issue, however, is documented in the Subversion documentation and requires you to amend the parameters passed to the configure command in order to have the information passed through to the "sub-configure" scripts. Rather bizarrely you need to specify the --with-libs parameter, specifying "/usr/kerberos" as the value. (i.e. --with-libs=/usr/kerberos)

I guess the part that really got me for a while was the fact that when you run the configure script, passing in the above mentioned switch, a message is generated on the first line stating: "configure: WARNING: unrecognized options: --with-libs" however this message can be safely ignored. That little tidbit of information is not specified in the documentaion. So, if you leave the script to run it should actually run to completion, barring any other issues naturally.

Long and short of it is that in order to compile Subversion with SSL enabled on CentOS 5.2 (well with the my install base anyway) I had to do the following:

# Unpack the source
tar -zxvf subversion-<version>.tar.gz
cd subversion-<version>
# Check that openssl-devel has been installed.
yum list openssl-devel
# If not, install using the following:
yum install openssl-devel
# Configure and install subversion with FSFS,
# ignoring message "configure: WARNING:
# unrecognized options: --with-libs".

./configure --without-berkeley-db --with-ssl --with-libs=/usr/kerberos

And that's all there is to it!

Tuesday, 10 March 2009

Analytic Workspace Manager Locking

So, following on from my previous blog about plugins for the AWM, I thought I would provide a little information about something I learned whilst writing my own plugin.

Writing a plugin for the AWM and testing it can be cumbersome if you have to constantly start the AWM, login and then navigate to a node in the tree that allows you to activate your plugin. No, arguably a better way is to create your own testbed to simulate the AWM environment. That way you can run your code with less effort and interaction and are also afforded the ability to step through and debug your code if, like me, you need to.

In my case I merely created a Java class that effectively created an Oracle database connection and then called my plugin code, passing in the connection and anything else that it required. This was all fine and dandy whilst developing the code and afforded me the luxury of being able to test my code with the minimum of effort. However, when I eventually placed my code into the plugin framework (which to be fair I was doing simultaneously anyway) my code, well, didn't really stop working but rather, well... just seemed to stall.

This was masked by the fact that I was using multiple threads and thus parts of the plugin, though still functioning, arguably produced incorrect results. After a little investigation (okay, okay, so more than a little and certainly more than I would have liked) I discovered the reason...

There appears to be no fine-grained locking in Oracle OLAP, certainly as far as the AWM is concerned.

So what do I mean by that? Well, just that it is geared towards a single developer updating a given analytic workspace. Multiple updates to different parts of the workspace simultaneously are not supported nor, as far as I can tell, even possible.

So what does that mean to us? Well, when the AWM connects to the analytic workspace it attaches it in "read/write" mode. This means that any plugin code can only affect the contents of that analytic workspace by using the connection provided to it when the plugin is invoked. Retrieving/creating a new connection using the data source made available in the Map object presented to the plugin means that any code executed over that connection is in fact in a different session. As such, any code intended to affect the same analytic workspace that was attached by the AWM session created when the analytic workspace was attached will be blocked and will not be able to make changes or even process any objects in that analytic workspace.

At first I thought that that was it, game over but there is a "trick". This "trick", albeit a bit of hack and one not to be taken lightly, involves the use of the DBMS_AW package . More specifically, the use of the DETACH, UPDATE and ATTACH sub-programs allowing us to detach the analytic workspace over the main connection provided by the AWM to the plugin code. We can then "re-attach" it on the new connection in read-only mode (or read/write if you want) and perform whatever actions you would like on the given analytic workspace. The UPDATE sub-program is required in order to commit any changes that have already been made to the analytic workspace whilst it was in read/write mode prior to detaching it.

Note that there is a potential issue here. Detaching the analytic workspace and re-attaching it introduces the risk that in the meantime another session locks the workspace before the plugin secures it for its own use. Also, failure to ensure that the analytic workspace is attached in read/write mode before the plugin exits breaks the environment in which the AWM expects to function. (i.e. The AWM still "thinks" that the workspace is attached in read/write mode and will allow a user to try to make changes even though this may not be the case.)

Clearly the functionality that this provides needs to be weighed in conjunction with the expected working environment and the associated risks. If the environment is inherently single-user then no harm is done. However, if there is a multi-user environment where multiple people are likely to request read/write locks on the same analytic workspace, well, then your mileage may vary. Clearly you may be able to use transactions to mitigate some, if not all, of these risks but that remains to be seen.

As mentioned previously, oh how nice it would have been to be able to lock a specific dimension or cube etc. which, based on the metadata stored in the database, appears quite possible but just not implemented. This could have opened up opportunities for multi-user development environments to allow teamed development, maintenance and testing etc. What a novel concept, though sorely lacking here...

Monday, 16 February 2009

Analytic Workspace Manager Plugin Best Practices

The Analytic Workspace Manager (or AWM for short) is the tool to use when wanting to quickly create OLAP structures in the Oracle OLAP engine embedded within the Oracle relational database. Clearly you can use OWB etc. but AWM provides a lightweight easy to use interface for creating quick demos, restructuring content or processing etc. However, what the AWM provides may well not be everything to everyone and thus it is possible to create your own plugins, using Java, to get it to do what you want.

The documentation around doing this, I found, is a little light on the ground so to speak. So I have tried to document my findings here to help those that may wish to follow suit. However, before I start I should probably point out that in no way is the following post endorsed by Oracle. These are merely my observations after coding my own plugin for the Analytic Workspace Manager.

So here goes...
Step one is to find the Developing_AWM_Plugins_11g.pdf document. When I started coding my plugin for the 11g version of AWM, I could only find the 10g document which is not quite accurate for 11g. The OLAP API has changed somewhat in 11g and this is reflected in the OLAP API interface between the AWM and user plugin code. Looking at the document, I guess the pertinent points are that the 11g version of AWM requires the plugin to be compiled with Java 5 SE. (Or version 1.5 in "old school", just why did Sun confuse matters?) We'll come back to this but suffice to say that you'll need to remember that Java 5 brings with it support for generics.

Also in the document, if you have coded for 10g plugins, you may notice that the interface method signatures have changed a little. The AW object passed into the handle method is now an oracle.AWXML.AW object. The params parameter to the handle method provides a map, where the DATAPROVIDER key provides the ability to get a connection to the underlying Oracle database without requiring username/password information.

Step two, and arguably the most important step, is the understanding that on startup the Analytic Workspace Manager loads all Java code in the plugin directory and looks for any class files that implement the AWMPlugin interface. This is also documented in the above mentioned document, however, what is not made clear is that there will naturally be a performance impact on the startup time of AWM, the more classes and JAR files that you add to the plugin directory. Also, if you use 3rd-party JAR files you are at the mercy of those maintainers.

For example, placing the "jfreechart" JAR files in the plugin directory causes exceptions to be thrown in the AWM command window regarding missing Eclipse files. (I'm guessing that "jfreechart" is coded using Eclipse and some code dependencies are left behind.) Allowing this to happen with a user-defined plugin is pretty inelegant at best but most likely just down-right rude. We should avoid that, and we can.

The best practice I have come up with is as follows:
  • The interface between the AWM and your plugin should reside in its own JAR file and only contain the minimum number of class files etc. in order to start the actual plugin. This keeps the startup time for the AWM to the minimum.
  • The actual plugin code should be packaged into its own JAR file and placed elsewhere in the file system. (I use a subdirectory of the plugin directory so that it can be found using relative paths from the plugin interface.)
  • Associated plugin dependencies should also be placed into a subdirectory (I use the same one as my own code as mentioned above) of the plugin directory.
Now, Oracle kindly provide a directory from which plugins can be found and executed but this clearly isn't scalable from a startup point of view if you require the use of large dependencies or lots of plugins. Oracle unfortunately don't provide the equivalent of an "Ext" directory into which dependencies could be placed. So here's the dilemma... Do we customise the Windows batch file (or *nix shell script) to include our dependencies in the classpath, or do we place our dependencies in one of the directories that is already referenced by the Oracle prescribed classpath?

My answer... neither! Customising the startup script is prone to failure if Oracle ever decide to update it. It also means deploying the plugin is a pain, in that everyone's installation may be different so deploying the script needs to take that into consideration. Hijacking the Oracle directories is also pretty "rude" and prone to cause issues down the line, not to mention taking into consideration once again the uniqueness of each installation. No, best practice would be to customise a URLClassLoader in order to dynamically load the dependencies at runtime and start the plugin. This unfortunately requires the use of reflection and is a bit more long winded but it will be far more robust and stand you in good stead during further developments.

My plugin used both "log4j" and "jfreechart", both of which had to be loaded onto the classpath with "log4j" needing to be instantiated in order to log progress. I may blog more on this in detail in the future, as doing this can be tricky and I couldn't find a definitive guide on this on the web.

Finally, coming back to generics, note that the AWMPlugin interface needs to be backwards compatible and as such does not make use of generics. This tends to generates compiler warnings. The way I would recommend handling this is to annotate the interface methods with "@SuppressWarnings("unchecked")" to suppress those warnings but only on the methods provided by Oracle. If you need to assign one of the variables found in the interface parameter list to to your own variable, you can use the generic wildcard, such as declaring "Map<?>", indicating the supertype.

And that's all there is to it.

Monday, 2 February 2009

Analysis Services 2005/2008 Heap Corruption

Recently I have been working for a financial institution who are using Analysis Service 2005 for their ad-hoc analysis. This particular client was experiencing issues with one of the Analysis Services servers producing memory mini-dump files during processing and causing service crashes and stalls whilst other servers were performing acceptably well.

The analysis of this issue started towards the end of 2007 and the reason for this has only now been diagnosed with the able help of Microsoft support. The particular system setup for this client is quite complex, using Oracle 10gR2 as the back-end relational database using flashback for referential integrity and consistency, the 11g (11.1.0.6.0) 64 bit OLE DB driver for connectivity from AS to Oracle and SSIS for scheduled processing. The AS database also makes extensive use of linked measure groups in order to provide a single cube of data with mixed granularity.

In order to diagnose the cause of the heap corruption we had to jump through quite a series of hoops. We initially went down the route of sending the mini-dump files to Microsoft for analysis but these naturally only provided limited and static information. Sending full dump files proved very tricky in the short term as the information that the cubes contain is confidential and cannot be sent off-site. In the end, restarting AS and only processing the dimensions and cubes (i.e. no queries being sent to AS) proved adequate to be able to send the full memory dump files off-site, as it would be extremely hard to put this in-memory information together in any meaningful way.

Still, the full dump files are also static and thus proved inconclusive. It could be seen from the dump files that heap corruption was indeed occuring during processing and there always appeared to be an Oracle error in the dump file as well. It seemed logical that the Oracle 11g driver could be the cause, as most clients tend to use the 10.2.0.3.0 Oracle OLE DB driver when connecting to Oracle 10.2.0.3.0 or later. In order to try and eliminate the 11g driver as the cause, we backed out the 11g driver and installed the 10.2.0.3.0 driver. (Note that in true Oracle fashion, this proved to be an exercise in and of itself. There is no Oracle 10.2.0.3.0 64 bit client installation. Noooooooo, that's waaaaay too easy! You have to install 10.2.0.1.0 and then patch it to 10.2.0.3.0. Oh, but wait, there is a patch before you apply the patch in order to get the patch to work etc. yadayadayada.) Anyhoo, I digress...

In the end, we found that using the Oracle 10.2.0.3.0 driver did not fix the problem. The next step then, was to divide and conquer so to speak...

Part one was to try and create a reduced test case by systematically removing complexity from the AS database. This was achieved by first removing the linked measure groups (I'm not aware of many people using this feature) and then to remove cubes and dimensions until a simple AS database could be created in which the issue could still be reproduced. The idea of this approach was to be able to get a small AS database with a tiny amount of non-confidential data so that Microsoft could perform in-house testing.

Part two was to use Time Travel Trace testing (TTT) which effectively allows one to debug the issue by being able to play forward and rewind the process over a given time span. This would allow one to see the changes made to the heap and replay them to identify the point at which the heap was corrupted. Rather amusingly though, there is a slight catch. This system was using AS2005 on Windows Server 2003 as are most installations I would suspect. However, TTT only works on Windows Server 2003 on a 32 bit architecture or Windows Server 2008 on a 64 bit architecture or if virtualised, only on Hyper-V.
Right, so a few changes to the test case then. As it happened we used a 32 bit environment and as luck would have it we were actually able to reproduce the issue, though TTT failed to work. Initially it exhibited the same symptoms as seen when used on Vista, where it "watched" the incorrect "thread" for activity. This should have been fixed be setting the processor affinity for AS. It wasn't...

As it happens, the TTT testing wasn't required. The reduced test case was effective and we were able to remove all confidential data. Microsoft took this in-house and managed to find the answer. So, are you ready for this? Do you want to know why memory heap corruption was occurring during processing? Well, do you?

Rather bizarrely, the reason for the heap corruption was the use of the ".Properties" function when specifying the default member in a dimension. Yup, the use of a specific function when specifying the default dimension member caused heap corruption during processing, not querying but processing. So how lucky am I that the reduced test case that I created used one of the only two dimensions out of 30 odd using that function when specifying the default member? Now I'm not sure why that function causes the issue and nor are the AS development team. They are still looking into it and from what I understand, were equally surprised. As it happens, the MDX that was being used to specify the default member wasn't particularly efficient in the first place (I must point out that it wasn't my code) and that once the code was changed to use the "Exists" function, it solved the problem.

Clearly your mileage may vary but nonetheless please be aware that the use of the ".Properties" function when specifying the default dimension member can cause heap corruption during processing. This is true for both AS2005 CU11 (or SP3 CU1) and AS2008 CU2. I'm still not sure why this only occurred on one of the servers when the other servers have the exact same database build but hey, removing the ".Properties" function solved the issue so clearly that was the problem.

You have been warned...