Saturday 17 September 2011

A dynamic type factory for Azure Table Storage - Part 1/4

This post will be the first of many where I detail some of the peculiarities/workarounds encountered in working with Azure to generate a system of fair complexity (see the architecture here). In this instance I will breach the topic of surpassing the limitations the Azure TableService Api imposes on us. (if you are not familiar with these then 1: find out more here, 2: this article might prove challenging to follow, but try anyway, you'll love it ^^ )

The 4 parts of this series can be outlined as follows:

Part 3 : A serializable DynamicObject that implements INotifyPropertyChanged
Part 4 : An actual Azure End to End showcase using the library

The business case

(1): We need an application that can sit over any DB+Cube and get that data in and out of Azure Table storage.
(2): Hooking up a new data source should not require changes to the application, let alone redeployment.

The plan of attack

The problem here, is that we were facing some seriously complex data models, and using ORM tools such as EF was out of the question because of (2). Having cried over not having a strongly typed object representation of our entities we grasped back to the good old days of SqlAdapters and DataTables *cringes*. Getting the data out of the DB is very straightforward, all is good.

So now that we have the data out, we have to upload it into Azure. (This is early on in the piece, my Azure expertise was about as impressive as my handwriting, which I stopped doing about a decade ago). Having done some stuff with Table Storage, I quickly got one of my test projects of the shelf, and decided to plug it into the DataTable based solution. This is where the sh*t hit the fan. Guess what, the TableService api needs a strongly typed object model to work with.

At this point I explored a few dead ends of which the following was the most notable failure: using a custom DynamicObject as my base TableServiceEntity, all the pushing and pulling of properties and their values worked out easy enough, unfortunately DynamicObject misses one silly attribute that our TableService really really needs... [Serializable]. Now that's what I call a showstopper!

The first solution

The first actual solution, the one most of you will be thinking by now, was using the Table Service REST API. In good programming style, we set out to build our own DataAdapter which catered for CRUD stuff on Azure Table Storage. It soon became apparent that this wasn't going to be as easy we had envisaged. Debugging exactly why the REST api was returning one 400 after another was just plain ridiculous. Getting encryption, keys and http headers all working was a regular pain in the buttocks. It took a little while, but we managed to get a version up and running.

We then built conversion mappings from CLR to Table Storage types for the fields and our upload tests were doing just great. We then had some additional hiccups implementing query parameters on the lookups, but finally reached a stage where we had something to work with.

And then roadblocks appeared again, at every angle. Looking up more than a 1000 items taught us we needed to implement support for continuation tokens. Uploads were woefully slow, because we weren't batching, and it really started to dawn on me we were stupidly writing a parallel implementation of the TableService api (which has all of this funky stuff built into it). I looked into implementing batch operations through the REST api, and proudly made the decision there had to be an easier/better/cleaner more maintainable solution. So we abandoned this approach.

The second/current solution

I really wanted to be able to use the StorageClient.TableService api for several reasons:
  • It supports all the table storage features
  • It's optimized in the best way possible (one would hope)
  • Future changes to the REST api will not require me rewrite a whole lot of code
  • Easy to work with (+ transferable skillset and all that) 
  • Microsoft supported api
So what do you do when you need strongly typed objects and you can't build em into the app? You build em at run-time of course! Two options presented themselves here:
  1. Using the CodeDom to compile our types into an assembly
  2. Emitting our types into IL
I am not going to discuss the pros and cons of these here, there's plenty of articles out there that do this better than I could, but after a lot of reading and some experimentation I decided to go with Reflection.Emit (even though I had previous experience with the CodeDom and not with Emit).

Being a bit intimidated by IL initially, I was surprised to find how quick and easy it was to pump out this library which I so eloquently named DynamicObjectFactory. Perhaps DynamicTypeFactory might have been more appropriate, but we can instantiate objects from it as well, so I decided not to bother changing it :). Admittedly the only thing I support in it are public properties (getter & setter methods). The reason for this is because I purpose built it for our business case, and it is all we need to interact with Azure table storage using the Microsoft supported api.

The outcome

--- EDIT: The below codewas updated to be in line with the implementation in Part2 ---

In Part2 I'll actually go through the code of the DynamicObjectFactory, but before closing this one, I'll present a quick overview of the methods and actual usage with Azure.
// Create a type from a DataTable schema and a baseType to inherit from
// e.g. TableServiceEntity, so we get our PartitionKey, RowKey and Timestamp
public static Type CreateType(DataTable dataTable, string typeName, Type baseType = null)
// Create a list of entities based on a table and a type
// (Property names will be matched to Column names)
public static IList CreateList(DataTable dataTable, Type type)

We can now do anything with our DataTable using the TableService APIs that we can do with a pre-compiled type. Below is an example of how to use this to do batch uploads to Azure Table Storage. Keep in mind this is sample code only and is written with simplicity in mind.

// Create the type
Type collectionType = TypeFactory.CreateType(
    sourceData, sourceData.TableName, typeof(TableServiceEntity));
// Instantiate the data into a list of our custom type objects
IList azureData = ObjectFactory.CreateList(sourceData, collectionType);
 
// Connect to Azure
var account = CloudStorageAccount.Parse(
    string.Format(
    "DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}",
    azureDataAccountName, azureDataAccessKey));
TableServiceContext tsc = new TableServiceContext(
    account.TableEndpoint.ToString(), account.Credentials);
CloudTableClient client = account.CreateCloudTableClient();
// Create the table
client.CreateTableIfNotExist(tableName);
 
// Upload the data to Table Storage in batches
int counter = 0;
foreach (var entity in azureData)
{
    counter++; // We processed one! Up the counter
 
    tsc.AddObject(tableName, entity); // Add the object
 
    if (counter % 100 == 0) // Every 100 items we commit a batch
        tsc.SaveChanges(SaveChangesOptions.Batch);
}
// Don't forget to commit the last batch
tsc.SaveChanges(SaveChangesOptions.Batch);
And that's how easy it is to get your data into Azure Table Storage. Extracting and manipulating the data on the other side, is not much different, the only catch is that you will need to use reflection or the dynamic keyword to access the entity properties. (Which should be fine, as you don't know what the property names are in advance anyway)

If we now head back to our business case:
(1): We need an application that can sit over any DB+Cube and get that data in and out of Azure Table storage.
SUCCESS! A couple of parameters can get us SQL/MDX(Drillthrough) data out of the data source into a DataTable and finally in and out of Azure
(2): Hooking up a new data source should not require changes to the application, let alone redeployment.
SUCCESS! All that is required, is for a configuration entry to specify the data source and query parameters. Everything else comes rolling out automagically. No application changes/redeployment required.

Any suggestions/ideas on other ways of addressing this business case are more than welcome in the comments!

4 comments:

  1. Hi..How can we set partition key and row key.. I am getting a partition key as null value exception when i try to save in table storage using this code.

    ReplyDelete
  2. Please add an example of how to query TableStorage with dynamic columns as the input instead of hard coding properties in a pre-defined class. I was unable to get this working using the ObjectFactory.

    ReplyDelete
  3. Hi Yogeeta, sorry about the slow reply, you would just specify them as properties in your datasource
    var r1 = _sourceData.NewRow();
    r1["PartitionKey"] = "PeopleILike";
    r1["RowKey"] = "1";
    r1["FirstName"] = "Eric";
    r1["LastName"] = "Norris";
    r1["Dob"] = new DateTime(1988, 4, 7);
    r1["Gender"] = "Male";

    ReplyDelete
  4. Thank man, that was exactly what I was looking for ( The devdude, then your response)

    ReplyDelete