Monday, 12 September 2011

Diving into the architecture of a burst processing real-world Azure application


In this post I will break down the architecture of a hybrid real-world SAAS BI-application running SQL Server 2008 + Analysis Services on a traditional server system while leveraging Azure burst processing to post-process and transform this data into Deep Zoom images and PivotViewer’s CXML.

The application at hand is part of a larger service offering which is mainly geared towards customer engagement for large corporates which offers online communities, surveys and fast and easy data visualization and workflow components. We’ll be diving into the heart of Resonate’s flagship product Pulse.

A quick word about me perhaps to give you some context for this post: I am Resonate’s software architect, as well as lead developer (the smaller the company, the more hats you wear :p). Incidentally I had to crank up my creative skills a bit as well, as I made the above video too ^^.

Because we all like human readable blogs I will explain what the typical flow inside our system looks like by using a fictitious example. Pulse can cater for many scenarios, but I’d like to focus on the most demanding one, which is the real-time transactional one.

Architecture diagram

Here's an overview of the architecture I will refer to in the below story. The hashed digit references (below) correspond with the orange circles in the above diagram. (I would strongly advise the reader to keep this image open in a separate tab at actual size (click it, and be wowed) while reading through the rest of this article.)

The story of Qux

Ok, so let’s quickly make up a company called FooCorp that retails BarGoods in BazLand. I love it already.

An inhabitant of BazLand, let’s call them Qux, has just bought some BarGoods and swiped their FooLoyaltyCard at the register. We already know everything about Qux, as they signed up for this card online.

Data about Qux’s transaction is fed straight into Pulse and (depending on sampling rules) a survey invitation gets sent out to Qux.

Qux is overjoyed to receive this survey (yes, it happens!), fills it out and submits it back into Pulse.

We then grab some demographics data (age, gender, …) about Qux, combine that with the transactional data (what did Qux buy, which register did they use, etc…) and attach this to the answers Qux just gave to survey questions. This package is pushed into the Pulse data visualization DB and processed into an OLAP cube. (#1)

FooCorp’s analysts dive into the Pulse Silverlight application and use a cube navigator, data mining tools and stats/charts to determine which chunk of the data needs some further investigation. There seems to be an anomaly for around 2000 transactions made in the CBD stores over the last few days, little do they know Qux is one of shoppers here. The analysts decide they want to drill down on this issue and generate a request to generate a (PivotViewer) card collection of these 2000 transactions.

--- human interaction stops, kick it into a higher gear ---

The Silverlight application’s RIA service (#2) now fires a drillthrough request to a web service sitting across SSAS (#1) based on the selected cube measures/dimensions/filters. The result of this are flattened rows that each represent one entity (containing 3 types of data: demographic, transaction and survey), which are then uploaded to Azure table storage (#3). After the upload completes, details about the collection get written to a SqlAzure database (#4).

The manager worker role (#5) polls the SqlAzure DB and picks up the collection, because it's in the “pending” state. Happy to do some work, it grabs the data from table storage, sends it off for Lexical Analysis (#6) and merges the generated concepts with the source data. The manager (#5) then fills up the images queue (#a) for processing with the updated entities.

The builders (#7) each grab 100 entities each from the images queue (#a) and bind some XAML to the entities to generate a stunning image which then gets pushed into blob storage (#8). (if you want to know why 100, ask me in the comments)

The manager (#5) keeps track via SqlAzure (#4) of how many images are actually completed, and once this is done pushed jobs onto the task queue (#b) to generate the CXML, as well as the DeepZoom pyramids and metadata files (DZC + DZI).

The builders (#7) will pick up these tasks from the task queue (#b) and push the generated files into blob storage (#8). (Morton trees are created using a custom implementation of the DeepZoom tools to allow distributed processing of the images on Azure worker roles)

At every step of the way the manager (#5) updates the SqlAzure tracking tables (#4) with the completion state of the collection (Pending – Generating Images – Generating Collection – Complete). This way the end user can consult the Pulse Silverlight application (#2) to keep track of how far along their collection is.

--- human interaction resumes ---

Our FooCorp analysts have eagerly watched the progress of their collection and now that it’s ready they jump in and have a PivotViewer collection served to them securely through the Azure webrole’s httphandler (#9).

After browsing through the cards and reading some verbatim comments (Qux being one of them), they see the overly negative reaction discovered in the cube data, was caused by a METEOR STRIKE FROM OUTER SPACE! Ok ok slight dramatization here, but I hope you get the picture :)

In the best interest of time, and size, I have tried to keep this post as small as possible, which was very hard to do. In future posts I will take a closer look at different parts of the above architecture, as well as some of the more quirky stuff we did to get this to work as smoothly as possible.


  1. "The builders (#7) each grab 100 entities each from the images queue (#a) "..

    where did u get the number 100 from? What is it based on? Has it anything to do with the blob storage?

  2. It's not a technical limitation, but rather a cutoff that's determined by the overhead+time it takes to process 100 queue items, vs the number of instances that can grab items. *not real nrs* If 10 items take 1 minute to process and 100 items take 2 minutes to process, and I want 500 items to be processed in 2 minutes with 5 instances. Then 100 is the perfect number. I've done heaps of calcs around this and it came out around between 85 and 110, so I decided to go for a nice round 100 :)