Tuesday 27 September 2011

A dynamic type factory for Azure Table Storage - Part 2/4

This article is the second part of how to overcome some of the limitations that come with the Azure TableService Api. In Part1 I take a look at the path taken to solve a specific business case, while in this part I put my magnifying glass over the source code of my DynamicObjectFactory library so we can have some fun with reflection and IL :)

As I am a strong believer in sharing knowledge I encourage you to go nuts and use any or all of this to your own delight. A reference would nice, but is not necessary: Original code by @jorisdries

You can download the binaries and source code here

I rewrote the code a little from the real version to be a bit more suited for this blog. Remember to put any questions in the comments and I will get back to you.

There are 2 classes that form the core of this library, 1 to generate custom types, and 1 to generate (custom) typed objects. They are eloquently named TypeFactory and ObjectFactory respectively.

1. The TypeFactory

Can generate a custom type from either a DataTable, or an IEnumerable<TypeField>. TypeField is a very simple construct to facilitate generating our type:
public class TypeField
{
    public string Name { getset; }
    public object Value { getset; }
    public Type Type { get { return Value.GetType(); } }
}

The functions for creating a type have the following signatures:
public static Type CreateType(DataTable dataTable, string typeName,
    Type baseType = null)
public static Type CreateType(IEnumerable<TypeField> fieldCollection, string typeName,
    Type baseType = null)

dataTable/fieldCollection: contains both the field definitions and the field values (because I'm lazy and the scenario I built this for is actually transforming a bunch of data into a custom typed list. Cfr Part1.)
typeName: the name that will be used to create the type with in a new dynamic assembly
baseType: the parent type you want the custom generated type to inherit from.

The overload taking the DataTable as its first argument is actually implemented using the second method, using a conversion from DataTable to a TypeFieldCollection, which is basically nothing more than a wrapped List<List<TypeField>> which is implemented as follows:
public class TypeFieldCollection : List<List<TypeField>>
{
    public IEnumerable<TypeField> Schema { get { return this.FirstOrDefault(); } }
}

The conversion from a DataTable to a TypeFieldCollection is quite unimaginative, but I'll give it for completeness sake, as it might aid with the understanding of the purpose of TypeField:

public static TypeFieldCollection ToTypeFieldCollection(this DataTable dataTable)
{
    TypeFieldCollection fieldCollection = new TypeFieldCollection();
 
    foreach (DataRow row in dataTable.Rows)
    {
        List<TypeField> entityRepresentation = new List<TypeField>();
        foreach (DataColumn col in dataTable.Columns)
            entityRepresentation.Add(new TypeField
            {
                Name = col.ColumnName,
                Value = row[col]
            });
 
        fieldCollection.Add(entityRepresentation);
    }
 
    return fieldCollection;
}


Right, so now we're ready for the big one. The comments in the code shouldn't be to hard to follow, but as always, questions are more than welcome. A few things I will elaborate on first though:

A high level look at the method:
  1. Build a dynamic assembly to hold our type (if we haven't built it before)
  2. Create the type (possibly inheriting from a base type)
  3. Define the type's properties by iterating over the IEnumerable<TypeField>. For every TypeField we:
    1. Create the corresponding backing field and the property
    2. Create the get and set methods and their IL code
  4. Finalize and return the type. Done!
Don't worry, you don't need to get it, to use it
If you don't get the IL code, don't worry about it. The first method simply reads a value and puts it on the stack, the second one updates a value with another one from the stack. (get and set property methods)

TypeFactory.CreateType code
public static Type CreateType(IEnumerable<TypeField> fieldCollection, string typeName,
    Type baseType = null)
{
    // If we've previously built a type with this name, return that one
    if (_types.ContainsKey(typeName))
        return _types[typeName];
 
    // create a dynamic assembly
    AssemblyName assemblyName = new AssemblyName();
    assemblyName.Name = typeName + "Assembly";
    AssemblyBuilder assemblyBuilder = Thread.GetDomain().DefineDynamicAssembly(
    assemblyName, AssemblyBuilderAccess.Run);
 
    // Create a module
    ModuleBuilder module = assemblyBuilder.DefineDynamicModule(typeName + "Module");
 
    // create a new type builder (with or without use of a base type)
    TypeBuilder typeBuilder = baseType == null ?
        module.DefineType(typeName, TypeAttributes.Public | TypeAttributes.Class) :
        module.DefineType(typeName, TypeAttributes.Public | TypeAttributes.Class,
        baseType);
 
    // Loop over the fields that will be used as the properties in our new type
    foreach (TypeField f in fieldCollection)
    {
        Type propertyType = f.Type;
        string propertyName = f.Name;
 
        // If the property already exists we skip it, feel free to change this
        // approach to override
        if (baseType != null && baseType.GetMember(propertyName).Any())
            continue;
 
        // Generate the field that will be manipulated with the property's
        // get and set methods
        FieldBuilder field = typeBuilder.DefineField(
            "_" + propertyName, propertyType, FieldAttributes.Private);
        // Generate the public property
        PropertyBuilder property = typeBuilder.DefineProperty(
            propertyName, System.Reflection.PropertyAttributes.None,
            propertyType, new Type[] { propertyType });
 
        // Define the attributes for the getter and setter of the property
        MethodAttributes propertyAttributes =
            MethodAttributes.Public | MethodAttributes.HideBySig;
 
        // Declare the accessor (get) method for the field we made previously.
        MethodBuilder getMethod = typeBuilder.DefineMethod(
            "get_value", propertyAttributes, propertyType, Type.EmptyTypes);
 
        // Write a method in IL to read our field and return it
        ILGenerator accessorILGenerator = getMethod.GetILGenerator();
        accessorILGenerator.Emit(OpCodes.Ldarg_0);
        accessorILGenerator.Emit(OpCodes.Ldfld, field);
        accessorILGenerator.Emit(OpCodes.Ret);
 
        // Declare the mutator (set) method for the field we made previously.
        MethodBuilder setMethod = typeBuilder.DefineMethod(
            "set_value", propertyAttributes, nullnew Type[] { propertyType });
 
        // Write a method in IL to update our field with the new value
        ILGenerator mutatorILGenerator = setMethod.GetILGenerator();
        mutatorILGenerator.Emit(OpCodes.Ldarg_0);
        mutatorILGenerator.Emit(OpCodes.Ldarg_1);
        mutatorILGenerator.Emit(OpCodes.Stfld, field);
        mutatorILGenerator.Emit(OpCodes.Ret);
 
        // Now we tie our fancy IL to the actual property in our assembly,
        // and the deed is done!
        property.SetGetMethod(getMethod);
        property.SetSetMethod(setMethod);
    }
 
    // Create and store the type for future use (while memory lasts)
    Type resultingType = typeBuilder.CreateType();
    _types.Add(typeName, resultingType);
 
    // And finally, return our brand new custom type
    return resultingType;
}

2. The ObjectFactory

Excellent, so now we have a method to create a type, let's dive into the ObjectFactory and take a look at how to instantiate some of our highly anticipated objects. The ObjectFactory implements methods to create objects from an existing type, or it can generate a new type for you to spit out objects with. Support for DataTables is implemented by generating an IList of custom objects.

The following method is the implementation for creating an object (or entity if you will) from a collection of TypeFields:
public static object CreateEntity(IEnumerable<TypeField> fieldCollection, string typeName,
    Type baseType = null)
{
    Type type = TypeFactory.CreateType(fieldCollection, typeName, baseType);
    return CreateObject(fieldCollection, type);
}

The real magic though, happens here:

ObjectFactory.CreateObject code
private static object CreateObject(IEnumerable<TypeField> fieldCollection, Type type)
{
    object instance = Activator.CreateInstance(type);
    foreach (TypeField f in fieldCollection)
    {
        var property = type.GetProperty(f.Name);
 
        // Skip properties that do not exist in the type
        if (property == null)
            continue;
 
        try
        {
            property.SetValue(instance, f.Value, null);
        }
        catch
        {
            //If setting the value failed, just insert the default value
            type.GetProperty(f.Name).SetValue(instance, GetDefault(type), null);
        }
    }
    return instance;
}

To see this in action, consider the following array of TypeFields
static TypeField[] myDataSet =
                new TypeField[]
                {
                    new TypeField { Name="Foo", Value = 1 },
                    new TypeField { Name="Bar", Value = "Banana" },
                    new TypeField { Name="Baz", Value = 0.5d }
                };

We can now generate an entity out of this, and access its properties using reflection:
var obj = ObjectFactory.CreateEntity(myDataSet, "MyType");
string bar = obj.GetType().GetProperty("Bar").GetValue(obj, nullas string;

To see a DataTable example, please see Part1

A few considerations:
  • DataTables allow column names with spaces in them, Properties in .NET do not. This will happily throw you an exception with that exact notification. The reason I don't remove the spaces, is because that will simply shift the problem: e.g.: We have a DataTable with 2 columns: 'FooBar' and 'Foo Bar'; In this scenario, we would lose the data contained in 1 of them (if a property already exists, as the library gracefully skips creating it).
  • With regards to the treatment of DBNull for non-nullable types (all value types), should we change the type to the nullable equivalent, or initialize with the default value? A good example of this is DateTime: when receiving a null date, do we make it the DateTime's default value? (01/01/01/), or do we change the type into a DateTime? and make it actually null? I decided this was a business decision and didn't change the type to a nullable one. Feel free to make the change.
So what's next?
In Part3 we'll take a look at using a Serializable DynamicObject that implements INotifyPropertyChanged to avoid using reflection to interact with our custom typed objects, which is a bit easier on the eye. The code for this is already part of the library here in Part2.

6 comments:

  1. Hey man, I find your stuff very interesting as I'm working on a cloud project and have the same obstacles. Can I know when part 3 is out?

    ReplyDelete
  2. This solution is flawed what`s the point of adding data the Table in a dynamic way but you can still not query it dynamically? The CreateQuery method needs a real type and will not work with TypeField[] nor DataTable.

    ReplyDelete
  3. What I meant is CreateQuery needs a type that is inheriting from TableServiceEntity and neither DataTable nor TypeField[] can inherit from TableServiceEntity so it will not work.

    ReplyDelete
  4. Hi devdude, firstly, sorry about the tardiness of the reply, but it is quite easy to make a type that inherits from TableServiceEntity, when creating the dynamic type you can simply supply typeof(TableServiceEntity) as the baseType parameter.

    The newly generated dynamic type will then have TableServiceEntity as its superclass. Running CreateQuery with this will require reflection to generate the Generic method though.

    This would look something like this:

    static DataServiceQuery CreateQuery(TableServiceContext tsc, Type t, string entitySetName)
    {
    // Retrieve all entities
    DataServiceQuery query = (dynamic)typeof(TableServiceContext).GetMethod("CreateQuery").MakeGenericMethod(t).Invoke(tsc, new object[] { entitySetName });
    return query;
    }

    ReplyDelete
  5. Thanks for such a usefull post, can yoy please provide the full code how can we use the dynamically generated type in querying the records from table. I am using the below code which does not work

    var obj = ObjectFactory.CreateEntity(myDataSet, "MyType");


    var specificEntity1 =
    (from e in serviceContext.CreateQuery("myazuretable")
    where e.PartitionKey == "TestPartition" && e.FirstName="Ravi"
    select e).AsTableServiceQuery();

    ReplyDelete
  6. Today I celebrate a year of being too lazy to complete parts 3 and 4. Well done at antagonizing me @phermens!

    ReplyDelete