Sunday, September 16, 2012

Changing .NET Assembly Platforms with Mono.Cecil

By default, .NET assemblies can only be loaded by the platform they are built for, or potentially anything later in the same stream (.NET 4 can load .NET 2 assemblies, Silverlight 5 can load Silverlight 4 assemblies, etc.). However, some of the stuff I’ve been working on for IronPython would be a lot easier if I could just build one assembly and use it anywhere. While it’s not possible with just one assembly, I can generate all of the other assemblies from the base one, with a few caveats.

The problem is caused by Reflection.Emit. IronPython uses RefEmit to generate .NET classes at runtime, and has the ability to store those on disk. However, RefEmit will only generate assemblies for the .NET runtime it is currently running under, which is usually .NET 4. Not all platforms support RefEmit, and re-running the compilation on every platform that needed it would be a pain anyway.

IKVM.Reflection offers a RefEmit-compatible API that can target any platform, but using it would require changing IronPython’s output code to use IKVM RefEmit instead of standard RefEmit when compiling, which is a fairly large change I didn’t want to make right now (maybe for 3.0).

Mono.Cecil is a library for manipulating .NET assemblies. It’s often used to inject code into assemblies and other mundane tasks. What I wanted to know was whether I could take a .NET 4 assembly generated by IronPython and produce a .NET 2 assembly that would run on IronPython for .NET 2. The answer turns out to be yes, but it’s a bit of a pain.

Rewriting assemblies for fun and profit(?)

There are a few things in an assembly that may have to change to get it to work on a different runtime. The first is simple: change the target platform, which is part of the assembly metadata. The next is a bit trickier, but not too bad: change the versions of referenced assemblies to match the platform you are targeting. The third part requires some tedious cataloguing: find any types that are located in different assemblies and change them to point at the correct ones for that target platform. The final piece is the most difficult: potentially, rewrite the actual IL code so that it works on any platform.

The first part, changing the assembly version, is trivial:

ad.Modules[0].Runtime = TargetRuntime.Net_2_0

The second part is not much harder, but requires some extra setup: we need to know what version to change it to. This is potentially an impossible problem, because you don’t always know what types might be present, and any references that aren’t to framework assemblies could break. Right now, I’m just hardcoding everything, but it would be better to pull this  from whatever version of mscorlib.dll is being targeted.

{
    "mscorlib": (2,0,0,0),
    "System": (2,0,0,0),
    "System.Core": (3,5,0,0),
}

The next part of the process is changing the assembly a type belongs in. It’s not actually that hard to do in Mono.Cecil, it just takes a low of upfront knowledge about how things have moved around. In the .NET 2 version of IronPython, the DLR is in Microsoft.Scripting.Core; in .NET 4, it’s in System.Core. Successfully loading a generated assembly means changing the relevant types from System.Core to Microsoft.Scripting.Core. In some cases, the namespace has also changed; the expression trees are in System.Linq.Expressions in .NET 4 but in Microsoft.Scripting.Ast for .NET 2.

The key here is to use Module.GetTypeReferences() to get all of the types an assembly references, and then change the Scope property to point to the new assembly and the Namespace property to the new namespace.

The final part (which is actually done first) is having to rewrite the IL code to replace any unsupported functions with ones that are. Thankfully, there is only one case of that so far: StrongBox<T>(), which exists in .NET 4 (and is used by LambdaExpression.CompileToMethod()) but does not exist in .NET 3.5. The paramaterless constructor call gets replaced by passing null to the constructor that takes an initial value, which is all the paramaterless one does. This is actually pretty straightforward:

strongbox_ctor_v2 = clr.GetClrType(StrongBox).MakeGenericType(Array[object]).GetConstructor((Array[object],))
strongbox_ctor_v4 = clr.GetClrType(StrongBox).MakeGenericType(Array[object]).GetConstructor(Type.EmptyTypes)

method = dcc.Methods[0]
il = method.Body.GetILProcessor()
instr = method.Body.Instructions[4] # This is specific to my use; YMMV
il.InsertBefore(instr, il.Create(OpCodes.Ldnull))
il.Replace(instr, 
    il.Create(OpCodes.Newobj, 
        method.Module.Import(strongbox_ctor_v2)))

There is one caveat here: because of how assembly references work, the IL rewriting should be done before the reference versions are changed, so that there are no stray references to 4.0 assemblies.

Next Steps

This is all just a proof of concept; there’s a few more things to do to make it usable. For example, it needs to be able to look at a set of references and work out which types moved where based on that (this is really important for Windows 8, which moved everything, it seems). Still, the approach seems promising; hopefully there aren’t anymore landmines to deal with.