Jeff Hardy's Blog

Thursday, June 27, 2013

Anonymous Function Blocks in Python

Python has anonymous functions in the form of lambdas, but they are limited to a single expression. For the most part, this is enough (especially now that print() is a function in Python 3), but there are cases where being able to have multiple statements would be useful. Right now, the way to do this in Python is to use a named, nested function:

def upload_data(dest, *urls):
    def _fetch(x):
        data = fetch_url(url)
        sent = 0
        for line in data:
            sent += send_data(dest, line)
        
        return sent
    
    return map(_fetch, urls)

Now, this could be rewritten as a set of expressions, but what if we had multi-statement anonymous functions?

The Idea

I’m not calling this a proposal because, quite frankly, I’m not sure it’s worth the effort, and I certainly don’t have the time or energy to try and champion it. I also haven’t looked to see if someone else has already had the same idea: it just occurred to me and I thought I’d write it down. The thought of wading through python-ideas to see if someone already had the same does not strike me as a good use of time.

Anyway, I was working on implementing function annotations for IronPython, and I realized that the arrow operator (->) was not used anywhere else in the grammar, and as far as I could tell was completely unambiguous – there’s no existing Python code that would contain the arrow. So, rather than being able to put multi-line lambdas anywhere (like C# or C++), what if they were restricted, like Ruby’s blocks? Python can’t use do like Ruby does, but maybe it could use the arrow instead?

Then, because no existing Python functions expect blocks, there needs to be a way to refer to a block in a statement. I decided to copy Ruby’s use of &, but in a slightly different way – as a placeholder for the block attached to that statement. A bare & is also not valid Python code, and I could not think of anything it could combine with that would be currently valid code.

Syntax

The result is something like this:

def upload_data(dest, *urls):
    return map(&, urls) -> (url):
        data = fetch_url(url)
        sent = 0
        for line in data:
            sent += send_data(dest, line)
        
        return sent

The –> is used to introduce the block; it’s followed by a parameter list, a :, and a suite, just like a normal funcdef. In fact, even the type annotations would be usable, although the resulting double arrow (map(&, foo) –> (f : int) –> str:) looks a bit weird.

OK, so it’s workable within the grammar (I actually implemented in IronPython’s parser, just to be sure). What does it mean?

Semantics

Semantically, these blocks are just a prettied-up version of the first function. The block is transformed into a nested function immediately before the statement with a generated name, and any block references (&) are replaced with the generated name. Some tricks would have to be played with line numbers to make debugging make sense, but that’s not insurmountable.

Multiple references would be allowed, and although I can’t think of a use case for that, it makes no sense to disallow it.

Even decorators (which are just functions, after all) can still be used:

map(my_decorator(&), foos) -> (foo):
    pass

There’s no reason they couldn’t be generators, either:

list(&()) -> ():
    i = 0
    while i < 10:
        yield i
        i += 1

The idea is to make them as close to named Python function as possible. The object passed to map is still a function instance, so all existing Python functions that take a callable should be immediately usable.

Implicit Blocks

Explicitly passing around block references is necessary to deal with existing Python functions (and we all know “explicit is better than implicit”) but it’s kind of ugly. Borrowing, again, from Ruby, it would be nice to have blocks be implicit:

def map(&func, iterable):
    return [func(e) for e in iterable]

map(foos) -> (foo):
    pass

This gets a lot trickier to implement in the general case, where there might be multiple functions with implicit blocks in the same statement. A rule of “outermost-rightmost” would probably work. I’m not exactly sure what restrictions Ruby imposes.

Use Cases

Blocks are possible to implement, and probably not too hard either. However, that doesn’t mean they’re worth doing. There aren’t too many situations where you can’t use list comprehensions, generator expressions, or lambdas, and nested name functions already exist to handle the remaining cases.

There are a couple of things that they do make nicer, though. Implementing decorators that take arguments, for one:

def timed(name):
    return & -> (func):
        return functools.wraps(func)(&) -> (*args, **kwargs):
            with timer(name):
                return func(*args, **kwargs)

Speaking of with statements, they wouldn’t be necessary with blocks:

def with_(obj, &func):
    obj.__enter__()
    try:
        return func(obj)
    finally:
        obj.__exit__()

with(open("foo.txt")) -> (f):
    upload(f)

It’s not exactly the same, since the block cannot return from the enclosing function, and you’d need nonlocal to modify variables in the outer scope. A similar treatment could be applied to for as well.

Finally, there’s the many things Ruby does with its own blocks, such as Sinatra:

get('/hi') -> ():
    return 'Hello, World'

But Flask does basically the same thing in the confines of existing Python. Still, it is very nice syntax sugar.

The Verdict

I think the idea is sound – if blocks are added to Python, they should look something like this. The work required for blocks using explicit block references should be relatively simple for someone familiar with CPython. Implicit block references are harder, but probably still doable.

That said, the use cases aren’t enough to motivate me to want to implement it (except for possibly the decorator – I can never figure out what to name those nested functions). If anyone else wants to, feel free to reuse the syntax. And if someone else already had the same idea, my apologies.

Now that I’ve written it down, I can page this idea out and never think of it again.

Tuesday, June 25, 2013

IronPython 3 TODO

This is my own list of things I want to see in IronPython 3 (after I get 2.7.4 out). It’s unlikely all of them will make 3.0 (which I’m targeting for PyCon 2014 next April), but hopefully most of them will.

Python 3 Features

Obviously, this is the most important thing. A couple are already implemented in branches (function annotations and part of keyword-only args), a few are relativelyeasy (metaclass changes, removal of old-style classes), and a few are hard (nonlocal, super(), yield from).

The first step will be to bring the new standard library and work from there. In addition, any changes needed to make the 3.x stdlib work on IronPython will be rolled into CPython so that we don’t have to maintain our own fork, with all of the work that that entails.

Better Test Coverage

IronPython’s test are currently a mess: they take too long to run and a huge number don’t even pass. Also, the test runner only works on Windows. It will need to be heavily modified (or probably replaced) with one that is more portable. It would also be nice to be able to generate coverage metrics, and be able to mark tests as “expected failure” so that TeamCity will actually build and run the tests successfully.

More Platforms

Windows desktop/server is no longer the only game in town. IronPython already works well on Mono, but without test coverage it’s hard to know how well. On top of that, the tablet/phone market is huge and getting bigger. Xamarin’s wonderful tools will make Android and iOS ports possible, and I’ve seen several requests for Windows Phone 8 and Windows Store (“Metro”) support as well, but they may not happen without someone volunteering to maintain them. Likewise, support for Silverlight will probably be dropped unless someone volunteers to maintain it.

The current code base is a mess of FEATURE_ and platform #ifdefs that is rather hard to maintain. There is the concept of a Platform Adaptation Layer (PAL), but it doesn’t get used everywhere. I’m going to see how feasible it is to expand the PAL to get rid of as many #ifdefs as possible.

At first each platform will only support embedding IronPython, but will eventually be extended to support building apps in pure Python.

“Static” Compilation and MSBuild support

IronPython currently has pyc.py to generate executables and DLLs, but it’s a bit clunky, only supports .NET 4, and is missing some useful features like arbitrary resources. Improvements to pyc.py will add those missing features to generate truly standalone executables.

The other issue is that the DLLs it generates are not usable from any .NET language; they must be loaded in an IronPython host. “Static” compilation will allow DLLs to be generated with actual .NET classes that can be reflected over and thus will be useful as plugins (for e.g. MSBuild or IIS) or to build pure-Python Andoid/iOS/WP8/Metro apps, all of which are basically plugins as well. Python classes just need some special adornment and pyc.py will pick them up and compile them to real types (or something similar):

import clr
of = clr.of

class RealClass(metaclass=clr.object):
    @clr.method(visibility="public")
    def Foo(self, x : of(int), y : of(str)) -> float:
        pass

This is both a lot and a little bit of work; the code to do the type generation is already there, but it makes some assumptions about global state that need to be pulled apart and refactored.

Yikes

OK, so there’s a lot there for nine months. And, keep in mind, this is just what I want to work on. However, I think this list will make sure that IronPython stays viable in the future. If you have any other suggestions, let me know.

As always, if you’re interested in helping, get in touch. I’m more then willing to help anyone get started. The codebase is intimidating at first, but once you got over the initial learning wall it’s not so bad.

Sunday, September 16, 2012

Changing .NET Assembly Platforms with Mono.Cecil

By default, .NET assemblies can only be loaded by the platform they are built for, or potentially anything later in the same stream (.NET 4 can load .NET 2 assemblies, Silverlight 5 can load Silverlight 4 assemblies, etc.). However, some of the stuff I’ve been working on for IronPython would be a lot easier if I could just build one assembly and use it anywhere. While it’s not possible with just one assembly, I can generate all of the other assemblies from the base one, with a few caveats.

The problem is caused by Reflection.Emit. IronPython uses RefEmit to generate .NET classes at runtime, and has the ability to store those on disk. However, RefEmit will only generate assemblies for the .NET runtime it is currently running under, which is usually .NET 4. Not all platforms support RefEmit, and re-running the compilation on every platform that needed it would be a pain anyway.

IKVM.Reflection offers a RefEmit-compatible API that can target any platform, but using it would require changing IronPython’s output code to use IKVM RefEmit instead of standard RefEmit when compiling, which is a fairly large change I didn’t want to make right now (maybe for 3.0).

Mono.Cecil is a library for manipulating .NET assemblies. It’s often used to inject code into assemblies and other mundane tasks. What I wanted to know was whether I could take a .NET 4 assembly generated by IronPython and produce a .NET 2 assembly that would run on IronPython for .NET 2. The answer turns out to be yes, but it’s a bit of a pain.

Rewriting assemblies for fun and profit(?)

There are a few things in an assembly that may have to change to get it to work on a different runtime. The first is simple: change the target platform, which is part of the assembly metadata. The next is a bit trickier, but not too bad: change the versions of referenced assemblies to match the platform you are targeting. The third part requires some tedious cataloguing: find any types that are located in different assemblies and change them to point at the correct ones for that target platform. The final piece is the most difficult: potentially, rewrite the actual IL code so that it works on any platform.

The first part, changing the assembly version, is trivial:

ad.Modules[0].Runtime = TargetRuntime.Net_2_0

The second part is not much harder, but requires some extra setup: we need to know what version to change it to. This is potentially an impossible problem, because you don’t always know what types might be present, and any references that aren’t to framework assemblies could break. Right now, I’m just hardcoding everything, but it would be better to pull this from whatever version of mscorlib.dll is being targeted.

{
    "mscorlib": (2,0,0,0),
    "System": (2,0,0,0),
    "System.Core": (3,5,0,0),
}

The next part of the process is changing the assembly a type belongs in. It’s not actually that hard to do in Mono.Cecil, it just takes a low of upfront knowledge about how things have moved around. In the .NET 2 version of IronPython, the DLR is in Microsoft.Scripting.Core; in .NET 4, it’s in System.Core. Successfully loading a generated assembly means changing the relevant types from System.Core to Microsoft.Scripting.Core. In some cases, the namespace has also changed; the expression trees are in System.Linq.Expressions in .NET 4 but in Microsoft.Scripting.Ast for .NET 2.

The key here is to use Module.GetTypeReferences() to get all of the types an assembly references, and then change the Scope property to point to the new assembly and the Namespace property to the new namespace.

The final part (which is actually done first) is having to rewrite the IL code to replace any unsupported functions with ones that are. Thankfully, there is only one case of that so far: StrongBox<T>(), which exists in .NET 4 (and is used by LambdaExpression.CompileToMethod()) but does not exist in .NET 3.5. The paramaterless constructor call gets replaced by passing null to the constructor that takes an initial value, which is all the paramaterless one does. This is actually pretty straightforward:

strongbox_ctor_v2 = clr.GetClrType(StrongBox).MakeGenericType(Array[object]).GetConstructor((Array[object],))
strongbox_ctor_v4 = clr.GetClrType(StrongBox).MakeGenericType(Array[object]).GetConstructor(Type.EmptyTypes)

method = dcc.Methods[0]
il = method.Body.GetILProcessor()
instr = method.Body.Instructions[4] # This is specific to my use; YMMV
il.InsertBefore(instr, il.Create(OpCodes.Ldnull))
il.Replace(instr, 
    il.Create(OpCodes.Newobj, 
        method.Module.Import(strongbox_ctor_v2)))

There is one caveat here: because of how assembly references work, the IL rewriting should be done before the reference versions are changed, so that there are no stray references to 4.0 assemblies.

Next Steps

This is all just a proof of concept; there’s a few more things to do to make it usable. For example, it needs to be able to look at a set of references and work out which types moved where based on that (this is really important for Windows 8, which moved everything, it seems). Still, the approach seems promising; hopefully there aren’t anymore landmines to deal with.

Wednesday, April 4, 2012

IronPython Samples

One thing that I think has been missing from IronPython for a while now is a set of embedding samples. There are many host environments that IronPython can run in, and while they are all similar they have some differences to.

To correct this, I put together a set of IronPython Samples demonstrating how to embed IronPython in a console, WinForms, and WPF app, as well as writing a complete WPF app in IronPython.

Any feedback (and pull requests!) is welcome. In particular, I'd like to know what other platforms people are interested: Android, Windows Phone, Silverlight, ASP.NET, etc.

Tuesday, March 13, 2012

Mea Culpa

I was so excited about getting IronPython 2.7.2 out the door, I briefly dropped my common sense and made a change to IronPython that never should have been made without triggering another RC release. So what the hell happened?

The change in question is f8cce37. The correction is 4a76497.

What Broke?

The property in question – MaybeNotImplemented – checks to see if a method’s return type has the MaybeNotImplemented attribute, which tells IronPython that the operator may return NotImplemented; this indicates that the attempted operation doesn’t work and that other options should be tried. Without specifying [return:MaybeNotImplemented] on a native method, IronPython won’t generate code to perform the other operations.

Windows Phone Fixes

The issue that triggered the initial change was #32374. This is interesting in itself, as it turns out that MethodInfo.ReturnParameter is not supported on Windows Phone 7 – it exists, and it compiles, but it throws NotSupportException. Joy.

Since mobile support was new, I figured that making a change specific to Windows Phone should be OK. And it probably would have been, had I done what I originally intended and put the new WP7 code in an #if block and left the original code intact. But instead I decided that if the new code worked for both, why not use it?

Static Typing is Not Enough

Notice how small the fix is? MethodInfo.ReturnTypeCustomAttributes returns an ICustomAttributeProvider, which has the IsDefined method. As it turns out, MethodInfo also implements ICustomAttributeProvider. This means that the original fix compiled, ran, and worked for most cases, but failed on others. And they failed in the worst possible way – silently (except for the part where the program breaks).

But but but … TESTS!

Yes, the tests should have caught it. Unfortunately IronPython has been running for a while without paying much attention to the state of the tests, and there’s really no one to blame for this except me. Most of the CPython standard library tests fail at some point or another, which drowns out the useful failures in a sea of noise. This, of course, has to change, so for 2.7.3 I’m focusing on the much smaller set of tests specifically for IronPython (not those inherited from CPython).

After this, there’s no way 2.7.3 is going out without at least that baseline set of tests green, and once I get them in order all new contributions will have to pass the relevant tests. This should be the only time that I have to do an emergency release.

In the meantime, IronPython 2.7.2.1 is available, which fixes this issue.