Thursday, June 27, 2013

Anonymous Function Blocks in Python

Python has anonymous functions in the form of lambdas, but they are limited to a single expression. For the most part, this is enough (especially now that print() is a function in Python 3), but there are cases where being able to have multiple statements would be useful. Right now, the way to do this in Python is to use a named, nested function:

def upload_data(dest, *urls):
    def _fetch(x):
        data = fetch_url(url)
        sent = 0
        for line in data:
            sent += send_data(dest, line)
        
        return sent
    
    return map(_fetch, urls)

Now, this could be rewritten as a set of expressions, but what if we had multi-statement anonymous functions?

The Idea

I’m not calling this a proposal because, quite frankly, I’m not sure it’s worth the effort, and I certainly don’t have the time or energy to try and champion it. I also haven’t looked to see if someone else has already had the same idea: it just occurred to me and I thought I’d write it down. The thought of wading through python-ideas to see if someone already had the same does not strike me as a good use of time.

Anyway, I was working on implementing function annotations for IronPython, and I realized that the arrow operator (->) was not used anywhere else in the grammar, and as far as I could tell was completely unambiguous – there’s no existing Python code that would contain the arrow. So, rather than being able to put multi-line lambdas anywhere (like C# or C++), what if they were restricted, like Ruby’s blocks? Python can’t use do like Ruby does, but maybe it could use the arrow instead?

Then, because no existing Python functions expect blocks, there needs to be a way to refer to a block in a statement. I decided to copy Ruby’s use of &, but in a slightly different way – as a placeholder for the block attached to that statement. A bare & is also not valid Python code, and I could not think of anything it could combine with that would be currently valid code.

Syntax

The result is something like this:

def upload_data(dest, *urls):
    return map(&, urls) -> (url):
        data = fetch_url(url)
        sent = 0
        for line in data:
            sent += send_data(dest, line)
        
        return sent

The –> is used to introduce the block; it’s followed by a parameter list, a :, and a suite, just like a normal funcdef. In fact, even the type annotations would be usable, although the resulting double arrow (map(&, foo) –> (f : int) –> str:) looks a bit weird.

OK, so it’s workable within the grammar (I actually implemented in IronPython’s parser, just to be sure). What does it mean?

Semantics

Semantically, these blocks are just a prettied-up version of the first function. The block is transformed into a nested function immediately before the statement with a generated name, and any block references (&) are replaced with the generated name. Some tricks would have to be played with line numbers to make debugging make sense, but that’s not insurmountable.

Multiple references would be allowed, and although I can’t think of a use case for that, it makes no sense to disallow it.

Even decorators (which are just functions, after all) can still be used:

map(my_decorator(&), foos) -> (foo):
    pass

There’s no reason they couldn’t be generators, either:

list(&()) -> ():
    i = 0
    while i < 10:
        yield i
        i += 1

The idea is to make them as close to named Python function as possible. The object passed to map is still a function instance, so all existing Python functions that take a callable should be immediately usable.

Implicit Blocks

Explicitly passing around block references is necessary to deal with existing Python functions (and we all know “explicit is better than implicit”) but it’s kind of ugly. Borrowing, again, from Ruby, it would be nice to have blocks be implicit:

def map(&func, iterable):
    return [func(e) for e in iterable]

map(foos) -> (foo):
    pass

This gets a lot trickier to implement in the general case, where there might be multiple functions with implicit blocks in the same statement. A rule of “outermost-rightmost” would probably work. I’m not exactly sure what restrictions Ruby imposes.

Use Cases

Blocks are possible to implement, and probably not too hard either. However, that doesn’t mean they’re worth doing. There aren’t too many situations where you can’t use list comprehensions, generator expressions, or lambdas, and nested name functions already exist to handle the remaining cases.

There are a couple of things that they do make nicer, though. Implementing decorators that take arguments, for one:

def timed(name):
    return & -> (func):
        return functools.wraps(func)(&) -> (*args, **kwargs):
            with timer(name):
                return func(*args, **kwargs)

Speaking of with statements, they wouldn’t be necessary with blocks:

def with_(obj, &func):
    obj.__enter__()
    try:
        return func(obj)
    finally:
        obj.__exit__()

with(open("foo.txt")) -> (f):
    upload(f)

It’s not exactly the same, since the block cannot return from the enclosing function, and you’d need nonlocal to modify variables in the outer scope. A similar treatment could be applied to for as well.

Finally, there’s the many things Ruby does with its own blocks, such as Sinatra:

get('/hi') -> ():
    return 'Hello, World'

But Flask does basically the same thing in the confines of existing Python. Still, it is very nice syntax sugar.

The Verdict

I think the idea is sound – if blocks are added to Python, they should look something like this. The work required for blocks using explicit block references should be relatively simple for someone familiar with CPython. Implicit block references are harder, but probably still doable.

That said, the use cases aren’t enough to motivate me to want to implement it (except for possibly the decorator – I can never figure out what to name those nested functions). If anyone else wants to, feel free to reuse the syntax. And if someone else already had the same idea, my apologies.

Now that I’ve written it down, I can page this idea out and never think of it again.

Tuesday, June 25, 2013

IronPython 3 TODO

This is my own list of things I want to see in IronPython 3 (after I get 2.7.4 out). It’s unlikely all of them will make 3.0 (which I’m targeting for PyCon 2014 next April), but hopefully most of them will.

Python 3 Features

Obviously, this is the most important thing. A couple are already implemented in branches (function annotations and part of keyword-only args), a few are relativelyeasy (metaclass changes, removal of old-style classes), and a few are hard (nonlocal, super(), yield from).

The first step will be to bring the new standard library and work from there. In addition, any changes needed to make the 3.x stdlib work on IronPython will be rolled into CPython so that we don’t have to maintain our own fork, with all of the work that that entails.

Better Test Coverage

IronPython’s test are currently a mess: they take too long to run and a huge number don’t even pass. Also, the test runner only works on Windows. It will need to be heavily modified (or probably replaced) with one that is more portable. It would also be nice to be able to generate coverage metrics, and be able to mark tests as “expected failure” so that TeamCity will actually build and run the tests successfully.

More Platforms

Windows desktop/server is no longer the only game in town. IronPython already works well on Mono, but without test coverage it’s hard to know how well. On top of that, the tablet/phone market is huge and getting bigger. Xamarin’s wonderful tools will make Android and iOS ports possible, and I’ve seen several requests for Windows Phone 8 and Windows Store (“Metro”) support as well, but they may not happen without someone volunteering to maintain them. Likewise, support for Silverlight will probably be dropped unless someone volunteers to maintain it.

The current code base is a mess of FEATURE_ and platform #ifdefs that is rather hard to maintain. There is the concept of a Platform Adaptation Layer (PAL), but it doesn’t get used everywhere. I’m going to see how feasible it is to expand the PAL to get rid of as many #ifdefs as possible.

At first each platform will only support embedding IronPython, but will eventually be extended to support building apps in pure Python.

“Static” Compilation and MSBuild support

IronPython currently has pyc.py to generate executables and DLLs, but it’s a bit clunky, only supports .NET 4, and is missing some useful features like arbitrary resources. Improvements to pyc.py will add those missing features to generate truly standalone executables.

The other issue is that the DLLs it generates are not usable from any .NET language; they must be loaded in an IronPython host. “Static” compilation will allow DLLs to be generated with actual .NET classes that can be reflected over and thus will be useful as plugins (for e.g. MSBuild or IIS) or to build pure-Python Andoid/iOS/WP8/Metro apps, all of which are basically plugins as well. Python classes just need some special adornment and pyc.py will pick them up and compile them to real types (or something similar):

import clr
of = clr.of

class RealClass(metaclass=clr.object):
    @clr.method(visibility="public")
    def Foo(self, x : of(int), y : of(str)) -> float:
        pass

This is both a lot and a little bit of work; the code to do the type generation is already there, but it makes some assumptions about global state that need to be pulled apart and refactored.

Yikes

OK, so there’s a lot there for nine months. And, keep in mind, this is just what I want to work on. However, I think this list will make sure that IronPython stays viable in the future. If you have any other suggestions, let me know.

As always, if you’re interested in helping, get in touch. I’m more then willing to help anyone get started. The codebase is intimidating at first, but once you got over the initial learning wall it’s not so bad.