lexical subroutines in perl 5

One of the big new experimental features in Perl 5.18.0 is lexical subroutines. In other words, you can write this:

my sub quickly { ... }
my @sorted = sort quickly @list;

my sub greppy (&@) { ... }
my @grepped = greppy { ... } @input;

These two examples highlight cases where lexical references to anonymous subroutines would not have worked. The first argument to sort must be a block or a subroutine name, which leads to awful code like this:

sort { $subref->($a, $b) } @list

With our greppy, above, we get to benefit from the parser-affecting behaviors of subroutine prototypes. Although you can write sub (&@) { ... }, it has no effect unless you install that into a named subroutine, and it needs to be done early enough.

On the other hand, lexical subroutines aren’t just drop-in replacements for code refs. You can’t pass them around and have them retain their named-sub behavior, because you’ll still just have a reference to them. They won’t be “really named.” So if you can’t use them as parameters, what are their benefits over named subs?

First of all, privacy. Sometimes, I see code like this:

package Abulafia;

our $Counter = 0;

...

Why isn’t $Counter lexical? Is it part of the interface? Is it useful to have it shared? Would my code be safer if that was lexical, and thus hidden from casual accidents or stupid ideas? In general, I make all those sorts of variables lexical, just to make myself think harder before messing around with their values. If I need to be able to change them, after all, it’s only a one word diff!

Well, named subroutines are, like our variables, global in scope. If you think you should be using lexical variables for things that aren’t API, maybe you should be using lexical subroutines, too. Then again, you may have to be careful in thinking about what “aren’t API” means. Consider this:

package Service::Client;
sub _ua { LWP::UserAgent->new(...) }

In testing, you’ve been making a subclass of Service::Client that overrides _ua to use a test UA. If you make that subroutine lexical, you can’t override it in the subclass. In fact, if it’s lexical, it won’t participate in method dispatch at all, which means you’re probably breaking your main class, too! After all, method dispatch starts in the package on which a method was invoked, then works its way up the packages in @INC. Well, package means package variables, and that excludes lexical subroutines.

So, it may be worth doing, but it means more thinking (about whether or not to lexicalize each non-public sub), which is something I try to avoid when coding.

So when is it useful? I see two scenarios.

The first is when you want to build a closure that’s only used in one subroutine. You could make a big stretch, here, and talk about creating a DSL within your subroutine. I wouldn’t, though.

# Please forgive this extremely contrived example. -- rjbs, 2013-09-25
sub dothings {
  my ($x, $y, @rest) = @_;

  my sub with_rest (&) { map $_[0]->(), @rest; }

  my @to_x = with_rest { $_ ** $x };
  my @to_y = with_rest { $_ ** $y };

  ...
}

I have no doubt that I will end up using this pattern someday. Why do I know this? Because I have written Python, and this is how named functions work there, and I use them!

There’s another form, though, which I find even more interesting.

In my tests, I often make a bunch of little packages or classes in one file.

package Tester {
  sub do_testing {
    ...
  }
}

package Targeter {
  sub get_targets {
    ...
  }
}

Tester->do_testing($_) for Targeter->get_targets(%param);

Sometimes, I want to have some helper that they can all use, which I might write like this:

sub logger { diag shift; diag explain(shift) }

package Tester {
  sub do_testing {
    logger(testing => \@_);
    ...
  }
}

package Targeter {
  sub get_targets {
    logger(targeting => \@_);
    ...
  }
}

Tester->do_testing($_) for Targeter->get_targets;

Well… I might write it like that, but it won’t work. logger is defined in one package (presumably main::) and then called from two different packages. Subroutine lookup is per-package, so you won’t find logger. What you need is a name lookup that isn’t package based, but, well, what’s the word? Lexical!

So, you could make that a lexical subroutine by sticking my in front of the subroutine declaration (and adding use feature 'lexical_subs (and, for now, no warnings 'experimental::lexical_subs')). There are problems, though, like the fact that caller doesn’t give great answers, yet. And we can’t really monkeypatch that subroutine, if we wanted, which we might. (Strangely abusing stuff is more acceptable in tests than in the production code, in my book.) What we might want instead is a lexical name to a package variable. We have that already! We just write this:

our sub logger { ... }

I’m not using lexical subs much, yet, but I’m pretty sure I will use them a good bit more in the future!

Written on September 25, 2013