Back

After a few weeks in Mountain View I am now back in Denmark.

While I was in Mountain View I had a chance to hang out with Peter Ahé who told me about some of the interesting new developments in Java tools, especially in javac. I expect to write a few posts about that.

If your feed reader has experienced a small hiccup when reading this blog’s feed, that’s because the new version of blogger is now far enough along that I’ve been able to convert this blog to blogger beta. The most visible consequence of this is that posts can now be tagged with labels, the same way conversations can be tagged in gmail. Each label, for instance java, then gets its own index page. Not exactly a new idea but that doesn’t make it any less useful.

Recursion

..and the winner in the category stupidest animal name is: the south american bat anoura caudifer, known in english as the tailed tailless bat. Well done.

I guess it’s just a matter of time before someone discovers the tailless tailed tailless bat.

Work

And now, a public service announcement. After more than three months of glorious unemployment I have now relapsed into a job. Indeed, a job with the very company that powers this blog: google.

In other, related, news: I will be in Mountain View for the next few weeks.

That is all.

APT

As I mentioned in my last post I’ve been playing around with database interfaces. While doing this, I’ve discovered something new: APT, the annotation processing tool for Java. I’ve known for a while that it existed but only now have I discovered what the big deal is. APT. Kicks. Ass!

In the database interface for Java I’m working on, you specify the structure of a database through annotated interfaces, essentially the same way things work in DLINQ. For instance, a database called test containing two tables would be specified like this:

public @Database("test") interface ITestDatabase extends IDatabase {
public @Table IDataSet public @Table IDataSet orders();
}

Using this interface you can open a connection to an actual relational database and fetch the contents either directly (by calling the persons or orders method) or you can specify more complex queries through something more “magical” which I’ll describe later.

The thing is that there’s a lot of ways you can get these interfaces wrong. For instance, you must use interfaces, not classes. Database interfaces must extends IDatabase. The methods that correspond to tables must return a IDataSet of some sort. And so on.

I considered this to be a problem until I discovered the annotation processor framework. An annotation processor is essentially a compiler plugin that is given access to parts of the syntax tree during compilation. That can be used for all sorts of interesting things, including issuing errors and warnings. So I’ve created an annotation processor that you can plug into your compiler and then it will check that all database-related interfaces are well-formed, and it works beautifully.

To top it all off, the compiler in eclipse now supports APT. If you just configure it correctly, which is no more difficult than using an external library, you get custom checking right in eclipse:

Of course, as with all power tools, you have show a little restraint and not issue all sorts of lame warnings just because you can…

…even if it it’s so easy the code almost writes itself:

for (MethodDeclaration method : decl.getMethods()) {
if (method.getSimpleName().equals("foo")) {
env.getMessager().printError(method.getPosition(),
"Enought with the 'foo' methods already!");
}
]

I know this tool has been around for a while. Why has it taken me so long to discover how useful it is — has there been some kind of hype that I’ve missed?

And yes, I know I’ve misspelled “enough”.

Databases

Has it really been a month since I last posted? It’s not that I’ve been particularly busy, in fact it’s been quite the opposite: after being unemployed for two months I’ve entered a state of profound laziness. It culminated last week: I did nothing but lie on my couch in an almost liquid state listening reading a song of ice and fire and listening to sigur rós. What a week that was.

But I’m slowly regaining my energy and have found a new area of interest: databases. Not databases a such, though, but support for databases in programming languages. I don’t really know anything about databases but happened to read a paper by William Cook and Ali Ibrahim, Integrating Programming Languages & Databases: What’s the Problem?.

Ideally, a programming language interface to a database should provide static type checking while still allowing queries to be parameterized a runtime, and must allow the database to optimize queries. Especially the issue of allowing queries to be optimized makes this problem difficult. Imagine that you’re given a huge database of employees and want to find all the ones called “Bob”. If you don’t care about performance you can just fetch the set of all employees from the database into your program and then select the ones with the right name:

for (IEmployee e : db.getEmployees()) {
if (e.name().equals("Bob"))
bobs.add(e);
}

If you do care the least about performance, though, you should ask the database to do all the work and just return the Bobs:

SELECT * FROM employees
WHERE name = 'Bob'

This is where things become difficult: you want to write ordinary code in your program that describes the query, like e.name().equals("Bob") but on the other hand you don’t want that code to be executed but instead sent in a query to the database.

Microsoft’s LINQ solves this by allowing code to be translated into data in the form of expression trees:

Func<int, bool> filter = (n => n < 2);
Exprint, bool>> filterCode = filter;
Console.WriteLine("{0}", filterCode.Body.Right);

Here we convert the lambda expression, (n => n < 2), into a piece of data, the second line, and then in the third line extract the Body field, n < 2, and finally the Right hand side of the body, the constant 2. When run, the program prints

ConstantExpression(2)

This solves the problem since you can now take executable code and disassemble it, which enables you to send it to the database to be executed instead of doing it yourself. But it this approach also has problems. First of all, it limits what you can write in a lambda expression since it must be possible to convert lambdas into expression trees. Also, it requires changes to the language and it feels like something that has been grafted onto the language without being generally useful, just to make things work in this special case.

So I've been playing around with a solution of my own in Java, one that doesn't require changes to the language. Instead it uses proxy objects to turn code into data. There's still plenty of problems to solve but I think it looks promising. Actually, today I ran my first query built this way, not in a database but on an ordinary collection. Here's the program:

IDataSet strs = dataSet("Burke", "Connor", "Frank", ...);
Querynew Query IString str = from(strs);
where(str.length().is(5));
select(str.toUpperCase());
}};
query.printTranscript(System.out);

Notice how the body of the query looks a whole lot like a SQL query, but is written in Java and fully type checked. And the important part is that the query body is not executed as such but turned into data. The last statement prints the disassembled query:

--- query ---
def $x0 = from(DataSet["Burke", "Connor", "Frank", ...])
where($x0.length().is(5))
select($x0.toUpperCase())
-------------

The query is not executed until it is asked to:

IDataSet result = query.perform();
System.out.println(result);

Only when you call perform is the body of the query executed. In this case the program prints

DataSet[BURKE, FRANK, DAVID]

Since the Query object contains a data representation of the code, it's a small problem to construct a corresponding SQL query to send to the database. As I said before there's still plenty of work left on this but I think solving the problem of converting code into data is a big step. If I make more progress I'll be back with an update. In any case I hope to post more often from now on.

Character pairs

Finally! My mac is back. I haven’t made much progress with saturn since my last post but in the meantime I’ve been working on something else: a contribution to eclipse.

When you create a text editor in eclipse, you can specify a character pair matcher that is typically used for highlighting matching parentheses and brackets.

In many cases when eclipse defines an interface, it also provides a default implementation that you can customize and use in your own plugin without having to deal with the full complexity of implementing the interface from scratch. In the case of ICharacterPairMatcher, however, there is no such default implementation. A few months ago I filed a feature request in the eclipse bug system for such a default implementation that you could give a set of which charaters to match, for instance "()", "[]" and "{}" and then this implementation would do all the work.

…yada yada yada, my implementation of such a character pair matcher, DefaultCharacterPairMatcher, is now a part of eclipse, as well as a new implementation of the matcher used in the Java editor. So, barring some unforeseen incident, from the next release of eclipse, my code will spring into action whenever you type a parenthesis, brace or bracket in the Java editor or any other editors that use the default character pair matcher. I think that’s pretty cool.

Saturn UI

Sigh. My mac is still being repaired. Fortunately, it turns out that my access code still works at the university so even though it’s not exactly ideal, it gives me a place to work when I really want to. So I have made some progress over the last few weeks.

Threading now works (see) but the scheduler is exceedingly stupid. One nice thing about this implementation, though, is that threading is completely deterministic. Interrupts appear to be nondeterministic but are actually based on values returned by a random number generator. All you need to know to repeat an execution is the seed of the generator and then the run will be exactly the same. That’s been pretty useful.

I’ve also found time to work on an eclipse plugin. Now there’s a perspective with a saturn editor with syntax coloring and integrated launching:


Notice the little saturn perspective marker in the top right corner. It took me forever to draw that.

The compiler is integrated with the IDE so there’s immediate compilation when files are saved:


Finally, there’s a simple debugger:


There’s still a lot of work left to do before the IDE is decent but I think it’s already surprisingly useful. You get a lot of functionality with relatively little work if you ask eclipse nicely.

By the way, if you look closely at the code in the screenshots you’ll notice that I’ve changed the block syntax from fun (...) -> ... to fn (...) -> .... I don’t really care if it’s fun or fn but I remember caring once, a few years ago when I first saw ML and O’Caml, and I remember preferring fn. I thought my former self should have some influence on this language. Even though he probably wouldn’t have liked it; he thought object orientation was for wimps. Good thing I’m running things now and not him.

Offline

I’ve sent my laptop off to be repaired, the harddisk died (quietly — that’s not my mac in the picture). This means that I will be more or less offline for the next two to three weeks. I will probably check my mail occasionally but I can’t guarantee that I’ll do it every day.

No computer. For three weeks. The horror!

Also, I’ve just moved quenta.org from b-one to dreamhost which I’m sure will cause all sorts of problem. Of which I, offline as I am, will be blissfully ignorant. Yeah, things are going pretty well here.

Tasks

Among my areas of lack-of-expertise when it comes to language implementation are threads and scheduling. It’s not that implementing a scheduler is that hard, really, but I always have trouble figuring out what the right primitives are, what should happen, exactly, when you create a new thread and such. So I’ve just ignored threading in saturn for a while and kept the model simple and single-threaded.

Of course you can only get so far without considering threading and this weekend, while trying to solve other problems, I think I accidentally ended up figuring out what the threading model should be based on. The solution is a new concept I currently call tasks, for lack of a better name.

Before getting to the threading part, the problem I was trying to solve actually has to do with collections and iteration. In smalltalk, neptune and saturn, you iterate through collections using blocks. You pass a block object to the collection and ask it to invoke the block for each of its elements:

class Vector {

...

for(block) {
for (i: 0 .. this.size())
block(this[i]);
}

}

This is in most cases more convenient for both the user and implementer of the collection than using iterators like many other languages. With blocks, all intermediate state can be stored in local variables and on the call stack, whereas with iterators, all intermediate state has to be stored in the iterator object:

iterator() {
return new ListIterator(this);
}

class ListIterator is Iterator {

def collection;
var index = 0;

// constructor

get_next() {
def result = collection[index];
index = index + 1;
return result;
}

has_more() {
return index < collection.size();
}

}

Unfortunately, even though blocks work better than iterators most of the time they have some severe limitations. The classic example of this is the fact that you can’t iterate through two collections simultaneously using blocks. In the OSVM implementations of smalltalk and neptune, collections (at least some of them) provided both block iteration and iterator objects for exactly this reason. That is of course unfortunate: two different implementations of what is essentially the same functionality.

The core of the problem is the following. With iterator objects, iteration is suspended after each call to get_next. You can wait however long you want before calling get_next again, or not do it at all. This for instance means that you can interleave the iteration of two different collections. With blocks, you can’t suspend iteration and then resume it later where you left off. You can abort block iteration in various ways, but then you can’t resume it again.

I had looked at various approaches to solving this problem that didn’t require iterators and only found one that seemed somewhat promising: what they call generators in Python and iterators in C# (and which also exists in other languages). Essentially this is a co-routine-like mechanism that allows you to suspend the execution of a method and then later resume where you left off. For instance, consider method in python:

def ints():
k = 0
while 1:
yield k
k = k + 1

When called, ints executes until it reaches a yield statement at which point it returns a value to the caller. When called again it continues immediately after the yield statement. In this case, successive calls to ints will return the integers starting from zero. The mechanism in C# is relatively similar.

Then I went to Copenhagen and back this weekend (to see Beck play live, one of the oddest concerts I’ve ever seen) and the train ride gave me some time to think about this. The result was the concept of tasks, which is similar to Python and C#’s mechanisms but cleaner.

In the simple case a task is similar to a block, and the syntax is also very similar:

def f = fun -> 4; // block returning 4
def t = task -> 4; // task returning 4

Tasks can be invoked the same way blocks can:

f()
> 4
t()
> 4

One difference between tasks and blocks is that tasks can only be invoked once. Or, to be precise, if a task has once finished executing, it is dead and calling it again will raise an error:

t()
> 4
t()
--> error!

Unlike blocks, tasks can yield values as in python and C#. Yielding a value causes the task to be suspended and the value to be returned to the caller. When the task invoked again it will resume execution immediately after where it yielded before:

def ints = task {
var k = 0;
while (true) {
yield k;
k = k + 1;
}
}

ints()
> 0
ints()
> 1
ints()
> 2
...

Finally, you can ask a task whether it has finished executing or if it has more values:

ints.is_done()
> false

And that’s it, that’s a task.

How does this solve the iteration problem? By allowing block-based iteration to be suspended:

def coll = new Vector("Ni!", "Peng!", "Neeee-wom!");
def iter = task {
for (o: coll)
yield o;
};

iter()
> "Ni!"
iter()
> "Peng!"
iter()
> "Neeee-wom!"

This way, Vector doesn’t have to implement iterators; instead, tasks can be used to allow block iteration to be suspended and resumed at will.

I’ve considered letting tasks take arguments. I can’t think of any uses for that but it does seem appropriate somehow, what with tasks and blocks looking so similar:

def get_sym = task (prefix) {
var k = 0;
while (true) {
yield "${prefix}-${k}";
k = k + 1;
}
};

gen_sym("mip");
> "mip-0"
gen_sum("fab")
> "fab-1"

It’s not too difficult to implement, I just feel silly doing so if it’s completely useless. But maybe I or someone else can come up with a neat use, I hope so.

Originally, on the train to Copenhagen, I didn’t think of tasks as tasks but as generators: objects that generate values. And the keyword wasn’t task, it was gen. But on the way back it occurred to me that maybe this is a more general concept than a generator — it’s more like a piece of work that can be paused and resumed during execution, which is where the name task came from. Also, since a task has its own execution state and stack it is really very similar to a thread. With some more work I think it’s possible to design a pretty decent threading model based on this abstraction. That might be the next thing I’ll work on.

I suspect that tasks will play the same kind of role that selector objects did in OSVM smalltalk and neptune: they’re invaluable in one or two key classes that implement some general abstractions, and then the abstractions can be used instead of the “raw” tasks or selectors.

For

Since my last two posts about for expressions I’ve been working on an implementation. Implementing new features in Saturn is pretty easy, by design, but this feature has taken longer than usual for two reasons. First of all, I really only had the syntax when I started so I first had to come up with a decent semantics. Also, implementing it revealed a bunch of bugs in other parts of the system — for instance, it turns out that mixing tail-call optimization and non-local returns gives some interesting problems. But all that has been fixed. Here’s a method that returns the factors of a given integer

factors(k) -> for (x : 2 .. k, k % x == 0) -> x;
// factors(18) = [2, 3, 6, 9]

It’s not particularly efficient but it is definitely succinct. You can also create infinite data structures:

def primes = for (x: Naturals, x > 1, factors(x).is_empty()) -> x;
// primes = [2, 3, 5, 7, ...]
// primes[100] = 541

Of course you can also use it for plain, everyday stuff like

def all_names = for (person: employees) -> person.get_name()

Compare this with, let’s pick a random language, Java:

List all_names = new ArrayList();
for (Person person: employees)
names.add(person.getName());

It’s just not the same…

Now, how does this actually work? The for expression uses a single method on collections: map_filter (I’m not too happy about this name so if you have a better idea feel free to tell me). This method takes two block arguments, map and filter and returns a new collection which contains the results of map applied to the elements in this collection for which filter returns true. A straightforward implementation could be:

map_filter(map, filter) {
def result = new ArrayList();
for (elm: this) {
if (filter(elm))
result.add(map(elm));
}
return result;
}

Note that for statements (for (...) { ... }) are not the same as for expressions (for (...) -> ...).

All a for expression is, really, is calls to the map_filter method. For instance, the body of the factor function above is expanded into

(2 .. k).map_filter(fun x -> x, fun x -> k % x == 0)

Implementing the map_filter method should be pretty straightforward and the good thing is that the implementer is free to decide how to represent the result. I definitely like this approach better than requiring collections to implement several different methods. Scala, for instance, uses three: map, filter and flatMap.

The freedom to choose the representation of the result is the key to implementing the list of primes: the implementation of map_filter in Naturals applies the map and filter lazily:

class Naturals is Collection {

map_filter(map, filter)
-> new MapFilterCollection(this, map, filter);

operator [index] -> index;

for(block) {
var i = 0;
while (true) {
block(i);
i = i + 1;
}
}

}

class MapFilterCollection is Collection {

def collection, map, filter;

// constructor, etc.

map_filter(map, filter)
-> new MapFilterCollection(this, map, filter);

for(block) {
for (obj: collection) {
if (filter(obj))
block(map(obj));
}
}

operator [index] {
var i = 0;
for (obj : this) {
if (index == i) return obj;
i = i + 1;
}
}

}

This is not an efficient implementation of MapFilterCollection but it should give an idea of how things work underneath: the collection just postpones all the work until you iterate through it or ask for a particular element.

Apart from the name of the method, map_filter, I think that’s a pretty decent approach. There’s still one issue left: what happens when you iterate through several collections at once:

for (x: 0 .. 100, y: 0 .. x, z: 0 .. y,
x + y + z == 50) -> new Tuple(x, y, z)

This expression uses three map_filters, one for each variable, but the result has to be turned into a single collection. I have implemented a solution but I think is still needs a bit of work. I’ll write more about that later.