Wikipedia

Notice anything odd about the picture on the right? It’s a piece of the wikipedia front page which shows that one of the handful of languages with more than 100 000 articles is, wait for it, Volapük, a language with an estimated 25-30 speakers worldwide. Hmm…

Obviously the vast majority of these articles have been created by a bot and it turns out that the culprit is SmeiraBot, which automatically translates English articles into Volapük. Most of the articles are about locations, for instance small towns in the US, and range from stubs a few lines long to longer articles of unknown quality to some mostly un-translated articles like this one about Monkeys Eyebrow, KY.

There is a proposal to close this wikipedia and I have to agree: it should be shut down. It exploits the wikipedia project to get attention for someone’s hobby language and devalues the effort that has gone into creating the other local wikipedias. However, the proposal apparently fizzled out so the “vükipedia” may survive. Hrmph.

On the subject of wikipedia, doesn’t it seem suspect when someone with an IP from within one of the largest Danish banks makes an article about a scandal in one of that bank’s subsidiaries less negative? See for instance these edits. Not okay!

Did you mean…?

Boy that firefox spell checker sure read my mind!


As seen on the Daily WTF.

Update: this has been fixed.

Fabaceae

A few weeks ago I needed a few integer values whose hexadecimal values were clearly recognizable when debugging — magic numbers. I did a search on the web for a list of good magic number, which I was sure would exist, but couldn’t find any. I ended up using 0xBADDEAD and 0xBEEFDAD which I thought were pretty good.

Then I made a mistake: I decided that I should make that list of magic numbers if no one else would. It wasn’t until I was done that that I realized the depth of pointlessness of the thing I had just done. But here it is: a list of all hexadecimal words that are meaningful in the english language, in the form of a google spreadsheet. As proof of the words’ meaningfulness each word has a link to a dictionary or encyclopedia describing the meaning. Most of them are pretty obscure — like dace, a small fish, or eba, a food eaten in West Africa. However, some are pretty meaningful (scroll down to see the list, for some reason blogger messes up the formatting):

ace A playing card
bad Not good; unfavorable; negative
bead A small decorative object
bee A flying insect
beef Culinary name for meat from bovines
cab A type of public transport
cafe A coffee-shop
dad Father
dead Bereft of life
deaf Insensitive to sound
decade A period of 10 years
decaf Decaffeinated coffee
fabaceae A family of flowering plants
facade The front of a building
face The front part of the head

The next time I or anyone else has to come up with a magic number it should be a lot easier. I’ll try to ignore the fact that I probably spent ten times as long compiling this list as any programmer spends inventing magic numbers in a lifetime.

Update

Jeez, it’s been months since I’ve last posted. Partly I’ve been busy and partly I just don’t have anything to say. But now I’ve come up with a few half-interesting links to post so I thought I’d give a sign of life.

First, in case you haven’t seen it already, there’s this brilliant flash game that I’ve wasted far too much time playing: gravity pods.

Second, I keep meeting people who have never read Olin Shivers’ acknowledgements in the scheme shell reference manual. If you haven’t already, go read them.

Finally, I came across this great description of a person, from Tim Moore‘s Frost on my Moustache, which had me laughing out loud in the train. It is Lord Dufferin‘s description of his butler, Wilson:

Of all the men I ever met he is the most desponding. Whatever is to be done, he is sure to see a lion in the path. Life in his eyes is a perpetual filling of leaky buckets and a rolling of stones uphill. He brushes my clothes, lays the cloth, opens the champagne, with the air of one advancing to his execution. I have never seen him smile but once, when he came to report to me that a sea had nearly swept his colleague, the steward, overboard.

If you think that’s funny you should read the whole book.

What The F—

Jobs

If you’re interested in programming languages and their implementation (and since you’re reading this there’s a good chance that you are) you may be interested to know that google is currently looking for software engineers for our Århus office:

Google Århus is hiring software engineers to design and implement high performance virtual machines. We are looking for world-class software developers who know how to create robust and optimized system software. You should have a good understanding of programming languages and practical experience with implementing them.

Just, you know, FYI.

JavaFX

Whaddayaknow, Sun has created a new “family of products”, JavaFX. This thing promises to “simplify and speed the creation and deployment of high-impact content for a wide range of devices”. The benefits are that it “reduces integration costs, improves device software consistency, and enables handset manufactures to provide new offerings with substantially faster time-to-market”. You have to hand it to Sun’s marketing people that’s top grade bullshit.

But if you look beyond the marketing crap one member of this “family of products” is a brand new programming language, JavaFX Script. There hasn’t been written a lot about it yet but I did manage to find this article that gives a quick introduction.

It is not a general-purpose language but:

[…] designed to optimize the creative process of building rich and compelling UIs leveraging Java Swing, Java 2D and Java 3D for developers and content authors.

Behind the superficial similarities with Java and JavaScript this new language has some very interesting features of its own. Here are a few of the features that caught my interest.

It is an object oriented language (runs on the JVM) but not everything is an object.

Arrays represent sequences of objects. In JavaFX arrays are not themselves objects, however, and do not nest. Expressions that produce nested arrays […] are automaticallly flattened […]

Hmm…

Arrays have a special status in the JavaFX language and are supported by some special syntactic constructs, for instance

var ints = [1, 2, 3, 4, 5];
insert 10 after ints[. == 3]
// ints is now [1, 2, 3, 10, 4, 5]

Note that the language not dynamically typed as the var declaration might seem to suggests; the ints variable is statically inferred to be an array of integers. Also note the use of a predicate to identify an element of an array. This is pretty neat:

var smallInts = ints[. < 4];
// smallInts is now [1, 2, 3]

JavaFX distinguishes between procedures and pure functions. There is a special syntax for declaring a pure function:

function dist(x, y) {
var xSqr = x * x;
var ySqr = y * y;
return sqrt(xSqr + ySqr);
}

A pure function declaration must be a sequence of variable declarations and a single final return statement. I guess they check statically that all functions called from within a pure function are also pure. If a procedure has side effects it must be declared with an operation declaration.

There are list comprehensions (array comprehensions?) with a relatively standard syntax:

select n * n from n in [1 .. 100]

Is is really so hard to find a list comprehension syntax where the declaration of the variable doesn’t come after its use? Apparently…

While the syntax is Java-like there are some syntactic differences from other languages in the C family. For logical operators they use the keywords and, or and not rather than &&, || and !. They also don’t mandate parentheses around conditions in if and while statements. Those are great changes, especially the logical operator keywords; this,

not (a == null or a.isEmpty())

reads much better than

!(a == null || a.isEmpty())

Also, if the precedence is right, it means that you can write

if (not x instanceof Foo) ...

rather than

if (!(x instanceof Foo)) ...

Another great feature is string interpolation; you can write stuff like

var answer = true;
var str = "The answer is {if answer then "Yes" else "No"}.";
// str is now "The answer is Yes."

I’ve used string interpolation in various languages and it’s great to have.

There is a reflective interface that looks a lot like Java’s except that it looks more straightforward and easier to use. There are a few extensions; a particularly spicy one is an operation that yields the extent of a class. That’s right, the syntax *:Foo yields an array of all instances of class Foo:

// Print all strings in this program
for (str in *:String)
System.out.println(str);

The clever reader might wonder if the result of *:Array contains itself since the result is an array? The really clever reader remembers that arrays aren’t objects and so can’t be enumerated by *:.

Why any language should have such a feature is a mystery to me and, it seems, also to them:

Note: this is an optional feature and is disabled by default.

Hopefully they will come to their senses and remove it completely before it’s too late and they have to maintain an implementation.

There are many more interesting features in this language which you can read about in the language reference. An implementation can be downloaded from their website; it runs on the JVM.

Overall it looks like a language with its very own peculiar flavor not quite like any languages I know. It’ll be interesting to see how well it succeeds in attracting programmers.

sh

Here’s an interesting tidbit for the code style fanatics out there. I recently found the source code of the original System 7 version of the bourne shell, sh, the father of bash. Wow. The code style is so awful that it’s almost a work of art. How bad is it? Well, I think a few lines from mac.h gives a pretty good impression:

#define IF    if(
#define THEN ){
#define ELSE } else {
#define ELIF } else if (
#define FI ;}

Here’s a bit of the resulting code (with pseudo-keywords highlighted):

FOR m=2*j-1; m/=2;
DO k=n-m;
FOR j=0; j DO FOR i=j; i>=0; i-=m
DO REG STRING *fromi; fromi = &from[i];
IF cf(fromi[m],fromi[0])>0
THEN break;
ELSE STRING s;
s=fromi[m];
fromi[m]=fromi[0];
fromi[0]=s;
FI
OD
OD
OD

As they would say over at the daily WTF: My eyes! The goggles do nothing! It’s C code alright, but not C code as we know it. It’s known as Bournegol, since it’s inspired by Algol. By the way, IOCCC, the International Obfuscated C Code Contest, was started partly out of disgust with this code.

Lucky

Some time last week, I don’t know exactly when, a change occurred on the Danish version of the Google front page. Here it is before the change:


and here it is after:


See the difference? They’ve changed the translation of I’m feeling lucky from Jeg føler mig heldig to Jeg prøver lykken. The original phrase is more or less a word-for word translation of the english phrase, literally “I feel myself lucky”. To a Danish speaker this phrase sounds about as odd as the direct translation of the new phrase would to an English speaker: “I’ll try happiness”. It sort of makes sense, and you might imagine a situation where someone might use it, but it’s certainly not a common expression. So I was very happy to hear that they were considering changing it, and now they have. Well, they’re actually not quite done yet, the old phrase is still used in a few places including the personalized homepage, but they’re getting there.

Bytecode

So, It’s been a month and a half since I last posted anything. It’s not that I’ve given up writing here it’s just that half of what I’m doing these days I’m not allowed to tell anyone and the other half I’m too busy doing to write about. However, I fell across a useful compiler implementation technique that I thought I’d take the effort to write about. It has to do with abstract syntax trees pretending to be bytecode (and vice versa).

I am, as usual, playing around with a toy language — the second one since neptune died. This time I thought I’d shake things up a little and implement both the compiler and runtime it in C++ rather than Java. My first attempt at a compiler was straightforward: the parser read the source and turned it into a syntax tree, the syntax tree was used to generate bytecode, the bytecode was executed by the runtime.

It turns out, however, that the obvious approach is pretty far from optimal. The compiler is really stupid and doesn’t actually use the syntax tree except to do a simple recursive traversal when generating the bytecode. On the other hand, the syntax tree would be really handy if I wanted to JIT compile the code at runtime, but at that point it is long gone and there is only the bytecode left.

I tried out different approaches to solving this problem and ended up with something that is a combination of bytecode and syntax trees. In particular, it has the advantages of both bytecode and syntax trees at the cost of a small overhead. Here’s an example. The code:

def main() {
if (1 < 2) return a + b;
else Console.println("Huh?");
}

is turned by the parser into the following code:

Syntax Tree Code (entry point: 49)
0 literal 1
2 literal 2
4 invoke @0.<(@2)
9 slap 1
11 if-false ~17
13 global "a"
15 global "b"
17 invoke @13.+(@15)
22 slap 1
24 return @17
26 goto ~13
28 global "Console"
30 literal "Huh?"
32 invoke @28.println(@30)
37 slap 1
39 if-else @4 ? @24 : @32
43 pop 1
45 literal #
47 return @45
49 block [@39 @47]

It it essentially a bytecode format with some extra information that allows you to reconstruct the syntax tree. A bytecode interpreter would execute it by starting at instruction 0 and ignoring the extra structural information:

   0 literal 1
2 literal 2
4 invoke @0.<(@2)
9 slap 1
11 if-false ~17
13 global "a"
15 global "b"
17 invoke @13.+(@15)
22 slap 1
24 return @17
26 goto ~13
28 global "Console"
30 literal "Huh?"
32 invoke @28.println(@30)
37 slap 1
39 if-else @4 ? @24 : @32
43 pop 1
45 literal #
47 return @45
49 block [@39 @47]

On the other hand, if you want to reconstruct the syntax tree you can read the code backwards, starting from the last instruction. In this case the last instruction says that the preceding code was generated from a statement block with two statements, the ones ending at 39 and 47. If you then jump to position 39 you can see that the code ending there was generated from an if statement with the condition ending at 4, the then-part ending at 24 and the else-part ending at 32. Just like an interpreter would ignore the extra structural information, a traversal will ignore the instructions that carry no structural information such as goto and pop:


/--> 0 literal 1
+--> 2 literal 2
\--- 4 invoke @0.<(@2) <---\
9 slap 1 |
11 if-false ~17 |
13 global "a" <--\ |
15 global "b" <--+ |
/--> 17 invoke @13.+(@15) ---/ |
| 22 slap 1 |
\--- 24 return @17 <---+
26 goto ~13 |
/--> 28 global "Console" |
+--> 30 literal "Huh?" |
\--- 32 invoke @28.println(@30) <---+
37 slap 1 |
/---> 39 if-else @4 ? @24 : @32 ----/
| 43 pop 1
| 45 literal #
+---> 47 return @45
\---- 49 block [@39 @47]

Here I’ve tried to illustrate the way the instructions point backwards using puny ASCII art. Anyway, you get the picture.

When you want to traverse the syntax tree embedded is a piece of code you don’t actually have to materialize it. Using magical and mysterious features of C++ you can create a visitor that gives the appearance of visiting a proper syntax tree when it’s really just traversing the internal pointers in one of these chunks of syntax-tree-/byte-code.

Pretty neat, I think. Having a closed syntax tree format (by which I mean one that doesn’t use direct memory pointers) means that you can let it flow from the compiler into the runtime, so you can dynamically inspect the syntax tree for a piece of code. It’s also easy to save in a binary file or send over the network.

More importantly it allows syntax trees to flow from running programs back into the compiler at runtime. One of the things I’m experimenting with this time is metaprogramming and representing code as data. In particular, I’m going for something in the style of MetaML but with dynamic rather than static typing. For that, this format will be very handy.