I’m still working on this

April 23rd, 2007

I finished my 390.  I got an A+ for it.  Rock on.

Now I’m doing a 520 (honours project) in a similar area.  I got a Playstation 3 for it’s multi-core capabilities, and so I’ve been tinkering with that a bit.

Mostly, however, I’ve been working on getting automatic parallelisation working at the theoretical level.  I now have a program which I can give some sequential code to, and it turns it into N different programs which, when executed concurrently, are semantically equivalent.  I’ll post some screenshots of the super-high-tech visualisation GUI I came up with to display some of the intermediate output when I’m at uni next.

Ruby - my least favourite language

April 3rd, 2006

Ruby is horrible. Truly. Having been required to use Ruby in non-trivial situations for COMP313, I feel that I am now in a position to make comments about what I feel are some major flaws in the language design. It’s a pity, really, because it looks like it had the makings of a really good language. A good object system (without the difficulties inherent with Java’s primitives, for example), a large and active user community and a nice clean implementation of the Uniform Access Principle (i.e. properties are just methods with arity 0).

However, despite these positive things, Ruby is the least pleasant language I’ve ever had to use. This mostly comes down to a matter of syntax. Here is some Ruby code I grabbed from the ‘Programming Ruby’ book:

class Song
@@plays = 0
def initialize(name, artist, duration)
@name     = name
@artist   = artist
@duration = duration
@plays    = 0
end
def play
@plays += 1
@@plays += 1
"This  song: #@plays plays. Total #@@plays plays."
end
end

Now, here is the same code snippet, but this time I have taken the liberty of replacing various Ruby syntactic constructs with images of fish:

class Song
@@plays = 0
def initialize(name, artist, duration)
@name     = name
@artist   = artist
@duration = duration
@plays    = 0
end
def play
@plays += 1
@@plays += 1
“This  song: #@plays plays. Total #@@plays plays.”
end
end

Strangely enough, this has actually done nothing to reduce the readability of Ruby code! I’m not just being facetious here (well, not entirely) - if we were to replace these exact same constructs with fish in Java, C++, Python or any other language, we would _significantly_ reduce the readability of the code. Ruby does not have this problem, because all of its bizarre syntactic constructs have no syntactic meaning to begin with! @, @@, $, #, %, ! all suffer from this. Let’s look at another example:

foo = “Hello, World”
foo.chomp!
bar = foo.chomp

Versus:

foo = “Hello, World”
foo.chomp!
bar = foo.chomp

The latter has just as much meaning as the first.  In fact, I find that having ‘!’ littered all over the place reduces readability more than fish.

I believe that Ruby syntax is flawed for this reason.  A few meaningless delimiters are fine because they aid experienced programmers (at the expense of new or uninitiated programmers), but Ruby code is meaningless and unreadable if you do not know Ruby.  I cannot think of any other language in common use today which suffers from this problem (or at least not to this extent).

The second problem I have with Ruby is the type system.  This problem is not unique to Ruby, but it is particularly pronounced.  Dynamic typing can work fine, but Ruby’s strongly and dynamically typed approach is horrible.  If you’re going to do dynamic typing, at least build in some useful conversion and coercion.  Why on earth should I have to explicitly do type checking on parameters (think: raise “This argument is of the wrong type!” if !x.kind_of?(SomeType)), and have to explicitly call to_s methods (toString for Java-natives)?  It makes no sense!  I am being forced to a lot of extra work as a programmer.  To me, this is the sign of a poorly designed programming language.  Let’s think about this one.  I’m manually writing argument type-checking, and yet operands like ‘+’ are very picky about their parameter types.  INFER THE GODDAMNED TYPE INFORMATION, OR PUT IN EXPLICIT STATIC TYPE CHECKING!  It’s not that hard.  Either way, the current implementation leads to programmers doing more work than they should have to.  I personally favour a militant type system (SML!).  If you look at your average block of ML code, there is very little type information which cannot be inferred:

fun foobar (x) = x + 37

In case the compiler can’t work it out (or I don’t want it to work it out), I can add additional explicit type annotations (and these type annotations can go anywhere - x is a string, foobar returns a float, 37 is an Andalusian dog - that sort of thing).  The static type checker goes through and makes sure that it all makes sense.  Your type annotations must agree with both what is written (37 is plainly not an Andalusian dog), and with the know types for operators (’+’ requires homogenous arguments, and the operator is only defined for floats and integers.  Because 37 is an integer, x must be an integer too!).  Because the type of the body of the function is known, we know the return type of the function too!  The above has absolutely no explicit type information, and yet an SML compiler can tell me at compile time if I’m trying to pass incorrect parameter types to a function.  We can go even further with static assertion checking!

Ruby has none of this.

Finally (there were a lot more things I thought of today, but I can’t remember half of them - I’ll add them as they come to me!) - why is ++ not defined for integers?  I mean, Ruby has every piece of syntactic sugar under the sun, and no increment operator?  What the hell?

Syntax for the masses

March 21st, 2006

Trying to figure out just how my web language is going to look is difficult, so my hope is that my new-found Planet WLUG readership will be able to help me pick.

Functions

Option 1:  Familiar imperative syntax with explicit argument types.  Function return type can be inferred.

function fooBar(string bar) {
        return “Foo! ” . bar;
}

Option 2:  Familiar functional syntax

function fooBar (string bar) = “Foo! ” . bar;

Option 3: Python-esque

function fooBar (string bar):
        return “Foo! ” . bar

Option 4: Some combination?

Records/”Objects”

type User { string name, int age };

User(”Gian”, 84); // Create a new user within this domain:

User(”Gian”, 84) <- User("Gian", 35); // Modify the object on the LHS.

/* The above expressions return the resultant object, as well as modifying program state. */

// A search type:

User $foo;

$foo.weight >= 30;
$foo.name ~= “G*”;

$foo <- User("Googoogah", _); // Update the 'name' field in matching records.

$foo <- User(); // Delete all matching records.

displayUser($foo); // Call ‘displayUser’ once per matching record, with that record as a parameter.

// $foo has type ‘User search’

Relational searching possible:

User $foo;
Dog $bar;

$bar.name == “Sparky”;
$foo.name == $bar.ownerName;

/* The above is lazily evaluated, so evaluation of $bar first or $foo first works
in the correct way */

A Basic Demonstration App

The template file: ownerInfo.tmpl

(some HTML and a call to ownerInfo(Request.dogName)

The function:

function ownerInfo(string dogName) {
        Person $owner;
        Dog $dog;
        $dog.name == dogName;
        $owner.name == $dog.ownerName;

        return displayOwner($owner);
}

function displayOwner(Person owner) {
        return “Name: ” . owner.name . “\n” .
                  ”Age: ” . owner.age . “\n”;
}

Fin.

We all do crazy things…

March 20th, 2006

So, I’ve set this blog up as a way of recording the development process for my COMP390 directed study project, as one day in the future I will have to write a report detailing all of this. With that in mind, I will attempt to lay things out from a high-level perspective:

What:

I’m writing a distributed web application framework. That is, the environment and tools and technologies to allow people to easily write massively scalable dynamic web applications.

Why:

I think by now that it should be common knowledge that PHP sucks. Big time. If you don’t know this, then you probably have never been forced to use PHP for anything non-trivial. Despite this fact, people still write big applications in PHP. This is fine when it’s just you and your friends using the site, but once it becomes popular, you will have a problem. Now, considering that you’ve used PHP, you’ve probably also used MySQL. If you were really sensible, you might have written a simple database abstraction layer. You could perhaps add additional (or bigger) servers at the front-end (running your PHP application), but all that does is put additional strain on your database backend, and probably has very little in the way of tangible benefits.

Web applications very rarely deal with more than a very small subset of the possible operations within a relational database. The success of frameworks such as Ruby on Rails suggests that the majority of users really only need the basic CRUD operations (create, read, update, delete). The seperation between the web application and the database is usually a very artificial one within the context of web application development. It seems to be a tradition that persists because that’s the done thing. The datasets which applications deal with are rarely tied to a particular method of representation.

How:

So, in writing my framework, I have three major goals:

  1. Provide a suitable development platform for web applications that are bigger than PHP/MySQL and smaller than Oracle.
  2. Take good advantage of the increasingly parallel computing power available in commodity hardware (think: dual cores/cell processors) while providing the ability to also scale across hosts.
  3. Remove the artificial division between the application and the database. By combining the two, you can take advantage of complete control over data and logic to enact highly aggressive (read: reaching a theoretical maxima in terms of efficiency) caching schemes on very dynamic data.

A web application, in my experience, generally boils down to:

  • A reasonably simple dataset (which may be large or small).
  • A whole bunch of simple logic to take data and perform some sort of business logic, or to prepare it for display.
  • A huge amount of display code, sometimes featuring a small amount of display logic.

Traditionally, the divisions between the first component and the second two is quite clear. The data is in the database, it gets queried out by the business logic and transformed for display. The second two pieces are more often than not munged together in some horrible way, or seperated via some basic templating mechanism which is usually very code-heavy and provides very little gain for the programmer (think: smarty).

Taken on its own, “a whole bunch of simple logic to take data and perform some sort of business logic” sounds much like a pure functional operation (think Haskell or ML functions, rather than C or PHP). A function f : x -> y without any side effects. In type theory, a displayUser function could take a User as a parameter and return a string which was the users’ details formatted as a nice HTML output, for example. Ignoring exactly where that User object comes from for now, this function is pure. It does nothing to modify the environment in which it runs. The program state S remains unaltered before, during and after execution. This means that given the same input, our displayUser function will always return the same output. This is a very useful piece of information from a caching perspective, and it forms the basis of both the aggressive caching and the ultra-scalability that takes place within this system.

If the User object were coming from a database, all the advantage we gained from knowing that the function is pure and effect-free is lost. A normal PHP application will have a bunch of inline database queries and then a bunch of code to perform some sort of computation or to prepare the result for display. Because the possibility of the database being modified exists (either by another part of your application or by an external user), the sorts of caching schemes which can be implemented are feeble at best, usually resorting to query-cache style caching.

Within my system, data objects are tied to functions explicitly. The domain of a function f will consist of a number of objects. An object may be part of any number of function domains, but it cannot exist independent of them. The domains of which a given object is a member of is determined programatically, and I envisage this information being able to be inferred statically at compile time, or at worst it will be defined by the user upon entering the data. By having explicit links between logic and data (and yet ensuring they still remain seperate), our caching scheme no longer has the uncertainty of external modification. If an object is modified, it is a simple matter of notify all upstream caches within the dependency tree, simply propagating upwards and invalidating the caches. Once again, the depedency tree can be computed statically at compile time, so the computational expense associated with this is minimal (and is far outweighed by the computational savings from the effectiveness of the caching scheme). Updates to objects can be done in a ‘mostly-functional’ manner. If a function attempts to take as input a given object from its domain and modify it, then its range must wholly intersect its domain, meaning that the result of the computation modifies the program state S, but the operation is still type-checked, and the appropriate cache-invalidation may take place. Update operations can be sequenced, which becomes important when we reach the topic of concurrency.
That’s all good and well, but what happens when you want to access an external datasource, for example, an external web service, or a file on the disk? The strategy I intend to employ is to have explicit ‘impure’ declarations for functions which can never be cached reliably. Once again, the pure/impure status of a function can be statically determined at compile time, and could potentially provide a basic mechansim for sandboxing and user security. By knowing where they external data sources lie, we do not waste computational effort on attempting to cache the uncacheable, which is yet another flaw of basic “dumb” caching schemes employed elsewhere.

So, assuming that a programmer has defined a number of objects (and corresponding custom types for these objects), a number of processes which can act on input from the caller and upon objects within their domain (and some which may modify the program state), then all that is required is a basic templating system which has the ability to output content verbatim, or to output the result of a given function (for example, displayPage(Request.id)), where Request.id is some user-provided input to the page. These templates could potentially be JIT compiled.

The benefit of enforcing this seperation of logic and display is that a) the usual issues associated with doing I/O in functional programming languages no longer apply, as the order of evaluation is not important, and b) all purely functional processes can be distributed and duplicated across as many distinct hosts or processors as you like. Exactly which processes can be distributed can be determined to within some infinitesimally small quantity away from optimum, because of the existence of a multi-headed dependency tree. The usual concurrency issues of deadlocks and thread starvation and the like are eliminated because program state is not being modified. And best of all, all of this is achievable without any extra work for the programmer!

Where:

Here, of course!  And at the University of Waikato Computer Science department.

Anyway. I think I’ve written enough today.  I do hope that it’s at least partially comprehensible when I come to try to read it back later.  Perhaps tomorrow I will post some pretty diagrams explaining the process distribution method and some samples of what I think the programming and template languages should look like!