Back To The Code

How do you do that again?

Alternative to Thread.sleep()

I stumbled upon this in our code base.

long slept = 0;

while (_queue.size() < minCount && slept < timeout) {
  Thread.sleep(100);
  slept += 100;
}

if (_queue.size() < minCount || slept > timeout)
  return new ArrayList<Delivery>();

return drain(timeout - slept);

A thread (publisher) publishes to the queue while another thread (consumer) attempts to drain the queue if the size of the queue is above some threshold. If the queue is below said threshold, the consumer thread will sleep for 100ms and check the size again. If the total sleep time has exceeded some timeout value, we return an empty list. In all other cases we drain the queue into a list and return the list.

There is a cleaner way to do this and I think most situations where we sleep for some interval and then check some value can be rewritten to use Reentrant locks with Conditions.

// consume method //

long elapse = timeout + System.currentTimeMillis();

drainLock.lock();
try {
  while (_queue.size() < minCount && System.currentTimeMillis() < elapse)
     newDelivery.await(timeRemaining(elapse), TimeUnit.MILLISECONDS);
} finally {
  drainLock.unlock();
}

if (_queue.size() < minCount)
  return new ArrayList<Delivery>();

return drain(timeRemaining(elapse));

// publish method //

this._queue.add(new Delivery(envelope, properties, body));

drainLock.lock();
try {
  newDelivery.signal();
} finally {
  drainLock.unlock();
}

The JavaDoc is not trivial to understand. One would think that the consumer holding the lock would block the publisher from publishing to the queue. The explanation is that if a thread is holding on to a lock and waiting, another thread can grab the lock too. So in the example above, when the consumer thread calls await(), it allows another thread, in this case the publisher thread, to grab the lock and call signal(). The signal() method awakens the waiting consumer and allows the consumer tread to run once the lock is released by the publisher.

From GWT to greener pastures

Google Web Toolkit (GWT), a development framework for web applications that allows one to write Java code for the web by compiling it to JavaScript. We were using an older version of GWT together with an incubator library for creating tables. These tables became a stable part of the entire web application.

We had a few issues:

  • making changes during development meant waiting for 20 seconds to see the effect – GWT has to reload the code in “hosted mode”
  • “hosted mode” is not production mode, regular expressions for example, work differently in hosted (Java environment) and in production mode (JavaScript environment)
  • the incubator library did not support later versions of GWT, we were locked in unless we wanted to convert all table code
  • future support for GWT is questionable (GWT team now working on Dart)
  • it is a bit niche, hard to find questions and answers compared to other frameworks
  • GWT RPC is not human readable nor sane

At some point we decided that we should make an effort to make GWT upgradeable, which meant replacing all the tables – which actually meant rewriting the entire application. Given some of the problems mentioned above would not be alleviated by upgrading, I decided to write a report that weighed the pros and cons of staying with GWT and presented this to our team. Based on this report we decided to plan how we could migrate away from GWT to a less niche web framework collection.

We would be incrementally adding new components using the new framework, replacing GWT components with the new framework and eventually removing GWT all together. Through this transition, we would need to facilitate communication between GWT and the new framework in order for GWT to invoke displaying components in the new framework and vice-versa. We created a so called Native Event Bus written in JavaScript that allowed us to write GWT native JavaScript which interfaced with this Native Event Bus. The new framework did the same thing. This created a communication bridge between GWT Java code and the new JavaScript based framework.

Once the Native Event Bus was in place, we were able to create components using the new framework and place these components inside GWT components and vice-versa. We’re no longer developing new components using GWT. Having the near immediate feedback, not having to worry about Java vs. JavaScript, being able to use sane server communication like REST, simply doing web development in a normal way is just great!

The new framework consists of a collection of libraries such as: backbone, asm, jquery, mustache – libraries that: had substantial documentation compared to alternatives, community support and considered generally accepted within the web development community.

 

Constructing Master String From Substrings

I stumbled upon a problem where I had to construct a master string m given a corpus of substrings of the master string. In other words, the set S contains strings s such that the test m.contains(s) == true holds for all s. The result had to be constructed of as few substrings as possible. A substring may be used multiple times in order to reconstruct the master string.

m = ABCDEFGHIJKLM

S = { ABC, AB, CDEF, EFHI, BCDEFH, JKLM, … }

The first step in finding a solution is to find the position of s in m for all s in S. Once we have the positions, we can construct a directed graph whereby an edge connects two consecutive substrings s. We then keep connecting substrings to each other until we have reconstructed the entire master string.

position : substring
0 : ABCD

0 : AB
2 : CDEF
4 : EFHI
etc…

In order to find the solution which uses the least number of substrings, we should traverse our graph using Breadth-first search (BFS). Starting from the root node, we visit each child and check if we have satisfied the end condition. The end condition being that the last substring matches the ending of the master string m. If the end condition is not met, we go to the next level of substrings connected to the nodes that we’ve just visited.

When the end condition is met, we can traverse back up the branch in order to collect all the substrings needed. Alternatively, we could keep track of the nodes that we have visited so far, but this seems an unnecessary use of memory.

Some substrings may occur in multiple positions. We should use all the possible positions of a substring when we construct our graph since we’re allowed to use a substring more than once. Note that we do not actually need to fully create the graph, instead we can keep the strings in a map indexed by their positions and walk through the map as though it were a graph structure.

In the case of missing substrings, e.g., if we have no substrings that contain one of the letters in m, we could try to find the closest match by simply increasing the position counter until we hit the next substring. This means we tolerate a gap in our solution.

Since we use BFS traversal, we are guaranteed to stop when we have the solution with the least number of substrings. Depth-first search (DFS) would result in us traversing all possible branches first before finding the optimal solution, which is not what we want, but the problem can be solved this way too.

Would be nice to find out if anyone has a better solution, please let me know if you do.

Case Study: Complex Bi-Directional Validation

The problem

Not too long ago, we built a web user interface component that contained a form with three sections. The user was able to select several options from each section. The choices made in section A affected the availability of the choices in sections B and C. Similarly, selecting an option from section B affected the availability of options in sections A and C. The same applied to choices made in section C.

A bit like this

For this example, when the user selects chocolate – apples, melons and caramel are no longer available as choices. Conversely, selecting melons or caramel or apples will render chocolate as disabled. In other words, the choices in each of the sections have a bi-directional effect on the availability of choices in other sections.

Expressing the rules

Since we are likely to add new options, and writing if-else statements is a very tiring process, we express the rules in a human readable form and save them to a file which we can later use as the input to our validator. The rules are expressed as follows:

(flavours):(fruits):(toppings)

Example:

v:a*b:a*s*c // this means that you can combine vanilla with apples 
            // and bananas with almond, sprinkles or caramel toppings

This allows us to easily communicate about which rules are missing or need to be removed. It’s easy to see that we will end up with many rules in order to express the constraints for the different options.

Parsing the rules

Now that we have all the rules stored in a file, we can begin to parse the file into an in-memory data-structure. Maybe something that isn’t quite obvious to begin with is that we can save these rules into a single integer by switching bits on or off corresponding to valid combinations – as long as the sum of the number of choices in the sections doesn’t exceed 32 (or 64 if you’re into longs, larger requirements should use BitSet). For this example, we would reserve 3 bits for each section, which means we have a requirement of 9 bits leaving the other 23 bits free given that we will work with 32bit integers.

32bit integer

[0 ... {flavours} {fruits} {toppings}]

Example:

[0 ... 001 011 111] // which is the rule shown above [0 ... v a*b a*s*c]

In essence, we have created a bit mask for each rule which also corresponds to a unique integer.

Validation

Saving these integers into a set V allows us to check whether a combination of selected options is valid or not in constant time. Furthermore, given the current state of selected options as integer c, we can find all possible valid future states by performing bitwise or operations to come up with a superset of valid future selections as integer f. We can then use f to enable/disable checkboxes in the UI. An added bonus is that, in the case of an invalid state, we can suggest the nearest valid state, n, to the user by performing xor operations on c and integers in and using the integer result with lowest bit cardinality:

n =  min_cardinality( xor( v, c )) for all v in V.

In real life

The initial solution didn’t use bits or bitwise operations, instead we used Java collection types such as HashSet and ArrayList. The performance wasn’t bad, but it certainly wasn’t great either. Moving to this solution decreased the CPU time by a factor of 20. Also, the real solution doesn’t deal with just 3 sections of 3 choices, instead there are 3 sections with a variation of different choices requiring 24 bits in total leaving 8 bits free for choices to be implemented in the future.

What I like about this solution is that it is simple and it has a lot of benefits that I didn’t think of to begin with, such as finding the nearest valid state. This solution was somewhat inspired by reading Programming Pearls by Jon Bentley. Even though the book isn’t the new hotness, it has aged with grace and I found it a truly inspiring read.

Levenshtein Distance using Clojure

I’m pretty much a total beginner when in it comes to writing Clojure. In order to familiarize myself more with the language, I decided to try to implement the Levenshtein distance algorithm.

The Levenshtein distance is a metric that measures the number of operations required to transform a word into another word. For example, it takes 3 operations to transform the word “kitten” to the word “sitting”.

My reason for choosing the Levenshtein distance for learning Clojure is the fact that implementing dynamic programming solutions in functional languages is non-trivial. The solutions to dynamic programming problems usually require populating a matrix for intermediate values that hold the current optimal solution.

I ended up writing three different solutions. I wasn’t really happy with the first two, the last solution is quite ok, in my opinion.

My first solution was to use loop and recur, whereby I pass the current distance table as an argument in the loop. It was pretty much a straight-up rip-off of the pseudo-code implementation given on wikipedia. Note also that I created a separate function for creating the initial values of the distance table.

(defn init-distance [m n]
  (reduce #(assoc %1 %2 (apply max %2)) {}
    (concat
      '([0 0])
      (for [i (range 1 (inc m))] [i 0])
      (for [j (range 1 (inc n))] [0 j]))))

(defn levenshtein-distance1 
  (let [m (count source)
        n (count target)]
    (loop [j 1
           i 1
           d (init-distance m n)]
      (cond
        (> j n) (d [m n])
        (> i m) (recur (inc j) 1 d)
        (= (get source (dec i)) (get target (dec j))) (recur j (inc i) (assoc d [i j] (d [(dec i) (dec j)])))
        :else (recur j (inc i) (assoc d [i j] (inc (min
                                                     (d [(dec i) j])
                                                     (d [i (dec j)])
                                                     (d [(dec i) (dec j)])))))))))

Not very elegant. I use conditions to iterate over [i j], generally the code is pretty messy. Also, I’m pretty sure we don’t need a separate function for creating the initial distance values. Using a loop/recur usually means that we can use a map operation instead.

Here comes version 2.

(def dist (memoize (fn [pred index] (let [[i j] index]
                                      (cond
                                        (zero? (min i j)) (max i j)
                                        (pred index) (dist pred [(dec i) (dec j)])
                                        :else (inc (min
                                                     (dist pred [(dec i) j])
                                                     (dist pred [i (dec j)])
                                                     (dist pred [(dec i) (dec j)]))))))))

(defn levenshtein-distance2 
  (let [pred (fn [index] (let [[i j] index]
                           (=
                             (get source (dec i))
                             (get target (dec j)))))]
    (->> (for [j (range (count target))
               i (range (count source))]
           [i j])
      (map (partial dist pred))
      last)))

I removed the loop and replaced it with a recursive call. The init-distance function is no longer needed since I added an extra condition that returns the correct value for when either i or j is zero. By calling the named function, we can use memoization of intermediate results which means that we don’t have to recalculate the values over and over again when we call the recursive function. Also, there is quite a bit of repeated code, the fact that we always call dist with the pred parameter. Another concern is that we don’t really control how the caching of intermediate values is managed which could lead to huge memory usage.

Let’s take a look at the next and last version.

(defn dist-step [pred d index]
  (let [[i j] index]
    (assoc d [i j]
      (cond
        (zero? (min i j)) (max i j)
        (pred index) (d [(dec i) (dec j)])
        :else (inc (min
                     (d [(dec i) j])
                     (d [i (dec j)])
                     (d [(dec i) (dec j)])))))))

(defn levenshtein-distance3 
  (let [m (count source)
        n (count target)
        pred (fn [index] (let [[i j] index]
                           (=
                             (get source (dec i))
                             (get target (dec j)))))
        step (partial dist-step pred)
        dist (reduce step {} (for [j (range n) i (range m)] [i j]))]
    (dist [(dec m) (dec n)])))

I removed the recursive call, instead I take the current form of the distance table as an argument to the dist-step function and return the distance containing the new distance value for [i j]. This allows us to use the function as a reducer by mapping a list of indices to a single distance table which is what we do in the actual levenshtein function. Above all, this version is a lot more readable compared to the previous two versions. It also exposes a general strategy for solving recursion problems when using functional languages. Instead of doing the recursive call that is so inherent in dynamic programming, we create a step function that performs one such recursive operation on a piece of data and returns the new data value. We then use this inside a reduce operation.

I’m adamant that there is a much simpler and more elegant way of writing this in Clojure – would love to get some comments and feedback.

Some interesting links regarding recursion and dynamic programming in Clojure:

A Quick BreakOut Game in JavaScript

Every third friday, the company that I work at lets us do whatever we want, as long as it is programming related and helps us become better developers. It had been a while that I wrote anything in pure JavaScript and was feeling the need to get back into it. I decided to take the opportunity to write BreakOut using the HTML canvas element. The goal was to have something up and running by the end of the day.

The idea was to make a BreakOut game that has support for multiple balls and multiple players as well as bricks of differing ‘hardness’.

I ended up writing this. Here’s the code for the startGame method:

function startGame() {

    Game.ctx = getContext();
    Game.screen = getScreen();

    var screen = Game.screen;

    Events.init();

    var p = new Player();
    p.x = Game.screen.hmiddle - p.w / 2;
    p.y = Game.screen.height - p.h - 5;
    Game.players.push(p);

    for (var i = 10; i < screen.vmiddle; i += 30)
        for (var j = 10; j < screen.width - 60; j += 60)
            Game.bricks.push(new Brick(j, i, randInt(3)));

    Game.balls.push(new Ball(10, Game.screen.vmiddle, 0.3, 0.3));

    // extra ball - weeeeee!  😀
    setTimeout(function () {
        Game.balls.push(new Ball(10, Game.screen.vmiddle, 0.3, 0.3));
    }, 5000);

    step(new Clock());
}

And here’s a quick screenshot of the game running in js fiddle. I discovered js fiddle just after I finished writing the game. It looks like a great online development tool. Look forward to using it more in the future.

Growl Notifications Using Clojure

I just pushed some Clojure code that allows one to create Growl notifications from within Clojure. It works by instantiating an AppleScript engine and sending apple script snippets to the engine which in turn communicates with the GrowlHelperApp.

The implementation is based on the Java version posted here. I just re-wrote it in Clojure, here.

Here are some unit tests to show how it can be used.

(ns growl-clj.test-core
  (:use [growl-clj.core])
  (:use [clojure.test]))

(deftest test-growl-enabled?
  (if growl-enabled?
    (print ">>>>>>>>>>> Growl is enabled")
    (print ">>>>>>>>>>> Growl is NOT enabled")))

(deftest test-register-app
  "Registering application with Growl"
  (let [available-notifications ["system" "user"]
        enabled-notifications ["system" "user"]
        app-name "MyApp"]

    (register-app
      available-notifications
      enabled-notifications
      app-name)))

(deftest test-notify
  "You should see a Growl notification after running this test"
  (let [notification-name "user"
        title "hello from test"
        message "nice! it works"
        app-name "MyApp"]

    (notify
      notification-name
      title
      message
      app-name)))