Monday 28 December 2015

Take that Turing and Church!!!



Halting problem is undecidable ala Dr Seuss:   

No program can say what another will do. 
Now, I won’t just assert that, I’ll prove it to you:
I will prove that although you might work til you drop,
you can’t predict whether a program will stop.

Imagine we have a procedure called P
that will snoop in the source code of programs to see
there aren’t infinite loops that go round and around;
and P prints the word “Fine!” if no looping is found.

You feed in your code, and the input it needs,
and then P takes them both and it studies and reads
and computes whether things will all end as they should
(as opposed to going loopy the way that they could).

Well, the truth is that P cannot possibly be,
because if you wrote it and gave it to me,
I could use it to set up a logical bind
that would shatter your reason and scramble your mind.

Here’s the trick I would use – and it’s simple to do.
I’d define a procedure – we’ll name the thing Q –
that would take any program and call P (of course!)
to tell if it looped, by reading the source;

And if so, Q would simply print “Loop!” and then stop;
but if no, Q would go right back to the top, 
and start off again, looping endlessly back,
til the universe dies and is frozen and black.

And this program called Q wouldn’t stay on the shelf;
I would run it, and (fiendishly) feed it itself.
What behaviour results when I do this with Q?
When it reads its own source, just what will it do?

If P warns of loops, Q will print “Loop!” and quit;
yet P is supposed to speak truly of it.
So if Q’s going to quit, then P should say, “Fine!” –
which will make Q go back to its very first line!

No matter what P would have done, Q will scoop it:
Q uses P’s output to make P look stupid.
If P gets things right then it lies in its tooth;
and if it speaks falsely, it’s telling the truth!

I’ve created a paradox, neat as can be –
and simply by using your putative P.
When you assumed P you stepped into a snare;
Your assumptions have led you right into my lair.

So, how to escape from this logical mess?
I don’t have to tell you; I’m sure you can guess.
By reductio, there cannot possibly be
a procedure that acts like the mythical P.

You can never discover mechanical means
for predicting the acts of computing machines.
It’s something that cannot be done. So we users
must find our own bugs; our computers are losers!



Geoffrey K. Pullum

Monday 23 November 2015

Currying in Kotlin

As an example consider summing over a range of integers applying a provided function on each. Something like
fun sigma(range: IntRange, f: (Int) -> Int): Int {
    var result = 0
    for (i in range) result += f(i)
    return result
}
and we have
println(sigma(1..5) { x -> x  })    // prints: 15
println(sigma(1..5) { x -> x * x }) // prints: 55
We would like to to fix f() so that we could define:
val sigmaOfSum = sigma { x -> x }
val sigmaOfProduct = sigma { x -> x * x }
without committing to a range, such that later on, we can invoke
println(sigmaOfSum(1..5))     // prints: 15
println(sigmaOfProduct(1..5)) // prints: 55
This can be done in straight Kotlin:
fun sigma(f: (Int) -> Int): (IntRange) -> Int {
    fun applyF(range: IntRange): Int {
        var result = 0
        for (i in range) result += f(i)
        return result
    }
    return ::applyF
}
Here, the (higher order) function sigma takes f() as a parameter and returns a function that applies f() on the provided range.
A more down to earth example; suppose we want to perform a DB lookup. In principal, we need specify a key/query and the th DB instance but we want to break these two things apart in the spirit of the first example:
fun lookUp(db: Map): (String) -> String? {
    fun dbLookup(key: String): String? {
        return db.get(key)
    }
    return ::dbLookup
}
now the following works:
val customerLookup = lookUp(mapOf("1" to "Customer1", "2" to "Customer2"))
val productLookup = lookUp(mapOf("1" to "Product1", "2" to "Product2"))

println(customerLookup("1")) // prints Customer1
println(productLookup("1"))  // prints Product1
Our lookUp() function is quite simple and can be collapsed to:
fun lookUp(map: Map): (String) -> String? = { map.get(it) }   

Friday 20 November 2015

Class delegating in Kotlin

The Kotlin language documentation is somewhat patchy and the class delegating functionality is under documented. To see what it does it's best to look at an example:
class Customer()
class Product()

interface CustomersFinder {
    fun findCustomers(query: String): Set<Customer>
}

interface ProductFinder {
    fun findProducts(query: String): Set<Product>
}
Now, suppose we want to have a MegaFinder class that is capable of querying both Customers and Products. The standard way to do this is

  1. pass instances of the finders to the MegaFinder
  2. have the MegaFinder stash those instances in private properties, and
  3. call on them as needed

All that is in the realm of boilerplate stuff and Kotlin has a concise alternative. This is how it looks like:
class MegaFinder(customerFinder: CustomerFinder, productFinder: ProductFinder)
 : CustomerFinder by customerFinder, ProductFinder by productFinder 
{
    fun loadAll() {
        val customers = findCustomers("*:*")
        val products = findProducts("*:*")
        //...
    }
}
What this says is that MegaFinder implements the finder interfaces by delegating to the designated instances. It is more terse then the traditional approach with two caveats that I see
  1. The delegating approach requires the MegaFinder to implement the finder interfaces. This is not necessarily bad, but the freedom not to do that is not available
  2. When reading the code it is not immediately obvious where the findCustomers(...) and findProducts(...) are implemented - it is almost is if you need some sort of IDE to figure this out :-)

Wednesday 24 June 2015

Traversing LinkedList considered (somewhat) harmful

I compare traversal of LinkedList, ArrayList and an array. First things first, given the mechanics of linked lists, iterating with a for loop, e.g:
        for (int i = 0; i < N; i++) {
            linkedList.get(i));
        }
is expected to be comparatively slow since the list has to be scanned every time to get() to the i-th element. While sub-optimal, it may still happen in the wild, especially when a call returns a List and the client programmer is not aware of the actual implementation.
 The benchmark measures performance of "for-style" traversal of LinkedList, ArrayList and array as well as "iterator-style" traversal of LinkedList and ArrayList e.g:
        Iterator iter = list.iterator();
        while (iter.hasNext()) {
            iter.next();
        } 

The code is available here. The output from JMH is
Benchmark                           Mode  Cnt       Score      Error  Units        
ListsBenchmark.forArray            thrpt   10  423016.591 ± 2745.121  ops/s        
ListsBenchmark.forArrayList        thrpt   10  345255.508 ± 1367.974  ops/s        
ListsBenchmark.forLinkedList       thrpt   10    2559.053 ±   14.089  ops/s        
ListsBenchmark.iteratorArrayList   thrpt   10  299156.440 ± 8655.560  ops/s        
ListsBenchmark.iteratorLinkedList  thrpt   10  212684.595 ± 1036.017  ops/s        
The score is number of operations per second (so higher is better) where an operation is a single traversal of a list/array of size 1000

Comments
  • As expected, traversal of a LinkedList with a for loop performs badly - about n * n/2 on the average
  • Although ArrayList is backed by an array, it performs significantly worse then a plain array. 
  • Iterating over LinkedList is significantly worse then iterating over ArrayList. This is most likely due to the fact that to move from one node to the next in a LinkedList,  the next node if fetched from the heap and then the contained value is fetched from the heap again. In the case of the ArrayList the references to the values are loaded into CPU caches and so only one visit to the heap is needed

Tuesday 16 June 2015

Lazy Map initialization

Here is one Java idiom I am happy to see go away with the tools made available in Java 8. In this scenario we have a Map with mutable values for which not all keys are known at creation time. Before Java 8, the way to add a mapping would look like this:
    private final Map< String, List< String > > map = new HashMap<>();

    public void addValue(String key, String value) {
        List valueList = map.get(key);
        if (valueList == null) {
            valueList = new ArrayList<>();
            map.put(key, valueList);
        }
        valueList.add(value);
    }
The equivalent logic with Java 8 is:
    public void addValue(String key, String value) {
        map.computeIfAbsent(key, v -> new ArrayList<>()).add(value);
    }
If the Map values are immutable, the following also works:
    private final Map< String, String > map = new HashMap<>();

    public void addValue(String key, String value) {
        map.merge(key, value, (v1, v2) -> value);
    }

Thursday 11 June 2015

Generating collisions in String.hashCode

In a previous post I mentioned collisions in String.hashCode(). Turns out that generating such collisions is quite easy.
String.hashCode() is defined as:
 
    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }
For a string A of length N (a0, a1,..., aN-2, aN-1),
this can be written in a closed form as: 31^(N-1)a0 + 31^(N-2)a1 + ... + 31aN-2 + aN-1.


Pad by zeros

The first observation is that 0 is a valid character and prepending arbitrary number of 0 characters doesn't change the value of the hash. The following code takes advantage of this:
    public static String padByZeroes(String original, int numberOfExtraZeroes) {
        int length = original.length();
        char[] chars = new char[length + numberOfExtraZeroes];
        for (int i = 0; i < length; i++) {
            chars[i + numberOfExtraZeroes] = original.charAt(i);
        }
        return new String(chars);
    }
padByZeroes produces strings of length greater then the original with identical hashcode.

Pair Perturbation

As a starting point to this approach we take a string B which is identical to A: (b0, b1,..., bN-2, bN-1). We then modify two consecutive character of B without affecting it's hashcode.
If the modified characters are are at index K and K+1 we have: hashCode(A)-hashCode(B) = ... = 31(aK - bK)+(aK+1 - bK+1).
We want the last expression to be zero, that is:  31(aK - bK) =  bK+1 - aK+1. As we are free to choose bK and bK+1, to make things simple (and reduce the chance of overflowing/underflowing the two byte value of char) we choose bK = aK - 1 which gives us bK+1 = aK+1 + 31.
    public static String pairPerturbation(String original, int pairIndex) {
        int length = original.length();
        if (pairIndex > length - 2) {
            throw new IllegalArgumentException("pairIndex can't be greater then " + (length - 2));
        }

        char[] chars = original.toCharArray();

        chars[pairIndex] = (char) (original.charAt(pairIndex) - 1);
        chars[pairIndex + 1] = (char) (original.charAt(pairIndex + 1) + 31);

        return new String(chars);
    }
Here is a link to the Gist of the code and tests: https://gist.github.com/sorokod/a6848e6cd4c9fe32a523#file-stringhashcollisions

Sunday 10 May 2015

A kink in the API

In TreeSet Javadoc we have:

"Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface."

Lets see what this means in practice. In this example we have a Transaction object which has an id and amount instance variables. Transactions are considered equal if they have the same id.
   public class Transaction {
        private final String id;
        private final int amount;

        public Transaction(String id, int amount) {
            this.id = id;
            this.amount = amount;
        }

        public int getAmount() {
            return amount;
        }

        @Override
        public int hashCode() {
            return id != null ? id.hashCode() : 0;
        }

        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (o == null || !(o instanceof Transaction)) return false;
            return id != null ? id.equals(((Transaction) o).id) : ((Transaction) o).id == null;
        }
    }
To compare Transaction objects based on the value of the amount so we introduce the relevant Comparator:
    public static class TransactionComparator implements Comparator {
        @Override
        public int compare(Transaction t1, Transaction t2) {
            return t1.getAmount() - t2.getAmount();
        }
    }
We now have everything in place for a small experiment:
TreeSet set = new TreeSet<>(new TransactionComparator());
Transaction t1 = new Transaction("T1", 100);
Transaction t2 = new Transaction("T2", 100);

System.out.printf("set did not already contain t1: %b\n", set.add(t1)); // set did not already contain t1: true
System.out.printf("set did not already contain t2: %b\n", set.add(t2)); // set did not already contain t2: false
System.out.printf("|set| = %d\n", set.size()); // |set| = 1
The quote at the top of the post means that as far as the TreeSet (and its friendss in SortedSet interface) is concerned, object equality is derived from compareTo() and not equal(), hence the two transactions t1 and t2 are equal. This is pretty unusual but one could argue that a fair warning was given.

Except that "The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface." doesn't really match:
System.out.printf("t1 is in set: %b\n", set.contains(t1)); // t1 is in set: true
System.out.printf("t2 is in set: %b\n", set.contains(t2)); // t2 is in set: true
And there we have it: |set| == 1 and set contains two elements. I think that it is more correct to say that "All bets are off regarding the behavior of a set if its ordering is inconsistent with equals"

-- Gist link 


Friday 24 April 2015

Hash function benchmarks

Some hashing benchmarks using JMH. Code and results are on GitHub. SipHash and CRC-32C look good, some generate collisions on the test data though.


Monday 13 April 2015

Setting up Java 8 Maven Ubuntu on Vagrant

Prerequsites:

  • VirtualBox is installed
  • Vagrant is installed
  • ssh and wget are available in PATH. Cygwin will do, e.g.: c:\cygwin\bin
  • We will use some of the Ubuntu setup scripts from ubuntu-equip - have a look.
You may want to download the Ubuntu Vagrant box so its available locally like so:
  E:\VagrantBoxes>wget http://files.vagrantup.com/precise64.box
Create a project directory, I have E:\vagrant_java. In the project directory create a file named Vagrantfile with the following content:
Vagrant.configure(2) do |config|
  config.vm.box = "ubuntu.lts.64"
  config.vm.box_url = "file:///E:/VagrantBoxes/precise64.box"
  config.vm.provision :shell, inline: 'wget --no-check-certificate https://github.com/aglover/ubuntu-equip/raw/master/equip_java8.sh && bash equip_java8.sh'
  config.vm.provision :shell, inline: 'wget --no-check-certificate https://github.com/resilva87/ubuntu-equip/raw/master/equip_maven3.sh && bash equip_maven3.sh'
end
Bring up the vm, this may take a while as the relevant packages are provisioned from the net:
E:\vagrant_java>vagrant up
Lets have a look at what we ended up with
E:\vagrant_java>vagrant ssh
 vagrant@precise64:~$ java -version                                   
 java version "1.8.0_40"                                              
 Java(TM) SE Runtime Environment (build 1.8.0_40-b25)                 
 Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)                                                    

 vagrant@precise64:~$ readlink -f $(which java)
 /usr/lib/jvm/java-8-oracle/jre/bin/java

 vagrant@precise64:~$  /opt/maven/bin/mvn -version
 Apache Maven 3.2.5 (...)
 Maven home: /opt/maven
 Java version: 1.8.0_40, vendor: Oracle Corporation
 Java home: /usr/lib/jvm/java-8-oracle/jre
 Default locale: en_US, platform encoding: ISO-8859-1
 OS name: "linux", version: "3.2.0-23-generic", arch: "amd64", family: "unix"
All done :-)

Saturday 28 February 2015

Need... more... time...

Machine Learning: https://www.coursera.org/learn/machine-learning/outline

Thursday 5 February 2015

Slide

There is a technical presentation, just waiting to be written, in which this will be the first slide:



Dagen H, was the day, 3 September 1967, on which traffic in Sweden switched from driving on the left-hand side of the road to the right.


http://en.wikipedia.org/wiki/Dagen_H

Friday 30 January 2015

ThreadLocal for thread saferty


Some code I saw recently that is really a kind of concurrency pattern in Java

Problems and Constraints 

There is an implementation of an interface that is not thread safe. You would like to use it in a multi threaded scenario without modifying the interface.

Example:
public interface Calculator {
    public int calculate(int input) ;
}

public class NonThreadSafeCalculatorImpl implements Calculator {
    public int calculate(int input) {
        //...
        // non thread safe logic here
        // ...
    }
}
An instance of NonThreadSafeCalculatorImplcan't be safely shared by multiple threads.

Solution
Implement the interface by delegating to a ThreadLocal that wraps the unsafe implementation

Example:
public class ThreadSafeCalculatorImpl implements Calculator {

    private static final ThreadLocal threadLocal = new ThreadLocal() {
        @Override
        protected Calculator initialValue() {
            return new NonThreadSafeCalculatorImpl();
        }
    };

    public int calculate(int input) {
        return threadLocal.get().calculate(input);
    }
}
An instance of ThreadSafeCalculatorImpl can be safely shared by multiple threads.

Monday 26 January 2015

NIST Randomness Beacon

Perhaps it's odd but I find that someone is publishing a 512-bit, full-entropy random number every minute of every day, well, romantic. There is a nice write-up on Hackaday  and I wrote some code that uses the Bouncy Castle library  to validate the cryptographic promises made by NIST. The code is available here.

Friday 16 January 2015

Scala traits as Java objects

One of the use cases for traits mentioned in Programming in Scala, Odersky et all, Ch. 12.3 is "enriching interfaces".

 With this approach "To enrich an interface using traits, simply define a trait with a small number of abstract methods—the thin part of the trait’s interface—and a potentially large number of concrete methods, all implemented in terms of the abstract methods. Then you can mix the enrichment trait into a class, implement the thin portion of the interface, and end up with a class that has all of the rich interface available."

The following Scala code is an example:
class Point(val x: Int, val y: Int)

trait Rectangular {
  // abstract methods
  def topLeft: Point
  def bottomRight: Point
  
  
  def left = topLeft.x
  def right = bottomRight.x
  def width = right - left
}

class Rectangle(val topLeft: Point, val bottomRight: Point) extends Rectangular { 
  // other methods...
}

With this setup we can write:
    val  rect : Rectangular = new Rectangle(new Point(1, 1), new Point(10, 10))
    println(rect left)  //prints 1
    println(rect right) //prints 10
    println(rect width) //prints 9
We will now have a peek and the resulting decompiled .class files and see how this magic looks like from the Java point of view (1). The Point class is unsurprising:
public class Point {

    private final int x;
    private final int y;

    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }

    public int x() { return x; }

    public int y() { return y; }
}

Things become more interesting with Rectangular. The trait results in two Java classes: Rectangular.java and Rectangular$class.java.
public interface Rectangular {
    public abstract Point topLeft();
    public abstract Point bottomRight();
    public abstract int left();
    public abstract int right();
    public abstract int width();
}

public abstract class Rectangular$class {
    public static int left(Rectangular $this) {
        return $this.topLeft().x();
    }

    public static int right(Rectangular $this) {
        return $this.bottomRight().x();
    }

    public static int width(Rectangular $this) {
        return $this.right() - $this.left();
    }

    public static void $init$(Rectangular rectangular) {}
}
Effectively the trait is broken up into two parts - an interface part (Rectangular.java) and an implementing class (Rectangular$class.java) which is not related to Rectangular at all and carries the implemented trait methods as static Java methods.

Finally, the Rectangle class looks like this:
public class Rectangle implements Rectangular {

    private final Point topLeft;
    private final Point bottomRight;

    public Rectangle(Point topLeft, Point bottomRight) {
        this.topLeft = topLeft;
        this.bottomRight = bottomRight;
        Rectangular.class.$init$(this);
    }


    public Point topLeft() { return topLeft; }
    public Point bottomRight() { return bottomRight; }

    // the mixin
    public int left()  { return Rectangular.class.left(this); }
    public int right() { return Rectangular.class.right(this); }
    public int width() { return Rectangular.class.width(this); }
}
where we see that mixin logic is statically delegated to Rectangular.

(1) Compiled with Scala 2.11, decompiled with Jad.