Blog

Scala Pitfalls: concurrent.Map.getOrElseUpdate is not atomic

A surprising pitfall in Scala's Standard library: Scala's Traits for Concurrent Maps inherit several methods that are unsafe to use in concurrency situation, including getOrElseUpdate.

This problem is discussed in Scala ticket Sl-7943 and has been fixed recently in TrieMap (but not the wrapper of Java's ConcurrentHashMap). As of Scala 2.11.5, the fix for TrieMap has not made it to a release yet.

Background: ConcurrentMap and the putIfAbsent idiom

ConcurrentHashMaps are a popular primitive to build concurrency-friendly code in the Java ecosystem.

Scala provides a wrapper trait and alternative implementation for CHMs in the form of concurrent.Map and TrieMap.

Like the Java version, these traits provide concurrency-safe primitives like putIfAbsent that can be used to implement concurrency-friendly get-or-put idiom in a HashMap.

Classical example for a Thread-safe getOrPut:

// create a concurrent map by wrapping a juc.ConcurrentHashMap
concurrent.Map map = new java.util.concurrent.ConcurrentHashMap[MyKey,MyHandler]()

def getOrCreate(MyKey key): MyHandler: MyHandler = {
  // retrieve the key from the map if present...
  val handler = map.get(myKey).getOrElse {
    // if not present,  allocate a new Handler
    val newHandler = new MyHandler()
    // putIfAbsent atomically inserts the Handler into the map, if not previously
    // installed. Otherwise, returns the previously present entry (and does not modify)
    val maybePresentHandler = map.putIfAbsent(myKey, newHandler)
    // if maybePresentHandler has a value, we got pre-empted => abandon our new Handler
    // and return the present one. Otherwise, return newHandler.
    maybePresentHandler.getOrElse(newHandler)
    }
  }
}

The (present = putIfAbsent(...)) != None is fairly common in concurrency-friendly Java. It is fast and safe. However, the code above is hardly elegant.

GetOrElseUpdate: tempting, but broken

Now, looking at the Scala API doc, it would seem that the getOrElseUpdate method provides a much more elegant alternative to achieve the same goal:

    def getOrElseUpdate(key: A, op:  B): B 
If given key is already in this map, returns associated value.
Otherwise, computes value from given expression op, stores with key in map and returns that value.

This would lend itself to the following code:

// Appealling, but broken - do not use!
def getOrCreateBroken(key: MyKey): MyHandler = {
  map.getOrElseUpdate(key, new MyHandler())
}

However, only a look at the source code reveals, that this does not work. It turns out, getOrElseUpdate is inherited from the base trait MapLike, and not overriden anywhere up to JConcurrentMapMapper which wraps the Scala wrapper around HashMap.

Summary

Scala's concurrent.Map interface (and the corresponding wrapper to Java's ConcurrentHashMap) is problematic, because it inherits a wealths of methods, not all of which are thread-safe. Be very careful when using those maps. Generally speaking, only the methods directly ported from ConcurrentHashMap are safe, while all the convenience methods from Scala's MapLike are not.

You may want to consider just using java.util.ConcurrentHashMap directly.

More generally, this can also serve as a cautionary tale on rich interfaces and traits that provide implementations – while convenient, they can make it very hard to provide proper semantic guarantees in Subclasses. To the point where even the Scala Library designers missed this obvious hole in the standard library.


Functional programming in Scala

I did a short presentation (1.5 h) on functional programming with Scala for the software development teams at one of our customers.

The aim was to give a glimpse of functional programming in general, show some of the features of Scala by example and give a rough idea how domain specific functional code for a life insurance could look like in Scala.

The source code of the slides as well as all code examples are available on GitHub.

See the presentation slides (in german) on SlideShare:


Avoiding apt-get update when installing Ruby via RVM on Ubuntu

One of the best practices for installing RVM (BTW: one of the greatest tools available when working with Ruby) is to install it as regular user (and not as root).

But if you try to install a new Ruby version via, e.g.

    rvm install 1.9.3

you may get a prompt for your root password, since rvm is trying to update your operating system.

To avoid this, simply disable the autolibs feature of rvm:

    rvm autolibs disable
    rvm install <whatever version you want to install>

However, you should make sure that your system is up to date by running

    apt-get --quiet --yes update

Using Ruby Version Manager (RVM) in Jenkins CI

The Ruby Version Manager (RVM) is a great tool if your Ruby and Rails applications are based on different versions of Ruby and/or use different gem sets.

To integrate RVM with the Continuous Integration tool Jenkins, you can either follow this description on the RVM website, or use this Jenkins RVM Plugin. Both solutions have some drawbacks, so we were looking for a new solution.

The description on the RVM website suggests to run the build steps as shell scripts and specify the desired version in every script, which seems to be unnecessary and error prone.

The Jenkins RVM Plugin claims to run the whole build process in the RVM environment specified for the project as a whole. Thats true, but we make heavy use of Jenkins tasks, e. g. for deployment scripts.

Unfortunately, the RVM environment is not available inside the task scripts. To fix this, you can do something like (i captured this code from the console log of builds with the RVM plugin):

    bash -c "source ~/.rvm/scripts/rvm && rvm_install_on_use_flag=1 && rvm use --create 1.9.3@projectname && export > rvm.env"
    source rvm.env

Again, you have to specify the Ruby version in every task.

But we would like to have an easy RVM project workflow, specifying the project specific ruby setting only once to stay DRY.

How it works

The comment section of this blog post led me to a solution which seems to work very well for our purposes without any major drawbacks.

  1. Set up RVM for user jenkins (or whatever user the jenkins process runs), especially add the rvm setup to the user profile

  2. Set up your projects with project specific RVM files (.ruby-version and .ruby-gemset if you use a current version of RVM)

  3. Use shell scripts for all ruby or rake based build steps and add this code to all scripts and tasks:

    #!/bin/bash -l
    cd ${WORKSPACE}

The trick is to add the -l option to the shebang, since the RVM setup code only executes if the shell is a login shell.

Now it is sufficient to keep .ruby-version and .ruby-gemset up to date, since the cd into the workspace directory sets up the correct RVM environment.

A complete example of a rails build script would be

    #!/bin/bash -l
    cd ${WORKSPACE}
    rvm info        # print info about rvm (for debugging)
    bundle
    export RAILS_ENV=test
    bundle exec rake db:migrate
    bundle exec rake test

Create a complete OpenStack test installation with devstack, Vagrant and VirtualBox on a MacBook Pro

If you go in for virtualization, continuous integration (CI) and DevOps, sooner or later the cloud technology OpenStack pops up. Unfortunately, it's not easy to get up and running with OpenStack.

In order to install a complete OpenStack infrastructure you already should have some experience with OpenStack. On the other hand, a running OpenStack environment is almost essential in order to learn OpenStack. A typical chicken/egg problem….

A good starting point for OpenStack is DEVSTACK, a script that is able to set up a complete OpenStack installation on different flavors of linux.

In combination with VirualBox and Vagrant I was able to set up a OpenStack playgound on my MacBook Pro.

That's how it works, shown in a manual step by step way, so you know what happens1:

System requirements

VirualBox and Vagrant are installed on the MacBook.

Create the VM for the installation

First we need a base system. Since we currently run Ubuntu 12.04 LTS (Precise) on our production systems, i chose this flavor.

Important: This virtual machine should have quite some memory. We will start some VMs in this VM after all. To access the OpenStack VM it is handy to have a private IP address. With this address we can access the services in the VM without having to do all the port forwarding.

So let's create a project directory and put this Vagrantfile into it:

Vagrantfile
1
2
3
4
5
6
7
8
9
Vagrant.configure("2") do |config|
  config.vm.box = "precise64"
  config.vm.network :private_network, ip: "10.12.14.16"
  config.vm.provider "virtualbox" do |v|
    v.name = "devstack-test-vm"
    v.customize ["modifyvm", :id, "--memory", "2048"]
    v.customize ["modifyvm", :id, "--cpuexecutioncap", "75"]
  end
end

The VM got 2 GB of memory and at most 75% of my Mac cpu.

After typing

    $ vagrant up

it takes some time to bring our base system up.

Update the VM and install required base packages

Now we have to set up our fresh linux to an up-to-date state.

Connect to the new machine with

    $ vagrant ssh

and upgrade the system packages with

    sudo su
    echo deb http://ubuntu-cloud.archive.canonical.com/ubuntu precise-updates/grizzly main >> /etc/apt/sources.list.d/grizzly.list
    apt-get install python-software-properties software-properties-common python-keyring -y
    apt-get update
    apt-get install ubuntu-cloud-keyring -y
    apt-get update
    apt-get upgrade
    apt-get dist-upgrade
    apt-get install git -y

Hint: The duplicate update is neccessary since the keys for the ubuntu cloud packages are not contained in the base image, but we do need the cloud packages. But to install the ubuntu-cloud-keyring, we do need an updated system.

Furthermore we need git to pull the devstack repository.

Since there was probably a kernel upgrade, we reboot our system and log in again.

Create the devstack user

The devstack script should run in a non-root user account which has sudo rights. We could use the default vagrant user for this, but i prefer to have a seperate user for this. Therefore we crate a new user stack.

    sudo su
    adduser stack
    echo "stack ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers

Now log in as user stack and download devstack:

    su - stack
    git clone git://github.com/openstack-dev/devstack.git
    cd devstack

Install the cloud

Finally we start the devstack script:

    ./stack.sh

At this point it's a good time to get some coffee or even have your lunch break, since the download and installation of the complete OpenStack installation takes about half an hour.

After completion you see something like:

    Horizon is now available at http://10.0.2.15/
    Keystone is serving at http://10.0.2.15:5000/v2.0/
    Examples on using novaclient command line is in exercise.sh
    The default users are: admin and demo
    The password: d19e329a91e25271a369
    This is your host ip: 10.0.2.15
    stack.sh completed in 1426 seconds.
    stack@precise64:~/devstack
    $

Voilá, we do have a complete working cloud on a MacBook Pro :-). The only gotcha is that stack.sh took the wrong IP address (it should be 10.12.14.16, configured in the Vagrantfile).

What have we got so far?

Since i did not install any hypervisor, the script choose "QEMU", a software based virtualization by emulation. This is quite good enough For learning purposes. For production environments you would of course use something like KVM or XEN.

The installed image is CirrOS, which is good for testing purposes as well. The OpenStack web site contains several other VM images.

The access to the OpenStack web interface you have to use the IP address 10.12.14.16 (specified in the Vagrantfile) instead of the one prompted by the script.

Unforuntately, some links of the OpenStack-UI include the wrong IP address, e. g. the console of the VMs. I do not have a workaround for this issue right now. Probably it would be a good idea to map ports 80, 5000 and 6000 (console) to localhost in the Vagrantfile.

Next steps

I plan to automate this setup even further (probably with puppet) to be able to set up my playground by a single command.

The next step would then be to include KVM and/or XEN within my cloud VM.

And finally, build a bare metal cloud server…


Footnotes
  1. I created a GitHub project for this HowTo, which eventually automates most of the steps out of the box using Vagrant and Puppet.