Tag Archives: Erlang

About Erlang/OTP and Multi-core performance in particular – Kenneth Lundin

I attended an awesome talk by Kenneth Lundin about Erlang/OTP at the Erlang Factory in London. The main topic was SMP and it’s improvements it in the latest release(s). That’s exactly one of the main reasons for Erlang, parallelize computations on many cores, without worrying about locks in shared memory.

Some of the issues they’ve been working on:

  1. Erlang now detects CPU Topology automatically at startup.
  2. Multiple run-queues
  3. You can lock schedulers to logical CPU’S
  4. Improved message passing – reduced lock time

They improved more things of course but considering SMP these are the most important ones.

  1. Erlang now detects the CPU topology of your system automatically at startup. You may still override this automatic setup using:
    erl +sct L0-3c0-3
    erlang:system_flag(cpu_topology,CpuTopology).
  2. Multiple run queues … what does that mean? We should first take a look at how Erlang does SMP:
    • Erlang without SMP:
      Without SMP support the Erlang VM had one Scheduler for one runqueue. So all the jobs were pushed on one queue and fetched by one scheduler.
    • Erlang SMP / before R13
      They started more schedulers that were pulling jobs from one queue. Sounds more parallel but still not performing as good as desired on many cores.
    • Erlang SMP R13
      Several schedulers like in the former solution but each of them has it’s own runqueue. The problem with this approach is that it can of course happen that you end up with some empty and some full queues because of the different runtime of the processes. So they build something called migration logic that is controlling and balancing the different runqueues.

    They migration logic does:

    • collect statistics about the maxlength of all scheduler’s runqueues
    • setup migration paths
    • Take away jobs from full-load schedulers and pushing jobs on low load scheduler queues

    Running on full load or not! If all schedulers are not fully loaded, jobs will be migrated to schedulers with lower id’s and thus making some schedulers inactive.

    This makes perfectly sense because the more schedulers and runqueues you need the more migrating has to be done. Using SMP support with many schedulers makes only sense if you’re really optimizing for many cores and you will have decreased performance on systems with few cores.

  3. Binding schedulers to CPU’s is really worth looking at it. The more cores your CPU has the more important it’ll be and the more performance improvement you’ll gain. You can force the erlang VM to do scheduler binding by:
    erl +sbt db
    erlang:system_flag(scheduler_bind_type,default_bind).
    1>erlang:system_info(cpu_topology).
    [{processor,[{core,{logical,0}},
    {core,{logical,3}},
    {core,{logical,1}},
    {core,{logical,2}}]}]
    2> erlang:system_info(scheduler_bindings).
    {unbound,unbound,unbound,unbound}
    fabrizio@machine:~$ erl +sbt db
    1> erlang:system_info(scheduler_bindings).
    {0,1,3,2}

Benchmark - Scheduler Binding - Kenneth Lundin
Source: presentation Kenneth Lundin – Erlang-Factory

You can test and benchmark SMP using following flags:
fabrizio@machine:~$ erl -smp disable       //default is auto
fabrizio@machine:~$ erl +S 2:4               //Number of Schedulers : Schedulers online

With erlang:system_info/1 you can use the following atoms

# cpu_topology
# multi_scheduling
# scheduler_bind_type
scheduler_bindings
logical_processors
multi_scheduling_blockers
scheduler_id
schedulers
# schedulers_online
smp_support

The ones marked with # can be set using system_flag/2

Erlang R13A Benchmark

I made a little benchmark to check out the new Erlang Release R13A and the behavior of the multiple run queues. The benchmarking program was the same I used in another benchmark you may find here. You may also find the sources at that location. As already noted there the slope from 1 CPU to 2 CPUs is due to the “bad” implementation made to challange the Erlang SMP features. The mashine was an 8 core Intel Xeon 3 GHz with a 64bit 2.6.9 Linux, Erlang kernel polling active.

erlangr13a_8cpu

Webserver Scalability and Reliability

Everybody knows Apache and Tomcat but when I try to talk about such strange things like Yaws or Mochiweb nobody knows what I actually want. These two are HTTP server implementations written in the old fashioned functional language Erlang and running on the famous Open Telecom Platform or OTP. Erlang/OTP was developed in the late 80s as a fault tolerant and highly scalable system for telecom applications. Nowadays in the social networking community it is daily business to serve some 10.000s of PHP requests per second. So we are facing problems telcos have for a long time.

Apache is the canonical web server to serve PHP to the world. Thinking about technological alternatives in the backend domain we have a look at both Java and Erlang. A rather quick and easy test to compare the scalability of the technologies was to setup web servers delivering the same static document. The image below shows the results.

Web Server Scaling by Concurrency

Apache and Tomcat scale nearly linearly up to a concurrency of 1000. We find that Mochiweb has a breakdown when concurrency reaches 300 but afterwards still scales linearly as Yaws does at a lower level. Absolute performance is less interesting here. What counts is the scaling behavior. Mochiweb for example is not designed first hand to deliver static content but to act as a HTTP endpoint for arbitrary OTP applications. Neither Yaws nor Mochiweb seem to cache documents in the default setup. Also, we did not use HiPE.

Unfortunally we had not yet the chance to verify the measurements given in this post: http://www.sics.se/~joe/apachevsyaws.html where Yaws still scales linearly (in the average) when Apache has long gone away, at concurrency of 50.000 or even 80.000 and clearly seems to survive DDoS attacks.

In this picture Erlang/OTP may not be a recommendation for classical web delivery but to build reliable services either to provide internal or external API endpoints. An interesing alternative at least.