About Erlang/OTP and Multi-core performance in particular – Kenneth Lundin
I attended an awesome talk by Kenneth Lundin about Erlang/OTP at the Erlang Factory in London. The main topic was SMP and it’s improvements it in the latest release(s). That’s exactly one of the main reasons for Erlang, parallelize computations on many cores, without worrying about locks in shared memory.
Some of the issues they’ve been working on:
- Erlang now detects CPU Topology automatically at startup.
- Multiple run-queues
- You can lock schedulers to logical CPU’S
- Improved message passing – reduced lock time
They improved more things of course but considering SMP these are the most important ones.
- Erlang now detects the CPU topology of your system automatically at startup. You may still override this automatic setup using:
erl +sct L0-3c0-3
- Multiple run queues … what does that mean? We should first take a look at how Erlang does SMP:
- Erlang without SMP:
Without SMP support the Erlang VM had one Scheduler for one runqueue. So all the jobs were pushed on one queue and fetched by one scheduler.
- Erlang SMP / before R13
They started more schedulers that were pulling jobs from one queue. Sounds more parallel but still not performing as good as desired on many cores.
- Erlang SMP R13
Several schedulers like in the former solution but each of them has it’s own runqueue. The problem with this approach is that it can of course happen that you end up with some empty and some full queues because of the different runtime of the processes. So they build something called migration logic that is controlling and balancing the different runqueues.
They migration logic does:
- collect statistics about the maxlength of all scheduler’s runqueues
- setup migration paths
- Take away jobs from full-load schedulers and pushing jobs on low load scheduler queues
Running on full load or not! If all schedulers are not fully loaded, jobs will be migrated to schedulers with lower id’s and thus making some schedulers inactive.
This makes perfectly sense because the more schedulers and runqueues you need the more migrating has to be done. Using SMP support with many schedulers makes only sense if you’re really optimizing for many cores and you will have decreased performance on systems with few cores.
- Erlang without SMP:
- Binding schedulers to CPU’s is really worth looking at it. The more cores your CPU has the more important it’ll be and the more performance improvement you’ll gain. You can force the erlang VM to do scheduler binding by:
erl +sbt db
fabrizio@machine:~$ erl +sbt db
Source: presentation Kenneth Lundin – Erlang-Factory
You can test and benchmark SMP using following flags:
fabrizio@machine:~$ erl -smp disable //default is auto
fabrizio@machine:~$ erl +S 2:4 //Number of Schedulers : Schedulers online
With erlang:system_info/1 you can use the following atoms
The ones marked with # can be set using system_flag/2