Category Archives: Performance

Vierte VZGeek-Night am 19.03.2010

Am 19.03.2010 findet die mittlerweile vierte VZGeek-Night statt. Wie immer laden wir zu interessanten Vorträgen und dem einen oder anderen Bier in gemütlicher Atmosphäre ein.
Neben Themen rund um das VZ und den Technologien, die uns begeistern, haben wir Herrn Dr. Johannes Mainusch von der XING AG eingeladen, um das Thema “Web Performance Tuning” genauer zu behandeln.
Eingeladen ist jeder, dessen Herz für innovative Technologien und hochverfügbare Websites schlägt.

Die Agenda im Detail:

19:45 – 20:00
Begrüßung
Sebastian Galonska (VZnet Netzwerke)

20:00 – 20:15
Updates zu OpenSocial – Messages, Notifications und Activity Stream
Bastian Hofmann (VZnet Netzwerke)

20:30 – 21:30
Wer langsam ist, wird verlassen – Web Performance Tuning
Dr. Johannes Mainusch (XING AG)

21:45 – 22:45
Social networks and the Richness of Data: Getting distributed webservices done with NoSQL
Fabrizio Schmidt (VZnet Netzwerke)

Die Veranstaltung findet dieses Mal im Steinhaus (Straßburger Str. 55, 10405 Berlin) statt.
Wie gewohnt öffnen wir die Tore um 19 Uhr.

VZ-Networks @ Qcon London 2010

This year the fourth annual Qcon London conference takes place from March 10th to March 12th. VZ-Networks participates with the following presentation:

Qcon

Social networks and the Richness of Data: Getting distributed webservices done with Nosql

Social networks by their nature deal with large amounts of user-generated data that must be processed and presented in a time sensitive manner. Much more write intensive than previous generations of websites, social networks have been on the leading edge of non-relational persistence technology adoption. This talk presents how Germany’s leading social networks schuelerVZ, studiVZ and meinVZ are incorporating Redis and Project Voldemort into their platform to run features like activity streams.
Jodok Batlogg, Fabrizio Schmidt

More Information

Serving objects is more than plain delivery

On April 26th 2007 Steve Souders wrote:

The user’s proximity to your web server has an impact on response times. Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user’s perspective. […] Remember that 80-90% of the end-user response time is spent downloading all the components in the page: images, stylesheets, scripts, Flash, etc. […] A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users.

Steve posted this approx. e^3.25809654 days after(!) we started to use a CDN for our web sites. Just some days later we noticed the desired effect. Our users started to make more and more traffic. The activity grew. Of course a CDN is some kind of luxury but it’s worth to invest into such a service at a special time. And from our point of view we thought it was time to. We were right.

Actually round about 286.356,421^2 objects will be requested per month by our users. More than the half of that (5,4E10 objects) are photos. Small, medium and big sized ones. So each of all photo files we store will be loaded round(pow(2,4.91)) times in a month. That makes a monthly traffic volume of more ore less 265.334.489.612.288 bytes only for these kind of objects. The total traffic of all delivered objects per month is something about 1,402939962446178 times higher.

At high traffic times there are over 110000110101000002 requests per second hitting our CDN and we are happy that our origin servers only get the (5^5)th part of it.

As a side effect we can learn something about the behaviour of our users because the performance graphs can show us for example what they do in the evening. Maybe the Schimanski serials on 26th of July was one of the reasons for the spikes after 8 pm (see graph above) which are nothing else than commercial breaks. Have a break, have a visit at studiVZ.

mckoy – [m]em[c]ache [k]ey [o]bservation [y]ield

We wanted to speed up our web-applications by alleviating our database-loads. So we decided to use the distributed memory object caching system, memcached. Due to the many requests of our memcached-systems (about 1.5 million requests per second), we built a tool (called mckoy), which is capable to perform statistics and debugging information about all memcache-requests in our network.

mckoy is a memcache protocol sniffer (based on pcap library) and statistics builder. It automatically detects and parses each key (and its value) and memcache-api methods.  At  the  end of the sniffing session, the results are used to build the statisticis. mckoy was written to analyse our web application and its usage of  memcache-api in PHP. For example: We wanted to know how many set() and get() methods were invoked in a given time. Based on these results,  we had to make changes to improve the usage of memcache-api for PHP. You can run mckoy on any UNIX based systems. It was tested on many *BSD and Linux systems. mckoy is licensed under GPLv3 and completely published as opensource project!

You can run mckoy in various modes (see manpage!). For example, if you want to sniff pattern “foobar” for all memcache-api methods and with live capturing, use:

mckoy -i <interface> -e “port 11211” -m 5 -k foobar -v

And this is, how it looks like:

Unfortunately, there are some known bugs. :) For example: An SIGSEGV will encounter when ^C is sent from user. Also, we noticed that mckoy isn’t able to handle memcached-1.2.8 <= 1.4.* correctly. These bugs will be fixed in the next version as soon as possible! For the next version I also planned to build in udp and binary support.

You can offcially download mckoy from:
http://www.lamergarten.de/releases.html
or
http://sourceforge.net/projects/mckoy/

cheers.

About Erlang/OTP and Multi-core performance in particular – Kenneth Lundin

I attended an awesome talk by Kenneth Lundin about Erlang/OTP at the Erlang Factory in London. The main topic was SMP and it’s improvements it in the latest release(s). That’s exactly one of the main reasons for Erlang, parallelize computations on many cores, without worrying about locks in shared memory.

Some of the issues they’ve been working on:

  1. Erlang now detects CPU Topology automatically at startup.
  2. Multiple run-queues
  3. You can lock schedulers to logical CPU’S
  4. Improved message passing – reduced lock time

They improved more things of course but considering SMP these are the most important ones.

  1. Erlang now detects the CPU topology of your system automatically at startup. You may still override this automatic setup using:
    erl +sct L0-3c0-3
    erlang:system_flag(cpu_topology,CpuTopology).
  2. Multiple run queues … what does that mean? We should first take a look at how Erlang does SMP:
    • Erlang without SMP:
      Without SMP support the Erlang VM had one Scheduler for one runqueue. So all the jobs were pushed on one queue and fetched by one scheduler.
    • Erlang SMP / before R13
      They started more schedulers that were pulling jobs from one queue. Sounds more parallel but still not performing as good as desired on many cores.
    • Erlang SMP R13
      Several schedulers like in the former solution but each of them has it’s own runqueue. The problem with this approach is that it can of course happen that you end up with some empty and some full queues because of the different runtime of the processes. So they build something called migration logic that is controlling and balancing the different runqueues.

    They migration logic does:

    • collect statistics about the maxlength of all scheduler’s runqueues
    • setup migration paths
    • Take away jobs from full-load schedulers and pushing jobs on low load scheduler queues

    Running on full load or not! If all schedulers are not fully loaded, jobs will be migrated to schedulers with lower id’s and thus making some schedulers inactive.

    This makes perfectly sense because the more schedulers and runqueues you need the more migrating has to be done. Using SMP support with many schedulers makes only sense if you’re really optimizing for many cores and you will have decreased performance on systems with few cores.

  3. Binding schedulers to CPU’s is really worth looking at it. The more cores your CPU has the more important it’ll be and the more performance improvement you’ll gain. You can force the erlang VM to do scheduler binding by:
    erl +sbt db
    erlang:system_flag(scheduler_bind_type,default_bind).
    1>erlang:system_info(cpu_topology).
    [{processor,[{core,{logical,0}},
    {core,{logical,3}},
    {core,{logical,1}},
    {core,{logical,2}}]}]
    2> erlang:system_info(scheduler_bindings).
    {unbound,unbound,unbound,unbound}
    fabrizio@machine:~$ erl +sbt db
    1> erlang:system_info(scheduler_bindings).
    {0,1,3,2}

Benchmark - Scheduler Binding - Kenneth Lundin
Source: presentation Kenneth Lundin – Erlang-Factory

You can test and benchmark SMP using following flags:
fabrizio@machine:~$ erl -smp disable       //default is auto
fabrizio@machine:~$ erl +S 2:4               //Number of Schedulers : Schedulers online

With erlang:system_info/1 you can use the following atoms

# cpu_topology
# multi_scheduling
# scheduler_bind_type
scheduler_bindings
logical_processors
multi_scheduling_blockers
scheduler_id
schedulers
# schedulers_online
smp_support

The ones marked with # can be set using system_flag/2