Category Archives: Operations

Incorporating a remote Subversion repository into a local spin off

To render and display the OpenSocial apps available on our platform we are using the Apache Shindig gadget server, currently with the version 1.0.incubating
Up until now we had just exported this Shindig version and committed this into our local in-house subversion repository. From there on we added all kinds of patches, bug fixes and extensions to it. Over the month this accumulated to about 50 different changes to the original shindig codebase, which made it very hard to upgrade to a newer Shindig version or to apply any patches for the original version.

Now that the time is coming to migrate to the OpenSocial 0.9 spec with Shindig 2.0 we rethought our Shindig development strategy and came up with a different process:

At first we refactored the changes we did heavily to be as separated as possible from the original Shindig codebase.
What we couldn’t strip out, we extracted as diff patches into separated patch files.
This left us with the original codebase, some 20 patch files and some extra php classes and JavaScript packages.
Now this allowed us to just commit the patches and extra files into our own subversion repository. Then we added a small build script which will export the rest of the folders and files we need from the Apache Shindig Subversion server. This means that these files are not under version control anymore and subversion does not want to commit them to the Apache Shindig Subversion (for which we have no write access) if some files have changed. For this files and folders there is a svn:ignore property set in our subversion, so that these files are not committed accidentally.
After the export the build script applies all patches and we have a working shindig server in our file system.


#!/bin/bash
apache_svn_base="https://svn.apache.org/repos/asf/"
shindig_svn_base="shindig/trunk"


paths[0]="config/OSML_library.xml"
paths[1]="config/container.js"
paths[2]="config/oauth.json"
paths[3]="content"
paths[4]="extras/src/main/javascript/features-extras"
paths[5]="features"
paths[6]="php/config/container.php"
paths[7]="php/docs"
paths[8]="php/external"
paths[9]="php/src"
paths[10]="php/test"
paths[11]="php/.htaccess"
paths[12]="php/index.php"


revision[0]="965715"
revision[1]="965715"
revision[2]="965715"
revision[3]="965715"
revision[4]="965715"
revision[5]="965715"
revision[6]="965715"
revision[7]="965715"
revision[8]="965715"
revision[9]="965715"
revision[10]="965715"
revision[11]="965715"
revision[12]="965715"


ELEMENTS=${#paths[@]}


for (( i=0;i<$ELEMENTS;i++)); do
# remove local files
rm -rf ${paths[${i}]}
# export remote files to local copy
svn export -r ${revision[${i}]} $apache_svn_base$shindig_svn_base/${paths[${i}]} ${paths[${i}]}
done


# apply patches
patches="patches/*.patch"


for f in $patches
do
echo "********** load patch $f"
patch -p0 -N -i $f
done

In this solution it is necessary for us to be able to see what we changed in the original files while developing, so that we can add or modify new patch files. For this we wrote a small diff script which downloads the version of a file from the Apache Shindig SVN and diffs it to its local copy.


#!/bin/bash
apache_svn_base="https://svn.apache.org/repos/asf/"
shindig_svn_base="shindig/trunk"


paths[0]="config/OSML_library.xml"
paths[1]="config/container.js"
paths[2]="config/oauth.json"
paths[3]="content"
paths[4]="extras/src/main/javascript/features-extras"
paths[5]="features"
paths[6]="php/config/container.php"
paths[7]="php/docs"
paths[8]="php/external"
paths[9]="php/src"
paths[10]="php/test"
paths[11]="php/.htaccess"
paths[12]="php/index.php"


revision[0]="965715"
revision[1]="965715"
revision[2]="965715"
revision[3]="965715"
revision[4]="965715"
revision[5]="965715"
revision[6]="965715"
revision[7]="965715"
revision[8]="965715"
revision[9]="965715"
revision[10]="965715"
revision[11]="965715"
revision[12]="965715"


ELEMENTS=${#paths[@]}
line="==================================================================="


for (( i=0;i<$ELEMENTS;i++)); do
find ${paths[${i}]} -type f -exec sh -c '(curl -s $2!svn/bc/$3/$4/$1 | diff -w - $1 | sed -e "1i\\
Index: $1\\
$5")' {} {} $apache_svn_base ${revision[${i}]} $shindig_svn_base $line \;
done

For all folders we traverse through all files, fetch the file from the remote Shindig SVN with CURL, diff the fetched content to the local file, and print out a ready diff patch with leading Index entry and separator line like its generated by the svn diff command.

Announcement: Change of OpenSocial ID Prefix – Migration plan

In order to simplify the communication between app backends and our OpenSocial API and to fix the issue of user migration between studiVZ and meinVZ we are planning to reduce the amount of possible endpoints by one.

Therefore, we will change the OpenSocial User ID Prefixes as follows:

  • www.studivz.net:abcdefg → www.vz.net:abcdefg
  • www.meinvz.net:abcdefg → www.vz.net:abcdefg
  • www.schuelervz.net:abcdefg → www.schuelervz.net:abcdefg
  • sandbox.developer.studivz.net:abcdefg → sandbox.developer.studivz.net:abcdefg

The ID suffix will not change!

We won’t change the hostnames of the different endpoints, but if you have an www.vz.net id it won’t matter if you request the meinVZ or studiVZ endpoint. This means, that the following endpoints are
available:

  • http://meinvz.gadgets.apivz.net/social/rest or http://studivz.gadgets.apivz.net/social/rest
  • http://schuelervz.gadgets.apivz.net/social/rest
  • http://sandbox.gadgets.apivz.net/social/rest
  • http://meinvz.gadgets.apivz.net/social/rpc or http://studivz.gadgets.apivz.net/social/rpc
  • http://schuelervz.gadgets.apivz.net/social/rpc
  • http://sandbox.gadgets.apivz.net/social/rpc

Our migration plan has two steps:

April, 13th – May, 18th:
While we still send the old OpenSocial Ids in our requests to your backend an in our API responses, our REST and RPC API will accept both old and new Ids as request parameters. In this timeframe you have to change your backend, to replace the old prefix with the new prefix in all our requests and responses (see example code below) and only store the new ids in your database.

May, 18th:
We will start sending out the new ids in our requests to your backend in in our API responses. Your backend should be ready for this after phase one, because while the replace logic is still applied to every request and response, it won’t find any matches. You can now remove this replace logic from your backend if you want.

Example source code for replacement logic:

When your backend receives request from the gadget (makeRequest):

$viewerId = str_replace('www.studivz.net', 'www.vz.net', $_GET['opensocial_viewer_id']);
$viewerId = str_replace('www.meinvz.net', 'www.vz.net', $_GET['opensocial_viewer_id']);
$ownerId = str_replace('www.studivz.net', 'www.vz.net', $_GET['opensocial_owner_id']);
$ownerId = str_replace('www.meinvz.net', 'www.vz.net', $_GET['opensocial_owner_id']);

When your backend receives a response from our REST or RPC API (example for receiving a list of users):

foreach ($decodedResponse['entry'] as $id => $user) {
$decodedResponse['entry'][$id]['id'] = str_replace('www.studivz.net', 'www.vz.net', $decodedResponse['entry'][$id]['id'])
$decodedResponse['entry'][$id]['id'] = str_replace('www.meinvz.net', 'www.vz.net', $decodedResponse['entry'][$id]['id'])
}

If you are a developer and have an app live, that has its own backend, you will be contacted in the next days by our support team in order to coordinate your personal migration process.

Apache Hadoop Get Together Berlin

We would like to announce the December-2009 Hadoop Get Together in newthinking store Berlin.

When: 16. December 2009 at 5:00pm
Where: newthinking store, Tucholskystr. 48, Berlin, Germany

As always there will be slots of 20min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. You can order drinks directly at the bar in the newthinking store. If you like, you can order pizza. We will go to Cafe Aufsturz after the event for some beer and something to eat.

Talks scheduled so far:

Richard Hutton (nugg.ad): “Moving from five days to one hour.” – This talk explains how we made data processing scalable at nugg.ad. The company’s core business is online advertisement targeting. Our servers receive 10,000 requests per second resulting in data of 100GB per day.

As the classical data warehouse solution reached its limit, we moved to a framework built on top of Hadoop to make analytics speedy, data mining detailed and all of our lives easier. We will give an overview of our solution involving file system structures, scheduling, messaging and programming languages from the future.

Jörg Möllenkamp (Sun): “Hadoop on Sun”
Abstract: Hadoop is a well known technology inside of Sun. This talk want to show some interesting use cases of Hadoop in conjunction with Sun technologies. The first show case wants to demonstrate how Hadoop can used to load massive multicore system with up to 256 threads in a single system to the max. The second use case shows how several mechanisms integrated in Solaris can ease the deployment and operation of Hadoop even in non-dedicated environments. The last usecase will show the combination of the Sun Grid Engine and Hadoop. Talk may contain command-line demonstrations ;).

Nikolaus Pohle (nurago): “M/R for MR – Online Market Research powered by Apache Hadoop. Enable consultants to analyze online behavior for audience segmentation, advertising effects and usage patterns.”

We would like to invite you, the visitor to also tell your Hadoop story, if you like, you can bring slides – there will be a beamer. Thanks for Isabel Drost who is organizing this event and for Newthinking Store for providing Space. VZnet Netzwerke is sponsoring the video recording of the talks.

Registration:

  • http://upcoming.yahoo.com/event/4842528/
  • https://www.xing.com/events/apache-hadoop-berlin-426025
  • Serving objects is more than plain delivery

    On April 26th 2007 Steve Souders wrote:

    The user’s proximity to your web server has an impact on response times. Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user’s perspective. […] Remember that 80-90% of the end-user response time is spent downloading all the components in the page: images, stylesheets, scripts, Flash, etc. […] A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users.

    Steve posted this approx. e^3.25809654 days after(!) we started to use a CDN for our web sites. Just some days later we noticed the desired effect. Our users started to make more and more traffic. The activity grew. Of course a CDN is some kind of luxury but it’s worth to invest into such a service at a special time. And from our point of view we thought it was time to. We were right.

    Actually round about 286.356,421^2 objects will be requested per month by our users. More than the half of that (5,4E10 objects) are photos. Small, medium and big sized ones. So each of all photo files we store will be loaded round(pow(2,4.91)) times in a month. That makes a monthly traffic volume of more ore less 265.334.489.612.288 bytes only for these kind of objects. The total traffic of all delivered objects per month is something about 1,402939962446178 times higher.

    At high traffic times there are over 110000110101000002 requests per second hitting our CDN and we are happy that our origin servers only get the (5^5)th part of it.

    As a side effect we can learn something about the behaviour of our users because the performance graphs can show us for example what they do in the evening. Maybe the Schimanski serials on 26th of July was one of the reasons for the spikes after 8 pm (see graph above) which are nothing else than commercial breaks. Have a break, have a visit at studiVZ.

    mckoy – [m]em[c]ache [k]ey [o]bservation [y]ield

    We wanted to speed up our web-applications by alleviating our database-loads. So we decided to use the distributed memory object caching system, memcached. Due to the many requests of our memcached-systems (about 1.5 million requests per second), we built a tool (called mckoy), which is capable to perform statistics and debugging information about all memcache-requests in our network.

    mckoy is a memcache protocol sniffer (based on pcap library) and statistics builder. It automatically detects and parses each key (and its value) and memcache-api methods.  At  the  end of the sniffing session, the results are used to build the statisticis. mckoy was written to analyse our web application and its usage of  memcache-api in PHP. For example: We wanted to know how many set() and get() methods were invoked in a given time. Based on these results,  we had to make changes to improve the usage of memcache-api for PHP. You can run mckoy on any UNIX based systems. It was tested on many *BSD and Linux systems. mckoy is licensed under GPLv3 and completely published as opensource project!

    You can run mckoy in various modes (see manpage!). For example, if you want to sniff pattern “foobar” for all memcache-api methods and with live capturing, use:

    mckoy -i <interface> -e “port 11211” -m 5 -k foobar -v

    And this is, how it looks like:

    Unfortunately, there are some known bugs. :) For example: An SIGSEGV will encounter when ^C is sent from user. Also, we noticed that mckoy isn’t able to handle memcached-1.2.8 <= 1.4.* correctly. These bugs will be fixed in the next version as soon as possible! For the next version I also planned to build in udp and binary support.

    You can offcially download mckoy from:
    http://www.lamergarten.de/releases.html
    or
    http://sourceforge.net/projects/mckoy/

    cheers.