Data structures and collections are one of the most wanted features for the Standard PHP Library SPL over the last few years. With PHP 5.3 we’ll finally get a little of what we want and this is good news. With data structures like stack, queue, heap or priority queue implemented in C we expect PHP programming to become somewhat more efficient.
Inspired by this post http://blueparabola.com/blog/spl-deserves-some-reiteration we decided to run our own benchmarks to either verify or disapprove the results posted. Our benchmarks were executed on a 64bit RHEL with PHP 5.3.0beta1. As you may expect, we carefully excluded startup or compilation time and measured only the code blocks in question. We used getrusage() to determine CPU time consumption. A huge number of iterations guaranteed smooth results.
The first structure under consideration was the SplFixedArray. If you only need numerical indices you now can create an array of fixed size that does not have to grow while more and more items are inserted. Dealing with an SplFixedArray saves you about 10 percent of runtime compared to a plain old PHP array.
Next we tried the SplStack and SplQueue. It is usually easy to implement a stack and a queue with plain arrays. Using array_push(), array_pop() and array_shift() is straightforward. It may be a surprise to the average PHP programmer to learn about the runtime behaviour of these functions. Worst is array_shift() because of the internal rehashing and the experienced PHP programmer may – for critical code at least – try to access arrays by indices maintaining counters, for example. This is much more efficient. Compared to the functions, at least SplQueue is something like an upset, but it is possible to find comparable solutions with plain PHP.
There is a little danger to compare apples and pears when turning towards SplHeap and SplPriorityQueue. What is the proper representation of a heap implemented using plain old arrays only? It’s a sorted array, ok. But a heap is sorted for each insert, so, do we really have to sort the array for each insert? Who will do this in real life?
It’s the use case that decides about the sorting strategy. If you are supposed to carefully separate writing the heap and reading from it, it is sufficient to sort it once. That way you beat SPL. But if you have to mix reading and writing arbitrarily the SPL will beat plain arrays by far. This is shown in the pictures below. For the mixed strategy we read once for 5 inserts and the SplMinHeap scales very well. The same holds for SplMaxHeap and SplPriorityQueue.
- SPL rules
- use SPL data structures where appropriate for a particular use case, they are efficient and comfortable
- benchmarking is error prone
- anyway, always benchmark performance critical code blocks