Archive for November 2008

Benchmarking PHP HTTP Clients

If you read my blog semi-regularly, you might remember when I mentioned that my book would be released later on this year. Unfortunately, that project had to be put on hold in favor of a few other projects. Now that those are winding down, however, I’m able to return to working on the book. I’m hoping the manuscript will be completed by the end of March 2009.

One of the interesting bits of research that I’ve done is benchmarking various mainstream PHP HTTP clients. Of course, we all know that there are lies, damned lies, statistics, and benchmarks, so take these with a grain of salt. They were run on my Sony Vaio, which is an Intel C2D T5550 @ 1.83GHz with 2 GB of RAM running Ubuntu Ibex and its standard php5 package. According to Speedtest.net, my Cox Cable connection has a 12,375 kb/s download rate and a 5,998 kb/s upload rate.

<?php
// pecl_http (1.6.1)
$response = http_get(
    'http://paste2.org/new-paste',
    array(
        'connecttimeout' =>  15
    )
);
echo 'http ', strlen($response), PHP_EOL;

// streams http wrapper
$response = file_get_contents('http://paste2.org/new-paste');
echo 'streams ', strlen($response), PHP_EOL;

// curl (php5-curl Ubuntu package:
// libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8)
$ch = curl_init('http://paste2.org/new-paste');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
echo 'curl ', strlen($response), PHP_EOL;

// PEAR::HTTP_Client (PEAR 1.7.2, HTTP_Client 1.2.1)
$error = error_reporting(E_ALL);
require_once 'HTTP/Client.php';
$client = new HTTP_Client();
$client->get('http://paste2.org/new-paste');
$response = $client->currentResponse();
$response = $response['body'];
echo 'pear ', strlen($response), PHP_EOL;
error_reporting($error);

// Zend_Http_Client (SVN r12780)
require_once 'Zend/Http/Client.php';
$client = new Zend_Http_Client('http://paste2.org/new-paste');
$response = $client->request()->getBody();
echo 'zend ', strlen($response), PHP_EOL;

The Ubuntu packages for Xdebug (php5-xdebug) and KCachegrind produced the following results for this script.

pecl_http 20.08%
streams 19.81%
curl 19.83%
pear 19.73%
zend 19.88%

So the performance of these components is roughly equivalent. One thing that’s interesting is that the call tree for PEAR is actually the longest (four calls underneath the one shown in the source here) and at the bottom is a call to gethostbyname, which takes 18.97% of the script’s runtime, putting the amount used by the calls above it at 0.76%. This suggests that the majority of the time taken by the other components is likely due to the same reason.

Let’s try a slightly more complex request.

<?php
$post = array(
    'lang' => 'php',
    'description' => '',
    'code' => 'test',
    'parent' => '0'
);

// pecl_http
$response = http_post_fields(
    'http://paste2.org/new-paste',
    $post,
    null,
    array('connecttimeout' => 15)
);
echo 'http ', strlen($response), PHP_EOL;

// streams http wrapper
$context = stream_context_create(array(
    'http' => array(
        'method' => 'POST',
        'header' => 'Content-Type: application/x-www-form-urlencoded',
        'content' => http_build_query($post)
    )
));
$response = file_get_contents('http://paste2.org/new-paste', false, $context);
echo 'streams ', strlen($response), PHP_EOL;

// curl
$params = array(
    CURLOPT_URL => 'http://www.paste2.org/new-paste',
    CURLOPT_POST => true,
    CURLOPT_HEADER => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POSTFIELDS => $post
);
$ch = curl_init();
foreach ($params as $key => $value) {
    curl_setopt($ch, $key, $value);
}
$response = curl_exec($ch);
curl_close($ch);
echo 'curl ', strlen($response), PHP_EOL;

// PEAR::HTTP_Client
$error = error_reporting(E_ALL);
require_once 'HTTP/Client.php';
$client = new HTTP_Client();
$client->post('http://paste2.org/new-paste', $post);
$response = $client->currentResponse();
$response = $response['body'];
echo 'pear ', strlen($response), PHP_EOL;
error_reporting($error);

// Zend_Http_Client
require_once 'Zend/Http/Client.php';
$client = new Zend_Http_Client('http://paste2.org/new-paste');
$client->setParameterPost($post);
$response = $client->request('POST')->getBody();
echo 'zend ', strlen($response), PHP_EOL;

And here are the Xdebug + KCachegrind results for the execution of this script.

pecl_http 12.56%
streams 25.02%
curl 12.69%
pear 24.81%
zend 24.81%

The gethostbyname call in the PEAR call stack again takes up the majority of its runtime, 21.05% in this case. That puts the remainder of the time for PEAR at 3.76%. pecl_http and curl are roughly equivalent in performance to each other and twice that of the others. Oddly, streams (a C extension like pecl_http and curl) suffers a performance difference similar to the libraries written in PHP.

I have a semi-educated guess as to why this is. PEAR makes two gethostbyname calls to process the request, presumably one for the initial POST and one for a GET that follows because the POST response includes a Location header. Zend appears to make two stream_socket_client calls for the same reason. Streams do not appear to implicitly cache DNS lookups, so the HTTP streams wrapper is most likely in the same situation.

The existence of the CURLOPT_DNS_USE_GLOBAL_CACHE option and the http.request.datashare.dns configuration setting and the fact that both are enabled by default lead me to believe that the curl and pecl_http extensions do cache DNS lookups and thus don’t suffer the performance hit of repeating them. Can anyone confirm or deny this?

AWDG November 2008 Meetup Slides are up

While I was in Atlanta this past week for php|works / PyWorks Conference, I volunteered to speak at the November meetup for the Atlanta Web Designers Group. Slides and demo code from my presentation can now be found in the Publications area of this web site. Thanks to the group for their invitation and hospitality and to Ben Ramsey for introducing us.

Natural Ordering in MySQL

I ran into an instance recently where I wanted to implement natural sorting of a result set in MySQL. When you’re dealing with numerical strings or strings with a common non-numeric prefix, the common solution of casting the order column to an integer by adding zero to it works fine. However, if neither of the aforementioned conditions is the case, it takes a little more work.

What actually happens when you add zero to a non-numeric column depends on the characters at the beginning of the column value. If the column does not begin with a sequence of one or more numeric characters, then adding zero to that column produces zero. (Ex: “dog” + 0 = 0) If the column does begin with numeric characters, then adding zero to it produces the sequence of numeric characters up to the first non-numeric character in the original value or the end of the value, whichever comes first. (Ex: “12 dogs” + 0 = 12) An example might be the easiest way to illustrate this.

mysql> SELECT name+0<>0, name+0, name
    -> FROM `recommendation`
    -> ORDER BY name+0<>0 DESC, name+0, name;
+-----------+--------+------------------------+
| name+0<>0 | name+0 | name                   |
+-----------+--------+------------------------+
|         1 |      3 | 3 month follow-up      |
|         1 |      6 | 6 month follow-up      |
|         1 |     12 | 12 month follow-up     |
|         0 |      0 | Intervention           |
|         0 |      0 | Observation            |
|         0 |      0 | Specialty Consultation |
+-----------+--------+------------------------+
6 rows in set (0.00 sec)

The first ORDER BY clause checks the string to see if it begins with numeric characters, then places results for those that do first. If you prefer that numeric results appear after non-numeric results, then you can exclude this clause.

The second ORDER BY clause orders the numeric results by casting them to integers and ordering by those integers.

The third clause orders the non-numeric results by the original column value.

And that’s all there is to it. Hope this proves helpful to someone.

Scaling Zend_Form

An adage often exchanged between Zend Framework enthusiasts goes something like this: “The bad thing about Zend Framework is that there’s a dozen ways to do anything. The great thing about Zend Framework is that there’s a dozen ways to do anything.” To a degree, this is a boon for the project. I think it’s fair to say that it’s one of the more flexible framework projects out there when it comes to how to do things with it.

I came across an instance using Zend_Form recently where the level of flexibility offered was a bit of a double-edged sword. In order to provide maximum flexibility per form element instance, each has not only their own filter, validator, and decorator instances, but also a plugin loader instance for each of these three types of plugins. These add up quickly when you have a form with several hundred elements in it.

But maybe you don’t have a reason for needing each element to have its own plugin loaders. I honestly can’t see a use case for that, but I’ve heard it claimed that one exists. For large forms, you can improve performance and memory usage by manually instantiating a plugin loader for each type of plugin, configuring them, and then having all elements added to the form use those plugin loader instances. To do this, subclass Zend_Form like so:

require_once 'Zend/Form.php';
require_once 'Zend/Loader/PluginLoader.php';

class Custom_Form extends Zend_Form
{
    private $_elementLoaders;

    public function init()
    {
        // Clear default form decorator paths so elements don't inherit them
        $this->getPluginLoader('decorator')->clearPaths();

        // Instantiate and configure central plugin loaders for elements
        $this->_elementLoaders = array();

        $this->_elementLoaders['decorator'] = new Zend_Loader_PluginLoader();
        // $this->_elementLoaders['decorator']->addPrefixPath( ... );

        $this->_elementLoaders['validate'] = new Zend_Loader_PluginLoader();
        // $this->_elementLoaders['validate']->addPrefixPath( ... );

        $this->_elementLoaders['filter'] = new Zend_Loader_PluginLoader();
        // $this->_elementLoaders['filter']->addPrefixPath( ... );
    }

    public function addElement($element, $name = null, $options = null)
    {
        if (!is_array($options)) {
            $options = array();
        }

        // A plugin loader is implicitly created if default decorators are loaded
        $options['disableLoadDefaultDecorators'] = true;

        // Add the element to the form
        parent::addElement($element, $name, $options);

        // Configure the element to use the central plugin loaders
        $element = $this->getElement($name);
        foreach ($this->_elementLoaders as $type => $loader) {
            $element->setPluginLoader($loader, $type);
        }

        // Now load default decorators for the element
        $element->loadDefaultDecorators();

        return $this;
    }
}

I find the Internationalization of Zend_Form page in the Reference Guide to be a bit misleading. While no i18n is done by default, that doesn’t mean that Zend_Translate components are not still loaded by default. In my opinion, Zend_Form should have been designed such that you would enable this feature if you needed it rather than disabling it if you didn’t. Be that as it may, here’s how you can handle turning it off if you don’t need it to gain a little extra performance.

class Custom_Form extends Zend_Form
{
    public function init()
    {
        $this->setDisableTranslator(true);
    }

    public function addElement($element, $name = null, $options = null)
    {
        if (!is_array($options)) {
            $options = array();
        }
        $options['disableTranslator'] = true;
        return parent::addElement($element, $name, $options);
    }
}

If you are currently running ZF 1.6.2, changeset 12201 was committed recently to address a performance issue with translation (which applies whether it’s disabled or not) of multi elements. It should be included in 1.7, but in the meantime the patch is easy to apply and shouldn’t conflict with any existing code using it.

The only other bottleneck that I noticed was how quickly calls to render individual form elements added up. I’m not sure if there’s any way around this or if any particular area of the task is causing a bottleneck, but it’s something that I hope to investigate further in the future. For the time being, I hope the changes I’ve mentioned here are helpful to someone. If you have any relevant comments, please feel free to post them to this entry. Thanks in advance for your contributions to the discussion!