Posts tagged ‘PHP’

php|tek 2009 Webcast Series

Probably should have lumped this in with my last post, but it’s a rare convenience that related thoughts like these all occur to me at the same time. Another event related to the php|tek 2009 conference coming up in May is a free webcast series for which I have been invited to present.

Webcasts will be held roughly every two weeks leading up to the beginning of the conference. I’m scheduled for Friday February 27 at 1 PM EST and my topic will be "When RSS Fails: Web Scraping with HTTP." Participation is free, but the number of participants is limited, so register early. Note that participation requires a machine running either Windows XP or higher or MacOS X 10.4 or higher. Look forward to seeing you there!

php|tek 2009 Hackathon

Planning on attending php|tek 2009? Think you might like for a Hackathon to be included in the Unconference? (Or want to know more about what such an event would entail?) Please visit the tek09 Google Group Hackathon thread and post your comments and suggestions. Also, please use the address http://tinyurl.com/tek09hackathon to help spread the word! Look forward to receiving your feeback.

Benchmarking PHP HTTP Clients

If you read my blog semi-regularly, you might remember when I mentioned that my book would be released later on this year. Unfortunately, that project had to be put on hold in favor of a few other projects. Now that those are winding down, however, I’m able to return to working on the book. I’m hoping the manuscript will be completed by the end of March 2009.

One of the interesting bits of research that I’ve done is benchmarking various mainstream PHP HTTP clients. Of course, we all know that there are lies, damned lies, statistics, and benchmarks, so take these with a grain of salt. They were run on my Sony Vaio, which is an Intel C2D T5550 @ 1.83GHz with 2 GB of RAM running Ubuntu Ibex and its standard php5 package. According to Speedtest.net, my Cox Cable connection has a 12,375 kb/s download rate and a 5,998 kb/s upload rate.

<?php
// pecl_http (1.6.1)
$response = http_get(
    'http://paste2.org/new-paste',
    array(
        'connecttimeout' =>  15
    )
);
echo 'http ', strlen($response), PHP_EOL;

// streams http wrapper
$response = file_get_contents('http://paste2.org/new-paste');
echo 'streams ', strlen($response), PHP_EOL;

// curl (php5-curl Ubuntu package:
// libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8)
$ch = curl_init('http://paste2.org/new-paste');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
echo 'curl ', strlen($response), PHP_EOL;

// PEAR::HTTP_Client (PEAR 1.7.2, HTTP_Client 1.2.1)
$error = error_reporting(E_ALL);
require_once 'HTTP/Client.php';
$client = new HTTP_Client();
$client->get('http://paste2.org/new-paste');
$response = $client->currentResponse();
$response = $response['body'];
echo 'pear ', strlen($response), PHP_EOL;
error_reporting($error);

// Zend_Http_Client (SVN r12780)
require_once 'Zend/Http/Client.php';
$client = new Zend_Http_Client('http://paste2.org/new-paste');
$response = $client->request()->getBody();
echo 'zend ', strlen($response), PHP_EOL;

The Ubuntu packages for Xdebug (php5-xdebug) and KCachegrind produced the following results for this script.

pecl_http 20.08%
streams 19.81%
curl 19.83%
pear 19.73%
zend 19.88%

So the performance of these components is roughly equivalent. One thing that’s interesting is that the call tree for PEAR is actually the longest (four calls underneath the one shown in the source here) and at the bottom is a call to gethostbyname, which takes 18.97% of the script’s runtime, putting the amount used by the calls above it at 0.76%. This suggests that the majority of the time taken by the other components is likely due to the same reason.

Let’s try a slightly more complex request.

<?php
$post = array(
    'lang' => 'php',
    'description' => '',
    'code' => 'test',
    'parent' => '0'
);

// pecl_http
$response = http_post_fields(
    'http://paste2.org/new-paste',
    $post,
    null,
    array('connecttimeout' => 15)
);
echo 'http ', strlen($response), PHP_EOL;

// streams http wrapper
$context = stream_context_create(array(
    'http' => array(
        'method' => 'POST',
        'header' => 'Content-Type: application/x-www-form-urlencoded',
        'content' => http_build_query($post)
    )
));
$response = file_get_contents('http://paste2.org/new-paste', false, $context);
echo 'streams ', strlen($response), PHP_EOL;

// curl
$params = array(
    CURLOPT_URL => 'http://www.paste2.org/new-paste',
    CURLOPT_POST => true,
    CURLOPT_HEADER => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_POSTFIELDS => $post
);
$ch = curl_init();
foreach ($params as $key => $value) {
    curl_setopt($ch, $key, $value);
}
$response = curl_exec($ch);
curl_close($ch);
echo 'curl ', strlen($response), PHP_EOL;

// PEAR::HTTP_Client
$error = error_reporting(E_ALL);
require_once 'HTTP/Client.php';
$client = new HTTP_Client();
$client->post('http://paste2.org/new-paste', $post);
$response = $client->currentResponse();
$response = $response['body'];
echo 'pear ', strlen($response), PHP_EOL;
error_reporting($error);

// Zend_Http_Client
require_once 'Zend/Http/Client.php';
$client = new Zend_Http_Client('http://paste2.org/new-paste');
$client->setParameterPost($post);
$response = $client->request('POST')->getBody();
echo 'zend ', strlen($response), PHP_EOL;

And here are the Xdebug + KCachegrind results for the execution of this script.

pecl_http 12.56%
streams 25.02%
curl 12.69%
pear 24.81%
zend 24.81%

The gethostbyname call in the PEAR call stack again takes up the majority of its runtime, 21.05% in this case. That puts the remainder of the time for PEAR at 3.76%. pecl_http and curl are roughly equivalent in performance to each other and twice that of the others. Oddly, streams (a C extension like pecl_http and curl) suffers a performance difference similar to the libraries written in PHP.

I have a semi-educated guess as to why this is. PEAR makes two gethostbyname calls to process the request, presumably one for the initial POST and one for a GET that follows because the POST response includes a Location header. Zend appears to make two stream_socket_client calls for the same reason. Streams do not appear to implicitly cache DNS lookups, so the HTTP streams wrapper is most likely in the same situation.

The existence of the CURLOPT_DNS_USE_GLOBAL_CACHE option and the http.request.datashare.dns configuration setting and the fact that both are enabled by default lead me to believe that the curl and pecl_http extensions do cache DNS lookups and thus don’t suffer the performance hit of repeating them. Can anyone confirm or deny this?

AWDG November 2008 Meetup Slides are up

While I was in Atlanta this past week for php|works / PyWorks Conference, I volunteered to speak at the November meetup for the Atlanta Web Designers Group. Slides and demo code from my presentation can now be found in the Publications area of this web site. Thanks to the group for their invitation and hospitality and to Ben Ramsey for introducing us.

Scaling Zend_Form

An adage often exchanged between Zend Framework enthusiasts goes something like this: “The bad thing about Zend Framework is that there’s a dozen ways to do anything. The great thing about Zend Framework is that there’s a dozen ways to do anything.” To a degree, this is a boon for the project. I think it’s fair to say that it’s one of the more flexible framework projects out there when it comes to how to do things with it.

I came across an instance using Zend_Form recently where the level of flexibility offered was a bit of a double-edged sword. In order to provide maximum flexibility per form element instance, each has not only their own filter, validator, and decorator instances, but also a plugin loader instance for each of these three types of plugins. These add up quickly when you have a form with several hundred elements in it.

But maybe you don’t have a reason for needing each element to have its own plugin loaders. I honestly can’t see a use case for that, but I’ve heard it claimed that one exists. For large forms, you can improve performance and memory usage by manually instantiating a plugin loader for each type of plugin, configuring them, and then having all elements added to the form use those plugin loader instances. To do this, subclass Zend_Form like so:

require_once 'Zend/Form.php';
require_once 'Zend/Loader/PluginLoader.php';

class Custom_Form extends Zend_Form
{
    private $_elementLoaders;

    public function init()
    {
        // Clear default form decorator paths so elements don't inherit them
        $this->getPluginLoader('decorator')->clearPaths();

        // Instantiate and configure central plugin loaders for elements
        $this->_elementLoaders = array();

        $this->_elementLoaders['decorator'] = new Zend_Loader_PluginLoader();
        // $this->_elementLoaders['decorator']->addPrefixPath( ... );

        $this->_elementLoaders['validate'] = new Zend_Loader_PluginLoader();
        // $this->_elementLoaders['validate']->addPrefixPath( ... );

        $this->_elementLoaders['filter'] = new Zend_Loader_PluginLoader();
        // $this->_elementLoaders['filter']->addPrefixPath( ... );
    }

    public function addElement($element, $name = null, $options = null)
    {
        if (!is_array($options)) {
            $options = array();
        }

        // A plugin loader is implicitly created if default decorators are loaded
        $options['disableLoadDefaultDecorators'] = true;

        // Add the element to the form
        parent::addElement($element, $name, $options);

        // Configure the element to use the central plugin loaders
        $element = $this->getElement($name);
        foreach ($this->_elementLoaders as $type => $loader) {
            $element->setPluginLoader($loader, $type);
        }

        // Now load default decorators for the element
        $element->loadDefaultDecorators();

        return $this;
    }
}

I find the Internationalization of Zend_Form page in the Reference Guide to be a bit misleading. While no i18n is done by default, that doesn’t mean that Zend_Translate components are not still loaded by default. In my opinion, Zend_Form should have been designed such that you would enable this feature if you needed it rather than disabling it if you didn’t. Be that as it may, here’s how you can handle turning it off if you don’t need it to gain a little extra performance.

class Custom_Form extends Zend_Form
{
    public function init()
    {
        $this->setDisableTranslator(true);
    }

    public function addElement($element, $name = null, $options = null)
    {
        if (!is_array($options)) {
            $options = array();
        }
        $options['disableTranslator'] = true;
        return parent::addElement($element, $name, $options);
    }
}

If you are currently running ZF 1.6.2, changeset 12201 was committed recently to address a performance issue with translation (which applies whether it’s disabled or not) of multi elements. It should be included in 1.7, but in the meantime the patch is easy to apply and shouldn’t conflict with any existing code using it.

The only other bottleneck that I noticed was how quickly calls to render individual form elements added up. I’m not sure if there’s any way around this or if any particular area of the task is causing a bottleneck, but it’s something that I hope to investigate further in the future. For the time being, I hope the changes I’ve mentioned here are helpful to someone. If you have any relevant comments, please feel free to post them to this entry. Thanks in advance for your contributions to the discussion!

ZendCon 2008 Slides Up

The slides from my talks at ZendCon 2008 are now up for your viewing pleasure. I’ll try to have the traditional conference wrap-up post up a little later on this weekend.

Creating Web Services with Zend Framework

Web Scraping

Environmental Awareness Quickie

I ran into an instance recently where someone was trying to run Phergie in an environment where the exec function was disabled. This causes a warning in the Quit plugin, which uses exec to automatically detect the full path to the PHP CLI binary on non-Windows systems that it will later use that path to initiate a new PHP CLI process to “restart” the bot.

I realized when I started digging into this issue that I wasn’t aware of a way to check the PHP configuration to see whether or not a function was disabled, save for using the ini_get function to get the value of the disable_functions setting and parse it manually.

Thanks to Johannes Schlüter for cluing me into the fact that the SPL ReflectionFunction class has an isDisabled method, which is exactly what I was looking for. Unfortunately, there’s no equivalent method in the SPL ReflectionClass class for the disable_classes setting. Thankfully, though, I haven’t run into a use case for that yet.

Update: Apparently I inspired Johannes to write a patch to add ReflectionClass::isDisabled(). It most likely won’t make it in until 5.3.1, but at least the patch is there if you need the feature. Thanks Johannes!

EAV Modeling – Square Peg in a Round Hole?

So I got the June 2008 issue of php|architect (or volume 7 issue 6 for those of you who track it that way) in recently. Right off, I found the cover article on EAV modeling to be of interest seeing as my current employment is in the medical IT industry and I’d never heard of this technique for storing data. I actually more or less knew what it was, but had never put a name to the face so to speak.

The mental image that came to me when reading about this approach to data modeling was taking the traditional relational table and turning it on its head. Despite what the Wikipedia article on the topic might tout early on, there are disadvantages to using the EAV approach. EAV actually has to circumvent, work around, or reimplement features that most mainstream database servers today provide “for free” to the traditional relational counterparts of EAV in order to get equal functionality. These include native data type validation and data type-specific operations without explicit typecasting (if you’re not separating EAV values by data type), row-level referential integrity, and schema metadata. EAV also adds a dimension of complexity to query construction in an era where storage is becoming cheaper and database technologies are evolving. It may work, but I don’t foresee it scaling very well for larger systems. In short, it seems an attempt to force a square peg into the round hole that are traditional relational database systems.

In a MySQL world, there are alternative approaches for deploying DDL modifications. One is to implement master-slave replication to propagate DDL modifications and load balancing to maintain uptime as changes are propagated from server to server. Another is to use MySQL Proxy to direct queries to servers hosting unmodified schemas and queue DML operations in the binary log while DDL modifications are made. Once DDL is complete, the server goes into “read only” mode while queued DML operations are applied and incoming DML operations received during that time are blocked until the queue is empty. (This may be a potential point of improvement.)

Outside of MySQL, there are document-focused database systems such as Apache Lucene and its current .NET and PHP ports as well as Apache CouchDB. While some of these are still a little early in development, I see them as being more ideal for applications demanding more fluid data storage and hope that the development of similar solutions continues.

Speaking at ZendCon

It appears the schedule for ZendCon is at least partly up. I’d been reserving this announcement mostly out of a personal sense of superstitution, but since it seems to be official now, so I’ll go ahead and pipe up: I’m speaking at ZendCon.

Out of four proposals, one managed to make it onto the conference schedule, and that was Pick Your Protocol: Creating Web Services with Zend Framework. It won’t be my first time at ZendCon, but it will be my first time there as a speaker. I’m looking forward to being among their ranks as well as meeting friends new and old.

See you all in September!

Output Filters in Zend_View

A feature of Zend Framework MVC that isn’t currently very well documented is output filters. They’re mentioned in passing in the Zend_View documentation, but not reviewed in detail anywhere in the Reference Guide as of version 1.5.2. I was curious enough about how to implement markup minification that I decided to trace through the Zend_View source code in attempt to discern how output filters actually worked. As it turns out, it’s actually pretty simple.

First, you need to get a reference to the current Zend_View instance. If you’re using the Zend_Layout MVC integration, you can get this by calling $this->_helper->layout->getLayoutInstance within your Zend_Controller_Action class to get the current Zend_Layout instance and then getView on that to get your Zend_View instance. Otherwise, the Zend_View instance is available via the view property of Zend_Controller_Action instance.

Next, call addFilterPath or setFilterPath on your Zend_View instance from your Zend_Controller_Action class. Pass in a path to the directory to contain your output filter classes and a naming prefix that all of your output filter classes will use. I’m not sure why the class prefix defaults to “Zend_View_Filter_” since no such classes exist. In my opinion, it would have made more sense to derive the prefix based on the provided directory path. Anyway, create the directory you’ve specified if it doesn’t already exist and create a new class file within that directory. In my case, I named the directory Vendor/View/Filter, the file Minify.php, and the class contained in the file Vendor_View_Filter_Minify.

Within this class, you must implement at least one method, filter. This method should accept a single parameter, which will be a string containing the view ouput to be filtered, and should return the filtered version of that string. Optionally, if your filter requires access to the related Zend_View instance, you can also declare a setView method that accepts the Zend_View instance as its only parameter and it will automatically be passed in when your output filter class is instantiated. Within setView, you can store the Zend_View instance in an instance property of the output filter class so it can be referred to later in the filter method.

Once you’ve finished your output filter class, you need to explicitly add it to the output filters in use from your Zend_Controller_Action class. You can use addFilter or setFilter for this. Pass in the name of your output filter class without the class prefix. In my case, I passed in “Minify.” At this point, the filter should be used when rendering your page. I poked around in the DOM and Tidy PHP extension documentation, but couldn’t find a feature for markup minification, so I ended up using the PCRE extension to do the job. Below is the final source code for my output filter class.

Vendor/View/Filter/Minify.phpclass Vendor_View_Filter_Minify
{
    public function filter($string)
    {
        return preg_replace(
            array('/>\s+/', '/\s+</', '/[\x0A\x0D]+/'),
            array('>', '<', ' '),
            $string
        );
    }
}

Vendor/Controller.phpclass Vendor_Controller extends Zend_Controller_Action{    public init()    {        $this->_helper->layout->getLayoutInstance()->getView()            ->addFilterPath('Vendor/View/Filter', 'Vendor_View_Filter_')            ->addFilter('Minify');    }}