<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Matthew Turland</title>
	<atom:link href="http://matthewturland.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://matthewturland.com</link>
	<description></description>
	<lastBuildDate>Mon, 30 Aug 2010 02:19:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Process Isolation in PHPUnit</title>
		<link>http://matthewturland.com/2010/08/19/process-isolation-in-phpunit/</link>
		<comments>http://matthewturland.com/2010/08/19/process-isolation-in-phpunit/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 02:40:28 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[PHPUnit]]></category>
		<category><![CDATA[Testing]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=688</guid>
		<description><![CDATA[I was recently writing a unit test for an autoloader when I came across a somewhat unintuitive behavior in PHPUnit. One requirement of the test suite was that some test methods had to be run in a separate process since class declarations reside in the global scope and persist until the process terminates. So, I [...]]]></description>
			<content:encoded><![CDATA[<p>I was recently writing a unit test for an <a title="PHP: Autoloading Classes - Manual" href="http://php.net/manual/en/language.oop5.autoload.php">autoloader</a> when I came across a somewhat unintuitive behavior in <a title="PHPUnit" href="http://www.phpunit.de">PHPUnit</a>.</p>
<p>One requirement of the test suite was that some test methods had to be run in a separate process since class declarations reside in the global scope and persist until the process terminates. So, I slapped a <code><a title="Appendix B. Annotations" href="http://www.phpunit.de/manual/3.4/en/appendixes.annotations.html#appendixes.annotations.runInSeparateProcess">@runInSeparateProcess</a></code> annotation in the docblock of a test method with that requirement, ran the test suite&#8230; and watched that test method fail because the class was still being declared.</p>
<p>It took some head-scratching and tracing through the source of PHPUnit itself to figure out what was going on. When you run the <code>phpunit</code> executable, it&#8217;s actually instantiating <code>PHPUnit_TextUI_TestRunner</code>. The eventual result of this is that the <code>run()</code> method inherited by your subclass of <code>PHPUnit_Framework_TestCase</code> is called.</p>
<p>Depending on the value of the also-inherited <code>$preserveGlobalState</code> instance property, which can be set via the <code>setPreserveGlobalState()</code> method, multiple measures are undertaken to preserve the state of the current process. One such measure is including files for all the classes currently defined in that process, which is what was tripping me up because <code>$preserveGlobalState</code> has a default value of <code>true</code>.</p>
<p><code>$preserveGlobalState</code> must contain its intended value <em>before</em> the <code>run()</code> method is called. The easiest way that I&#8217;ve found to facilitate this is to override the <code>run()</code> method in your subclass, call <code>setPreserveGlobalState()</code> there, then call the parent class implementation of <code>run()</code>. I&#8217;ve included a code sample below to illustrate this.</p>
<pre class="brush: php;">class MyTestCase extends PHPUnit_Framework_TestCase
{
    public function run(PHPUnit_Framework_TestResult $result = NULL)
    {
        $this-&gt;setPreserveGlobalState(false);
        return parent::run($result);
    }
}</pre>
<p>So, if you try to use the <code>@runInSeparateProcess</code> or <code>@runTestsInSeparateProcesses</code> annotations that PHPUnit offers, be aware that the global state will be preserved by default. You will need to explicitly set it to not be so if running tests in separate processes is to have the effect that you are probably intending.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/08/19/process-isolation-in-phpunit/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A Few Kinks in FilterIterator</title>
		<link>http://matthewturland.com/2010/08/15/a-few-kinks-in-filteriterator/</link>
		<comments>http://matthewturland.com/2010/08/15/a-few-kinks-in-filteriterator/#comments</comments>
		<pubDate>Sun, 15 Aug 2010 21:12:28 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[SPL]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=393</guid>
		<description><![CDATA[After a recent release of Phergie, I came across a few issues stemming partly from odd behavior in the PHP FilterIterator class. First, bug #52559. At the time, I was trying to troubleshoot why the first element seemed to be skipped during iteration by a subclass of FilterIterator. Not knowing that FilterIterator contained no count() method at [...]]]></description>
			<content:encoded><![CDATA[<p>After a recent release of <a title="Phergie — A PHP IRC Bot" href="http://phergie.org">Phergie</a>, I came across <a title="Known Issue in Phergie 2.0.3 — Phergie" href="http://phergie.org/2010/08/08/known-issue-in-phergie-2-0-3/">a few issues</a> stemming partly from odd behavior in the <a title="PHP: Hypertext Preprocessor" href="http://php.net">PHP</a> <a title="PHP: FilterIterator - Manual" href="http://us2.php.net/filteriterator"><code>FilterIterator</code></a> class.</p>
<p>First, <a title="PHP :: Bug #52559 :: Calling undefined method on &lt;code&gt;FilterIterator&lt;/code&gt; subclasses causes segfault" href="http://bugs.php.net/bug.php?id=52559">bug #52559</a>. At the time, I was trying to troubleshoot why the first element seemed to be skipped during iteration by a subclass of <code>FilterIterator</code>. Not knowing that <code>FilterIterator</code> contained no <code>count()</code> method at the time, I tried calling it to get the number of elements in the original array.</p>
<p>Once I discovered the <a title="Segmentation fault - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Segmentation_fault">segfault</a>, I had to come up with a short code sample exposing the bug in order to report it. Tracing that segfault to that method call was a bit tedious, but <a title="Xdebug - Debugger and Profiler Tool for PHP" href="http://xdebug.org/">Xdebug</a> helped. Lesson learned: when a segfault occurs, check calls to all methods of native classes to ensure that they actually exist. The <code>--rc</code> option of the PHP CLI binary is particularly useful for this.</p>
<p>Last, <a title="PHP :: Bug #52560 :: FilterIterator errantly returns null for first element" href="http://bugs.php.net/bug.php?id=52560">bug #52560</a>. This bug was an indirect cause of the first element being skipped during iteration. What actually happens is that <code>null</code> is returned in place of that element. The bug report goes over this in more detail, but the easiest way to work around this bug is to override the <code>FilterIterator</code> constructor in the subclass and call the <code>rewind()</code> method to explicitly reset the iterator position when the class is instantiated, as shown below.</p>
<pre class="brush: php;">&lt;?php
class MyFilterIterator extends FilterIterator {
    public function __construct(Iterator $iterator) {
        parent::__construct($iterator);
        $this-&gt;rewind();
    }
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/08/15/a-few-kinks-in-filteriterator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Update on &#8220;Web Scraping with PHP&#8221;</title>
		<link>http://matthewturland.com/2010/07/17/an-update-on-web-scraping-with-php/</link>
		<comments>http://matthewturland.com/2010/07/17/an-update-on-web-scraping-with-php/#comments</comments>
		<pubDate>Sun, 18 Jul 2010 02:51:46 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=385</guid>
		<description><![CDATA[Several people have asked me the same question recently, so I decided to take a blog post to provide an answer. The question is, &#8220;When will &#8216;Web Scraping with PHP&#8216; be available in print?&#8221; Answering requires a bit of background to paint a full picture of where things are now. I turned in the manuscript [...]]]></description>
			<content:encoded><![CDATA[<p>Several people have asked me the same question recently, so I decided to take a blog post to provide an answer. The question is, &#8220;When will &#8216;<a title="php|architect’s Guide to Web Scraping with PHP | php|architect" href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/">Web Scraping with PHP</a>&#8216; be available in print?&#8221; Answering requires a bit of background to paint a full picture of where things are now.</p>
<p>I turned in the manuscript for the book, which had already gone through technical editing, back in April 2009. That&#8217;s right, 14 months ago. After going through proofreading and layout, having a cover designed, getting an ISBN, and so forth, the digital version of the book finally went up for sale <a title="Matthew Turland  » Blog Archive » “Web Scraping with PHP” Now Available!" href="http://matthewturland.com/2010/04/20/web-scraping-with-php-now-available/">about three months ago</a>.</p>
<p>At that point, I asked if print copies would be available at <a title="PHP Conference — Chicago IL May 18-22 2010 — PHP, MySQL, Linux, Windows, Drupal, WordPress" href="http://tek.phparch.com/">php|tek</a>. The publisher sounded unsure, but told me that they would see what they could do. I was later told that the book had gone through printing and that arrangements had been made for a box to be shipped to the hotel on the Monday prior to the conference starting. It didn&#8217;t arrive at any point before or during the conference.</p>
<p>After the conference was over (now about two months ago), I was told that the box had finally arrived at the hotel the following Wednesday. I&#8217;m not sure how this was known, because the box was never recovered from the hotel to be shipped elsewhere.</p>
<p>The publisher then resolved to request another box from the printer. This box would have to be shipped to someone to proof the printed copies for quality assurance, then the books would go on sale. I was told about this resolution in early June, about five or six weeks ago. I&#8217;ve been in touch with the person responsible for proofing the books; they haven&#8217;t been received yet.</p>
<p>I have no direct contact with the printer, so all I&#8217;ve been able to do is e-mail the publisher repeatedly (six times and counting) since then asking for an update. Through an instant messenger conversation, I was told that a call was supposed to be placed to the printer early last week to determine what the status of things was. So far as I know, that call was never made, or at least I was never told of the result.</p>
<p>I&#8217;m as frustrated with this situation as anyone. I want the book to be available to people who want to read it in the format they prefer. If you are in this audience, please contact the publisher to voice your opinion. You can do so by using <a title="Contact php|architect | php|architect" href="http://www.phparch.com/policies/contact-us/">this contact form</a> on their web site or by <a title="php|architect (phparch) on Twitter" href="http://twitter.com/phparch">sending them a tweet</a> on Twitter. Thank you all very much for your continued support.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/07/17/an-update-on-web-scraping-with-php/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Webscrapers Mailing List</title>
		<link>http://matthewturland.com/2010/07/03/webscrapers-mailing-list/</link>
		<comments>http://matthewturland.com/2010/07/03/webscrapers-mailing-list/#comments</comments>
		<pubDate>Sat, 03 Jul 2010 12:25:57 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[cURL]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=377</guid>
		<description><![CDATA[Daniel Stenberg, one of the primary authors of the libcurl library on which the PHP cURL extension is based, was kind enough to comment on and clarify a recent blog post of mine regarding web scraping using the PHP and cURL. He later sent me a tweet to invite me to a new mailing list for [...]]]></description>
			<content:encoded><![CDATA[<p><a title="daniel.haxx.se" href="http://daniel.haxx.se">Daniel Stenberg</a>, one of the primary authors of the <a title="cURL and libcurl" href="http://curl.haxx.se">libcurl library</a> on which the <a title="PHP: cURL - Manual" href="http://us3.php.net/curl">PHP cURL extension</a> is based, was kind enough to <a title="Matthew Turland » Blog Archive » Gotcha on Scraping .NET Applications with PHP and cURL" href="http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/comment-page-1/#comment-5202">comment on</a> and clarify a <a title="Matthew Turland &Acirc;&raquo; Blog Archive &Acirc;&raquo; Gotcha on Scraping .NET Applications with PHP and cURL" href="http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/">recent blog post</a> of mine regarding web scraping using the PHP and cURL. He later sent me <a title="Twitter / Daniel Stenberg: @elazar Allow me to invite ..." href="http://twitter.com/bagder/status/17590025600">a tweet</a> to invite me to a new <a title="Webscrapers - The Community" href="http://webscrapers.haxx.se">mailing list</a> for web scraping enthusiasts just before <a title="Twitter / Daniel Stenberg: Everyone is welcome to joi ..." href="http://twitter.com/bagder/status/17590320446">tweeting a public invitation</a>. In addition to the mailing list itself, the web site also has links to books (including <a title="php|architect’s Guide to Web Scraping with PHP | php|architect" href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/">my book</a>) and popular tools related to the subject. I think this is awesome and I encourage anyone with an interest in web scraping, professional or recreational, to join.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/07/03/webscrapers-mailing-list/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ledger and Building It From Source on Ubuntu 10.04</title>
		<link>http://matthewturland.com/2010/07/01/ledger-and-building-it-from-source-on-ubuntu-10-04/</link>
		<comments>http://matthewturland.com/2010/07/01/ledger-and-building-it-from-source-on-ubuntu-10-04/#comments</comments>
		<pubDate>Fri, 02 Jul 2010 03:09:14 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[Ubuntu]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=370</guid>
		<description><![CDATA[So I recently started looking around for finance software that would run on Ubuntu and quickly found reasons to dislike suggested options. Then I found Ledger. Wow, did it seem awesome by comparison. So, I added the Ubuntu PPA (see the &#8220;Platform binaries&#8221; section of this wiki page), installed it, created a data file for [...]]]></description>
			<content:encoded><![CDATA[<p>So I recently started <a title="Twitter / Matthew Turland: Any recommendations for fi ..." href="http://twitter.com/elazar/status/17336783301">looking around for finance software</a> that would run on <a title="Ubuntu homepage | Ubuntu" href="http://www.ubuntu.com">Ubuntu</a> and quickly found <a title="Twitter / Matthew Turland: So, something that GnuCash ..." href="http://twitter.com/elazar/status/17377748418">reasons to dislike suggested options</a>. Then I <a title="Twitter / Matthew Turland: Giving Ledger a shot. CLI, ..." href="http://twitter.com/elazar/status/17377850801">found Ledger</a>. Wow, did it seem awesome by comparison. So, I added the Ubuntu PPA (see the &#8220;Platform binaries&#8221; section of <a title="Home - ledger - GitHub" href="http://wiki.github.com/jwiegley/ledger/">this wiki page</a>), installed it, created a data file for my finances, and ran the ledger CLI executable on it.</p>
<p>Then I ran into a problem that appeared to be a bug: in a transaction with multiple postings and only one with a null amount, I was receiving the error, &#8220;Error: Only one posting with null amount allowed per transaction.&#8221; Checking the <a title="Ledger | Google Groups" href="http://groups.google.com/group/ledger-cli">Google Group</a> didn&#8217;t reveal any other reports of the issue, nor did searching the <a title="Bugzilla Main Page" href="http://newartisans.com/bugzilla/">Bugzilla database</a>.</p>
<p>So, I hopped onto the #ledger IRC channel on Freenode, which is the network I tend to frequent anyway. Within minutes, I was able to have the lead developer on the project confirm that the issue appeared to be a bug and politely request that I file a <a title="Bug 374 – Problem with periodic transactions having a null posting" href="http://newartisans.com/bugzilla/show_bug.cgi?id=374">bug report</a> for it, which I did.</p>
<p>I was also able to consult the <a title="README-1ST" href="http://github.com/jwiegley/ledger/raw/master/README-1ST">README-1ST</a> file for instructions on how to do a custom build from source, which I intended to use to ensure that the bug hadn&#8217;t already been fixed in the git repository. The only thing that this file lacked was a list of dependencies, but I was able to locate those through trial and error with the build tool and thought I&#8217;d post them here for anyone else looking to build ledger from source on Ubuntu 10.04.</p>
<pre class="brush: bash;">sudo apt-get install libboost-dev libboost-date-time-dev libboost-filesystem-dev libboost-iostreams-dev libboost-regex-dev libgmp3-dev libmpfr-dev texinfo</pre>
<p>Once you&#8217;ve executed the command above from the shell, you should be able to run the command below from the README-1ST file to create your build. The executable will be created in the root of the source tree and named &#8220;ledger.&#8221;</p>
<pre class="brush: bash;">./acprep update</pre>
<p>To create a debug build, which I did to be able to submit debugging output related to my issue, issue this command following the one above.</p>
<pre class="brush: bash;">./acprep debug make</pre>
<p><strong>Update</strong>: As it turns out, the issue was not a bug, just a small formatting issue with my data file. However, the lead developer of ledger still plans on looking into make the issue more obvious in ledger&#8217;s output.</p>
<p><strong>Update #2</strong>: It seems the ledger build tool dependencies command supports Ubuntu, CentOS, and OS X. The way the statement was positioned in the README-1ST file, I assumed that support was limited to OS X. So, rather than going through the lengthy process I did to install dependencies on Ubuntu, you can just do this.</p>
<pre class="brush: bash;">./acprep dependencies</pre>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/07/01/ledger-and-building-it-from-source-on-ubuntu-10-04/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Gotcha on Scraping .NET Applications with PHP and cURL</title>
		<link>http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/</link>
		<comments>http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 02:27:09 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[cURL]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=365</guid>
		<description><![CDATA[Obligatory pitch: Many other useful tidbits like this can be yours by purchasing my book, php&#124;architect&#8217;s Guide to Web Scraping with PHP. I recently wrote a PHP script to scrape data from a .NET application. In the process of developing this script, I noticed something interesting that I thought I&#8217;d share. In this case, I [...]]]></description>
			<content:encoded><![CDATA[<p><em>Obligatory pitch: Many other useful tidbits like this can be yours by purchasing my book, </em><a title="php|architect&amp;#8217;s Guide to Web Scraping with PHP | php|architect" href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/"><em>php|architect&#8217;s Guide to Web Scraping with PHP</em></a><em>.</em></p>
<p>I recently wrote a PHP script to scrape data from a .NET application. In the process of developing this script, I noticed something interesting that I thought I&#8217;d share. In this case, I was using the <a title="PHP: cURL - Manual" href="http://php.net/manual/en/book.curl.php">cURL extension</a>, but the tip isn&#8217;t necessarily specific to that. One thing my script did was submit a <a title="POST (HTTP) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/POST_(HTTP)">POST request</a> to simulate a form submission. The code looked something like the sample below.</p>
<pre class="brush: php;">$ch = curl_init();
curl_setopt_array($ch, array(
    CURLOPT_URL =&gt; 'http://...',
    CURLOPT_POST =&gt; true,
    CURLOPT_POSTFIELDS =&gt; array(
        'field1' =&gt; 'value1',
        // ...
    ),
    // ...
));</pre>
<p>The issue I ran into had to do with a behavior of the <code>CURLOPT_POSTFIELDS</code> setting that&#8217;s easy to overlook. This is a segment of its description from the <a title="PHP: curl_setopt - Manual" href="http://us3.php.net/curl_setopt">PHP manual page</a> for the <code>curl_setopt()</code> function.</p>
<blockquote><p>If <em>value</em> is an array, the <em>Content-Type</em> header will be set to <em>multipart/form-data</em>.</p></blockquote>
<p>If the form being submitted is not set to have an <code>enctype</code> attribute value of <code>multipart/form-data</code> in the form&#8217;s markup, .NET returns a 500-level HTTP response with no further information on what causes the error (for security purposes). This presumably happens because it&#8217;s expecting one value for the <code>Content-Type</code> request header and getting another.</p>
<p>Setting <code>CURLOPT_HEADER</code> and <code>CURLOPT_VERBOSE</code> to <code>true</code> helped to reveal that this was the issue. The fix is pretty simple: instead of passing the array itself for <code>CURLOPT_POSTFIELDS</code>, pass the result of wrapping it in a call to the  <code>http_build_query()</code> function (see its <a title="PHP: http_build_query - Manual" href="http://us.php.net/http_build_query">PHP manual page</a>). This converts it to a properly formatted query string, which causes cURL to use the default <code>Content-Type</code> header value of <code>application/x-www-form-urlencoded</code> instead.</p>
<p>Tools like <a title="Firebug" href="http://getfirebug.com">Firebug</a> can help you to examine requests made by a browser. Together with these settings for cURL, you can modify your script&#8217;s requests to match those of your browser as closely as possible, making gotchas like this less likely to trip you up.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Why I Write</title>
		<link>http://matthewturland.com/2010/06/23/why-i-write/</link>
		<comments>http://matthewturland.com/2010/06/23/why-i-write/#comments</comments>
		<pubDate>Thu, 24 Jun 2010 00:53:05 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[Personal]]></category>
		<category><![CDATA[Writing]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=312</guid>
		<description><![CDATA[Someone I know recently sent me a question that I found interesting. &#8220;I&#8217;m&#8230; exploring why I continue to pursue the insanity that is writing, and I want to get some views from people who write in other disciplines. Got any insight to share on why you wrote your book?&#8221; I&#8217;m unfortunately still seeing delays in [...]]]></description>
			<content:encoded><![CDATA[<p>Someone I know recently sent me a question that I found interesting.</p>
<blockquote><p>&#8220;I&#8217;m&#8230; exploring why I continue to pursue the insanity that is writing, and I want to get some views from people who write in other disciplines. Got any insight to share on why you wrote your book?&#8221;</p></blockquote>
<p>I&#8217;m unfortunately still seeing delays in the print edition of <a title="php|architect's Guide to Web Scraping with PHP | php|architect" href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/">my book</a> being published. My apologies to those of you who have been asking after it; trust me when I say that I&#8217;m doing everything I can at this point to make it happen.</p>
<p>Unpleasantries aside, I decided to take a blog post to answer this question. I&#8217;ve actually written on this subject in the past with respect to <a title="Writing Tech Books 101 | Blue Parabola, LLC" href="http://blueparabola.com/blog/writing-tech-books-101">technical publishing</a> in particular, if you&#8217;d like more background on that.</p>
<p>As far as my personal reasons go, they certainly didn&#8217;t relate to money. Technical publishing may not be as saturated a market as mainstream fiction, but it&#8217;s also not as lucrative for authors.</p>
<p>Its relatively limited audience also eliminates fame as a reason, at least outside of that audience. A book may complement an existing reputation, but it&#8217;s more rare to establish a substantial level of notoriety through being published. Respect within that community — colleagues, peers, and prospective employers — is a more feasible goal.</p>
<p>That leads me to my personal main reason for writing my book: credentials. My knowledge and skills were vetted by <a title="books | php|architect" href="http://www.phparch.com/books/">a publisher</a> respected within the industry for the quality of the books they publish. While it may not be directly monetary, that respect has value and there are few ways for an individual to attain it. Publishing a book is one such method.</p>
<p>Lastly, I felt I had something to share with an audience with whom I had a common interest. The topic of my book may be a bit niche, but prospective readers are all the more likely to be fervorous about studying or otherwise pursuing it. Readers of science fiction and technical books have this trait in common. If I ever end up publishing a fiction piece, it will likely be in the former genre.</p>
<p>It&#8217;s one thing to publish a blog post or an article in a professional magazine, but a book signifies a higher level of commitment, dedication, and perseverance. If writing is insane, it&#8217;s in the same boat with getting married and going to college. I doubt there are enough padded rooms and straitjackets in the world for all its college students, married couples, and writers.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/06/23/why-i-write/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New SPL Features in PHP 5.3</title>
		<link>http://matthewturland.com/2010/05/20/new-spl-features-in-php-5-3/</link>
		<comments>http://matthewturland.com/2010/05/20/new-spl-features-in-php-5-3/#comments</comments>
		<pubDate>Thu, 20 May 2010 15:00:53 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[SPL]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=107</guid>
		<description><![CDATA[Note: I&#8217;ve written on this topic before, but thought the subject warranted further more detailed discussion and a more comprehensive and up-to-date set of benchmarks. Hence, this post and this presentation. Enjoy. The SPL, or Standard PHP Library, is an often overlooked extension in the PHP core. It first came on the scene in PHP [...]]]></description>
			<content:encoded><![CDATA[<p><em>Note: I&#8217;ve <a title="The SPL Deserves Some Reiteration | Blue Parabola, LLC" href="http://blueparabola.com/blog/spl-deserves-some-reiteration">written on this topic before</a>, but thought the subject warranted further more detailed discussion and a more comprehensive and up-to-date set of benchmarks. Hence, this post and <a title="New SPL Features in PHP 5.3" href="http://www.slideshare.net/tobias382/new-spl-features-in-php-53">this presentation</a>. Enjoy.</em></p>
<p>The <a title="PHP: SPL - Manual" href="http://us3.php.net/spl">SPL</a>, or Standard PHP Library, is an often overlooked extension in the PHP core. It first came on the scene in PHP 5 and a variety of <a title="PHP: Iterators - Manual" href="http://php.net/manual/en/spl.iterators.php">iterators</a> constituted the majority of its initial offerings. Though the <a title="PHP: New Classes - Manual" href="http://www.php.net/manual/en/migration53.classes.php">iterator offerings were expanded in PHP 5.3</a>, the particularly interesting additions to the SPL were several specialized <a title="Data structure - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Data_structures">data structure</a> <a title="PHP: Datastructures - Manual" href="http://php.net/manual/en/spl.datastructures.php">classes</a>, the foundational concepts for which originate in the field of <a title="Computer science - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Computer_science">computer science</a>. In this post, I will provide an overview of these new classes and explain why and when they should be used.</p>
<h3>Arrays</h3>
<p>While PHP has several data types, the ones that likely see the most frequent and varied use are <a title="PHP: Strings - Manual" href="http://php.net/manual/en/language.types.string.php">strings</a> and <a title="PHP: Arrays - Manual" href="http://php.net/manual/en/language.types.array.php">arrays</a>. They are the proverbial <a title="Duct tape - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Duct_tape#Common_uses">duct tape</a> and <a title="WD-40 - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/WD-40#Function">WD-40</a> of PHP, respectively. Like arrays, SPL data structure classes are used to store composite (i.e. non-<a title="Scalar (computing) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Scalar_(computing)">scalar</a>) data.</p>
<p>Now, that&#8217;s not to say that every instance of an array in existing codebases should be replaced with an SPL container object. There are cases where it&#8217;s appropriate to use one over the other. Knowing the difference requires an understanding of how arrays work.</p>
<p>Within the C code that makes up the PHP interpreter, arrays are implemented as a data structure called a <a title="Hash table - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Hash_table">hash table or hash map</a>. When a value contained within an array is referenced by its index, PHP uses a <a title="[svn] Contents of /php/php-src/trunk/Zend/zend_hash.h" href="http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_hash.h?revision=298204&amp;view=markup#l228">hashing function</a> to convert that index into a unique hash representing the location of the corresponding value within the array.</p>
<p>This hash map implementation enables arrays to store an arbitrary number of elements and provide access to all of those elements simultaneously using either numeric or string keys. Arrays are extremely fast for the capabilities they provide and are an excellent general purpose data structure.</p>
<h3>Fixed Arrays</h3>
<p>In contrast to arrays, <a title="PHP: SplFixedArray - Manual" href="http://php.net/manual/en/class.splfixedarray.php"><code>SplFixedArray</code></a> functions more like <a title="Arrays" href="http://www.cplusplcom/doc/tutorial/arrays/">C arrays</a> or <a title="Arrays (The Java™ Tutorials &gt; Learning the Java Language &gt; Language Basics)" href="http://java.sun.com/docs/books/tutorial/java/nutsandbolts/arrays.html">Java arrays</a> than <a title="PHP: Arrays - Manual" href="http://php.net/manual/en/language.types.array.php">PHP arrays</a>. The maximum number of elements that it may contain is specified upon instantiation. While it is possible to change it later via the <code>setSize()</code> method, this negates the performance advantages of using it: because its size is fixed, it doesn&#8217;t need to use a hashing function to resolve the position of elements within the array. <strong>It makes sense to use fixed arrays when the number of elements to be stored is known in advance and the elements only need to be accessed by sequential position.</strong></p>
<p><code>SplFixedArray</code> implements the <a title="SPL-StandardPHPLibrary: Iterator Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceIterator.html"><code>Iterator</code></a>, <a title="SPL-StandardPHPLibrary: ArrayAccess Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceArrayAccess.html"><code>ArrayAccess</code></a>, and <a title="SPL-StandardPHPLibrary: Countable Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceCountable.html"><code>Countable</code></a> interfaces. <code>Iterator</code> allows it to be iterated using a <a title="PHP: foreach - Manual" href="http://php.net/manual/en/control-structures.foreach.php"><code>foreach</code></a> loop. <code>ArrayAccess</code> provides access to its elements using <a title="PHP: Arrays - Manual" href="http://php.net/manual/en/language.types.array.php#language.types.array.syntax.modifying">array syntax</a> where elements are referred to using integer positions beginning at 0 as with enumerated arrays. <code>Countable</code> enables a list to be passed to the <a title="PHP: count - Manual" href="http://php.net/count"><code>count()</code></a> function like an array.</p>
<p>Aside from the inability to use it in place of arrays with <a title="PHP: Array Functions - Manual" href="http://php.net/manual/en/ref.array.php">array functions</a>, instances of <code>SplFixedArray</code> function just like arrays for all intents and purposes. It&#8217;s even possible to convert them to and from arrays using the <a title="PHP: SplFixedArray::toArray - Manual" href="http://php.net/manual/en/splfixedarray.toarray.php"><code>toArray()</code></a> and <a title="PHP: SplFixedArray::fromArray - Manual" href="http://php.net/manual/en/splfixedarray.fromarray.php"><code>fromArray()</code></a> methods respectively. However, it generally makes more sense to use <code>SplFixedArray</code> exclusively for each individual use case.</p>
<h3>Lists</h3>
<p>In <a title="Computer science - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Computer_science">computer science</a>, a <a title="List (computing) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/List_(computing)">list</a> is defined as an ordered collection of values. A <a title="Linked list - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Linked_list">linked list</a> is a data structure in which each element in the list includes a reference to one or both of the elements on either side of it within the list. The term &#8220;<a title="Doubly-linked list - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Doubly-linked_list">doubly-linked list</a>&#8221; is used to refer to the latter case. In the SPL, this takes the form of the class <a title="PHP: SplDoublyLinkedList - Manual" href="http://php.net/manual/en/class.spldoublylinkedlist.php"><code>SplDoublyLinkedList</code></a>.</p>
<p>Like <code>SplFixedArray</code>, <code>SplDoublyLinkedList</code> also implements the <a title="SPL-StandardPHPLibrary: Iterator Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceIterator.html"><code>Iterator</code></a>, <a title="SPL-StandardPHPLibrary: ArrayAccess Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceArrayAccess.html"><code>ArrayAccess</code></a>, and <a title="SPL-StandardPHPLibrary: Countable Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceCountable.html"><code>Countable</code></a> interfaces. In addition to the methods that come with these interface implementations, elements can be added to or removed from the start or end of the list using its <a title="PHP: SplDoublyLinkedList::push - Manual" href="http://php.net/manual/en/spldoublylinkedlist.push.php"><code>push()</code></a>, <a title="PHP: SplDoublyLinkedList::pop - Manual" href="http://php.net/manual/en/spldoublylinkedlist.pop.php"><code>pop()</code></a>, <a title="PHP: SplDoublyLinkedList::shift - Manual" href="http://php.net/manual/en/spldoublylinkedlist.shift.php"><code>shift()</code></a> and <a title="PHP: SplDoublyLinkedList::unshift - Manual" href="http://php.net/manual/en/spldoublylinkedlist.unshift.php"><code>unshift()</code></a> methods, which correspond to the <a title="PHP: array_push - Manual" href="http://php.net/array_push"><code>array_push()</code></a>, <a title="PHP: array_pop - Manual" href="http://php.net/array_pop"><code>array_pop()</code></a>, <a title="PHP: array_shift - Manual" href="http://php.net/array_shift"><code>array_shift()</code></a>, and <a title="PHP: array_unshift - Manual" href="http://php.net/array_unshift"><code>array_unshift()</code></a> functions respectively. Unfortunately, as of PHP 5.3.2, there&#8217;s no way to insert an element anywhere in the list other than at the beginning or the end. A <a title="PHP Bugs: #48358: SplDoublyLinkedList needs an insertAfterIterator() method or something similar" href="http://bugs.php.net/bug.php?id=48358">feature request</a> has been filed for this. Add a comment or vote to show support for its addition.</p>
<p>The elements at the start and end of the list are accessible via its <a title="PHP: SplDoublyLinkedList::top - Manual" href="http://php.net/manual/en/spldoublylinkedlist.top.php"><code>top()</code></a> and <a title="PHP: SplDoublyLinkedList::bottom - Manual" href="http://php.net/manual/en/spldoublylinkedlist.bottom.php"><code>bottom()</code></a> methods respectively, which correspond to the <a title="PHP: reset - Manual" href="http://php.net/manual/en/function.reset.php"><code>reset()</code></a> and <a title="PHP: end - Manual" href="http://php.net/manual/en/function.end.php"><code>end()</code></a> functions. Like <code>SplFixedArray</code>, elements can also be accessed arbitrarily by positional index using the array syntax granted by <code>ArrayAccess</code>. <strong>It makes sense to use lists when the number of elements to be stored is not known in advance and the elements only need to be accessed by sequential position.</strong></p>
<h3>Stacks</h3>
<p><a title="Stack (data structure) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Stack_(data_structure)">Stacks</a> are similar to lists with two major differences. First, elements can only be added to the top of the stack. Second, an element can only be accessed by taking it off the top of the stack. Because of these differences, the stack is often referred to as a Last-In-First-Out or <a title="LIFO (computing) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/LIFO_(computing)">LIFO</a> data structure. <a title="PHP: SplStack - Manual" href="http://php.net/manual/en/class.splstack.php"><code>SplStack</code></a> is the SPL stack implementation.</p>
<p><code>SplStack</code> is a bit removed from the traditional definition of a stack. It extends <code>SplDoublyLinkedList</code> and inherits its abilities, some of which don&#8217;t really apply to stacks. In order to enforce its restriction on how elements are accessed, <code>SplStack</code> overrides the <a title="PHP: SplDoublyLinkedList::setIteratorMode - Manual" href="http://php.net/manual/en/spldoublylinkedlist.setiteratormode.php"><code>setIteratorMode()</code></a> method of its parent class and implements <a title="PHP: SplStack::setIteratorMode - Manual" href="http://php.net/manual/en/splstack.setiteratormode.php">its own</a> to prevent modification of the iteration direction. Both methods allow elements to be retained or removed as they are iterated.</p>
<p><strong>Use of stacks makes sense when the number of elements to be stored is not known in advance and the only element that must be accessible is the last one stored.</strong> However, as of PHP 5.3.2, the performance of <code>SplStack</code> leaves something to be desired. Benchmarks included later in this provide an objective illustration of this, though the cause of the behavior remains unknown.</p>
<h3>Queues</h3>
<p><a title="Queue (data structure) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Queue_(data_structure)">Queues</a> are also similar to lists, again with two major differences. First, elements can only be added (or &#8220;enqueued&#8221;) to the end of the queue. Second, an element can only be accessed by removing (or &#8220;dequeueing&#8221;) it from the beginning of the queue. For these reasons the queue is referred to as a First-In-First-Out or <a title="FIFO - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/FIFO">FIFO</a> data structure. The <a title="PHP: SplQueue - Manual" href="http://php.net/manual/en/class.splqueue.php"><code>SplQueue</code></a> class implements this data structure in the SPL.</p>
<p><code>SplQueue</code> follows suit with <code>SplStack</code> in extending <code>SplDoublyLinkedList</code>. Just as <code>SplStack</code> resultingly inherits some operations with at least questionable applicability, so too does <code>SplQueue</code>. Likewise, it overrides <code>setIteratorMode()</code> with <a title="PHP: SplQueue::setIteratorMode - Manual" href="http://php.net/manual/en/splqueue.setiteratormode.php">its own version</a> to restrict how elements are accessed. <strong>Use of queues makes sense when the number of elements to be stored is not known in advance and the only element that must be accessible is the remaining element that was stored earliest.</strong></p>
<p>One minor difference between <code>SplQueue</code> and <code>SplStack</code> is that the former contains two method aliases named after conceptual queue operations: <a title="PHP: SplQueue::dequeue - Manual" href="http://php.net/manual/en/splqueue.dequeue.php"><code>dequeue()</code></a> aliases <a title="PHP: SplDoublyLinkedList::shift - Manual" href="http://php.net/manual/en/spldoublylinkedlist.shift.php"><code>SplDoublyLinkedList::shift()</code></a> and <a title="PHP: SplQueue::enqueue - Manual" href="http://php.net/manual/en/splqueue.enqueue.php"><code>enqueue()</code></a> aliases <a title="PHP: SplDoublyLinkedList::push - Manual" href="http://php.net/manual/en/spldoublylinkedlist.push.php"><code>SplDoublyLinkedList::push()</code></a>. This makes sense because while <code>push()</code> and <code>pop()</code> share similar applicability to conceptual stack operations, they are already present in its parent class.</p>
<p>Despite their common ancestry, <code>SplQueue</code> appears to have better performance than <code>SplStack</code> as of PHP 5.3.2. Benchmarks included later in this post review this in more detail.</p>
<h3>Heaps</h3>
<p>Up to this point, the data structures discussed have resembled lists insofar as they contain elements in the order in which they were added. By contrast, when an element is added to a <a title="Heap (data structure) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Heap_(data_structure)">heap</a>, a comparison function is used to compare the new element to other elements already in the heap and element is placed appropriately within the heap based on that function&#8217;s return value. The beauty of heaps is that their underlying algorithm does this with minimal element comparisons, so it&#8217;s extremely efficient. <strong>Using heaps makes sense when the number of elements to be stored is not known in advance and elements must be accessed in an order based on how they compare to each other.</strong></p>
<p><a title="PHP: SplHeap - Manual" href="http://php.net/manual/en/class.splheap.php"><code>SplHeap</code></a> is an abstract class used to create a heap by extending it and providing a comparison function in the form of its <a title="PHP: SplHeap::compare - Manual" href="http://php.net/manual/en/splheap.compare.php"><code>compare()</code></a> method. Only the root element of a heap, the one yielding the highest comparison function return value, may be accessed or removed from the heap at any given time. This is done using the <a title="PHP: SplHeap::extract - Manual" href="http://php.net/manual/en/splheap.extract.php"><code>extract()</code></a> method of <code>SplHeap</code>. <code>SplHeap</code> implements the <a title="SPL-StandardPHPLibrary: Iterator Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceIterator.html"><code>Iterator</code></a> and <a title="SPL-StandardPHPLibrary: Countable Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceCountable.html"><code>Countable</code></a> interfaces but, because only the root element can be extracted, it does not implement the <a title="SPL-StandardPHPLibrary: ArrayAccess Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceArrayAccess.html"><code>ArrayAccess</code></a> interface like the previously discussed data structure classes.</p>
<p>In addition to the abstract <code>SplHeap</code> class, two concrete implementations are also included in the SPL, namely <a title="PHP: SplMinHeap - Manual" href="http://php.net/manual/en/class.splminheap.php"><code>SplMinHeap</code></a> and <a title="PHP: SplMaxHeap - Manual" href="http://php.net/manual/en/class.splmaxheap.php"><code>SplMaxHeap</code></a>. The <a title="PHP: SplMinHeap::compare - Manual" href="http://php.net/manual/en/splminheap.compare.php"><code>compare()</code></a> method of <code>SplMinHeap</code> returns a value such that the smallest element in the heap is the root element. Likewise, the <a title="PHP: SplMaxHeap::compare - Manual" href="http://php.net/manual/en/splmaxheap.compare.php"><code>compare()</code></a> method of <code>SplMaxHeap</code> returns a value such that the largest element in the heap is the root element.</p>
<p>At first glance, using a subclass of <code>SplHeap</code> may seem equivalent to calling <a title="PHP: sort - Manual" href="http://php.net/sort"><code>sort()</code></a> or a similar function on an array and accessing the elements in sequence. This is indeed the case if all elements are added to the array prior to it being sorted. However, situations such as elements arriving over time or inadequate memory to store all elements simultaneously may preclude this approach. Use of arrays in such situations would require repeated resorting of the entire array as new elements are added, which is inefficient. This is why using the corresponding heap class makes a lot more sense in that situation than repeated calls to <a title="PHP: sort - Manual" href="http://php.net/sort"><code>sort()</code></a>, <a title="PHP: min - Manual" href="http://php.net/manual/en/function.min.php"><code>min()</code></a> or <a title="PHP: max - Manual" href="http://php.net/manual/en/function.max.php"><code>max()</code></a>. Additionally, <code>SplHeap</code> can be used to implement the <a title="Heapsort - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Heapsort">heapsort algorithm</a>, which has better worst case performance versus the <a title="Quicksort - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Quicksort">quicksort algorithm</a> <a title="[svn] Contents of /php/php-src/trunk/Zend/zend_qsort.c" href="http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_qsort.c?revision=296679&amp;view=markup#l56">implementation</a> <a title="[svn] Contents of /php/php-src/trunk/ext/standard/array.c" href="http://svn.php.net/viewvc/php/php-src/trunk/ext/standard/array.c?revision=298204&amp;view=markup#l541">used by arrays</a>.</p>
<h3>Priority Queues</h3>
<p><a title="Priority queue - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Priority_queue">Priority queues</a> are somewhat similar to heaps. In fact, while it doesn&#8217;t extend <code>SplHeap</code>, <a title="PHP: SplPriorityQueue - Manual" href="http://php.net/manual/en/class.splpriorityqueue.php"><code>SplPriorityQueue</code></a> does make use of a heap structure internally to implement its functionality. The difference is that the <a title="PHP: SplPriorityQueue::insert - Manual" href="http://www.php.net/manual/en/splpriorityqueue.insert.php"><code>insert()</code></a> method of <code>SplPriorityQueue</code> queue accepts both a value and an associated priority, removing the need to use an array or object to store both of these and define an appropriate comparison function in an <code>SplHeap</code> instance. Elements with the highest priority, like those in <code>SplMaxHeap</code> with the highest value, are the ones that come out first when <a title="PHP: SplPriorityQueue::extract - Manual" href="http://php.net/manual/en/splpriorityqueue.extract.php"><code>extract()</code></a> is called. Note that elements with equal priority are returned in no particular order.</p>
<p>For reasons similar to those of <code>SplHeap</code>, <code>SplPriorityQueue</code> implements both <a title="SPL-StandardPHPLibrary: Iterator Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceIterator.html"><code>Iterator</code></a> and <a title="SPL-StandardPHPLibrary: Countable Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceCountable.html"><code>Countable</code></a> interfaces and does not implement the <a title="SPL-StandardPHPLibrary: ArrayAccess Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceArrayAccess.html"><code>ArrayAccess</code></a> interface. Because it stores a value and priority per element, <code>SplPriorityQueue</code> includes a <a title="PHP: SplPriorityQueue::setExtractFlags - Manual" href="http://php.net/manual/en/splpriorityqueue.setextractflags.php"><code>setExtractFlags()</code></a> method that modifies the behavior of <code>extract()</code> to return the stored value, the stored priority, or an array containing both. Priorities are not bound to a particular data type: strings, integers, or even composite data types can be used. <code>SplPriorityQueue</code> can be extended and its <a title="PHP: SplPriorityQueue::compare - Manual" href="http://www.php.net/manual/en/splpriorityqueue.compare.php"><code>compare()</code></a> method overridden to customize the comparison logic.</p>
<p><strong>It makes sense to use a priority queue when the number of elements to be stored is not known in advance and elements must be accessed in an order based on how a value associated with each element (versus the element value itself) compares to the same associated values of other elements.</strong></p>
<h3>Sets and Composite Hash Maps</h3>
<p><a title="PHP: SplObjectStorage - Manual" href="http://php.net/manual/en/class.splobjectstorage.php"><code>SplObjectStorage</code></a> combines some of the properties of two different data structures. First, it provides the same functionality of a <a title="Hash table - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Hash_table">hash table</a> that a normal array has, but without its associated inability to use objects as keys unless the <a title="PHP: spl_object_hash - Manual" href="http://php.net/spl_object_hash"><code>spl_object_hash()</code></a> function is used. In other words, it implements a composite hash map. Second, it can be used as a <a title="Set (computer science) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Set_(computer_science)">set</a> to store objects as data without a meaningful corresponding key or concept of sequential order.</p>
<p>Its <a title="PHP: SplObjectStorage::attach - Manual" href="http://php.net/manual/en/splobjectstorage.attach.php"><code>attach()</code></a> method accepts an object key and the data to associate with it and its <a title="PHP: SplObjectStorage::detach - Manual" href="http://php.net/manual/en/splobjectstorage.detach.php"><code>detach()</code></a> method allows data to be removed using its associated object key. To use the object as a set, simply exclude the <code>$data</code> parameter for <code>attach()</code> as it&#8217;s optional. The <a title="Set (computer science) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Set_(computer_science)#Operations">set operations</a> implemented by <code>SplObjectStorage</code> all have array function counterparts. For example, the <a title="PHP: SplObjectStorage::addAll - Manual" href="http://php.net/manual/en/splobjectstorage.addall.php"><code>addAll()</code></a> method and <a title="PHP: array_merge - Manual" href="http://php.net/manual/en/function.array-merge.php"><code>array_merge()</code></a> function both correspond to the union set operation. The difference operation is available using the <a title="PHP: SplObjectStorage::removeAll - Manual" href="http://php.net/manual/en/splobjectstorage.removeall.php"><code>removeAll()</code></a> method and <a title="PHP: array_diff - Manual" href="http://php.net/manual/en/function.array-diff.php"><code>array_diff()</code></a> function and its variants. The <a title="PHP: SplObjectStorage::contains - Manual" href="http://php.net/manual/en/splobjectstorage.contains.php"><code>contains()</code></a> method and <a title="PHP: in_array - Manual" href="http://php.net/manual/en/function.in-array.php"><code>in_array()</code></a> function both implement the element_of operation. Sadly, only arrays have an implementation of the intersection operation in the form of <a title="PHP: array_intersect - Manual" href="http://php.net/manual/en/function.array-intersect.php"><code>array_intersect()</code></a> and its variants. Tobias Schlitt has a <a title="Python. Good, bad, evil -2-: Native sets - Blog - Open Source - schlitt.info" href="http://schlitt.info/opensource/blog/0722_python_good_bad_evil_02_native_sets.html">more in-depth analysis</a> of this data structure that includes implementations of the set operations lacking in the SPL itself.</p>
<p>Like some of the other data structures in the SPL, <code>SplObjectStorage</code> implements the <a title="SPL-StandardPHPLibrary: Iterator Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceIterator.html"><code>Iterator</code></a>, <a title="SPL-StandardPHPLibrary: Countable Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceCountable.html"><code>Countable</code></a>, and <a title="SPL-StandardPHPLibrary: ArrayAccess Interface Reference" href="http://www.php.net/~helly/php/ext/spl/interfaceArrayAccess.html"><code>ArrayAccess</code></a> interfaces. Oddly, it also implements the <a title="PHP: Traversable - Manual" href="http://php.net/manual/en/class.traversable.php"><code>Traversable</code></a> interface (which is limited to internally defined classes and negates the need for implementation of the <code>Iterator</code> interface) and the <a title="PHP: Serializable - Manual" href="http://php.net/manual/en/class.serializable.php"><code>Serializable</code></a> interface (and it is the only SPL data structure class to do so).</p>
<p><strong>Using this class makes sense when data must be stored using composite keys or the ability to access data using set operations is more important than accessing data in a specific order.</strong></p>
<h3>Benchmarks</h3>
<p><em>Standard disclaimer: There are <a title="Lies, damned lies, and statistics - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics">lies, damned lies, and benchmarks</a>. <a title="your mileage may vary - Wiktionary" href="http://en.wiktionary.org/wiki/your_mileage_may_vary"><acronym title="Your Mileage May Vary">YMMV</acronym></a>.</em></p>
<h4>Platform</h4>
<ul>
<li>System: <a title="VGN-NR298E/S | VAIO® NR Series Notebook PC | Sony | SonyStyle USA" href="http://www.sonystyle.com/webapp/wcs/stores/servlet/ProductDisplay?storeId=10151&amp;catalogId=10551&amp;langId=-1&amp;productId=8198552921665293693#specifications">Sony Vaio VGN-NR298E</a></li>
<li>CPU: <a title="Intel® Core™2 Duo Processor T5550 (2M Cache, 1.83 GHz, 667 MHz FSB) with SPEC Code(s) SLA4E" href="http://ark.intel.com/Product.aspx?id=32427&amp;processor=T5550&amp;spec-codes=SLA4E">Intel Core2Duo 1.83GHz</a></li>
<li>RAM: 4 GB DDR2</li>
<li>OS: <a title="Ubuntu Home Page | Ubuntu" href="http://www.ubuntu.com/">Ubuntu</a> 9.10 Karmic Koala Desktop Edition 64-bit</li>
<li>PHP: Custom build of <a title="PHP: Downloads" href="http://www.php.net/downloads.php#v5.3.2">5.3.2</a> (<a title="PHP 5.3 on Ubuntu — Third Party Code" href="http://thirdpartycode.com/2009/08/building-php-5-3-packages-on-ubuntu-9-04-jaunty-for-apache-2/">here&#8217;s how to create one</a>) using this configuration: <code>--without-pear --without-sqlite --without-sqlite3 --without-pdo-sqlite</code></li>
</ul>
<h4>Process</h4>
<p>Code used is located in <a title="elazar's spl-benchmarks at master - GitHub" href="http://github.com/elazar/spl-benchmarks">this GitHub repository</a>.</p>
<ol>
<li>Modify constant declarations at the top of runner.php as appropriate (50 executions per test were used to get the results below), then execute it from the command line. It will in turn execute each of the scripts in the tests directory, measuring execution time and memory usage. Results will be recorded in results/raw.csv.</li>
<li>To generate graphs, run graphs.php. This uses the <a title="eZ Components - Documentation - Tutorials" href="http://ezcomponents.org/docs/tutorials/Graph">Graph component</a> from the <a title="eZ Components" href="http://ezcomponents.org/">ezComponents library</a>. Resulting images will be written to the results directory in PNG format.</li>
</ol>
<h4>Results</h4>
<table>
<tbody>
<tr>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splfixedarray_eps.png" title="SplFixedArray - Executions Per Second"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splfixedarray_eps.png" alt="SplFixedArray - Executions Per Second" width="400" height="225" /></a></td>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splfixedarray_memory.png" title="SplFixedArray - Memory"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splfixedarray_memory.png" alt="SplFixedArray - Memory" width="400" height="225" /></a></td>
<td><strong>Code</strong></p>
<p><a title="tests/splfixedarray-array.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splfixedarray-array.php">Array</a></p>
<p><a title="tests/splfixedarray-spl.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splfixedarray-spl.php">SPL</a></td>
</tr>
<tr>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/spldoublylinkedlist_eps.png" title="SplDoublyLinkedList - Executions Per Second"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/spldoublylinkedlist_eps.png" alt="SplDoublyLinkedList - Executions Per Second" width="400" height="225" /></td>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/spldoublylinkedlist_memory.png" title="SplDoublyLinkedList - Memory"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/spldoublylinkedlist_memory.png" alt="SplDoublyLinkedList - Memory" width="400" height="225" /></td>
<td><strong>Code</strong></p>
<p><a title="tests/spldoublylinkedlist-array.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/spldoublylinkedlist-array.php">Array</a></p>
<p><a title="tests/spldoublylinkedlist-spl.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/spldoublylinkedlist-spl.php">SPL</a></td>
</tr>
<tr>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splstack_eps.png" title="SplStack - Executions Per Second"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splstack_eps.png" alt="SplStack - Executions Per Second" width="400" height="225" /></td>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splstack_memory.png" title="SplStack - Memory"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splstack_memory.png" alt="SplStack - Memory" width="400" height="225" /></td>
<td><strong>Code</strong></p>
<p><a title="tests/splstack-array.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splstack-array.php">Array</a></p>
<p><a title="tests/splstack-spl.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splstack-spl.php">SPL</a></td>
</tr>
<tr>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splqueue_eps.png" title="SplQueue - Executions Per Second"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splqueue_eps.png" alt="SplQueue - Executions Per Second" width="400" height="225" /></td>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splqueue_memory.png" title="SplQueue - Memory"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splqueue_memory.png" alt="SplQueue - Memory" width="400" height="225" /></td>
<td><strong>Code</strong></p>
<p><a title="tests/splqueue-array.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splqueue-array.php">Array</a></p>
<p><a title="tests/splqueue-spl.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splqueue-spl.php">SPL</a></td>
</tr>
<tr>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splminheap_eps.png" title="SplMinHeap - Executions Per Second"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splminheap_eps.png" alt="SplMinHeap - Executions Per Second" width="400" height="225" /></td>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splminheap_memory.png" title="SplMinHeap - Memory"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splminheap_memory.png" alt="SplMinHeap - Memory" width="400" height="225" /></td>
<td><strong>Code</strong></p>
<p><a title="tests/splminheap-array.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splminheap-array.php">Array</a></p>
<p><a title="tests/splminheap-spl.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splminheap-spl.php">SPL</a></td>
</tr>
<tr>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splpriorityqueue_eps.png" title="SplPriorityQueue - Executions Per Second"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splpriorityqueue_eps.png" alt="SplPriorityQueue - Executions Per Second" width="400" height="225" /></td>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splpriorityqueue_memory.png" title="SplPriorityQueue - Memory"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splpriorityqueue_memory.png" alt="SplPriorityQueue - Memory" width="400" height="225" /></td>
<td><strong>Code</strong></p>
<p><a title="tests/splpriorityqueue-array.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splpriorityqueue-array.php">Array</a></p>
<p><a title="tests/splpriorityqueue-spl.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splpriorityqueue-spl.php">SPL</a></td>
</tr>
<tr>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splobjectstorage_eps.png" title="SplObjectStorage - Executions Per Second"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splobjectstorage_eps.png" alt="SplObjectStorage - Executions Per Second" width="400" height="225" /></td>
<td><a href="http://github.com/elazar/spl-benchmarks/raw/master/results/splobjectstorage_memory.png" title="SplObjectStorage - Memory"><img src="http://github.com/elazar/spl-benchmarks/raw/master/results/splobjectstorage_memory.png" alt="SplObjectStorage - Memory" width="400" height="225" /></td>
<td><strong>Code</strong></p>
<p><a title="tests/splobjectstorage-array.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splobjectstorage-array.php">Array</a></p>
<p><a title="tests/splobjectstorage-spl.php at master from elazar's spl-benchmarks - GitHub" href="http://github.com/elazar/spl-benchmarks/blob/master/tests/splobjectstorage-spl.php">SPL</a></td>
</tr>
</tbody>
</table>
<h3>Other Data Structures</h3>
<p>If you have an interest in other data structure implementations for PHP outside of SPL offerings, check out the <a title="PECL :: Package :: bloomy" href="http://pecl.php.net/package/bloomy">bloomy</a> <a title="PECL :: The PHP Extension Community Library" href="http://pecl.php.net/">PECL</a> extension, which is an implementation of a <a title="Bloom filter - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Bloom_filter">bloom filter</a> created by <a title="Bloom Filters Quickie - Andrei Zmievski" href="http://zmievski.org/2009/04/bloom-filters-quickie">Andrei Zmievski</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/05/20/new-spl-features-in-php-5-3/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>&#8220;Web Scraping with PHP&#8221; Now Available!</title>
		<link>http://matthewturland.com/2010/04/20/web-scraping-with-php-now-available/</link>
		<comments>http://matthewturland.com/2010/04/20/web-scraping-with-php-now-available/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 13:00:58 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=275</guid>
		<description><![CDATA[What I&#8217;m announcing in this blog post has been in the works since early 2008 when I first pitched the idea. It was rejected by several major publishers who basically said the same thing: the idea was in too small of a niche or simply wasn&#8217;t marketable. php&#124;architect Press respectfully disagreed with them and decided [...]]]></description>
			<content:encoded><![CDATA[<p>What I&#8217;m announcing in this blog post has been in the works since early 2008 when I first pitched the idea. It was rejected by several major publishers who basically said the same thing: the idea was in too small of a niche or simply wasn&#8217;t marketable. <a title="books | php|architect" href="http://www.phparch.com/books/">php|architect Press</a> respectfully disagreed with them and decided to publish what is now <a title="php|architect’s Guide to Web Scraping with PHP | php|architect" href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/">a book</a> written by me that you can purchase.</p>
<p>It&#8217;s currently only available in PDF format due to a delay with the printer; a dead tree version should become available within the next few weeks. To my knowledge, there are plans to offer the paper and PDF bundle as has been done in the past with their other books.</p>
<p>Many of you reading this post probably have a personal to-do list of goals that you want to accomplish within your lifetime.  Becoming the published author of a book has been an item on my own list for some time, one that seeing this accomplishment through to its completion has helped me to cross out. I think anyone who has achieved a similar victory can relate to its significance, if only to oneself.</p>
<p>I do of course encourage you to purchase the book. I have no naïve notions that this will result in any substantial monetary return. Even if it did, that was not my reason for writing the book. I did it because I have knowledge that I believe is worth knowing and sharing with you. There were a number of people who contributed to this and I encourage you to read about them in the pages of the book that credit them.</p>
<p>It is also worth restating here that I have many family members, friends, and colleagues who helped to make this possible. There are too many to name, but I would like to thank each and every one of you from the bottom of my heart. I consider this a milestone in my life and my only hope is that it has as profound an effect on your life as it has on my own.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/04/20/web-scraping-with-php-now-available/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Leaving K-fx2</title>
		<link>http://matthewturland.com/2010/04/11/leaving-kfx2/</link>
		<comments>http://matthewturland.com/2010/04/11/leaving-kfx2/#comments</comments>
		<pubDate>Sun, 11 Apr 2010 14:35:51 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[Personal]]></category>
		<category><![CDATA[Career]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=268</guid>
		<description><![CDATA[There are times in life when things don&#8217;t go to plan. You may start a new job and then, after some time in the position, come to find that it&#8217;s just not a good fit for you. Regretfully, that&#8217;s been a recent experience of mine: I&#8217;ve decided to leave my position at K-fx2. I wish [...]]]></description>
			<content:encoded><![CDATA[<p>There are times in life when things don&#8217;t go to plan. You may <a title="Matthew Turland » Blog Archive » So Long, Blue Parabola" href="http://matthewturland.com/2010/01/21/so-long-blue-parabola/">start a new job</a> and then, after some time in the position, come to find that it&#8217;s just not a good fit for you. Regretfully, that&#8217;s been a recent experience of mine: I&#8217;ve decided to leave my position at <a title="Baton Rouge Web Design, Graphic Design, SEO, Software, Print/Video - Kfx² Incorporated - Design in Motion" href="http://www.kfx2.com/">K-fx<sup>2</sup></a>. I wish my coworkers well; they have my thanks for the experiences I had in my time there. If you are a PHP developer; live within the Lafayette, Baton Rouge, or New Orleans areas; and are looking for work, consider <a title="PHP Developer Job at K-fx2, Inc." href="http://jobs.zend.com/job/php-developer-baton-rouge-la-k-fx2-inc-2549343e39/?d=1&amp;excluded_view=1">joining them</a>.</p>
<p>As for what&#8217;s next for me, I&#8217;ve accepted a position with <a title="Synacor — Home" href="http://www.synacor.com/">Synacor</a> as a Senior Engineer on their <a title="Synacor — Products: Platform Solutions" href="http://www.synacor.com/products/view/platform_solutions/">Content Management Platform</a> team. I will be traveling to their offices in Buffalo, NY for orientation on April 26th. The team seems excited about me coming on board and I&#8217;m looking forward to meeting them in person. If you live near the area and would like to see me while I&#8217;m in town, just let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/04/11/leaving-kfx2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
