<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Matthew Turland &#187; Web Scraping</title>
	<atom:link href="http://matthewturland.com/tag/web-scraping/feed/" rel="self" type="application/rss+xml" />
	<link>http://matthewturland.com</link>
	<description></description>
	<lastBuildDate>Sun, 18 Jul 2010 14:29:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Webscrapers Mailing List</title>
		<link>http://matthewturland.com/2010/07/03/webscrapers-mailing-list/</link>
		<comments>http://matthewturland.com/2010/07/03/webscrapers-mailing-list/#comments</comments>
		<pubDate>Sat, 03 Jul 2010 12:25:57 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[cURL]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=377</guid>
		<description><![CDATA[Daniel Stenberg, one of the primary authors of the libcurl library on which the PHP cURL extension is based, was kind enough to comment on and clarify a recent blog post of mine regarding web scraping using the PHP and cURL. He later sent me a tweet to invite me to a new mailing list for [...]]]></description>
			<content:encoded><![CDATA[<p><a title="daniel.haxx.se" href="http://daniel.haxx.se">Daniel Stenberg</a>, one of the primary authors of the <a title="cURL and libcurl" href="http://curl.haxx.se">libcurl library</a> on which the <a title="PHP: cURL - Manual" href="http://php.net/curl">PHP cURL extension</a> is based, was kind enough to <a title="Matthew Turland » Blog Archive » Gotcha on Scraping .NET Applications with PHP and cURL" href="http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/comment-page-1/#comment-5202">comment on</a> and clarify a <a title="Matthew Turland » Blog Archive » Gotcha on Scraping .NET Applications with PHP and cURL" href="http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl">recent blog post</a> of mine regarding web scraping using the PHP and cURL. He later sent me <a title="Twitter / Daniel Stenberg: @elazar Allow me to invite ..." href="http://twitter.com/bagder/status/17590025600">a tweet</a> to invite me to a new <a title="Webscrapers - The Community" href="http://webscrapers.haxx.se">mailing list</a> for web scraping enthusiasts just before <a title="Twitter / Daniel Stenberg: Everyone is welcome to joi ..." href="http://twitter.com/bagder/status/17590320446">tweeting a public invitation</a>. In addition to the mailing list itself, the web site also has links to books (including <a title="php|architect’s Guide to Web Scraping with PHP | php|architect" href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/">my book</a>) and popular tools related to the subject. I think this is awesome and I encourage anyone with an interest in web scraping, professional or recreational, to join.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/07/03/webscrapers-mailing-list/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Gotcha on Scraping .NET Applications with PHP and cURL</title>
		<link>http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/</link>
		<comments>http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 02:27:09 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[cURL]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=365</guid>
		<description><![CDATA[Obligatory pitch: Many other useful tidbits like this can be yours by purchasing my book, php&#124;architect&#8217;s Guide to Web Scraping with PHP. I recently wrote a PHP script to scrape data from a .NET application. In the process of developing this script, I noticed something interesting that I thought I&#8217;d share. In this case, I [...]]]></description>
			<content:encoded><![CDATA[<p><em>Obligatory pitch: Many other useful tidbits like this can be yours by purchasing my book, </em><a title="php|architect&amp;#8217;s Guide to Web Scraping with PHP | php|architect" href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/"><em>php|architect&#8217;s Guide to Web Scraping with PHP</em></a><em>.</em></p>
<p>I recently wrote a PHP script to scrape data from a .NET application. In the process of developing this script, I noticed something interesting that I thought I&#8217;d share. In this case, I was using the <a title="PHP: cURL - Manual" href="http://php.net/manual/en/book.curl.php">cURL extension</a>, but the tip isn&#8217;t necessarily specific to that. One thing my script did was submit a <a title="POST (HTTP) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/POST_(HTTP)">POST request</a> to simulate a form submission. The code looked something like the sample below.</p>
<pre class="brush: php;">$ch = curl_init();
curl_setopt_array($ch, array(
    CURLOPT_URL =&gt; 'http://...',
    CURLOPT_POST =&gt; true,
    CURLOPT_POSTFIELDS =&gt; array(
        'field1' =&gt; 'value1',
        // ...
    ),
    // ...
));</pre>
<p>The issue I ran into had to do with a behavior of the <code>CURLOPT_POSTFIELDS</code> setting that&#8217;s easy to overlook. This is a segment of its description from the <a title="PHP: curl_setopt - Manual" href="http://php.net/curl_setopt">PHP manual page</a> for the <code>curl_setopt()</code> function.</p>
<blockquote><p>If <em>value</em> is an array, the <em>Content-Type</em> header will be set to <em>multipart/form-data</em>.</p></blockquote>
<p>If the form being submitted is not set to have an <code>enctype</code> attribute value of <code>multipart/form-data</code> in the form&#8217;s markup, .NET returns a 500-level HTTP response with no further information on what causes the error (for security purposes). This presumably happens because it&#8217;s expecting one value for the <code>Content-Type</code> request header and getting another.</p>
<p>Setting <code>CURLOPT_HEADER</code> and <code>CURLOPT_VERBOSE</code> to <code>true</code> helped to reveal that this was the issue. The fix is pretty simple: instead of passing the array itself for <code>CURLOPT_POSTFIELDS</code>, pass the result of wrapping it in a call to the  <code>http_build_query()</code> function (see its <a title="PHP: http_build_query - Manual" href="http://php.net/http_build_query">PHP manual page</a>). This converts it to a properly formatted query string, which causes cURL to use the default <code>Content-Type</code> header value of <code>application/x-www-form-urlencoded</code> instead.</p>
<p>Tools like <a title="Firebug" href="http://getfirebug.com">Firebug</a> can help you to examine requests made by a browser. Together with these settings for cURL, you can modify your script&#8217;s requests to match those of your browser as closely as possible, making gotchas like this less likely to trip you up.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Renaming a DOMNode in PHP</title>
		<link>http://matthewturland.com/2010/02/09/renaming-a-domnode-in-php/</link>
		<comments>http://matthewturland.com/2010/02/09/renaming-a-domnode-in-php/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 01:07:14 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[DOM]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[Web Scraping]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=218</guid>
		<description><![CDATA[A recent work assignment had me using PHP to pull HTML data into a DOMDocument instance and renaming some elements, such as b to strong or i to em. As it turns out, renaming elements using the DOM extension is rather tedious. Version 3 of the DOM standard introduces a renameNode() method, but the PHP [...]]]></description>
			<content:encoded><![CDATA[<p>A recent work assignment had me using PHP to pull HTML data into a <code><a title="PHP: DOMDocument - Manual" href="http://php.net/manual/en/class.domdocument.php">DOMDocument</a></code> instance and renaming some elements, such as <a title="HTML element - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/HTML_element#Presentation">b to strong or i to em</a>. As it turns out, renaming elements using the DOM extension is rather tedious.</p>
<p>Version 3 of the DOM standard introduces a <code><a title="Document Object Model Core" href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#Document3-renameNode">renameNode()</a></code> method, but the PHP DOM extension doesn&#8217;t currently support it.</p>
<p>The <code><a title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php#domnode.props.nodename">$nodeName</a></code> property of the <code><a title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php">DOMNode</a></code> class is read-only, so it can&#8217;t be changed that way.</p>
<p>A node can be created with a different name in the same document, but if you specify a value to go along with it, any entities in that value are automatically encoded, so it&#8217;s not possible to pass in the intended inner content of a node if it contains other nodes.</p>
<p>The only method I&#8217;ve found that works is to replicate the attributes and child nodes of the original node. Attributes are fairly easy, but I ran into an issue replicating children where only the first child of any given node was replicated within its intended replacement and the remaining children were omitted. Here&#8217;s the original code that was exhibiting this behavior.</p>
<pre class="brush: php;">foreach ($oldNode-&gt;childNodes as $childNode) {
    $newNode-&gt;appendChild($childNode);
}</pre>
<p>The reason for this behavior is that the <code><a title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php#domnode.props.childnodes">$childNodes</a></code> property of <code>$oldNode</code> is implicitly modified when <code>$childNode</code> is transferred from it to <code>$newNode</code>, so the internal pointer of <code>$childNodes</code> to the next child in the list is no longer accurate.</p>
<p>To get around this, I took advantage of the fact that any node with any child nodes will always have a <code><a title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php#domnode.props.firstchild">$firstChild</a></code> property pointing to the first one. The modified code that takes this approach is below and has the behavior I originally set out to implement.</p>
<pre class="brush: php;">while ($oldNode-&gt;firstChild) {
    $newNode-&gt;appendChild($oldNode-&gt;firstChild);
}</pre>
<p>If you&#8217;re curious, below is the full code segment for renaming a node.</p>
<pre class="brush: php;">$newNode = $oldNode-&gt;ownerDocument-&gt;createElement('new_element_name');
if ($oldNode-&gt;attributes-&gt;length) {
    foreach ($oldNode-&gt;attributes as $attribute) {
        $newNode-&gt;setAttribute($attribute-&gt;nodeName, $attribute-&gt;nodeValue);
    }
}
while ($oldNode-&gt;firstChild) {
    $newNode-&gt;appendChild($oldNode-&gt;firstChild);
}
$oldNode-&gt;ownerDocument-&gt;replaceChild($newNode, $oldNode);</pre>
<p>Another potential &#8220;gotcha&#8221; is the argument order of the <code><a title="PHP: DOMNode::replaceChild - Manual" href="http://php.net/manual/en/domnode.replacechild.php">replaceChild()</a></code> method, which is the new node followed by the old node rather than the reverse that most people might expect. Thanks to <a title="joshua may (notjosh) on Twitter" href="http://twitter.com/notjosh">Joshua May</a> for pointing that one out to me; I might never have understood why I was getting a <a title="PHP: DOMNode::appendChild - Manual" href="http://php.net/manual/en/domnode.appendchild.php#domnode.appendchild.errors">&#8220;Not Found Error&#8221;</a> <code><a title="PHP: DOMException - Manual" href="http://php.net/manual/en/class.domexception.php">DOMException</a></code> otherwise.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/02/09/renaming-a-domnode-in-php/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Webcast Slides</title>
		<link>http://matthewturland.com/2009/03/05/webcast-slides/</link>
		<comments>http://matthewturland.com/2009/03/05/webcast-slides/#comments</comments>
		<pubDate>Thu, 05 Mar 2009 14:23:19 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[HTTP]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Hard to believe it&#8217;s been that long, but two months ago I mentioned the free webcast series sponsored by Adobe and leading up to php&#124;tek 2009. I&#8217;ve posted the slides from my webcast on February 27. If you weren&#8217;t able to make it, I gave an introduction to what web scraping is, basic details of [...]]]></description>
			<content:encoded><![CDATA[<p>Hard to believe it&#8217;s been that long, but two months ago <a href="http://matthewturland.com/2009/01/08/php-tek-2009-webcast-series" title="i should be coding :: php|tek 2009 webcast series">I mentioned</a> the <a href="http://tek.mtacon.com/c/s/free-webcast-series" title="php|tek 2009 - PHP Conference in Chicago, IL">free webcast series</a> sponsored by <a href="http://www.adobe.com/" title="Adobe">Adobe</a> and leading up to <a href="http://tek.mtacon.com" title="php|tek 2009 - PHP Conference in Chicago, IL">php|tek 2009</a>.</p>
<p>I&#8217;ve posted the <a href="http://www.slideshare.net/tobias382/when-rss-fails-web-scraping-with-http" title="When RSS Fails: Web Scraping with HTTP">slides from my webcast</a> on February 27. If you weren&#8217;t able to make it, I gave an introduction to <a href="http://en.wikipedia.org/wiki/Web_scraping" title="Web scraping - Wikipedia, the free encyclopedia">what web scraping is</a>, basic details of the <a href="http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol" title="Hypertext Transfer Protocol - Wikipedia, the free encyclopedia">HTTP protocol</a>, available resources for developing web scraping applications, and best practices. I know there are plans to make the audio from the webcast and I will update this post with a link once it becomes available.</p>
<p>If the slides and audio aren&#8217;t enough for you, I will in all likelihood be giving an extended version of the presentation that includes both retrieval and analysis as part of the Unconference event at <a href="http://tek.mtacon.com" title="php|tek 2009 - PHP Conference in Chicago, IL">php|tek</a>. Look forward to seeing you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2009/03/05/webcast-slides/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How-To (and How-Not-To) on Web Scraping</title>
		<link>http://matthewturland.com/2008/03/12/scraping-html-with-dom/</link>
		<comments>http://matthewturland.com/2008/03/12/scraping-html-with-dom/#comments</comments>
		<pubDate>Wed, 12 Mar 2008 23:50:27 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[A friend of mine who shall remain nameless pointed a post out to me on the PHP DZone web site recently. Noting that the article&#8217;s content was misinformed at best and downright ignorant at worst, even when examining it sheerly from the author&#8217;s knowledge of PHP as a language, this friend asked that I set [...]]]></description>
			<content:encoded><![CDATA[<p>A friend of mine who shall remain nameless pointed <a title="Writing Website Scrapers in PHP | PHP Zone" href="http://php.dzone.com/news/writing-website-scrapers-php">a post</a> out to me on the <a title="PHP Zone | Community for PHP users and developers" href="http://php.dzone.com">PHP DZone</a> web site recently. Noting that the article&#8217;s content was misinformed at best and downright ignorant at worst, even when examining it sheerly from the author&#8217;s knowledge of PHP as a language, this friend asked that I set the author straight.</p>
<p>I gladly obliged with <a title="Writing Website Scrapers in PHP | PHP Zone" href="http://php.dzone.com/news/writing-website-scrapers-php#comment-1497">a comment</a> on the post, having become somewhat of an authority on the application topic myself. As much of an unorthodox practice as web scraping may be, there are some methodologies for it that are obviously better than others. The aforementioned post illustrates a lot of the ones to avoid, and my arguments against them.</p>
<p>Later, I randomly encountered <a title="xml.lt: Blog: Scraping HTML with DOM" href="http://www.xml.lt/Blog/2008/03/11/Scraping+html+with+DOM">a post</a> on the blog at <a title="xml.lt: Blog" href="http://www.xml.lt/Blog/">xml.lt</a> on the topic of web scraping using the <a title="PHP: DOM - Manual" href="http://php.net/dom">DOM extension</a>. This article showcases recommended practices and reasoned arguments against bad (and unfortunately common) alternatives. The author comes across as being significantly more informed on both the language and the application in the article&#8217;s content and code examples.</p>
<p>If you&#8217;re looking for references on topic of web scraping with PHP, there&#8217;s always the article I wrote for the <a title="php|architect / December 2007&mdash; php|architect, PHP Magazine, PHP Conferences, PHP Books" href="http://www.phparch.com/c/magazine/issue/63">December 2007 issue</a> of <a title="Welcome&mdash; php|architect, PHP Magazine, PHP Conferences, PHP Books" href="http://www.phparch.com">php|architect magazine</a>, of which you can still purchase an electronic copy in PDF format. At some point, I also hope to write a short book on the subject. Until then, if you have related questions, you can generally reach me in the #phpc channel on Freenode, under the nick Elazar. I&#8217;m always glad to give out advice on web scraping and PHP, as I&#8217;m sure my good friend <a title="acloudtree" href="http://acloudtree.com">Jared Folkins</a> (who is also my &#8220;Little Sis&#8221; from the <a title="PHPWomen" href="http://phpwomen.org">PHPWomen</a> <a title="PHP Women: Member Benefits =&gt; Big Sis - Little Sis - Mentoring" href="http://www.phpwomen.org/forum/index.php?t=msg&amp;th=190&amp;start=0&amp;S=76901096cc69bea483d105b60f546fe3">Big Sis/Little Sis mentoring program</a>) will attest.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2008/03/12/scraping-html-with-dom/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Book Review: PHP Web 2.0 Mashup Projects</title>
		<link>http://matthewturland.com/2008/01/24/book-review-php-web-20-mashup-projects/</link>
		<comments>http://matthewturland.com/2008/01/24/book-review-php-web-20-mashup-projects/#comments</comments>
		<pubDate>Thu, 24 Jan 2008 13:35:13 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[Books]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Podcasts]]></category>
		<category><![CDATA[Web Scraping]]></category>
		<category><![CDATA[Web Services]]></category>

		<guid isPermaLink="false">http://ishouldbecoding.com/2008/01/05/book-review-php-web-20-mashup-projects/</guid>
		<description><![CDATA[You can find this review in podcast form on the Zend Developer Zone PHP Abstract Podcast. I received an e-mail recently from a very nice gentleman at Packt Publishing, a UK-based publishing company focused on providing hands-on application-oriented publications to IT professionals, particularly those specific to open source technologies. Their representative asked if I would [...]]]></description>
			<content:encoded><![CDATA[<p>You can find this review in podcast form on the <a href="http://devzone.zend.com/article/3006-PHP-Abstract-Podcast-Episode-33-Book-Review-PHP-Web-2.0-Mashup-Projects" title="PHP Abstract Podcast Episode 33: Book Review: PHP Web 2.0 Mashup Projects">Zend Developer Zone PHP Abstract Podcast</a>.</p>
<p>I received an e-mail recently from a very nice gentleman at <a href="http://packtpub.com" alt="Packt Publishing">Packt Publishing</a>, a UK-based publishing company focused on providing hands-on application-oriented publications to IT professionals, particularly those specific to open source technologies. Their representative asked if I would be willing to review one of their books, namely <a href="http://packtpub.com/php-web-20-mashups/book">PHP Web 2.0 Mashup Projects</a> by Shu-Wai Chow. Reviewing books is not something I had done before, so I thought I would give it a good old-fashioned college try.</p>
<p>In a supersaturated market, it is difficult to make an impression with a PHP book these days. The books of real value are those that focus on ways to apply the language to real world problems. These books delve into the depths of a particular application domain, showing PHP code and outlining design principles along the way. They are useful to current and prospective PHP programmers alike because they can introduce both not to PHP itself, but to an existing class of problems and how PHP can be applied to solve them. PHP Web 2.0 Mashup Projects is one of these books.</p>
<p>Most technology-related books on the shelves are several inches thick and an  inherently daunting chore to sift through. Luckily, this book is not one of those. Do not let the size fool you, though; it is positively packed with useful information. It hits the high points of each topic it covers, giving you enough in the way of code samples and step-by-step explanations to get started, as well as resources to help you get better acquainted with topics that might be of particular interest to you.</p>
<p>The book is divided into six chapters, each of which covers a set of particular protocols, data formats, and APIs for acquiring and processing data in order to create a particular mashup application. These projects include:</p>
<ul>
<li>A search engine to find products on Amazon by their Universal Product Code</li>
<li>A search engine to combine results from MSN and Yahoo!</li>
<li>A video jukebox that pulls songs from Last.fm and videos from YouTube</li>
<li>A traffic incident reporting application that sends SMS alerts</li>
<li>An illustrated tube station line map using Google Maps and Flickr for related photos</li>
</ul>
<p>The book&#8217;s structure and layout make it easy to follow, whether you prefer to read it linearly or jump around to specific sections. It is an excellent reference that I can see myself returning to time and time again.</p>
<p>One of the strengths of the book is that it has a very wide base of coverage.  It starts by introducing basics in interacting with web services and extracting the desired data from their responses using core PHP libraries. The REST, XML-RPC, and SOAP protocols and the WSDL standard are all covered in enough depth to get you started, so you can work with a web service regardless of the protocol or protocols it offers. The author does an excellent job of selecting example web services and data standards from large and well-known to small and obscure.  For real world APIs, you will find the likes of Amazon, YouTube, Google, and Flickr, as well as sources that might not be household names, such as the Internet UPC Database. Data standards include general formats like XML, RDF, and JSON and more specialized formats like RSS and XSPF.</p>
<p>Another strength is that the book encourages good principles from the start. It advocates object-oriented design principles for code reuse and a DRY philosophy. It suggests using third-party libraries such as those in PEAR in order to avoid unnecessary reinvention of the wheel, but still shows you how to roll your own if and when it becomes necessary. The books also covers usability, particularly in the last chapter when it discusses AJAX and race conditions, and pays special attention to application security, an area of increasing concern in web applications. Unlike some books, this one includes tips for development outside its own showcased projects to alleviate you from having to spend your own time troubleshooting common issues or digging for solutions to &#8220;gotcha&#8221; situations.</p>
<p>And last but certainly not least, the book demonstrates that sometimes you have to be resourceful in locating and acquiring your data, particularly in Chapter 5 where one of my own areas of interest, web scraping, is covered. The topic is explained in plain language and supplemented with examples walking you through exactly how it can be used to acquire data for your own mashups. Web scraping is not a frequently broached topic and I applaud the author for making a point to include it. I believe it is a genuinely useful methodology that can help in data acquisition when no other options are available.</p>
<p>I cannot give the book an entirely glowing review, though. There are some errata present, both in content and code samples. Most are small, but some are enough to throw off a reader not already familiar with the material being covered. I&#8217;ve submitted some of these via the publisher&#8217;s web site already, though I have yet to receive any related communications or see them show up on the web site at the time that I write this review. These issues are able to be corrected, though, and the quality of the book&#8217;s content outshines them.</p>
<p>Overall, PHP Web 2.0 Mashup Projects is an excellent example of creativity in finding new ways to aggregate data sets in useful combinations. It is a testament to the possibilities of the internet when access to data is opened up and freedom to use that data enables developers to create exciting and inspiring new solutions. Mashups show the internet&#8217;s potential increasing in leaps and bounds and this book can get you on your way to contributing to their future development.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2008/01/24/book-review-php-web-20-mashup-projects/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Web Scraping Article Published</title>
		<link>http://matthewturland.com/2007/12/20/web-scraping-article-published/</link>
		<comments>http://matthewturland.com/2007/12/20/web-scraping-article-published/#comments</comments>
		<pubDate>Thu, 20 Dec 2007 15:57:18 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false">http://ishouldbecoding.com/2007/12/20/web-scraping-article-published/</guid>
		<description><![CDATA[Just a quick post to announce (albeit a little late) the December 2007 issue of php&#124;architect, which includes my article on web scraping. Please buy a copy, give it a read, and feel free to post comments on the forum thread for the article. I&#8217;d love to hear some reader feedback! You may noticed that [...]]]></description>
			<content:encoded><![CDATA[<p>Just a quick post to announce (albeit a little late) the <a href="http://www.phparch.com/c/magazine/issue/63" title="php|architect / December 2007">December 2007 issue of php|architect</a>, which includes my article on web scraping. Please buy a copy, give it a read, and feel free to post comments on the <a href="http://forum.phparch.com/421" title="Web Scraping">forum thread for the article</a>. I&#8217;d love to hear some reader feedback!</p>
<p>You may noticed that I&#8217;ve added a new page for <a href="http://matthewturland.com/publications" title="i should be coding :: publications">publications</a>. This will become the home for any content I produce that gains any sort of recognition, be it a podcast, article, book review, presentation slides, or what have you. Anytime anything new goes there, I&#8217;ll try to make a point to write a post about it.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2007/12/20/web-scraping-article-published/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Article for php&#124;architect</title>
		<link>http://matthewturland.com/2007/11/18/article-for-phparchitect/</link>
		<comments>http://matthewturland.com/2007/11/18/article-for-phparchitect/#comments</comments>
		<pubDate>Sun, 18 Nov 2007 13:32:52 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false">http://ishouldbecoding.com/2007/11/18/article-for-phparchitect/</guid>
		<description><![CDATA[One of the things that has kept me away from my blog for the past few weeks is an article I&#8217;ve been working on for php&#124;architect magazine. It should be included in the December 2007 issue and is entitled &#8220;Web Scraping.&#8221; So, if the topic interests you, keep an eye out for it. If you [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things that has kept me away from my blog for the past few weeks  is an article I&#8217;ve been working on for <a href="http://phparch.com" title="php|architect, PHP Magazine, PHP Conferences, PHP Books">php|architect magazine</a>. It should be included in the December 2007 issue and is entitled &#8220;Web Scraping.&#8221; So, if the topic interests you, keep an eye out for it. If you aren&#8217;t sure if the topic interests you, you can check out <a href="http://www.phppodcasts.com/2007/10/18/php-abstract-podcast-episode-22-screen-scraping/" title="PHP Abstract Podcast Episode 22: Screen Scraping | PHP Podcasts">my episode</a> on the <a href="http://devzone.zend.com" title="Zend Developer Zone" target="_blank">Zend Developer Zone</a> <a href="http://devzone.zend.com/tag/PHP_Abstract" title="PHP_Abstract">PHP Abstract podcast</a> for a brief high-level description. I&#8217;ll probably post about this again once the issue comes out, but I thought I&#8217;d give a heads up to anyone out there that might buy issues of the magazine on an issue-by-issue basis.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2007/11/18/article-for-phparchitect/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PHP Abstract Episode 22: Screen Scraping</title>
		<link>http://matthewturland.com/2007/10/18/php-abstract-episode-22-screen-scraping/</link>
		<comments>http://matthewturland.com/2007/10/18/php-abstract-episode-22-screen-scraping/#comments</comments>
		<pubDate>Thu, 18 Oct 2007 18:38:03 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Podcasts]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false">http://ishouldbecoding.com/2007/10/18/php-abstract-episode-22-screen-scraping/</guid>
		<description><![CDATA[Check out the latest PHP Abstract podcast (episode 22) from Dev Zone. I&#8217;m the guest speaker! The podcast is on web scraping, a practice in which I have (unfortunately) become somewhat proficient. Leave a comment on Dev Zone or on this entry and let me know what you think!]]></description>
			<content:encoded><![CDATA[<p>Check out the latest <a href="http://www.phppodcasts.com/category/phpabstract/" title="PHP Abstract | PHP Podcasts">PHP Abstract</a> podcast (<a href="http://devzone.zend.com/article/2631-PHP-Abstract-Episode-22-Screen-Scraping" title="PHP Abstract Episode 22: Screen Scraping">episode 22</a>) from <a href="http://devzone.zend.com" title="Zend Developer Zone">Dev Zone</a>. I&#8217;m the guest speaker! The podcast is on web scraping, a practice in which I have (unfortunately) become somewhat proficient. Leave a comment on Dev Zone or on this entry and let me know what you think!</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2007/10/18/php-abstract-episode-22-screen-scraping/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
