<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Matthew Turland &#187; cURL</title>
	<atom:link href="http://matthewturland.com/tag/curl/feed/" rel="self" type="application/rss+xml" />
	<link>http://matthewturland.com</link>
	<description></description>
	<lastBuildDate>Tue, 24 Jan 2012 04:03:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Webscrapers Mailing List</title>
		<link>http://matthewturland.com/2010/07/03/webscrapers-mailing-list/</link>
		<comments>http://matthewturland.com/2010/07/03/webscrapers-mailing-list/#comments</comments>
		<pubDate>Sat, 03 Jul 2010 12:25:57 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[cURL]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=377</guid>
		<description><![CDATA[Daniel Stenberg, one of the primary authors of the libcurl library on which the PHP cURL extension is based, was kind enough to comment on and clarify a recent blog post of mine regarding web scraping using the PHP and cURL. He later sent me a tweet to invite me to a new mailing list for [...]]]></description>
			<content:encoded><![CDATA[<p><a title="daniel.haxx.se" href="http://daniel.haxx.se">Daniel Stenberg</a>, one of the primary authors of the <a title="cURL and libcurl" href="http://curl.haxx.se">libcurl library</a> on which the <a title="PHP: cURL - Manual" href="http://us3.php.net/curl">PHP cURL extension</a> is based, was kind enough to <a title="Matthew Turland » Blog Archive » Gotcha on Scraping .NET Applications with PHP and cURL" href="http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/comment-page-1/#comment-5202">comment on</a> and clarify a <a title="Matthew Turland &Acirc;&raquo; Blog Archive &Acirc;&raquo; Gotcha on Scraping .NET Applications with PHP and cURL" href="http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/">recent blog post</a> of mine regarding web scraping using the PHP and cURL. He later sent me <a title="Twitter / Daniel Stenberg: @elazar Allow me to invite ..." href="http://twitter.com/bagder/status/17590025600">a tweet</a> to invite me to a new <a title="Webscrapers - The Community" href="http://webscrapers.haxx.se">mailing list</a> for web scraping enthusiasts just before <a title="Twitter / Daniel Stenberg: Everyone is welcome to joi ..." href="http://twitter.com/bagder/status/17590320446">tweeting a public invitation</a>. In addition to the mailing list itself, the web site also has links to books (including <a title="php|architect’s Guide to Web Scraping with PHP | php|architect" href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/">my book</a>) and popular tools related to the subject. I think this is awesome and I encourage anyone with an interest in web scraping, professional or recreational, to join.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/07/03/webscrapers-mailing-list/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Gotcha on Scraping .NET Applications with PHP and cURL</title>
		<link>http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/</link>
		<comments>http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 02:27:09 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[cURL]]></category>
		<category><![CDATA[Web Scraping]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=365</guid>
		<description><![CDATA[Obligatory pitch: Many other useful tidbits like this can be yours by purchasing my book, php&#124;architect&#8217;s Guide to Web Scraping with PHP. I recently wrote a PHP script to scrape data from a .NET application. In the process of developing this script, I noticed something interesting that I thought I&#8217;d share. In this case, I [...]]]></description>
			<content:encoded><![CDATA[<p><em>Obligatory pitch: Many other useful tidbits like this can be yours by purchasing my book, </em><a title="php|architect&amp;#8217;s Guide to Web Scraping with PHP | php|architect" href="http://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/"><em>php|architect&#8217;s Guide to Web Scraping with PHP</em></a><em>.</em></p>
<p>I recently wrote a PHP script to scrape data from a .NET application. In the process of developing this script, I noticed something interesting that I thought I&#8217;d share. In this case, I was using the <a title="PHP: cURL - Manual" href="http://php.net/manual/en/book.curl.php">cURL extension</a>, but the tip isn&#8217;t necessarily specific to that. One thing my script did was submit a <a title="POST (HTTP) - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/POST_(HTTP)">POST request</a> to simulate a form submission. The code looked something like the sample below.</p>
<pre class="brush: php; title: ; notranslate">$ch = curl_init();
curl_setopt_array($ch, array(
    CURLOPT_URL =&gt; 'http://...',
    CURLOPT_POST =&gt; true,
    CURLOPT_POSTFIELDS =&gt; array(
        'field1' =&gt; 'value1',
        // ...
    ),
    // ...
));</pre>
<p>The issue I ran into had to do with a behavior of the <code>CURLOPT_POSTFIELDS</code> setting that&#8217;s easy to overlook. This is a segment of its description from the <a title="PHP: curl_setopt - Manual" href="http://us3.php.net/curl_setopt">PHP manual page</a> for the <code>curl_setopt()</code> function.</p>
<blockquote><p>If <em>value</em> is an array, the <em>Content-Type</em> header will be set to <em>multipart/form-data</em>.</p></blockquote>
<p>If the form being submitted is not set to have an <code>enctype</code> attribute value of <code>multipart/form-data</code> in the form&#8217;s markup, .NET returns a 500-level HTTP response with no further information on what causes the error (for security purposes). This presumably happens because it&#8217;s expecting one value for the <code>Content-Type</code> request header and getting another.</p>
<p>Setting <code>CURLOPT_HEADER</code> and <code>CURLOPT_VERBOSE</code> to <code>true</code> helped to reveal that this was the issue. The fix is pretty simple: instead of passing the array itself for <code>CURLOPT_POSTFIELDS</code>, pass the result of wrapping it in a call to the  <code>http_build_query()</code> function (see its <a title="PHP: http_build_query - Manual" href="http://us.php.net/http_build_query">PHP manual page</a>). This converts it to a properly formatted query string, which causes cURL to use the default <code>Content-Type</code> header value of <code>application/x-www-form-urlencoded</code> instead.</p>
<p>Tools like <a title="Firebug" href="http://getfirebug.com">Firebug</a> can help you to examine requests made by a browser. Together with these settings for cURL, you can modify your script&#8217;s requests to match those of your browser as closely as possible, making gotchas like this less likely to trip you up.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/06/30/gotcha-on-scraping-net-applications-with-php-and-curl/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
