<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Matthew Turland &#187; HTML</title>
	<atom:link href="http://matthewturland.com/tag/html/feed/" rel="self" type="application/rss+xml" />
	<link>http://matthewturland.com</link>
	<description></description>
	<lastBuildDate>Tue, 24 Jan 2012 04:03:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Renaming a DOMNode in PHP</title>
		<link>http://matthewturland.com/2010/02/09/renaming-a-domnode-in-php/</link>
		<comments>http://matthewturland.com/2010/02/09/renaming-a-domnode-in-php/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 01:07:14 +0000</pubDate>
		<dc:creator>Matthew Turland</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[DOM]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[Web Scraping]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://matthewturland.com/?p=218</guid>
		<description><![CDATA[A recent work assignment had me using PHP to pull HTML data into a DOMDocument instance and renaming some elements, such as b to strong or i to em. As it turns out, renaming elements using the DOM extension is rather tedious. Version 3 of the DOM standard introduces a renameNode() method, but the PHP [...]]]></description>
			<content:encoded><![CDATA[<p>A recent work assignment had me using PHP to pull HTML data into a <code><a title="PHP: DOMDocument - Manual" href="http://php.net/manual/en/class.domdocument.php">DOMDocument</a></code> instance and renaming some elements, such as <a title="HTML element - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/HTML_element#Presentation">b to strong or i to em</a>. As it turns out, renaming elements using the DOM extension is rather tedious.</p>
<p>Version 3 of the DOM standard introduces a <code><a title="Document Object Model Core" href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#Document3-renameNode">renameNode()</a></code> method, but the PHP DOM extension doesn&#8217;t currently support it.</p>
<p>The <code><a title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php#domnode.props.nodename">$nodeName</a></code> property of the <code><a title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php">DOMNode</a></code> class is read-only, so it can&#8217;t be changed that way.</p>
<p>A node can be created with a different name in the same document, but if you specify a value to go along with it, any entities in that value are automatically encoded, so it&#8217;s not possible to pass in the intended inner content of a node if it contains other nodes.</p>
<p>The only method I&#8217;ve found that works is to replicate the attributes and child nodes of the original node. Attributes are fairly easy, but I ran into an issue replicating children where only the first child of any given node was replicated within its intended replacement and the remaining children were omitted. Here&#8217;s the original code that was exhibiting this behavior.</p>
<pre class="brush: php; title: ; notranslate">foreach ($oldNode-&gt;childNodes as $childNode) {
    $newNode-&gt;appendChild($childNode);
}</pre>
<p>The reason for this behavior is that the <code><a title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php#domnode.props.childnodes">$childNodes</a></code> property of <code>$oldNode</code> is implicitly modified when <code>$childNode</code> is transferred from it to <code>$newNode</code>, so the internal pointer of <code>$childNodes</code> to the next child in the list is no longer accurate.</p>
<p>To get around this, I took advantage of the fact that any node with any child nodes will always have a <code><a title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php#domnode.props.firstchild">$firstChild</a></code> property pointing to the first one. The modified code that takes this approach is below and has the behavior I originally set out to implement.</p>
<pre class="brush: php; title: ; notranslate">while ($oldNode-&gt;firstChild) {
    $newNode-&gt;appendChild($oldNode-&gt;firstChild);
}</pre>
<p>If you&#8217;re curious, below is the full code segment for renaming a node.</p>
<pre class="brush: php; title: ; notranslate">$newNode = $oldNode-&gt;ownerDocument-&gt;createElement('new_element_name');
if ($oldNode-&gt;attributes-&gt;length) {
    foreach ($oldNode-&gt;attributes as $attribute) {
        $newNode-&gt;setAttribute($attribute-&gt;nodeName, $attribute-&gt;nodeValue);
    }
}
while ($oldNode-&gt;firstChild) {
    $newNode-&gt;appendChild($oldNode-&gt;firstChild);
}
$oldNode-&gt;ownerDocument-&gt;replaceChild($newNode, $oldNode);</pre>
<p>Another potential &#8220;gotcha&#8221; is the argument order of the <code><a title="PHP: DOMNode::replaceChild - Manual" href="http://php.net/manual/en/domnode.replacechild.php">replaceChild()</a></code> method, which is the new node followed by the old node rather than the reverse that most people might expect. Thanks to <a title="joshua may (notjosh) on Twitter" href="http://twitter.com/notjosh">Joshua May</a> for pointing that one out to me; I might never have understood why I was getting a <a title="PHP: DOMNode::appendChild - Manual" href="http://php.net/manual/en/domnode.appendchild.php#domnode.appendchild.errors">&#8220;Not Found Error&#8221;</a> <code><a title="PHP: DOMException - Manual" href="http://php.net/manual/en/class.domexception.php">DOMException</a></code> otherwise.</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewturland.com/2010/02/09/renaming-a-domnode-in-php/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
