<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
><channel><title>Matthew Turland &#187; HTML</title> <atom:link href="http://matthewturland.com/tag/html/feed/" rel="self" type="application/rss+xml" /><link>http://matthewturland.com</link> <description></description> <lastBuildDate>Tue, 15 May 2012 02:29:07 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.2</generator> <item><title>Renaming a DOMNode in PHP</title><link>http://matthewturland.com/2010/02/09/renaming-a-domnode-in-php/</link> <comments>http://matthewturland.com/2010/02/09/renaming-a-domnode-in-php/#comments</comments> <pubDate>Wed, 10 Feb 2010 01:07:14 +0000</pubDate> <dc:creator>Matthew Turland</dc:creator> <category><![CDATA[PHP]]></category> <category><![CDATA[DOM]]></category> <category><![CDATA[HTML]]></category> <category><![CDATA[Web Scraping]]></category> <category><![CDATA[XML]]></category><guid
isPermaLink="false">http://matthewturland.com/?p=218</guid> <description><![CDATA[A recent work assignment had me using PHP to pull HTML data into a DOMDocument instance and renaming some elements, such as b to strong or i to em. As it turns out, renaming elements using the DOM extension is rather tedious. Version 3 of the DOM standard introduces a renameNode() method, but the PHP [...]]]></description> <content:encoded><![CDATA[<p>A recent work assignment had me using PHP to pull HTML data into a <code><a
title="PHP: DOMDocument - Manual" href="http://php.net/manual/en/class.domdocument.php">DOMDocument</a></code> instance and renaming some elements, such as <a
title="HTML element - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/HTML_element#Presentation">b to strong or i to em</a>. As it turns out, renaming elements using the DOM extension is rather tedious.</p><p>Version 3 of the DOM standard introduces a <code><a
title="Document Object Model Core" href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#Document3-renameNode">renameNode()</a></code> method, but the PHP DOM extension doesn&#8217;t currently support it.</p><p>The <code><a
title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php#domnode.props.nodename">$nodeName</a></code> property of the <code><a
title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php">DOMNode</a></code> class is read-only, so it can&#8217;t be changed that way.</p><p>A node can be created with a different name in the same document, but if you specify a value to go along with it, any entities in that value are automatically encoded, so it&#8217;s not possible to pass in the intended inner content of a node if it contains other nodes.</p><p>The only method I&#8217;ve found that works is to replicate the attributes and child nodes of the original node. Attributes are fairly easy, but I ran into an issue replicating children where only the first child of any given node was replicated within its intended replacement and the remaining children were omitted. Here&#8217;s the original code that was exhibiting this behavior.</p><pre class="brush: php; title: ; notranslate">foreach ($oldNode-&gt;childNodes as $childNode) {
    $newNode-&gt;appendChild($childNode);
}</pre><p>The reason for this behavior is that the <code><a
title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php#domnode.props.childnodes">$childNodes</a></code> property of <code>$oldNode</code> is implicitly modified when <code>$childNode</code> is transferred from it to <code>$newNode</code>, so the internal pointer of <code>$childNodes</code> to the next child in the list is no longer accurate.</p><p>To get around this, I took advantage of the fact that any node with any child nodes will always have a <code><a
title="PHP: DOMNode - Manual" href="http://php.net/manual/en/class.domnode.php#domnode.props.firstchild">$firstChild</a></code> property pointing to the first one. The modified code that takes this approach is below and has the behavior I originally set out to implement.</p><pre class="brush: php; title: ; notranslate">while ($oldNode-&gt;firstChild) {
    $newNode-&gt;appendChild($oldNode-&gt;firstChild);
}</pre><p>If you&#8217;re curious, below is the full code segment for renaming a node.</p><pre class="brush: php; title: ; notranslate">$newNode = $oldNode-&gt;ownerDocument-&gt;createElement('new_element_name');
if ($oldNode-&gt;attributes-&gt;length) {
    foreach ($oldNode-&gt;attributes as $attribute) {
        $newNode-&gt;setAttribute($attribute-&gt;nodeName, $attribute-&gt;nodeValue);
    }
}
while ($oldNode-&gt;firstChild) {
    $newNode-&gt;appendChild($oldNode-&gt;firstChild);
}
$oldNode-&gt;ownerDocument-&gt;replaceChild($newNode, $oldNode);</pre><p>Another potential &#8220;gotcha&#8221; is the argument order of the <code><a
title="PHP: DOMNode::replaceChild - Manual" href="http://php.net/manual/en/domnode.replacechild.php">replaceChild()</a></code> method, which is the new node followed by the old node rather than the reverse that most people might expect. Thanks to <a
title="joshua may (notjosh) on Twitter" href="http://twitter.com/notjosh">Joshua May</a> for pointing that one out to me; I might never have understood why I was getting a <a
title="PHP: DOMNode::appendChild - Manual" href="http://php.net/manual/en/domnode.appendchild.php#domnode.appendchild.errors">&#8220;Not Found Error&#8221;</a> <code><a
title="PHP: DOMException - Manual" href="http://php.net/manual/en/class.domexception.php">DOMException</a></code> otherwise.</p> ]]></content:encoded> <wfw:commentRss>http://matthewturland.com/2010/02/09/renaming-a-domnode-in-php/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using apc
Page Caching using disk: enhanced
Database Caching 14/17 queries in 0.006 seconds using apc

Served from: matthewturland.com @ 2012-05-21 17:16:18 -->
