Archive for the ‘PHP’ Category.

“Web Scraping with PHP” Now Available!

What I’m announcing in this blog post has been in the works since early 2008 when I first pitched the idea. It was rejected by several major publishers who basically said the same thing: the idea was in too small of a niche or simply wasn’t marketable. php|architect Press respectfully disagreed with them and decided to publish what is now a book written by me that you can purchase.

It’s currently only available in PDF format due to a delay with the printer; a dead tree version should become available within the next few weeks. To my knowledge, there are plans to offer the paper and PDF bundle as has been done in the past with their other books.

Many of you reading this post probably have a personal to-do list of goals that you want to accomplish within your lifetime.  Becoming the published author of a book has been an item on my own list for some time, one that seeing this accomplishment through to its completion has helped me to cross out. I think anyone who has achieved a similar victory can relate to its significance, if only to oneself.

I do of course encourage you to purchase the book. I have no naïve notions that this will result in any substantial monetary return. Even if it did, that was not my reason for writing the book. I did it because I have knowledge that I believe is worth knowing and sharing with you. There were a number of people who contributed to this and I encourage you to read about them in the pages of the book that credit them.

It is also worth restating here that I have many family members, friends, and colleagues who helped to make this possible. There are too many to name, but I would like to thank each and every one of you from the bottom of my heart. I consider this a milestone in my life and my only hope is that it has as profound an effect on your life as it has on my own.

Models in Zend Framework

A question that frequently comes up in my interactions with other developers about Zend Framework is how to approach designing models. There’s a small collection of resources and advice that I generally give on the subject, so I thought I’d write up a blog post to give people an easy place to access it all.

More than one way to skin a cat

First, there is no one “correct” way to design a model. If there were, the framework would probably have an actual model component. It doesn’t, and as Bill Karwin — former Project Manager for Zend Framework — has said, the reason for this because designing the model is your job. There are pros and cons to any approach. It’s all about finding a method that works for you, is appropriate for the situation at hand, and mitigates difficulty in long-term application maintenance.

The Model section of the Quick Start Guide and this blog post by Michelangelo van Dam includes examples of a Data Mapper approach. Zend_Db uses a Table Data Gateway and Row Data Gateway approach. Doctrine, a popular ORM library that is gaining traction in the ZF community, uses an Active Record approach. And available approaches don’t stop there. As a general rule of thumb when designing models and components in general, I recommend favoring composition over inheritance. You’ll get a better sense of what I mean by that later in the post.

Defining the model

The Wikipedia article on the MVC architectural pattern isn’t all-encompassing, but isn’t a bad place to start either. In particular, it drives home a few important points about the model that you should bear in mind.

The model is a “domain-specific representation of the data upon which the application operates.”

In a nutshell, the model handles data: storing it, retrieving it, filtering and validating it, and providing access to it.

The model contains “domain logic” that “adds meaning to raw data.”

In other words, it handles the conversion from raw data in a data source to semantically meaningful PHP objects and back again.

“MVC does not specifically mention the data access layer because it is understood to be underneath or encapsulated by the model.”

The shorter version: $model != $database. This is one point that trips a lot of people up. The model and your database are not congruent, synonymous, or in any way equivalent. Yes, 99% of the time, your model will use a database for its data source. However, models can be more complex than that: they can serve as clients to web services, limit access to data using an ACL, access data caching resources like memcached or APC, and so forth.

Designing the model

“So which approach do you use?”

Generally, the answer is none of the above. My personal preference is to keep data, in the form of plain old PHP arrays and objects, separate from logic to handle that data. Many common PHP tasks result in data already being present in either of these forms such as in its superglobal arrays, so it seems natural to just take it in the form in which it’s provided.

I define a model class that composes some other object to access the data I need, generally a Zend_Db_Adapter instance. Any methods of that model class return data using scalar types or classes that PHP supports natively. What’s great about this is that it’s fairly easy to convert data to and from these forms using type juggling regardless of the data’s origin.

“But wait, how can I encapsulate the data source within my model if I need a dependency like a Zend_Db_Adapter instance for it to be able to interact with that data source?”

This is another major question that people tend to ask. If any code calling your model first has to handle injecting its dependencies, that muddies up separation of concerns because calling code then must have some knowledge of how your model operates internally. This is a problem because, if the data source of the model needs to change in the future, all calling code needs to change as well versus only model code. There are a few ways to approach this problem in Zend Framework.

The first method involves storing your dependencies in Zend_Registry. Declare accessor methods in the model for dependencies that retrieve them from the registry only if they are not explicitly injected from calling code. This bypasses the need for dependency injection from application code, thus preserving separation of concerns, but still allows injection to be performed for unit testing purposes.

The second method is a variation of the first and is specific to the case of Zend_Db_Adapter instances. This approach involves setting your adapter as the default adapter to use for Zend_Db_Table instances in lieu of using the registry to store it. Note that this doesn’t require actually using Zend_Db_Table in order to work. This can be set from the bootstrap or application configuration file, then retrieved using the getDefaultAdapter() method of the Zend_Db_Table_Abstract class. Again, don’t forget your accessor methods so dependencies can be injected from unit tests.

Another method might be to use a service locator as a dependency in models. This serves as a layer of abstraction between models and their dependencies when it comes to controllers dealing with both. To my knowledge, the closest thing to an implementation of this in ZF is the Zend_Application_Bootstrap classes. From a controller, the bootstrap instance can be accessed as in the code example below.

$bootstrap = $this->getInvokeArg('bootstrap');

Where to go from here

Service layers are another frequent topic related to models. Those aside, I mainly suggest picking a simple model that you can prototype and try several approaches to see which you prefer. If you’ve got some ZF experience under your belt, I’d be interested in hearing about your own modeling approaches and experiences and encourage you to leave a comment on this post.

Renaming a DOMNode in PHP

A recent work assignment had me using PHP to pull HTML data into a DOMDocument instance and renaming some elements, such as b to strong or i to em. As it turns out, renaming elements using the DOM extension is rather tedious.

Version 3 of the DOM standard introduces a renameNode() method, but the PHP DOM extension doesn’t currently support it.

The $nodeName property of the DOMNode class is read-only, so it can’t be changed that way.

A node can be created with a different name in the same document, but if you specify a value to go along with it, any entities in that value are automatically encoded, so it’s not possible to pass in the intended inner content of a node if it contains other nodes.

The only method I’ve found that works is to replicate the attributes and child nodes of the original node. Attributes are fairly easy, but I ran into an issue replicating children where only the first child of any given node was replicated within its intended replacement and the remaining children were omitted. Here’s the original code that was exhibiting this behavior.

foreach ($oldNode->childNodes as $childNode) {
    $newNode->appendChild($childNode);
}

The reason for this behavior is that the $childNodes property of $oldNode is implicitly modified when $childNode is transferred from it to $newNode, so the internal pointer of $childNodes to the next child in the list is no longer accurate.

To get around this, I took advantage of the fact that any node with any child nodes will always have a $firstChild property pointing to the first one. The modified code that takes this approach is below and has the behavior I originally set out to implement.

while ($oldNode->firstChild) {
    $newNode->appendChild($oldNode->firstChild);
}

If you’re curious, below is the full code segment for renaming a node.

$newNode = $oldNode->ownerDocument->createElement('new_element_name');
if ($oldNode->attributes->length) {
    foreach ($oldNode->attributes as $attribute) {
        $newNode->setAttribute($attribute->nodeName, $attribute->nodeValue);
    }
}
while ($oldNode->firstChild) {
    $newNode->appendChild($oldNode->firstChild);
}
$oldNode->ownerDocument->replaceChild($newNode, $oldNode);

Another potential “gotcha” is the argument order of the replaceChild() method, which is the new node followed by the old node rather than the reverse that most people might expect. Thanks to Joshua May for pointing that one out to me; I might never have understood why I was getting a “Not Found Error” DOMException otherwise.

Splitting PHP Class Files

A recent work project required me to write a PHP script to interact with a remote SOAP service. Part of the service provider’s recommended practices entailed using a slightly dated software package called wsdl2php, which generates a single PHP file containing classes corresponding to all user-defined types from a specified WSDL file.

The issue I ran into was due to all the generated PHP classes being housed in a single file. I had to process two WSDL files that had several identical user-defined types in common. As a result, I couldn’t simply include the two PHP files generated from them because PHP doesn’t allow you to define two classes with the same name.

Looking at its source code, modifying wsdl2php to change this behavior was not a very appealing option. Attempting to consolidate the two WSDL files into one with no redundant user-defined type declarations seemed futile as well. Instead, I resolved to split the generated PHP files such that each class was contained in its own file. This would also allow me to use an autoloader to determine which of the classes I actually needed for the particular service call I was making.

Due to the number of classes, splitting the classes into separate files by hand would have been tedious and time-consuming. I decided to tap into my previous experience with the tokenizer extension to throw together a CLI script that would handle this for me. Once I got it working, it clocked in at just over 50 LOC with comments and whitespace. You simply call it from a shell and pass it the PHP file you want to split and the destination for the split class files.

I thought it might be useful for others needed to process similarly formatted source code, so I threw it into a github repository for anyone who might like to take a look. I’m open to suggestions for improvements to implement if enough people find it useful. Feel free to file an issue on the repository if you happen to find a bug.

So Long, Blue Parabola

I’ve decided to leave Blue Parabola. My last day there will be Tuesday February 16. I’d like to thank Keith Casey and Marco Tabini for choosing me to be part of the team. It’s been a privilege to work with them and I’ve learned a great deal.

As for what’s next, I’ll be starting at K-fx2. (Nope, no funemployment for me.) There, I’ll be developing Zend Framework applications and helping to streamline development processes and infrastructure.

Thanks to all my friends and family who’ve provided support during this transition. I’m looking forward to what the next year holds.

Speaking at a Conference

I can’t make any claim to the title of veteran conference speaker. Not yet, at least. However, I have done it once before at ZendCon in 2008 and I’ll be doing it again at php|tek this year. I thought I’d take a blog post to give out a few tips to any prospective first-time speakers based on my first speaking experience. I’m assuming there that you’ve already decided on a particular conference that you want to attend, you’ve submitted a session proposal, and you’ve been accepted.

First, in addition to the other things you should do before attending, be ready to give your presentation before you get on the plane. You should start on your slides as far in advance as possible. Don’t put it off or wait until the last minute, because it will likely be more work than you anticipate. This includes making sure that any live demos you intend to give will run as expected. Syntax errors and crashing web servers look very bad to the audience.

One of the reasons for this is that you’ll want to practice your talk out loud. It’s one thing to put the material onto slides, but it may sound different when it’s actually coming out of your mouth and going into the crowd. You may find stumbling points, places where you stutter or get caught off-guard when transitioning from one topic to another. Try to organize the presentation such that it matches your natural flow when talking about the topic without any slides at all.

Which reminds me, learn from the masters. People like Marco Tabini have spoken before and have a wealth of knowledge that they’ll share fairly freely most of the time, especially if alcohol (or, in Marco’s case, an espresso) is involved. Look at books like Presentation Zen by Garr Reynolds. Take the time to hone your presentation skills before you have to make your delivery.

If you’ve been to a conference before, you’ve probably already learned about my next point the hard way. Don’t depend on wifi internet access availability. Why not? Because the vast majority of the time, it will suck. There won’t be enough IP addresses, someone will do something to hog bandwidth and make latency skyrocket, it will find some way to refuse to work. Save local copies of files, write a minimal daemon to simulate a remote server, do whatever you need to do to avoid it.

That point goes hand in hand with this one: test your equipment early and have a Plan B. In particular, hook your laptop up to the projector in the room in which you’ll be speaking (or to a test projector, if the conference hosts provide one and prefer you use that) to make sure it can display your slides. Ben Ramsey was gracious enough to loan me his Macbook at ZendCon because my Sony Vaio refused to work with the projector and the time-sensitive situation did nothing but add to my speaking nerves. Make sure you don’t end up in the same spot.

Lastly, don’t let critical reception deter you from speaking again. I got pretty negative feedback the first time around, but I took it in stride. While I know I have plenty of room for improvement, I’m still going to give it another shot. Do your very best, then strive to be better.

Hope you enjoyed this blog post and gleaned something useful from it. If you’ve got any of your own speaking tips, please feel free to add a comment on this post. If you’ll be attending php|tek, I look forward to seeing you there!

Speaking at tek-X

As the recently released schedule shows, I will be speaking at the php|tek 2010 conference. The session I’ll be presenting is entitled “New SPL Features in PHP 5.3″ and it will be an extended version of the webcast I presented as part of the CodeWorks webcast series.

While there, I also plan on participating in the Hack Track and may try to recruit a few new contributors (like you!) for the Phergie project. I am very much looking forward to the event and hope to see you there!

Database Testing with PHPUnit and MySQL

Update 2012/01/15: I finally got around to submitting a patch to document this feature in the PHPUnit manual. Sebastian has merged it, so it will hopefully be available in the online manual soon.

Update #2 2012/01/23: I got around to checking the online version of the manual and the current build includes my patch. Enjoy.

I recently made a contribution to the PHPUnit project that I thought I’d take a blog post to discuss. One of the extensions bundled with PHPUnit adds support for database testing. This extension was contributed by Mike Lively and is a port of the DbUnit extension for the JUnit Java unit testing framework. If you’re interested in learning more about database unit testing, check out this presentation by Sebastian Bergmann on the subject.

One of the major components of both extensions is the data set. Database unit tests involve loading a seed data set into a database, executing code that performs an operation on that data set such as deleting a record, and then checking the state of the data set to confirm that the operation had the desired effect. DbUnit supports multiple formats for seed data sets. The PHPUnit Database extension includes support for DbUnit’s XML and flat XML formats plus CSV format as well.

If you’re using MySQL as your database, CSV has been the only format supported by both the mysqldump utility and the PHPUnit Database extension up to this point. My contribution adds support for its XML format to the extension. While this support was developed to work in the PHPUnit 3.4.x branch, it won’t be available in a stable release until 3.5.0. In the meantime, this is how you can use it now.

  1. Go to the commit on Github and apply the additions and modifications included in it to your PHPUnit installation.
  2. From a shell, get your XML seed data set and store it in a location accessible to your unit test cases.
    mysqldump --xml -t -u username -p database > seed.xml
  3. Create a test case class that extends PHPUnit_Extensions_Database_TestCase. Implement getConnection() and getDataSet() as per the documentation where the latter will include a method call to create the data set from the XML file as shown below.
    $dataSet = $this->createMySQLXMLDataSet('/path/to/seed.xml');
  4. At this point, you can execute operations on the database to get it to its expected state following a test, produce an XML dump of the database in that state, and then compare that dump to the actual database contents in a test method to confirm that the two are equal.
    $expected = $this->createMySQLXMLDataSet('/path/to/expected.xml');
    $actual = new PHPUnit_Extension_Database_DataSet_QueryDataSet($this->getConnection());
    // Specify a SELECT query as the 2nd parameter here to limit the data set, else the entire table is used
    $actual->addTable('tablename');
    $this->assertDataSetsEqual($expected, $actual);

That’s it! Hopefully this proves useful to someone else.

PHPUnit and Xdebug on Ubuntu Karmic

This is just a quick post to advise anyone who may be using PHPUnit and Xdebug together on Ubuntu Karmic. If you try to upgrade to PHPUnit 3.4.6 and you’re using the php5-xdebug Ubuntu package (which is Xdebug 2.0.4), you may get output that looks like this:

$ sudo pear upgrade phpunit/PHPUnit
Did not download optional dependencies: pear/Image_GraphViz, pear/Log, use --alldeps to download automatically
phpunit/PHPUnit can optionally use package "pear/Image_GraphViz" (version >= 1.2.1)
phpunit/PHPUnit can optionally use package "pear/Log"
phpunit/PHPUnit can optionally use PHP extension "pdo_sqlite"
phpunit/PHPUnit requires PHP extension "xdebug" (version >= 2.0.5), installed version is 2.0.4
No valid packages found
upgrade failed

There are two ways to deal with this situation. First off, note that the newer Xdebug 2.0.5 version includes several bugfixes including one related to code coverage reporting. That said, if you still want to continue using the php5-xdebug package anyway, you can force the upgrade by having the PEAR installer ignore dependencies like so:

sudo pear upgrade -n phpunit/PHPUnit

The other method involves installing Xdebug 2.0.5. First, if you have the php5-xdebug package, remove it.

sudo apt-get remove php5-xdebug

Next, use the PECL installer to install Xdebug. This requires that you have the php5-dev package installed so that the extension can be compiled locally.

sudo apt-get install php5-dev
sudo pecl install xdebug

At this point, create the file /etc/php5/conf.d/xdebug.ini if it doesn’t already exist and populate it with these contents:

zend_extension=/usr/lib/php5/20060613/xdebug.so

Then bounce Apache so that the new extension will be loaded.

sudo apache2ctl restart

That’s it. Hope someone finds this helpful.

I’m a Honey Pot

Side note: Yes, the title of this post is a throwback to the 418 status code in the HTTP protocol. My sense of humor is just odd that way.

I thought I’d kick things off on my new blog with a quick post on something I did while getting it set up.

Before switching to this new blog, I’d moved to using the spamhoneypot plugin on my old Habari blog to capture spam. I had a great amount of success in that switch, but in deciding to move to using WordPress on this new blog, I noticed that it had no equivalent plugins. There were several anti-spam plugins, but they all required use of a third-party service. I hadn’t seen consistent success with plugins that used this approach in the past, so I wanted to avoid repeating those experiences.

So, I decided to try my hand at writing a WordPress plugin. After wading through the filter and action documentation and googling around for a bit, I came up with a fairly simple plugin that seems to do the job.

The plugin works by adding a textarea field to the comment form that’s hidden using a CSS style. Since bots don’t generally detect CSS like this, they proceed to fill out the field like any other field. This implies that they aren’t a human being using a browser, in which case the plugin marks the comment as spam. I’ve found this catches the vast majority of spam comments with very few false results.

I’ve submitted to have the plugin hosted on the WordPress site, but until then, you can grab a copy off of a Github repository I’ve set up for it. Hope you find it useful!

Update 1/2/10 8:41 AM CST: The plugin is now available for download from the WordPress site.