Posts tagged ‘PHP’

New SPL Features in PHP 5.3

Note: I’ve written on this topic before, but thought the subject warranted further more detailed discussion and a more comprehensive and up-to-date set of benchmarks. Hence, this post and this presentation. Enjoy.

Update: This post and the benchmarks have been updated for PHP 5.3.4 and Ubuntu 10.10. (12/14/2010)

The SPL, or Standard PHP Library, is an often overlooked extension in the PHP core. It first came on the scene in PHP 5 and a variety of iterators constituted the majority of its initial offerings. Though the iterator offerings were expanded in PHP 5.3, the particularly interesting additions to the SPL were several specialized data structure classes, the foundational concepts for which originate in the field of computer science. In this post, I will provide an overview of these new classes and explain why and when they should be used.

Arrays

While PHP has several data types, the ones that likely see the most frequent and varied use are strings and arrays. They are the proverbial duct tape and WD-40 of PHP, respectively. Like arrays, SPL data structure classes are used to store composite (i.e. non-scalar) data.

Now, that’s not to say that every instance of an array in existing codebases should be replaced with an SPL container object. There are cases where it’s appropriate to use one over the other. Knowing the difference requires an understanding of how arrays work.

Within the C code that makes up the PHP interpreter, arrays are implemented as a data structure called a hash table or hash map. When a value contained within an array is referenced by its index, PHP uses a hashing function to convert that index into a unique hash representing the location of the corresponding value within the array.

This hash map implementation enables arrays to store an arbitrary number of elements and provide access to all of those elements simultaneously using either numeric or string keys. Arrays are extremely fast for the capabilities they provide and are an excellent general purpose data structure.

Fixed Arrays

In contrast to arrays, SplFixedArray functions more like C arrays or Java arrays than PHP arrays. The maximum number of elements that it may contain is specified upon instantiation. While it is possible to change it later via the setSize() method, this negates the performance advantages of using it: because its size is fixed, it doesn’t need to use a hashing function to resolve the position of elements within the array. It makes sense to use fixed arrays when the number of elements to be stored is known in advance and the elements only need to be accessed by sequential position.

SplFixedArray implements the Iterator, ArrayAccess, and Countable interfaces. Iterator allows it to be iterated using a foreach loop. ArrayAccess provides access to its elements using array syntax where elements are referred to using integer positions beginning at 0 as with enumerated arrays. Countable enables a list to be passed to the count() function like an array.

Aside from the inability to use it in place of arrays with array functions, instances of SplFixedArray function just like arrays for all intents and purposes. It’s even possible to convert them to and from arrays using the toArray() and fromArray() methods respectively. However, it generally makes more sense to use SplFixedArray exclusively for each individual use case.

Lists

In computer science, a list is defined as an ordered collection of values. A linked list is a data structure in which each element in the list includes a reference to one or both of the elements on either side of it within the list. The term “doubly-linked list” is used to refer to the latter case. In the SPL, this takes the form of the class SplDoublyLinkedList.

Like SplFixedArray, SplDoublyLinkedList also implements the Iterator, ArrayAccess, and Countable interfaces. In addition to the methods that come with these interface implementations, elements can be added to or removed from the start or end of the list using its push(), pop(), shift() and unshift() methods, which correspond to the array_push(), array_pop(), array_shift(), and array_unshift() functions respectively. Unfortunately, as of PHP 5.3.4, there’s no way to insert an element anywhere in the list other than at the beginning or the end. A feature request has been filed for this. Add a comment or vote to show support for its addition.

The elements at the start and end of the list are accessible via its top() and bottom() methods respectively, which correspond to the reset() and end() functions. Like SplFixedArray, elements can also be accessed arbitrarily by positional index using the array syntax granted by ArrayAccess. It makes sense to use lists when the number of elements to be stored is not known in advance and the elements only need to be accessed by sequential position.

Stacks

Stacks are similar to lists with two major differences. First, elements can only be added to the top of the stack. Second, an element can only be accessed by taking it off the top of the stack. Because of these differences, the stack is often referred to as a Last-In-First-Out or LIFO data structure. SplStack is the SPL stack implementation.

SplStack is a bit removed from the traditional definition of a stack. It extends SplDoublyLinkedList and inherits its abilities, some of which don’t really apply to stacks. In order to enforce its restriction on how elements are accessed, SplStack overrides the setIteratorMode() method of its parent class and implements its own to prevent modification of the iteration direction. Both methods allow elements to be retained or removed as they are iterated.

Use of stacks makes sense when the number of elements to be stored is not known in advance and the only element that must be accessible is the last one stored. However, as of PHP 5.3.4, the performance of SplStack leaves something to be desired. Benchmarks included later in this provide an objective illustration of this, though the cause of the behavior remains unknown.

Queues

Queues are also similar to lists, again with two major differences. First, elements can only be added (or “enqueued”) to the end of the queue. Second, an element can only be accessed by removing (or “dequeueing”) it from the beginning of the queue. For these reasons the queue is referred to as a First-In-First-Out or FIFO data structure. The SplQueue class implements this data structure in the SPL.

SplQueue follows suit with SplStack in extending SplDoublyLinkedList. Just as SplStack resultingly inherits some operations with at least questionable applicability, so too does SplQueue. Likewise, it overrides setIteratorMode() with its own version to restrict how elements are accessed. Use of queues makes sense when the number of elements to be stored is not known in advance and the only element that must be accessible is the remaining element that was stored earliest.

One minor difference between SplQueue and SplStack is that the former contains two method aliases named after conceptual queue operations: dequeue() aliases SplDoublyLinkedList::shift() and enqueue() aliases SplDoublyLinkedList::push(). This makes sense because while push() and pop() share similar applicability to conceptual stack operations, they are already present in its parent class.

Despite their common ancestry, SplQueue appears to have better performance than SplStack as of PHP 5.3.4. Benchmarks included later in this post review this in more detail.

Heaps

Up to this point, the data structures discussed have resembled lists insofar as they contain elements in the order in which they were added. By contrast, when an element is added to a heap, a comparison function is used to compare the new element to other elements already in the heap and element is placed appropriately within the heap based on that function’s return value. The beauty of heaps is that their underlying algorithm does this with minimal element comparisons, so it’s extremely efficient. Using heaps makes sense when the number of elements to be stored is not known in advance and elements must be accessed in an order based on how they compare to each other.

SplHeap is an abstract class used to create a heap by extending it and providing a comparison function in the form of its compare() method. Only the root element of a heap, the one yielding the highest comparison function return value, may be accessed or removed from the heap at any given time. This is done using the extract() method of SplHeap. SplHeap implements the Iterator and Countable interfaces but, because only the root element can be extracted, it does not implement the ArrayAccess interface like the previously discussed data structure classes.

In addition to the abstract SplHeap class, two concrete implementations are also included in the SPL, namely SplMinHeap and SplMaxHeap. The compare() method of SplMinHeap returns a value such that the smallest element in the heap is the root element. Likewise, the compare() method of SplMaxHeap returns a value such that the largest element in the heap is the root element.

At first glance, using a subclass of SplHeap may seem equivalent to calling sort() or a similar function on an array and accessing the elements in sequence. This is indeed the case if all elements are added to the array prior to it being sorted. However, situations such as elements arriving over time or inadequate memory to store all elements simultaneously may preclude this approach. Use of arrays in such situations would require repeated resorting of the entire array as new elements are added, which is inefficient. This is why using the corresponding heap class makes a lot more sense in that situation than repeated calls to sort(), min() or max(). Additionally, SplHeap can be used to implement the heapsort algorithm, which has better worst case performance than the quicksort algorithm implementation used by arrays.

Priority Queues

Priority queues are somewhat similar to heaps. In fact, while it doesn’t extend SplHeap, SplPriorityQueue does make use of a heap structure internally to implement its functionality. The difference is that the insert() method of SplPriorityQueue queue accepts both a value and an associated priority, removing the need to use an array or object to store both of these and define an appropriate comparison function in an SplHeap instance. Elements with the highest priority, like those in SplMaxHeap with the highest value, are the ones that come out first when extract() is called. Note that elements with equal priority are returned in no particular order.

For reasons similar to those of SplHeap, SplPriorityQueue implements both Iterator and Countable interfaces and does not implement the ArrayAccess interface. Because it stores a value and priority per element, SplPriorityQueue includes a setExtractFlags() method that modifies the behavior of extract() to return the stored value, the stored priority, or an array containing both. Priorities are not bound to a particular data type: strings, integers, or even composite data types can be used. SplPriorityQueue can be extended and its compare() method overridden to customize the comparison logic.

It makes sense to use a priority queue when the number of elements to be stored is not known in advance and elements must be accessed in an order based on how a value associated with each element (versus the element value itself) compares to the same associated values of other elements.

Sets and Composite Hash Maps

SplObjectStorage combines some of the properties of two different data structures. First, it provides the same functionality of a hash table that a normal array has, but without its associated inability to use objects as keys unless the spl_object_hash() function is used. In other words, it implements a composite hash map. Second, it can be used as a set to store objects as data without a meaningful corresponding key or concept of sequential order.

Its attach() method accepts an object key and the data to associate with it and its detach() method allows data to be removed using its associated object key. To use the object as a set, simply exclude the $data parameter for attach() as it’s optional. The set operations implemented by SplObjectStorage all have array function counterparts. For example, the addAll() method and array_merge() function both correspond to the union set operation. The difference operation is available using the removeAll() method and array_diff() function and its variants. The contains() method and in_array() function both implement the element_of operation. Sadly, only arrays have an implementation of the intersection operation in the form of array_intersect() and its variants. Tobias Schlitt has a more in-depth analysis of this data structure that includes implementations of the set operations lacking in the SPL itself.

Update: A patch I’ve submitted has been merged. SplObjectStorage::removeAllExcept(), which is equivalent to the set intersection operation, will become available in PHP 5.3.5. (1/5/2011)

Like some of the other data structures in the SPL, SplObjectStorage implements the Iterator, Countable, and ArrayAccess interfaces. Oddly, it also implements the Traversable interface (which is limited to internally defined classes and negates the need for implementation of the Iterator interface) and the Serializable interface (and it is the only SPL data structure class to do so).

Using this class makes sense when data must be stored using composite keys or the ability to access data using set operations is more important than accessing data in a specific order.

Benchmarks

Standard disclaimer: There are lies, damned lies, and benchmarks. YMMV.

Platform

Process

Code used is located in this GitHub repository.

  1. Modify constant declarations at the top of runner.php as appropriate (50 executions per test were used to get the results below), then execute it from the command line. It will in turn execute each of the scripts in the tests directory, measuring execution time and memory usage. Results will be recorded in results/raw.csv.
  2. To generate graphs, run graphs.php. This uses the Graph component from the ezComponents library. Resulting images will be written to the results directory in PNG format.

Results

SplFixedArray - Executions Per Second SplFixedArray - Memory CodeArraySPL
SplDoublyLinkedList - Executions Per Second SplDoublyLinkedList - Memory CodeArraySPL
SplStack - Executions Per Second SplStack - Memory CodeArraySPL
SplQueue - Executions Per Second SplQueue - Memory CodeArraySPL
SplMinHeap - Executions Per Second SplMinHeap - Memory CodeArraySPL
SplPriorityQueue - Executions Per Second SplPriorityQueue - Memory CodeArraySPL
SplObjectStorage - Executions Per Second SplObjectStorage - Memory CodeArraySPL

Other Data Structures

If you have an interest in other data structure implementations for PHP outside of SPL offerings, check out the bloomy PECL extension, which is an implementation of a bloom filter created by Andrei Zmievski.

Renaming a DOMNode in PHP

A recent work assignment had me using PHP to pull HTML data into a DOMDocument instance and renaming some elements, such as b to strong or i to em. As it turns out, renaming elements using the DOM extension is rather tedious.

Version 3 of the DOM standard introduces a renameNode() method, but the PHP DOM extension doesn’t currently support it.

The $nodeName property of the DOMNode class is read-only, so it can’t be changed that way.

A node can be created with a different name in the same document, but if you specify a value to go along with it, any entities in that value are automatically encoded, so it’s not possible to pass in the intended inner content of a node if it contains other nodes.

The only method I’ve found that works is to replicate the attributes and child nodes of the original node. Attributes are fairly easy, but I ran into an issue replicating children where only the first child of any given node was replicated within its intended replacement and the remaining children were omitted. Here’s the original code that was exhibiting this behavior.

foreach ($oldNode->childNodes as $childNode) {
    $newNode->appendChild($childNode);
}

The reason for this behavior is that the $childNodes property of $oldNode is implicitly modified when $childNode is transferred from it to $newNode, so the internal pointer of $childNodes to the next child in the list is no longer accurate.

To get around this, I took advantage of the fact that any node with any child nodes will always have a $firstChild property pointing to the first one. The modified code that takes this approach is below and has the behavior I originally set out to implement.

while ($oldNode->firstChild) {
    $newNode->appendChild($oldNode->firstChild);
}

If you’re curious, below is the full code segment for renaming a node.

$newNode = $oldNode->ownerDocument->createElement('new_element_name');
if ($oldNode->attributes->length) {
    foreach ($oldNode->attributes as $attribute) {
        $newNode->setAttribute($attribute->nodeName, $attribute->nodeValue);
    }
}
while ($oldNode->firstChild) {
    $newNode->appendChild($oldNode->firstChild);
}
$oldNode->ownerDocument->replaceChild($newNode, $oldNode);

Another potential “gotcha” is the argument order of the replaceChild() method, which is the new node followed by the old node rather than the reverse that most people might expect. Thanks to Joshua May for pointing that one out to me; I might never have understood why I was getting a “Not Found Error” DOMException otherwise.

Splitting PHP Class Files

A recent work project required me to write a PHP script to interact with a remote SOAP service. Part of the service provider’s recommended practices entailed using a slightly dated software package called wsdl2php, which generates a single PHP file containing classes corresponding to all user-defined types from a specified WSDL file.

The issue I ran into was due to all the generated PHP classes being housed in a single file. I had to process two WSDL files that had several identical user-defined types in common. As a result, I couldn’t simply include the two PHP files generated from them because PHP doesn’t allow you to define two classes with the same name.

Looking at its source code, modifying wsdl2php to change this behavior was not a very appealing option. Attempting to consolidate the two WSDL files into one with no redundant user-defined type declarations seemed futile as well. Instead, I resolved to split the generated PHP files such that each class was contained in its own file. This would also allow me to use an autoloader to determine which of the classes I actually needed for the particular service call I was making.

Due to the number of classes, splitting the classes into separate files by hand would have been tedious and time-consuming. I decided to tap into my previous experience with the tokenizer extension to throw together a CLI script that would handle this for me. Once I got it working, it clocked in at just over 50 LOC with comments and whitespace. You simply call it from a shell and pass it the PHP file you want to split and the destination for the split class files.

I thought it might be useful for others needed to process similarly formatted source code, so I threw it into a github repository for anyone who might like to take a look. I’m open to suggestions for improvements to implement if enough people find it useful. Feel free to file an issue on the repository if you happen to find a bug.

Speaking at a Conference

I can’t make any claim to the title of veteran conference speaker. Not yet, at least. However, I have done it once before at ZendCon in 2008 and I’ll be doing it again at php|tek this year. I thought I’d take a blog post to give out a few tips to any prospective first-time speakers based on my first speaking experience. I’m assuming there that you’ve already decided on a particular conference that you want to attend, you’ve submitted a session proposal, and you’ve been accepted.

First, in addition to the other things you should do before attending, be ready to give your presentation before you get on the plane. You should start on your slides as far in advance as possible. Don’t put it off or wait until the last minute, because it will likely be more work than you anticipate. This includes making sure that any live demos you intend to give will run as expected. Syntax errors and crashing web servers look very bad to the audience.

One of the reasons for this is that you’ll want to practice your talk out loud. It’s one thing to put the material onto slides, but it may sound different when it’s actually coming out of your mouth and going into the crowd. You may find stumbling points, places where you stutter or get caught off-guard when transitioning from one topic to another. Try to organize the presentation such that it matches your natural flow when talking about the topic without any slides at all.

Which reminds me, learn from the masters. People like Marco Tabini have spoken before and have a wealth of knowledge that they’ll share fairly freely most of the time, especially if alcohol (or, in Marco’s case, an espresso) is involved. Look at books like Presentation Zen by Garr Reynolds. Take the time to hone your presentation skills before you have to make your delivery.

If you’ve been to a conference before, you’ve probably already learned about my next point the hard way. Don’t depend on wifi internet access availability. Why not? Because the vast majority of the time, it will suck. There won’t be enough IP addresses, someone will do something to hog bandwidth and make latency skyrocket, it will find some way to refuse to work. Save local copies of files, write a minimal daemon to simulate a remote server, do whatever you need to do to avoid it.

That point goes hand in hand with this one: test your equipment early and have a Plan B. In particular, hook your laptop up to the projector in the room in which you’ll be speaking (or to a test projector, if the conference hosts provide one and prefer you use that) to make sure it can display your slides. Ben Ramsey was gracious enough to loan me his Macbook at ZendCon because my Sony Vaio refused to work with the projector and the time-sensitive situation did nothing but add to my speaking nerves. Make sure you don’t end up in the same spot.

Lastly, don’t let critical reception deter you from speaking again. I got pretty negative feedback the first time around, but I took it in stride. While I know I have plenty of room for improvement, I’m still going to give it another shot. Do your very best, then strive to be better.

Hope you enjoyed this blog post and gleaned something useful from it. If you’ve got any of your own speaking tips, please feel free to add a comment on this post. If you’ll be attending php|tek, I look forward to seeing you there!

Speaking at tek-X

As the recently released schedule shows, I will be speaking at the php|tek 2010 conference. The session I’ll be presenting is entitled “New SPL Features in PHP 5.3″ and it will be an extended version of the webcast I presented as part of the CodeWorks webcast series.

While there, I also plan on participating in the Hack Track and may try to recruit a few new contributors (like you!) for the Phergie project. I am very much looking forward to the event and hope to see you there!

Database Testing with PHPUnit and MySQL

Update 2012/01/15: I finally got around to submitting a patch to document this feature in the PHPUnit manual. Sebastian has merged it, so it will hopefully be available in the online manual soon.

Update #2 2012/01/23: I got around to checking the online version of the manual and the current build includes my patch. Enjoy.

I recently made a contribution to the PHPUnit project that I thought I’d take a blog post to discuss. One of the extensions bundled with PHPUnit adds support for database testing. This extension was contributed by Mike Lively and is a port of the DbUnit extension for the JUnit Java unit testing framework. If you’re interested in learning more about database unit testing, check out this presentation by Sebastian Bergmann on the subject.

One of the major components of both extensions is the data set. Database unit tests involve loading a seed data set into a database, executing code that performs an operation on that data set such as deleting a record, and then checking the state of the data set to confirm that the operation had the desired effect. DbUnit supports multiple formats for seed data sets. The PHPUnit Database extension includes support for DbUnit’s XML and flat XML formats plus CSV format as well.

If you’re using MySQL as your database, CSV has been the only format supported by both the mysqldump utility and the PHPUnit Database extension up to this point. My contribution adds support for its XML format to the extension. While this support was developed to work in the PHPUnit 3.4.x branch, it won’t be available in a stable release until 3.5.0. In the meantime, this is how you can use it now.

  1. Go to the commit on Github and apply the additions and modifications included in it to your PHPUnit installation.
  2. From a shell, get your XML seed data set and store it in a location accessible to your unit test cases.
    mysqldump --xml -t -u username -p database > seed.xml
  3. Create a test case class that extends PHPUnit_Extensions_Database_TestCase. Implement getConnection() and getDataSet() as per the documentation where the latter will include a method call to create the data set from the XML file as shown below.
    $dataSet = $this->createMySQLXMLDataSet('/path/to/seed.xml');
  4. At this point, you can execute operations on the database to get it to its expected state following a test, produce an XML dump of the database in that state, and then compare that dump to the actual database contents in a test method to confirm that the two are equal.
    $expected = $this->createMySQLXMLDataSet('/path/to/expected.xml');
    $actual = new PHPUnit_Extensions_Database_DataSet_QueryDataSet($this->getConnection());
    // Specify a SELECT query as the 2nd parameter here to limit the data set, else the entire table is used
    $actual->addTable('tablename');
    $this->assertDataSetsEqual($expected, $actual);

That’s it! Hopefully this proves useful to someone else.

PHPUnit and Xdebug on Ubuntu Karmic

This is just a quick post to advise anyone who may be using PHPUnit and Xdebug together on Ubuntu Karmic. If you try to upgrade to PHPUnit 3.4.6 and you’re using the php5-xdebug Ubuntu package (which is Xdebug 2.0.4), you may get output that looks like this:

$ sudo pear upgrade phpunit/PHPUnit
Did not download optional dependencies: pear/Image_GraphViz, pear/Log, use --alldeps to download automatically
phpunit/PHPUnit can optionally use package "pear/Image_GraphViz" (version >= 1.2.1)
phpunit/PHPUnit can optionally use package "pear/Log"
phpunit/PHPUnit can optionally use PHP extension "pdo_sqlite"
phpunit/PHPUnit requires PHP extension "xdebug" (version >= 2.0.5), installed version is 2.0.4
No valid packages found
upgrade failed

There are two ways to deal with this situation. First off, note that the newer Xdebug 2.0.5 version includes several bugfixes including one related to code coverage reporting. That said, if you still want to continue using the php5-xdebug package anyway, you can force the upgrade by having the PEAR installer ignore dependencies like so:

sudo pear upgrade -n phpunit/PHPUnit

The other method involves installing Xdebug 2.0.5. First, if you have the php5-xdebug package, remove it.

sudo apt-get remove php5-xdebug

Next, use the PECL installer to install Xdebug. This requires that you have the php5-dev package installed so that the extension can be compiled locally.

sudo apt-get install php5-dev
sudo pecl install xdebug

At this point, create the file /etc/php5/conf.d/xdebug.ini if it doesn’t already exist and populate it with these contents:

zend_extension=/usr/lib/php5/20060613/xdebug.so

Then bounce Apache so that the new extension will be loaded.

sudo apache2ctl restart

That’s it. Hope someone finds this helpful.

I’m a Honey Pot

Side note: Yes, the title of this post is a throwback to the 418 status code in the HTTP protocol. My sense of humor is just odd that way.

I thought I’d kick things off on my new blog with a quick post on something I did while getting it set up.

Before switching to this new blog, I’d moved to using the spamhoneypot plugin on my old Habari blog to capture spam. I had a great amount of success in that switch, but in deciding to move to using WordPress on this new blog, I noticed that it had no equivalent plugins. There were several anti-spam plugins, but they all required use of a third-party service. I hadn’t seen consistent success with plugins that used this approach in the past, so I wanted to avoid repeating those experiences.

So, I decided to try my hand at writing a WordPress plugin. After wading through the filter and action documentation and googling around for a bit, I came up with a fairly simple plugin that seems to do the job.

The plugin works by adding a textarea field to the comment form that’s hidden using a CSS style. Since bots don’t generally detect CSS like this, they proceed to fill out the field like any other field. This implies that they aren’t a human being using a browser, in which case the plugin marks the comment as spam. I’ve found this catches the vast majority of spam comments with very few false results.

I’ve submitted to have the plugin hosted on the WordPress site, but until then, you can grab a copy off of a Github repository I’ve set up for it. Hope you find it useful!

Update 1/2/10 8:41 AM CST: The plugin is now available for download from the WordPress site.

The Configuration Pattern in Zend Framework

Several components in Zend Framework such as Zend_Form and Zend_Layout, support what I’ve come to call the Configuration Pattern. Though it doesn’t appear to exist in any officially sanctioned capacity like traditional design patterns, and it may qualify more as a convention than a design pattern, it seems to me that it’s something worth knowing about.

Here’s how it works. Have a look at the constructor for Zend_Form. It accepts an $options parameter, which can be an associative array or Zend_Config instance. If it’s an array, setOptions() is called. If it’s a Zend_Config instance, setConfig() is called, which then converts the Zend_Config instance to an associative array and passes that to setOptions(). So, either way, you end up in the same method with the same type of data.

setOptions() then iterates over the associative array it receives. It takes the index of each element and looks for a corresponding setter method. For example, if the array contains an element with the index ‘viewScriptPath’, setOptions() would check the class definition for a method setViewScriptPath() using method_exists(). Since that method does exist in Zend_Form, it would be called and passed the value from the associative array corresponding to that index.

This alleviates the need to explicitly call a setter method for every value you want to set. While that approach is slightly more efficient — not using setOptions() means one less function call and no method_exists() calls, which are slightly expensive — it can make code using the class in question look overly verbose or cluttered.

Alternatively, the setOptions() method could be called after the class was instantiated. With respect to that approach, passing an array or Zend_Config instance to the constructor only saves you one line of code on the calling end.

To get a list of classes that use this pattern, you can issue the bash command shown below from the library directory of your Zend Framework installation.

grep "\$this->setOptions" * | sort | uniq

Breadth-First Thinking

A surprisingly frequent occurrence in my day-to-day life goes something like this: I’ll get into IM or IRC conversations with friends when one technical topic or another will come up. Sometimes the conversation just branches from one tangent to another until that happens, other times the friend will ping me to ask a particular question on the topic. Some friends have even come to know this as a notable quality of mine.

The phrase that I’ve used to describe this quality in my head is “breadth-first thinking.” I thought I’d take a blog post to describe it in a bit more depth. You can find some of this information in the 2007 PHP Advent Calendar entry that Ben Ramsey did, but I’ll reiterate some of it here to bring it into context with my personal methods.

Social Bookmarking

Get an account on a social bookmarking service. I personally like Delicious as its Firefox addon makes bookmarking and tagging (which is extremely important for making things easy to find) a Ctl+D and Alt+S away in Firefox. You’re only as likely to use this service as it is easy to use and this is going to comprise a significant part of your personal database.

Feed Reader

Find a feed reader you like. I use Google Reader myself as it’s relatively frills-free and allows me to use all the functionality I need from the keyboard. Given only a few minutes, it’s easy to make a pass and mark off items that don’t interest me.

Everything Bucket

While Alex Payne may be against them, I think everything buckets are still potentially useful tools. Originally I was using Google Notebook, but when that got shut down I had to shop around for an alternative. I had issues with Evernote consistently retaining formatting in information I saved to it. I tried a few others and finally settled on using private posts on Tumblr.

News Sites

Subscribe to relevant new sites for topics that interest you, but in particular aim for sites that host a variety of information. I find PHP Developer, Planet PHP, and Zend Developer Zone to be excellent on both counts because they often put the spotlight on experiences using PHP and software based on it in conjunction with other technologies. Don’t let it stop there, though. Further explore blogs that they syndicate and subscribe to the ones that carry a lot of subject matter you like.

Social Media

Finally, participate in social media. If you follow people who share your interests on IRC, Facebook, or Twitter, links to interesting content are unlikely to be in short supply. If you use a Twitter client like