An Update on “Web Scraping with PHP”

Several people have asked me the same question recently, so I decided to take a blog post to provide an answer. The question is, “When will ‘Web Scraping with PHP‘ be available in print?” Answering requires a bit of background to paint a full picture of where things are now.

I turned in the manuscript for the book, which had already gone through technical editing, back in April 2009. That’s right, 14 months ago. After going through proofreading and layout, having a cover designed, getting an ISBN, and so forth, the digital version of the book finally went up for sale about three months ago.

At that point, I asked if print copies would be available at php|tek. The publisher sounded unsure, but told me that they would see what they could do. I was later told that the book had gone through printing and that arrangements had been made for a box to be shipped to the hotel on the Monday prior to the conference starting. It didn’t arrive at any point before or during the conference.

After the conference was over (now about two months ago), I was told that the box had finally arrived at the hotel the following Wednesday. I’m not sure how this was known, because the box was never recovered from the hotel to be shipped elsewhere.

The publisher then resolved to request another box from the printer. This box would have to be shipped to someone to proof the printed copies for quality assurance, then the books would go on sale. I was told about this resolution in early June, about five or six weeks ago. I’ve been in touch with the person responsible for proofing the books; they haven’t been received yet.

I have no direct contact with the printer, so all I’ve been able to do is e-mail the publisher repeatedly (six times and counting) since then asking for an update. Through an instant messenger conversation, I was told that a call was supposed to be placed to the printer early last week to determine what the status of things was. So far as I know, that call was never made, or at least I was never told of the result.

I’m as frustrated with this situation as anyone. I want the book to be available to people who want to read it in the format they prefer. If you are in this audience, please contact the publisher to voice your opinion. You can do so by using this contact form on their web site or by sending them a tweet on Twitter. Thank you all very much for your continued support.

Webscrapers Mailing List

Daniel Stenberg, one of the primary authors of the libcurl library on which the PHP cURL extension is based, was kind enough to comment on and clarify a recent blog post of mine regarding web scraping using the PHP and cURL. He later sent me a tweet to invite me to a new mailing list for web scraping enthusiasts just before tweeting a public invitation. In addition to the mailing list itself, the web site also has links to books (including my book) and popular tools related to the subject. I think this is awesome and I encourage anyone with an interest in web scraping, professional or recreational, to join.

Ledger and Building It From Source on Ubuntu 10.04

So I recently started looking around for finance software that would run on Ubuntu and quickly found reasons to dislike suggested options. Then I found Ledger. Wow, did it seem awesome by comparison. So, I added the Ubuntu PPA (see the “Platform binaries” section of this wiki page), installed it, created a data file for my finances, and ran the ledger CLI executable on it.

Then I ran into a problem that appeared to be a bug: in a transaction with multiple postings and only one with a null amount, I was receiving the error, “Error: Only one posting with null amount allowed per transaction.” Checking the Google Group didn’t reveal any other reports of the issue, nor did searching the Bugzilla database.

So, I hopped onto the #ledger IRC channel on Freenode, which is the network I tend to frequent anyway. Within minutes, I was able to have the lead developer on the project confirm that the issue appeared to be a bug and politely request that I file a bug report for it, which I did.

I was also able to consult the README-1ST file for instructions on how to do a custom build from source, which I intended to use to ensure that the bug hadn’t already been fixed in the git repository. The only thing that this file lacked was a list of dependencies, but I was able to locate those through trial and error with the build tool and thought I’d post them here for anyone else looking to build ledger from source on Ubuntu 10.04.

sudo apt-get install libboost-dev libboost-date-time-dev libboost-filesystem-dev libboost-iostreams-dev libboost-regex-dev libgmp3-dev libmpfr-dev texinfo

Once you’ve executed the command above from the shell, you should be able to run the command below from the README-1ST file to create your build. The executable will be created in the root of the source tree and named “ledger.”

./acprep update

To create a debug build, which I did to be able to submit debugging output related to my issue, issue this command following the one above.

./acprep debug make

Update: As it turns out, the issue was not a bug, just a small formatting issue with my data file. However, the lead developer of ledger still plans on looking into make the issue more obvious in ledger’s output.

Update #2: It seems the ledger build tool dependencies command supports Ubuntu, CentOS, and OS X. The way the statement was positioned in the README-1ST file, I assumed that support was limited to OS X. So, rather than going through the lengthy process I did to install dependencies on Ubuntu, you can just do this.

./acprep dependencies

Gotcha on Scraping .NET Applications with PHP and cURL

Obligatory pitch: Many other useful tidbits like this can be yours by purchasing my book, php|architect’s Guide to Web Scraping with PHP.

I recently wrote a PHP script to scrape data from a .NET application. In the process of developing this script, I noticed something interesting that I thought I’d share. In this case, I was using the cURL extension, but the tip isn’t necessarily specific to that. One thing my script did was submit a POST request to simulate a form submission. The code looked something like the sample below.

$ch = curl_init();
curl_setopt_array($ch, array(
    CURLOPT_URL => 'http://...',
    CURLOPT_POST => true,
    CURLOPT_POSTFIELDS => array(
        'field1' => 'value1',
        // ...
    ),
    // ...
));

The issue I ran into had to do with a behavior of the CURLOPT_POSTFIELDS setting that’s easy to overlook. This is a segment of its description from the PHP manual page for the curl_setopt() function.

If value is an array, the Content-Type header will be set to multipart/form-data.

If the form being submitted is not set to have an enctype attribute value of multipart/form-data in the form’s markup, .NET returns a 500-level HTTP response with no further information on what causes the error (for security purposes). This presumably happens because it’s expecting one value for the Content-Type request header and getting another.

Setting CURLOPT_HEADER and CURLOPT_VERBOSE to true helped to reveal that this was the issue. The fix is pretty simple: instead of passing the array itself for CURLOPT_POSTFIELDS, pass the result of wrapping it in a call to the  http_build_query() function (see its PHP manual page). This converts it to a properly formatted query string, which causes cURL to use the default Content-Type header value of application/x-www-form-urlencoded instead.

Tools like Firebug can help you to examine requests made by a browser. Together with these settings for cURL, you can modify your script’s requests to match those of your browser as closely as possible, making gotchas like this less likely to trip you up.

Why I Write

Someone I know recently sent me a question that I found interesting.

“I’m… exploring why I continue to pursue the insanity that is writing, and I want to get some views from people who write in other disciplines. Got any insight to share on why you wrote your book?”

I’m unfortunately still seeing delays in the print edition of my book being published. My apologies to those of you who have been asking after it; trust me when I say that I’m doing everything I can at this point to make it happen.

Unpleasantries aside, I decided to take a blog post to answer this question. I’ve actually written on this subject in the past with respect to technical publishing in particular, if you’d like more background on that.

As far as my personal reasons go, they certainly didn’t relate to money. Technical publishing may not be as saturated a market as mainstream fiction, but it’s also not as lucrative for authors.

Its relatively limited audience also eliminates fame as a reason, at least outside of that audience. A book may complement an existing reputation, but it’s more rare to establish a substantial level of notoriety through being published. Respect within that community — colleagues, peers, and prospective employers — is a more feasible goal.

That leads me to my personal main reason for writing my book: credentials. My knowledge and skills were vetted by a publisher respected within the industry for the quality of the books they publish. While it may not be directly monetary, that respect has value and there are few ways for an individual to attain it. Publishing a book is one such method.

Lastly, I felt I had something to share with an audience with whom I had a common interest. The topic of my book may be a bit niche, but prospective readers are all the more likely to be fervorous about studying or otherwise pursuing it. Readers of science fiction and technical books have this trait in common. If I ever end up publishing a fiction piece, it will likely be in the former genre.

It’s one thing to publish a blog post or an article in a professional magazine, but a book signifies a higher level of commitment, dedication, and perseverance. If writing is insane, it’s in the same boat with getting married and going to college. I doubt there are enough padded rooms and straitjackets in the world for all its college students, married couples, and writers.

New SPL Features in PHP 5.3

Note: I’ve written on this topic before, but thought the subject warranted further more detailed discussion and a more comprehensive and up-to-date set of benchmarks. Hence, this post and this presentation. Enjoy.

The SPL, or Standard PHP Library, is an often overlooked extension in the PHP core. It first came on the scene in PHP 5 and a variety of iterators constituted the majority of its initial offerings. Though the iterator offerings were expanded in PHP 5.3, the particularly interesting additions to the SPL were several specialized data structure classes, the foundational concepts for which originate in the field of computer science. In this post, I will provide an overview of these new classes and explain why and when they should be used.

Arrays

While PHP has several data types, the ones that likely see the most frequent and varied use are strings and arrays. They are the proverbial duct tape and WD-40 of PHP, respectively. Like arrays, SPL data structure classes are used to store composite (i.e. non-scalar) data.

Now, that’s not to say that every instance of an array in existing codebases should be replaced with an SPL container object. There are cases where it’s appropriate to use one over the other. Knowing the difference requires an understanding of how arrays work.

Within the C code that makes up the PHP interpreter, arrays are implemented as a data structure called a hash table or hash map. When a value contained within an array is referenced by its index, PHP uses a hashing function to convert that index into a unique hash representing the location of the corresponding value within the array.

This hash map implementation enables arrays to store an arbitrary number of elements and provide access to all of those elements simultaneously using either numeric or string keys. Arrays are extremely fast for the capabilities they provide and are an excellent general purpose data structure.

Fixed Arrays

In contrast to arrays, SplFixedArray functions more like C arrays or Java arrays than PHP arrays. The maximum number of elements that it may contain is specified upon instantiation. While it is possible to change it later via the setSize() method, this negates the performance advantages of using it: because its size is fixed, it doesn’t need to use a hashing function to resolve the position of elements within the array. It makes sense to use fixed arrays when the number of elements to be stored is known in advance and the elements only need to be accessed by sequential position.

SplFixedArray implements the Iterator, ArrayAccess, and Countable interfaces. Iterator allows it to be iterated using a foreach loop. ArrayAccess provides access to its elements using array syntax where elements are referred to using integer positions beginning at 0 as with enumerated arrays. Countable enables a list to be passed to the count() function like an array.

Aside from the inability to use it in place of arrays with array functions, instances of SplFixedArray function just like arrays for all intents and purposes. It’s even possible to convert them to and from arrays using the toArray() and fromArray() methods respectively. However, it generally makes more sense to use SplFixedArray exclusively for each individual use case.

Lists

In computer science, a list is defined as an ordered collection of values. A linked list is a data structure in which each element in the list includes a reference to one or both of the elements on either side of it within the list. The term “doubly-linked list” is used to refer to the latter case. In the SPL, this takes the form of the class SplDoublyLinkedList.

Like SplFixedArray, SplDoublyLinkedList also implements the Iterator, ArrayAccess, and Countable interfaces. In addition to the methods that come with these interface implementations, elements can be added to or removed from the start or end of the list using its push(), pop(), shift() and unshift() methods, which correspond to the array_push(), array_pop(), array_shift(), and array_unshift() functions respectively. Unfortunately, as of PHP 5.3.2, there’s no way to insert an element anywhere in the list other than at the beginning or the end. A feature request has been filed for this. Add a comment or vote to show support for its addition.

The elements at the start and end of the list are accessible via its top() and bottom() methods respectively, which correspond to the reset() and end() functions. Like SplFixedArray, elements can also be accessed arbitrarily by positional index using the array syntax granted by ArrayAccess. It makes sense to use lists when the number of elements to be stored is not known in advance and the elements only need to be accessed by sequential position.

Stacks

Stacks are similar to lists with two major differences. First, elements can only be added to the top of the stack. Second, an element can only be accessed by taking it off the top of the stack. Because of these differences, the stack is often referred to as a Last-In-First-Out or LIFO data structure. SplStack is the SPL stack implementation.

SplStack is a bit removed from the traditional definition of a stack. It extends SplDoublyLinkedList and inherits its abilities, some of which don’t really apply to stacks. In order to enforce its restriction on how elements are accessed, SplStack overrides the setIteratorMode() method of its parent class and implements its own to prevent modification of the iteration direction. Both methods allow elements to be retained or removed as they are iterated.

Use of stacks makes sense when the number of elements to be stored is not known in advance and the only element that must be accessible is the last one stored. However, as of PHP 5.3.2, the performance of SplStack leaves something to be desired. Benchmarks included later in this provide an objective illustration of this, though the cause of the behavior remains unknown.

Queues

Queues are also similar to lists, again with two major differences. First, elements can only be added (or “enqueued”) to the end of the queue. Second, an element can only be accessed by removing (or “dequeueing”) it from the beginning of the queue. For these reasons the queue is referred to as a First-In-First-Out or FIFO data structure. The SplQueue class implements this data structure in the SPL.

SplQueue follows suit with SplStack in extending SplDoublyLinkedList. Just as SplStack resultingly inherits some operations with at least questionable applicability, so too does SplQueue. Likewise, it overrides setIteratorMode() with its own version to restrict how elements are accessed. Use of queues makes sense when the number of elements to be stored is not known in advance and the only element that must be accessible is the remaining element that was stored earliest.

One minor difference between SplQueue and SplStack is that the former contains two method aliases named after conceptual queue operations: dequeue() aliases SplDoublyLinkedList::shift() and enqueue() aliases SplDoublyLinkedList::push(). This makes sense because while push() and pop() share similar applicability to conceptual stack operations, they are already present in its parent class.

Despite their common ancestry, SplQueue appears to have better performance than SplStack as of PHP 5.3.2. Benchmarks included later in this post review this in more detail.

Heaps

Up to this point, the data structures discussed have resembled lists insofar as they contain elements in the order in which they were added. By contrast, when an element is added to a heap, a comparison function is used to compare the new element to other elements already in the heap and element is placed appropriately within the heap based on that function’s return value. The beauty of heaps is that their underlying algorithm does this with minimal element comparisons, so it’s extremely efficient. Using heaps makes sense when the number of elements to be stored is not known in advance and elements must be accessed in an order based on how they compare to each other.

SplHeap is an abstract class used to create a heap by extending it and providing a comparison function in the form of its compare() method. Only the root element of a heap, the one yielding the highest comparison function return value, may be accessed or removed from the heap at any given time. This is done using the extract() method of SplHeap. SplHeap implements the Iterator and Countable interfaces but, because only the root element can be extracted, it does not implement the ArrayAccess interface like the previously discussed data structure classes.

In addition to the abstract SplHeap class, two concrete implementations are also included in the SPL, namely SplMinHeap and SplMaxHeap. The compare() method of SplMinHeap returns a value such that the smallest element in the heap is the root element. Likewise, the compare() method of SplMaxHeap returns a value such that the largest element in the heap is the root element.

At first glance, using a subclass of SplHeap may seem equivalent to calling sort() or a similar function on an array and accessing the elements in sequence. This is indeed the case if all elements are added to the array prior to it being sorted. However, situations such as elements arriving over time or inadequate memory to store all elements simultaneously may preclude this approach. Use of arrays in such situations would require repeated resorting of the entire array as new elements are added, which is inefficient. This is why using the corresponding heap class makes a lot more sense in that situation than repeated calls to sort(), min() or max(). Additionally, SplHeap can be used to implement the heapsort algorithm, which has better worst case performance versus the quicksort algorithm implementation used by arrays.

Priority Queues

Priority queues are somewhat similar to heaps. In fact, while it doesn’t extend SplHeap, SplPriorityQueue does make use of a heap structure internally to implement its functionality. The difference is that the insert() method of SplPriorityQueue queue accepts both a value and an associated priority, removing the need to use an array or object to store both of these and define an appropriate comparison function in an SplHeap instance. Elements with the highest priority, like those in SplMaxHeap with the highest value, are the ones that come out first when extract() is called. Note that elements with equal priority are returned in no particular order.

For reasons similar to those of SplHeap, SplPriorityQueue implements both Iterator and Countable interfaces and does not implement the ArrayAccess interface. Because it stores a value and priority per element, SplPriorityQueue includes a setExtractFlags() method that modifies the behavior of extract() to return the stored value, the stored priority, or an array containing both. Priorities are not bound to a particular data type: strings, integers, or even composite data types can be used. SplPriorityQueue can be extended and its compare() method overridden to customize the comparison logic.

It makes sense to use a priority queue when the number of elements to be stored is not known in advance and elements must be accessed in an order based on how a value associated with each element (versus the element value itself) compares to the same associated values of other elements.

Sets and Composite Hash Maps

SplObjectStorage combines some of the properties of two different data structures. First, it provides the same functionality of a hash table that a normal array has, but without its associated inability to use objects as keys unless the spl_object_hash() function is used. In other words, it implements a composite hash map. Second, it can be used as a set to store objects as data without a meaningful corresponding key or concept of sequential order.

Its attach() method accepts an object key and the data to associate with it and its detach() method allows data to be removed using its associated object key. To use the object as a set, simply exclude the $data parameter for attach() as it’s optional. The set operations implemented by SplObjectStorage all have array function counterparts. For example, the addAll() method and array_merge() function both correspond to the union set operation. The difference operation is available using the removeAll() method and array_diff() function and its variants. The contains() method and in_array() function both implement the element_of operation. Sadly, only arrays have an implementation of the intersection operation in the form of array_intersect() and its variants. Tobias Schlitt has a more in-depth analysis of this data structure that includes implementations of the set operations lacking in the SPL itself.

Like some of the other data structures in the SPL, SplObjectStorage implements the Iterator, Countable, and ArrayAccess interfaces. Oddly, it also implements the Traversable interface (which is limited to internally defined classes and negates the need for implementation of the Iterator interface) and the Serializable interface (and it is the only SPL data structure class to do so).

Using this class makes sense when data must be stored using composite keys or the ability to access data using set operations is more important than accessing data in a specific order.

Benchmarks

Standard disclaimer: There are lies, damned lies, and benchmarks. YMMV.

Platform

Process

Code used is located in this GitHub repository.

  1. Modify constant declarations at the top of runner.php as appropriate (50 executions per test were used to get the results below), then execute it from the command line. It will in turn execute each of the scripts in the tests directory, measuring execution time and memory usage. Results will be recorded in results/raw.csv.
  2. To generate graphs, run graphs.php. This uses the Graph component from the ezComponents library. Resulting images will be written to the results directory in PNG format.

Results

SplFixedArray - Executions Per Second SplFixedArray - Memory Code
Array
SPL
SplDoublyLinkedList - Executions Per Second SplDoublyLinkedList - Memory Code
Array
SPL
SplStack - Executions Per Second SplStack - Memory Code
Array
SPL
SplQueue - Executions Per Second SplQueue - Memory Code
Array
SPL
SplMinHeap - Executions Per Second SplMinHeap - Memory Code
Array
SPL
SplPriorityQueue - Executions Per Second SplPriorityQueue - Memory Code
Array
SPL
SplObjectStorage - Executions Per Second SplObjectStorage - Memory Code
Array
SPL

Other Data Structures

If you have an interest in other data structure implementations for PHP outside of SPL offerings, check out the bloomy PECL extension, which is an implementation of a bloom filter created by Andrei Zmievski.

“Web Scraping with PHP” Now Available!

What I’m announcing in this blog post has been in the works since early 2008 when I first pitched the idea. It was rejected by several major publishers who basically said the same thing: the idea was in too small of a niche or simply wasn’t marketable. php|architect Press respectfully disagreed with them and decided to publish what is now a book written by me that you can purchase.

It’s currently only available in PDF format due to a delay with the printer; a dead tree version should become available within the next few weeks. To my knowledge, there are plans to offer the paper and PDF bundle as has been done in the past with their other books.

Many of you reading this post probably have a personal to-do list of goals that you want to accomplish within your lifetime.  Becoming the published author of a book has been an item on my own list for some time, one that seeing this accomplishment through to its completion has helped me to cross out. I think anyone who has achieved a similar victory can relate to its significance, if only to oneself.

I do of course encourage you to purchase the book. I have no naïve notions that this will result in any substantial monetary return. Even if it did, that was not my reason for writing the book. I did it because I have knowledge that I believe is worth knowing and sharing with you. There were a number of people who contributed to this and I encourage you to read about them in the pages of the book that credit them.

It is also worth restating here that I have many family members, friends, and colleagues who helped to make this possible. There are too many to name, but I would like to thank each and every one of you from the bottom of my heart. I consider this a milestone in my life and my only hope is that it has as profound an effect on your life as it has on my own.

Leaving K-fx2

There are times in life when things don’t go to plan. You may start a new job and then, after some time in the position, come to find that it’s just not a good fit for you. Regretfully, that’s been a recent experience of mine: I’ve decided to leave my position at K-fx2. I wish my coworkers well; they have my thanks for the experiences I had in my time there. If you are a PHP developer; live within the Lafayette, Baton Rouge, or New Orleans areas; and are looking for work, consider joining them.

As for what’s next for me, I’ve accepted a position with Synacor as a Senior Engineer on their Content Management Platform team. I will be traveling to their offices in Buffalo, NY for orientation on April 26th. The team seems excited about me coming on board and I’m looking forward to meeting them in person. If you live near the area and would like to see me while I’m in town, just let me know.

Models in Zend Framework

A question that frequently comes up in my interactions with other developers about Zend Framework is how to approach designing models. There’s a small collection of resources and advice that I generally give on the subject, so I thought I’d write up a blog post to give people an easy place to access it all.

More than one way to skin a cat

First, there is no one “correct” way to design a model. If there were, the framework would probably have an actual model component. It doesn’t, and as Bill Karwin — former Project Manager for Zend Framework — has said, the reason for this because designing the model is your job. There are pros and cons to any approach. It’s all about finding a method that works for you, is appropriate for the situation at hand, and mitigates difficulty in long-term application maintenance.

The Model section of the Quick Start Guide and this blog post by Michelangelo van Dam includes examples of a Data Mapper approach. Zend_Db uses a Table Data Gateway and Row Data Gateway approach. Doctrine, a popular ORM library that is gaining traction in the ZF community, uses an Active Record approach. And available approaches don’t stop there. As a general rule of thumb when designing models and components in general, I recommend favoring composition over inheritance. You’ll get a better sense of what I mean by that later in the post.

Defining the model

The Wikipedia article on the MVC architectural pattern isn’t all-encompassing, but isn’t a bad place to start either. In particular, it drives home a few important points about the model that you should bear in mind.

The model is a “domain-specific representation of the data upon which the application operates.”

In a nutshell, the model handles data: storing it, retrieving it, filtering and validating it, and providing access to it.

The model contains “domain logic” that “adds meaning to raw data.”

In other words, it handles the conversion from raw data in a data source to semantically meaningful PHP objects and back again.

“MVC does not specifically mention the data access layer because it is understood to be underneath or encapsulated by the model.”

The shorter version: $model != $database. This is one point that trips a lot of people up. The model and your database are not congruent, synonymous, or in any way equivalent. Yes, 99% of the time, your model will use a database for its data source. However, models can be more complex than that: they can serve as clients to web services, limit access to data using an ACL, access data caching resources like memcached or APC, and so forth.

Designing the model

“So which approach do you use?”

Generally, the answer is none of the above. My personal preference is to keep data, in the form of plain old PHP arrays and objects, separate from logic to handle that data. Many common PHP tasks result in data already being present in either of these forms such as in its superglobal arrays, so it seems natural to just take it in the form in which it’s provided.

I define a model class that composes some other object to access the data I need, generally a Zend_Db_Adapter instance. Any methods of that model class return data using scalar types or classes that PHP supports natively. What’s great about this is that it’s fairly easy to convert data to and from these forms using type juggling regardless of the data’s origin.

“But wait, how can I encapsulate the data source within my model if I need a dependency like a Zend_Db_Adapter instance for it to be able to interact with that data source?”

This is another major question that people tend to ask. If any code calling your model first has to handle injecting its dependencies, that muddies up separation of concerns because calling code then must have some knowledge of how your model operates internally. This is a problem because, if the data source of the model needs to change in the future, all calling code needs to change as well versus only model code. There are a few ways to approach this problem in Zend Framework.

The first method involves storing your dependencies in Zend_Registry. Declare accessor methods in the model for dependencies that retrieve them from the registry only if they are not explicitly injected from calling code. This bypasses the need for dependency injection from application code, thus preserving separation of concerns, but still allows injection to be performed for unit testing purposes.

The second method is a variation of the first and is specific to the case of Zend_Db_Adapter instances. This approach involves setting your adapter as the default adapter to use for Zend_Db_Table instances in lieu of using the registry to store it. Note that this doesn’t require actually using Zend_Db_Table in order to work. This can be set from the bootstrap or application configuration file, then retrieved using the getDefaultAdapter() method of the Zend_Db_Table_Abstract class. Again, don’t forget your accessor methods so dependencies can be injected from unit tests.

Another method might be to use a service locator as a dependency in models. This serves as a layer of abstraction between models and their dependencies when it comes to controllers dealing with both. To my knowledge, the closest thing to an implementation of this in ZF is the Zend_Application_Bootstrap classes. From a controller, the bootstrap instance can be accessed as in the code example below.

$bootstrap = $this->getInvokeArg('bootstrap');

Where to go from here

Service layers are another frequent topic related to models. Those aside, I mainly suggest picking a simple model that you can prototype and try several approaches to see which you prefer. If you’ve got some ZF experience under your belt, I’d be interested in hearing about your own modeling approaches and experiences and encourage you to leave a comment on this post.

Ada Lovelace Day and Amazing Grace

So, if you hadn’t already heard, today is Ada Lovelace Day. If you aren’t familiar with it, it’s is an internationally observed event during which its participants use blogs, podcasts, videos, and all other forms of internet media to celebrate the achievements of women in the fields of technology and science. Read more about the event and its namesake or take a look at this timeline of major female figures in computing from its beginnings with Ada Lovelace to present day.

Many people choose a friend or colleague who’s helped them to excel in the field. I myself have a number I could name, but with this being the first year I’m participating, I chose instead to veer from the beaten path and write this blog post about someone I’ve admired since I began serious study of computer science in high school. The very first computer science course I took began with a unit on the history of the field. Among the other Big Names included in that unit was that of Rear Admiral Dr. Grace Hopper, also sometimes referred to as “Amazing Grace.” And did she ever live up to that name.

The first reason I admire Grace is because she was no stranger to failure or perseverence. When she applied to Vassar College at the age of 16, she was rejected because her test scores in Latin did not meet admission requirements. She persisted and was admitted the following year, going on to earn a bachelor’s degree in mathematics and physics from Vassar College and a Master’s degree from Yale. She would eventually return to Vassar to share her knowledge as an associate professor of mathematics.

While my own academic achievements are an understated far cry from hers, I relate to this quality because it took a large amount of persistence for me to complete my own degree, partly due to my admittedly lacking abilities in mathematics as compared to the requirements of the curriculum under which I graduated. I struggled, had to retake several classes due to not meeting grade requirements, but persevered and earned the degree that hangs on my wall today.

The second reason I admire Grace is the magnitude of her aspirations. In a time period when not all colleges in the country accepted women, women were mainly relegated to “lace-collar jobs” in the workforce, and the right to suffrage for women had not yet been won, Grace chose to pursue her education in a field that to this day is still predominantly occupied by men. Not only did she participate in the field, she excelled in it, contributing to technological breakthroughs that literally became the stuff of legend and the foundation for the technology that we enjoy today. Hers is truly a story for the history books, one of defying stereotypes and overcoming adversities of society to achieve something spectacular.

The final reason that I admire and even envy Grace is her contributions to the innovations of her era. In 1944, during her service in the US Navy Reserve, she served on the programming staff for the Harvard Mark I, the first large-scale automatic digital computer in the country, and coauthored papers on it and its two successors. In 1949, she became senior mathematician for the team that developed the UNIVAC I, the first commercially available computer in the country. The work she did between 1950 and 1980 resulted in the first compiler, an accomplishment to which many professional software developers today owe their livelihood. In the 1970s, she pioneered standards for testing computer components and systems for which administration would later be assumed by the National Institute of Standards and Technology. She was right there in the thick of the industry’s beginnings, making contributions that would echo in the decades to come.

Sadly, Grace passed away six years before I came to know the significance of her accomplishments to my future career and the technological state of the entire world. She was laid to rest with full military honors in Arlington National Cemetery on January 1, 1992. I regret never having had the chance to shake her hand and tell her in person all that you’ve read here just now. So Grace, I salute and thank you for the immense impact that your life and service have had on the planet you left behind. No matter where technology may take our race in the generations to come, I sincerely hope that they carry your memory with them.