Archive for the ‘Uncategorized’ Category.

Zend Framework and Remember The Milk

I’ve posted a few times on Twitter related to my latest project and a few people have already asked me about it, so I figured it was worth a blog post.

My first project for the Zend Framework was Zend_Service_Simpy, a service module providing a lightweight wrapper around the API for the Simpy social bookmarking service.

My latest project is another service module for the Zend Framework. This time, though, it’s for the Remember The Milk API. RTM is basically a TODO list on serious steroids. It’s the Swiss Army Knife of task management. It allows you to manage multiple lists of tasks. You can add them easily from a variety of mediums, tag them, prioritize them, set deadlines for them, have them repeat, get reminders for them, tie them to physical real world locations, and share them. RTM offers great support for integration with Google applications including Google Calendar, iGoogle, and Gmail (plus offline access powered by Google Gears). They’re also very big into supporting mobile devices, including those running on Windows Mobile as well as the iPhone.

If you like, you can check out my original proposal for this module. I can already say that the API will end up changing a little, though, but it’s good enough to give you a general idea of what the capabilities of the finished service module will be. I only actively started implementation recently and things are progressing at a fairly rapid pace. I still have unit tests and documentation to handle, but hopefully there’s a shot at seeing it moved to core within the next two releases of the framework.

Google Reader and Yahoo! Pipes

I ran into a situation recently that I thought I’d share. I use Google Reader to manage the feeds that I read regularly. PHPDeveloper.org is among my favorite news syndication web sites. However, some of its posts, in particular those dealing with job posts or additions to CakePHP’s Bakery, aren’t interesting to me.

Eventually, I came to the conclusion that I could wrap the feed in a Yahoo! Pipe in order to filter out the uninteresting information. (I know, the irony of using Google and Yahoo products together might seem anything from ironic to downright unholy to some.)

Unfortunately, doing so meant that I had to remove the original PHPDeveloper.org feed from Google Reader and add the new pipe-wrapped feed in its place. Because (as best I can tell) certain things are tracked per feed rather than per URL (old items) or per item (read statuses), this meant losing all information specific to the old feed.

Granted, I only had to do this once, but I wish it had occurred to me earlier. Google Reader may have search capability (which took forever to be included), but that’s not the same as being able to have content filtering automatically handled for me whenever I view the contents of a feed.

So my line of thought continued. It would be nice if there was an easy way to maintain the user experience of adding feeds through my preferred browser, Mozilla Firefox, but to have new feeds be automatically wrapped in a Yahoo! Pipe “behind the scenes.” This would allow me to go back and manipulate feed content later if I saw a repeating pattern in specific content that didn’t interest me.

Another unfortunate trait of this situation is that Yahoo! Pipes doesn’t currently offer a web service API, or it might make implementing my idea significantly easier. While the AJAX interface exposes server interaction logic, it’s obfuscated to the point where it makes reverse engineering attempts infeasible. It’s unfortunate, because I think a marriage of the features of each of these services would make the result all the more useful for their users.

More Oracle and Java Woes

Today I continued the trek toward completing the project described in my last entry. Though I don’t think I ran into as many issues today as I did in the past week or so of working on the project, today certainly had it’s fair share.

First up was a rather interesting exception being thrown by a JDBC operation, namely “java.sql.SQLException: SQL string is not Query.” This is apparently intended to be JDBC’s way of explaining that PreparedStatement.executeQuery() doesn’t work for DML operations. To execute one of those, you have to use either execute() or executeUpdate(). Thankfully, a forum thread was able to point me in the right direction on that one.

Next on the list, if Oracle JDeveloper 10.1.3.3.0 tells you “The WAR file is already up to date,” don’t believe it! I don’t know what logic it’s using to decide whether or not the class files constituting a WAR file are out-of-date, but there are definitely some cases where it’s flawed. I spent a better part of the morning trying to figure out why everything from undeploying and redeploying the EAR file to bouncing the OAS installation was still giving me illogical output. Come to find out, I didn’t know the WAR file not being updated was relevant to the problem at the time, but it certainly proved to be in the end! Tried searching for a bug report on this, but came up empty, so maybe it’s just me.

Last but not least, I take issue with the language used in the mod_plsql User’s Guide to describe its process of file upload handling. Though it never explicitly states this, it seems to imply that the internal handling of performing an INSERT operation to place data for an uploaded file into the document table takes place in a separate transaction from that of the action procedure that gets executed afterward.

You have to go to the PL/SQL User’s Guide to read why this is not the case. To sum it up, a transaction can span multiple procedures. A procedure being executed as a data cartridge operates within a transaction that is implicitly committed when that procedure terminates so long as no uncaught exceptions are raised. However, until that point, the effects of any DML operations executed are only visible to the procedure. This includes the INSERT procedure performed by mod_plsql on the document table. What this effectively means is that the only way something other than the procedure can see that the inserted record exists unless the procedure does an explicit COMMIT.

If you read my last post, you know that I was calling a servlet from the data cartridge. You can probably imagine the amount of aggravation this caused me when I ran my servlet locally without issue, had to backtrack to figure out where the servlet was failing when it was deployed, and then found out that a single COMMIT statement at the beginning of my data cartridge procedure made things work as expected. So, yay for lacking Oracle documentation.

I did get the servlet working, though. It can now pull data from the database, convert it from Excel binary to CSV format, and put the converted data back into the database. So, the Clean Content API, while not specifically designed for the purpose for which I’m using it, is at least a somewhat capable solution. That basically sums up my day, folks. I’ll be back on Monday to do it again.

Extracting Data from Excel with Oracle Clean Content

I got assigned an interesting project at work recently. It involved receiving a file upload via PL/SQL. This in itself is relatively trivial and easy to accomplish when running data cartridges on Oracle Application Server via mod_plsql. What was less unremarkable about the nature of the task was that the uploaded file was intended to be a Microsoft Excel binary file containing a single worksheet. Unfortunately, PL/SQL isn’t so divergent in its available native packages that it has readily available functionality to easily handle this situation.

Luckily, my boss had recently visited the annual Oracle OpenWorld conference and while there learned of a new technology of theirs that could help: the Outside In Clean Content API. I’m uncertain as to whether this product came under Oracle’s branding as the result of a merger, buyout, partnership, or what have you. After poking around the net, I saw that it has thus far received very little coverage, presumably because it was a relatively new release.

Clean Content’s primary purpose is to “identify and remove sensitive, confidential or proprietary metadata and hidden information from Microsoft Office documents.” Of course, to be able to accomplish this, it needs to be capable of extracting said data from these document formats. As a side feature of sorts, they expose this functionality in their API, which is available in the form of C++, C#, and Java libraries.

Originally, when I began work on the project, the requirements stated that the uploaded file would be in CSV format. The format requirement was changed later, after I had developed a prototype capable of handling a CSV file. To adapt my existing work to this new requirement, I developed a Java servlet to supplement it, which the data cartridge would call using UTL_HTTP.REQUEST.

This servlet received the name of an uploaded Excel file, used JDBC to pull the binary data from the local database, used the Clean Content API to convert it to CSV format, and used JDBC again to put the converted data back into the same table. It didn’t end up amounting to much in the way of LOC, but it did require some learning on my part.

First off, the Clean Content API is structured in a SAX-like fashion. The best resources to learn it are actually both included in the free download: the Developer’s Guide and the JavaDoc API documentation. Examples in the former show how to restrict the API to analysis only (i.e. not modifying the document data), provide in-memory data to the API (via a ByteBuffer), and how to specify a handler class to intercept events. You may have to peruse several examples to find all this out, but it’s all there if you take the time to read through it (and selectively skip all the parts having to do with document manipulation).

Your handler class has to extend the BaseElementHandler or GenericElementHandler class in the API. I recommend the latter during development, as its start() method can help in the debugging process by showing you what data is being extracted.

The startTextCell() method will indicate when the parser is within a spreadsheet cell containing textual data. However, the TextCellElement it receives contains only coordinate information, not the value of the cell. (Quick note: the coordinate system is 0-based, meaning that the coordinates of the first cell of the spreadsheet are 0, 0.) To actually capture the text, you have to use the text() method. This is a little confusing, but the reason is that it’s possible to encounter textual metadata outside of the spreadsheet cells. A simple class flag property can be used so you know when you are or aren’t within spreadsheet cells when this event occurs.

The startDataCell() method indicates when numeric data is encountered. Something worth mentioning here is that the Excel binary format houses dates as integers. To convert such a number back to its equivalent date, take the date 1/1/1900 and add that number of days to it using GregorianCalendar.add(). An example of this is 39,085, which corresponds to 1/3/2007. You can further format this further by passing the return value of GregorianCalendar.getTime() to SimpleDateFormat.format().

One oddity I ran into during development that was unrelated to the Clean Content API was with the JDBC library. I executed a SELECT query, got back a ResultSet object, and then attempted to call ResultSet.getBytes() to place the value of a BLOB column into a byte array. This was so I could pass that to ByteBuffer.wrap() to be used with the Clean Content API later. However, the returned byte array always came back severely truncated judging by its length and the fact that the Clean Content API could not determine the document type based on it. I wasn’t able to get around to examining the content byte by byte to determine the cause of this, but I did find a solution: ResultSet.getBlob() returns a Blob object and Blob.getBytes() returns the needed (complete) byte array. Apparently Oracle condones this method of obtaining the value, so rather than beat myself up trying to figure out the weirdness that is this situation, I followed the well-beaten path.

Beyond troubleshooting these oddities, along with relearning how to write servlets and learning how to test them in Oracle JDeveloper and deploy them using Oracle Enterprise Manager (and running into this issue in the process), the process of implementing these project requirements was pretty straightforward. Hope my learning experiences end up helping someone else out there. I’m sure there are other existing solutions that could have been applied here, but if nothing else, it showed that there’s more than one way to skin this cat.

Web Scraping Article Published

Just a quick post to announce (albeit a little late) the December 2007 issue of php|architect, which includes my article on web scraping. Please buy a copy, give it a read, and feel free to post comments on the forum thread for the article. I’d love to hear some reader feedback!

You may noticed that I’ve added a new page for publications. This will become the home for any content I produce that gains any sort of recognition, be it a podcast, article, book review, presentation slides, or what have you. Anytime anything new goes there, I’ll try to make a point to write a post about it.

The Acme of Skill

OK, I know I promised a post on how NULL in Oracle scares me, but I think I’ll save that for another day. For the moment, I’ve had something else on my mind recently. Someone I know is apparently of the opinion that PHP is “on the way out.” I have to vehemently disagree with this, and not just because PHP is my language of preference.

For starters, there are major corporations that are actively using PHP. Yahoo, current employer of Sara Golemon, is a great example. Facebook, a social networking site whose advertising program threatens Google Adsense enough that they created the Open Social initiative and brought in other companies in order to compete, is another.

While usage of PHP took a slight dip about two years ago, probably due in part to hype growing popularity and advances in other technologies like .NET and Ruby as well as the low adoption rate of PHP 5, its use is back on the rise. The performance improvements and addition of new OOP features are only making PHP a better, more well-rounded solution for the enterprise.

Major corporations are finally getting away from fearing competition from the open source community and are starting to embrace it. Oracle has collaborated with Zend to enable them to produce Zend Core for Oracle. The Oracle Technology Network web site has a dedicated section for PHP Developers as well as a manual and a cookbook. Oracle develops the OCI8 PHP PDO driver and have made fairly recent updates to it to support database resident connection pooling, fast application notification, and other notable 11g features. (Check out Chris Jones’ blog for more info on that project.) Oracle is also beginning to release a substantial number of projects, particularly developer tools, as open source software.

Oracle isn’t the only one, either. Microsoft has even started getting into the game. A FastCGI add-on is now available for IIS 6, Microsoft’s web server. FastCGI is frequently used when Apache is not being run or running PHP as an Apache module is not an option for other reasons, such as shared hosting services that want to support both PHP 4 and PHP 5 on the same machine, so this is quite the boon for Microsoft shops. Microsoft is getting involved in the production of a new PHP database driver for SQL Server 2005. I can say on personal authority that multiple Microsoft representatives were present at ZendCon 2007 and made a presentation to the conference attendees on that very subject.

So this all seriously begs this question: why are these corporations, some of which have been specifically shown to be opposed to open source, now trying to play nice? Keep your friends close and your enemies closer, anyone? If you can’t beat them, join them? OK, enough cliche anecdotes. I think I’ve made my point here. PHP isn’t going anywhere and it’s certainly not “on the way out.” These companies are putting a significant time and energy into supporting integration with their products by open source software and I don’t think they’d make that investment if PHP’s overall outlook was limited to the short-term.

I don’t believe it was ever the specific intention of the open source movement to compete with large companies and their proprietary products, but merely to fill a gap in software needs perceived by the consumer. As such, I find a particular quote by Sun Tzu, author of “The Art of War,” to be appropriate here: “For to win one hundred victories in one hundred battles is not the acme of skill. To subdue the enemy without fighting is the acme of skill.”

Oracle Gotchas

Let me start by saying that, as a relational database management system, I like Oracle. It’s full-featured, mature, and very scalable. However, there are a few small areas where I have to wonder what the developers were thinking. I’ve been working with Oracle for just over a year now. While I’m no self-proclaimed authority on the subject, I thought I’d document my thoughts and see if anyone else shared my mind or had found annoyances in other areas. There may be good design decisions behind these situations, but from a usability perspective, they’re blemishes that make Oracle slightly less than pristine.

Different rules for SQL and PL/SQL when using duplicate placeholders in dynamic SQL

Why is this? You don’t have to specify variable bindings multiple times for the same variable in PL/SQL, but you do in SQL. Why not just remove duplication from both cases?

SQL doesn’t support functions returning BOOLEAN values

See the Note under the RETURN Clause section on the CREATE FUNCTION page. (A rather obscure place to make the note, I might add.) Granted, this is an easily remedied situation by using wrapper functions to return equivalent values using other data types, but it shouldn’t have to be remedied at all. It discourages good practices in selecting return data types for functions and, by proxy, needless complexity. Boolean values can obviously be conceptually understood or we wouldn’t have conditional expressions. Would it really be so hard to support implicit or explicit use of boolean return values as well?

PL/SQL doesn’t support the DECODE function

The DECODE function is essentially the functional equivalent of a CASE expression. Granted, CASE statements are more easily readable, but at the cost of brevity. There are instances where use of a full-blown CASE statement seems unwarranted and DECODE would be a perfect fit, but can’t be used. It makes the claim of “tight integration with SQL” seem a little less accurate.

Rules for handling NULL are F.U.B.A.R.

This one probably deserves a whole post by itself, which I’ll probably get around to writing in the near future. Suffice it to say, if you’ve worked with Oracle before, you probably know what I’m talking about.

No Exclusive OR Operator

See for yourself. The basic operators (AND, OR, and NOT) are there. NAND and NOR are derived easily enough by simply using NOT to negate AND and OR respectively. Lack of an XOR operator requires either writing a userland function to do the job or replicating at least one condition in your SQL query at least once. There are other options like using the bitand function, but they lack the level of readability that an XOR expression would have. I honestly can’t imagine that it would be so difficult to implement, either. The only difference between it and the existing OR operator is that the XOR operator returns false instead of true when both operands evaluate to true.

No LIMIT clause support

Several other databases including MySQL, PostgreSQL, and SQLite have it, but commercial databases like Oracle, Microsoft SQL Server, and IBM DB2 don’t. It’s hard to understand why, because it’s really not so complex a feature. It can even optimize runtime in cases where the desired subset starts at the beginning of the result set. Granted, the same effect can be accomplished for each of these, but the syntax is convoluted and unintuitive. For Oracle in particular, a subquery and use of a dynamically determined pseudocolumn ROWNUM is required. If the original query has a GROUP BY clause, then two subquery layers are required. Whether the underlying engine would process it the same way or not, the simple addition of this clause would make the feature easier and more intuitive to implement for developers.

Wrap Up

Despite these small deficiencies, Oracle is an excellent database. The recent additions of Oracle XE and Oracle SQL Developer also make it very easy to get started as a developer. I recommend the SQL Reference and PL/SQL Reference as beginning references. See the Oracle Technology Network for further references.

Article for php|architect

One of the things that has kept me away from my blog for the past few weeks is an article I’ve been working on for php|architect magazine. It should be included in the December 2007 issue and is entitled “Web Scraping.” So, if the topic interests you, keep an eye out for it. If you aren’t sure if the topic interests you, you can check out my episode on the Zend Developer Zone PHP Abstract podcast for a brief high-level description. I’ll probably post about this again once the issue comes out, but I thought I’d give a heads up to anyone out there that might buy issues of the magazine on an issue-by-issue basis.

PHP Abstract Episode 22: Screen Scraping

Check out the latest PHP Abstract podcast (episode 22) from Dev Zone. I’m the guest speaker! The podcast is on web scraping, a practice in which I have (unfortunately) become somewhat proficient. Leave a comment on Dev Zone or on this entry and let me know what you think!

There and Back Again – A Conference Tale

Every time I’ve tried to sit down to write this blog entry over the past few days, I always find it difficult to summarize the events on which it’s based. The time period had its up and down points, but overall, I think it was an excellent first conference experience.

I was finally able to meet a number of people I’d been speaking with for over a year in the #phpc channel on the Freenode network. Here they are, in no particular order: Cal Evans, Ligaya Turmelle, Elizabeth Smith, Elizabeth Naramore, Ben Ramsey, Brian DeShong, Derick Rethans, Mike Lively, Josh Eichorn, Patrick Reilly, Sara Golemon, Chris Cornutt, Jay Pipes, Sebastian Bergmann, Maggie Nelson, and Curt Zirzow. (And if I left anyone out, just let me know and I’ll add you to the list. Chalk it up to the alcohol.)

I also had the pleasure of meeting several of the speakers that I hadn’t spoken with at length at any point before the conference: Keith Casey, Wez Furlong, Chris Shiflett, Ilia Alshanetsky, Marcus B?rger, Christopher Jones, and Terry Chay.

And last, but certainly not least, I made a few new friends. I believe some of them will soon join the ranks of the #phpc regulars. Among them are Christian Flickinger, Jeff Sica, Michelangelo van Dam, and Jonathan Peck. I look forward to future conversations with them and, with any luck, to seeing them at future conferences.

I think that about covers the people, so onto the talks.

Monday

  • Marcus B?rger, Sara Golemon, and Wez Furlong gave an excellent tutorial entitled “Extending and Embedding PHP.” I think most of the second half was over my head, but I did learn a lot and do plan to put it to use. I’ve already got an idea for a project to start me out: wrapping the libircclient library in a PECL extension. Just have to read over the examples provided during the tutorial and brush up on my C.

Tuesday

  • Terry Chay’s talk entitled “The Internet is an Ogre: Finding Art in the Software Architecture” presented interesting viewpoints and was quite entertaining in the process. How many F-bombs it contained is a topic still up for debate, but regardless there was plenty of blood, sweat, and swear for all to enjoy.
  • Ben Ramsey gave an excellent talk entitled “Give Your Site a Boost with Memcached” and came across as a very knowledgeable and capable speaker. I look forward to hearing his future talks.
  • Ilia Alshanetsky gave a very informative talk entitled “State of PHP Security” on new measures being taken to ensure security in future PHP releases. I had a brief discussion with him afterward about the retention of the open_basedir configuration directive in PHP 6 and was pleasantly surprised to find that our opinions of the matter appear to be in agreement.
  • Maggie Nelson gave a nice iChat-style (read teleconference) Unconference talk entitled “You Don’t Need a DBA.” I think I knew more of the material beforehand than not, but it was a very well-presented talk and I was glad to have a little interaction with Maggie being as I hadn’t had a chance to meet her “in person” before that point.

I unfortunately had to miss the talks in the 4-5 PM time slot, including Elizabeth Smith’s Unconference talk on Building PHP on Windows (thanks to her and Jeff Sica for recording it so I could listen to it later), in order to take my ZCE exam. Thankfully, though, I passed and now hold the distinction of being the first and only Zend Certified Engineer in the state of Louisiana. (It didn’t really come as a big surprise to me, Louisiana being as technologically progressive as it is, but still… how often do you get to make a claim like that?) If it’s any consolation to Elizabeth, I think I’ve gotten through the part of her tutorial on setting up a build environment on my laptop’s Windows XP install and just need to get around to trying to actually build with it.

Wednesday

  • Elizabeth Smith gave an awesome Unconference talk entitled “PHP on the Desktop” using PHP-GTK. She also made mention of a few other GUI library bindings such as Qt, Winbinder, and her own WinUi project. It gathered quite a crowd and I hope she considers submitting a conference paper for it next year.
  • Shahar Evron presented a talk entitled “Content Indexing with Zend_Search_Lucene.” I was a bit disappointed with this talk, as I was expecting more advanced concepts than those I was already familiar with. I’ll be interested to see if a daemon-style solution is ever developed that is built on Zend_Search_Lucene.
  • Joel Spolsky gave a keynote speech entitled “Great Software.” While I’m not certain that I agree with his views, I must commend him for giving an entertaining presentation that seemingly managed to poke fun at every large corporation with a presence in the room.
  • Joe Stagner gave a talk entitled “PHP Diversity – PHP Applications in a Heterogenous IT Environment.” For some strange reason, my mind is coming up blank for this talk. I’ll probably listen to the audio recording on DevZone when it’s posted and say, “Oh yeah… !”
  • Chris Shiflett presented a talk entitled “Security 2.0.” It was a good in-depth view of some security concepts that I already have a vague familiarity with. And he included mention of the Little Bobby Tables strip from xkcd. What’s not to like?!
  • Christopher Jones gave an informative talk entitled “Performance Tuning for PHP with Oracle Databases” where he reviewed various approaches to minimizing client-server communications and gave good PHP-specific examples. I’m hoping Chris will at some point become available to speak at the Baton Rouge Oracle User’s Group. Hopefully he’ll have slides from his talk up on his web site shortly.

Following this was the Yahoo Night Club event, where I apparently made for a good show with my mad karaoke skills. While I’m grateful to Yahoo for sponsoring the event (in particular the open bar, which I unfortunately took more advantage of than I probably should have), I think would have preferred more karaoke to the comedian and magician. Oh well, such is life.

After things appeared to die down there, I somehow managed to wander over to the Knuckles bar despite my inebriated state. At some point, I apparently passed out at the bar. Special thanks to the friends that I ensured I got back to my room safely and apologies to anyone inconvenienced by mishaps resulting from my alcohol intake level.

Thursday

I was greeted the next morning with what Jeff Sica termed “the hangover of the Gods”, but surprisingly managed to make it through the day without any deja vu from the morning after my bachelor party.

  • Cory Doctorow gave an excellent keynote entitled “Stay Free! How Open Source Affects Culture.” He lived up to his reputation as a proponent of the open source movement and came across as a very intelligent and energetic speaker.
  • Hank Janssen and John Bocharov gave a nice presentation detailing the efforts that Microsoft has been making to ease the difficulties involved in developing in PHP on Windows. Watching Elizabeth Smith heckle their API examples was especially fun. I look forward to seeing what develops.
  • And last, but certainly not least, David Sklar gave an excellent case study-style presentation on API design based on his experiences at Ning. I wasn’t aware (though, after some thought, not really surprised either) that their API uses Atom.

I actually helped to develop the Twitter page on DevZone, and though some difficulties with Twitter made it less functional than we’d hoped, doing the work and collaborating with Cal Evans and Matthew Weier O’Phinney was a great experience. And hey, I even made it into a shot in the closing update on the ZendCon web site.

I’d like to spend a special shout-out and thank you to Cal Evans. ZendCon would not have been the monumental success that it was without his immense efforts and dedication before, during, and after the conference. And even in the midst of it, he was still around to greet everyone and introduce people to each other. ZendCon 2007 was an experience that I will carry with me always, and for that I owe him thanks.

I’m already being encouraged to look at attending other conferences in the future. Depending on the availability of funds and vacation time, I may try to go to the MySQL Conference next April in Santa Clara. Depending on how things go between now and then, I may even submit a conference paper on a use case study of MySQL by a non-profit organization based on my past experiences with the Acadiana Educational Endowment. Who knows? I haven’t even fully recuperated from ZendCon and already I’m thinking about doing the conference thing again; does that make me a masochist? Time will tell, I suppose.