Archive for the ‘Uncategorized’ Category.

EAV Modeling – Square Peg in a Round Hole?

So I got the June 2008 issue of php|architect (or volume 7 issue 6 for those of you who track it that way) in recently. Right off, I found the cover article on EAV modeling to be of interest seeing as my current employment is in the medical IT industry and I’d never heard of this technique for storing data. I actually more or less knew what it was, but had never put a name to the face so to speak.

The mental image that came to me when reading about this approach to data modeling was taking the traditional relational table and turning it on its head. Despite what the Wikipedia article on the topic might tout early on, there are disadvantages to using the EAV approach. EAV actually has to circumvent, work around, or reimplement features that most mainstream database servers today provide “for free” to the traditional relational counterparts of EAV in order to get equal functionality. These include native data type validation and data type-specific operations without explicit typecasting (if you’re not separating EAV values by data type), row-level referential integrity, and schema metadata. EAV also adds a dimension of complexity to query construction in an era where storage is becoming cheaper and database technologies are evolving. It may work, but I don’t foresee it scaling very well for larger systems. In short, it seems an attempt to force a square peg into the round hole that are traditional relational database systems.

In a MySQL world, there are alternative approaches for deploying DDL modifications. One is to implement master-slave replication to propagate DDL modifications and load balancing to maintain uptime as changes are propagated from server to server. Another is to use MySQL Proxy to direct queries to servers hosting unmodified schemas and queue DML operations in the binary log while DDL modifications are made. Once DDL is complete, the server goes into “read only” mode while queued DML operations are applied and incoming DML operations received during that time are blocked until the queue is empty. (This may be a potential point of improvement.)

Outside of MySQL, there are document-focused database systems such as Apache Lucene and its current .NET and PHP ports as well as Apache CouchDB. While some of these are still a little early in development, I see them as being more ideal for applications demanding more fluid data storage and hope that the development of similar solutions continues.

Output Filters in Zend_View

A feature of Zend Framework MVC that isn’t currently very well documented is output filters. They’re mentioned in passing in the Zend_View documentation, but not reviewed in detail anywhere in the Reference Guide as of version 1.5.2. I was curious enough about how to implement markup minification that I decided to trace through the Zend_View source code in attempt to discern how output filters actually worked. As it turns out, it’s actually pretty simple.

First, you need to get a reference to the current Zend_View instance. If you’re using the Zend_Layout MVC integration, you can get this by calling $this->_helper->layout->getLayoutInstance within your Zend_Controller_Action class to get the current Zend_Layout instance and then getView on that to get your Zend_View instance. Otherwise, the Zend_View instance is available via the view property of Zend_Controller_Action instance.

Next, call addFilterPath or setFilterPath on your Zend_View instance from your Zend_Controller_Action class. Pass in a path to the directory to contain your output filter classes and a naming prefix that all of your output filter classes will use. I’m not sure why the class prefix defaults to “Zend_View_Filter_” since no such classes exist. In my opinion, it would have made more sense to derive the prefix based on the provided directory path. Anyway, create the directory you’ve specified if it doesn’t already exist and create a new class file within that directory. In my case, I named the directory Vendor/View/Filter, the file Minify.php, and the class contained in the file Vendor_View_Filter_Minify.

Within this class, you must implement at least one method, filter. This method should accept a single parameter, which will be a string containing the view ouput to be filtered, and should return the filtered version of that string. Optionally, if your filter requires access to the related Zend_View instance, you can also declare a setView method that accepts the Zend_View instance as its only parameter and it will automatically be passed in when your output filter class is instantiated. Within setView, you can store the Zend_View instance in an instance property of the output filter class so it can be referred to later in the filter method.

Once you’ve finished your output filter class, you need to explicitly add it to the output filters in use from your Zend_Controller_Action class. You can use addFilter or setFilter for this. Pass in the name of your output filter class without the class prefix. In my case, I passed in “Minify.” At this point, the filter should be used when rendering your page. I poked around in the DOM and Tidy PHP extension documentation, but couldn’t find a feature for markup minification, so I ended up using the PCRE extension to do the job. Below is the final source code for my output filter class.

Vendor/View/Filter/Minify.php

class Vendor_View_Filter_Minify
{
    public function filter($string)
    {
        return preg_replace(
            array('/>\s+/', '/\s+</', '/[\x0A\x0D]+/'),
            array('>', '<', ' '),
            $string
        );
    }
}

Vendor/Controller.php

class Vendor_Controller extends Zend_Controller_Action
{
    public init()
    {
        $this->_helper->layout->getLayoutInstance()->getView()
            ->addFilterPath('Vendor/View/Filter', 'Vendor_View_Filter_')
            ->addFilter('Minify');
    }
}

Oracle XE 10gR2 on Kubuntu 64-bit

So I started poking around for instructions on installing Oracle XE 10gR2 on my Kubuntu Hardy 64-bit installation recently. I came across this article from Oracle, which seemed like exactly what I wanted. Unfortunately, it assumes that the intended host operating system is 32-bit, which causes issues if you try to install XE through apt as the Oracle article suggests. After following these instructions, I immediately received this error:

W: Failed to fetch http://oss.oracle.com/debian/dists/unstable/Release \ Unable to find expected entry main/binary-amd64/Packages in \ Meta-index file (malformed Release file?)

After that, any apt command issued (related or not) produced this error.

E: The package oracle-xe needs to be reinstalled, but I can't find an \ archive for it.

And finally, when I resorted to using this excellent guide instead, I ran into this problem because of the earlier failed installation.

dpkg: regarding oracle-xe-universal_10.2.0.1-1.1_i386.deb containing \ oracle-xe-universal: oracle-xe-universal conflicts with oracle-xe oracle-xe (version 10.2.0.1-1.0) is present and broken due to failed \ removal or installation.dpkg: error processing oracle-xe-universal_10.2.0.1-1.1_i386.deb \ (--install): conflicting packages - not installing oracle-xe-universalErrors were encountered while processing: oracle-xe-universal_10.2.0.1-1.1_i386.deb

Luckily, I was able to find a solution to purge the failed installation from the system.

dpkg --remove --force-remove-reinstreq oracle-xe-universal

One thing that the Oracle article was useful for was creating a swap file large enough for Oracle to use, as the issue described in the article with not having enough swap space did arise when I initially tried the installation. Beyond that, the non-Oracle guide mentioned earlier worked like a charm and I now have a working XE installation on my system.

DomQuery

Ever since I started working with the jQuery JavaScript library, I’ve loved it. It offers the power to do a lot with only a little code and makes features offered by the JavaScript DOM implementation much easier to access. My interest in web scraping prompted me to consider creating an equivalent of sorts for PHP.

This obviously doesn’t include some features specific to the client-side or any that require evaluating CSS, but it does include many for extracting data from a valid XML or HTML document. I’ve posted my initial work on the concept in an GitHub repository. The code there is commented with docblocks and includes unit tests with over 99% code coverage. Comments and suggestions are welcome.

PHP 5.2 and SQL Server 2000 on Windows XP

Recently at work, I was tasked with finding a method to retrieve data from a third-party SQL Server 2000-based system into our own MySQL-based application. It’s worth noting that both system are behind their own firewalls and I was trying to bear security in mind as I did this.

First, I had to actually get into the system with the SQL Server database. For this, I used Citrix GoToAssist. The company behind the software makes their money off of hosting the servers it uses and selling access to them as a service. The client software uses HTTPS for security and to get around firewall configurations. While their software is nice, I’d be interested to see if anyone knows of any OSS equivalents of it.

Once I had access to the system, I needed something akin to phpMyAdmin or Oracle SQL Developer in order to see what resources were available on the SQL Server itself. For that, I found a local copy of EMS SQL Manager already on the system. This wasn’t the most full-fledged product I’d seen for this type of purpose, but it was free, included everything I needed, and did the job in a pinch.

After doing some reconnaisance on the database structure using the information that had been given to me by the third-party vendor, I downloaded the ZIP archive containing the standard build of PHP 5.2 for Windows. Side note: one of the things I like about PHP on Windows is that, for CLI purposes, it only takes decompressing the ZIP archive, tweaking the configuration file to get the settings and extensions you want, and executing the binary to get it up and running.

With my SQL Server experience being dated by about six years, I started throwing PHP code at the system to see what would stick. I noticed that the server was already set up to accept trusted connections, and being that I was running the script on the local system, this made it likely that authentication wouldn’t present any issues, or so I thought.

I created my PDO instance like so:

$db = new PDO('mssql:localhost;dbname=...');

And then attempted to call a particular stored procedure that I’d been told would have some of the data I was looking to extract. I was surprised to get this in response when calling PDO::errorInfo().

Array
(
[0] => HY000
[1] => 10007
[2] => Could not find stored procedure ... [10007] (severity 5)
[EXEC ...]
[3] => -1
[4] => 5
)

Now I had just been in EMS SQL Manager and seen the stored procedure myself, so I knew it was there. I tried using the sa account, but that didn’t seem to work either. After some digging, I found that I had to create an account for the current Windows user on the current network domain in order to make PHP capable of seeing the stored procedure when using a trusted connection. Once I’d created the account and given it access to execute the specific stored procedure I was trying to call, I tried again and PDO::errorInfo then gave me this.

Array
(
[0] => HY000
[1] => 10007
[2] => Unicode data in a Unicode-only collation or ntext data
cannot be sent to clients using DB-Library (such as ISQL) or
ODBC version 3.7 or earlier. [10007] (severity 5) [EXEC ...]
[3] => -1
[4] => 5
)

Apparently there are issues with the MS SQL stand-alone PHP extension and SQL Servers using unicode collation. Some more digging turned up that the only way to get around this was to use ODBC. Once I did that, I found that I was no longer getting an error when trying to call the stored procedure.

$db = new PDO('odbc:Driver={SQL Server};Server=localhost;
Database=...;Trusted_Connection=yes;');

Though I knew of stored procedures conceptually, my previous experience with SQL Server had never included using them. As such, I wasn’t familiar with the syntax and came across something rather strange while trying to troubleshoot it: if the arguments to a stored procedure are surrounded by parentheses in the  query calling that stored procedure, the code silently fails and returns no results.

// $stmt = $db->prepare('EXEC ...(?, ?)'); // Fails silently
$stmt = $db->prepare('EXEC ... ?, ?'); // Works as expected

Last on the list of “interesting things I experienced” while on this little trek wasn’t related to the database, but to what was happening when I attempted to push data from the SQL Server machine to our LAMP machine. I was using the PHP serialize() function to encode the data into a string, then using the streams HTTPS wrapper to send it via a POST request to a PHP file on the LAMP machine. This meant I wouldn’t have to poke a hole in the firewall on either side since both were already open to web traffic. When attempting to run the code for this, I got an error on the client side:

PHP Warning:  file_get_contents(): SSL: An existing connection 
was forcibly closed by the remote host.
PHP Warning: file_get_contents(https://...): failed to open
stream: HTTP request failed!

The cause for this wasn’t immediately apparent. Upon checking Apache error logs, I came across this:

PHP Fatal error:  Allowed memory size of 16777216 bytes 
exhausted (tried to allocate 3381676 bytes)

The actual issue had to do with how much data I was trying to send; it was exceeding the value specified in the memory_limit configuration setting, which as it turned out was set to the recommended value for a significantly older version of PHP. After updating it to a more current recommended value and bouncing Apache, all was well.

Hope the details of my experiences prove helpful to someone else. Thanks to those who provided assistance along the way.

Phergie on C7Y Yet Again

The guys at C7Y really seem to like Phergie. The articles about the Phergie project on C7Y got a brief shout-out in the P3 Podcast for 4/17/08. Thanks for the plug guys! The rest of the podcast was really awesome too. I definitely recommend checking it out.

Phergie on C7Y Again

Part two of the two-part article I wrote for C7Y on experiences gleaned from developing the PHP 5 IRC bot Phergie has been posted. Feel free to leave comments in the article’s forum.

Interesting Bug in the HTTP Streams Wrapper

Streams are quite possibly one of the coolest things about PHP. They’re a feature of the core and allow you to do some basic things that might otherwise require a separate extension, which may or may not be available if you’re in a shared hosting environment. Among these things is acting as an HTTP client, which you can do using the HTTP streams wrapper. See Example #1 on that page for a code sample showing how to submit a POST request.

I wrote a small script a while back that’s gained a surprising amount of popularity thanks to a plug from the site that it posts to. The current incarnation of the script uses the cURL extension to send a POST request to paste2.org, the response from which it then parses for the URL corresponding to the code that was originally sent. When I learned that this could be done with streams, I attempted to implement it in that fashion, but ran into strange issues where I would get 404 or 500-level HTTP errors rather than the response I was expecting.

After some digging, it turns out that this is a bug in the 5.2.x branch. The issue has to do with how headers are arranged by the underlying C code. As a result, explicitly specifying a Content-Type header for the operation will result in failure. However, not explicitly specifying the Content-Type header value results in a Notice being output and the correct header value being used automatically, which coincidentally causes the operation to succeed.

The bug has been fixed in the 5.3 and 6 branches and is expected to be fixed in 5.2.6 as well. Hope this workaround proves helpful to anyone who runs into a similar issue.

Watch Your Include Path

It’s pretty rare that I encounter a bug in the software I run that hampers my ability to work or my server environment’s ability to function normally. However, I encountered one last week that has taken me and several Rackspace support technicians nearly a week to figure out, namely PHP bug #43677.

The bug is in at least PHP 5.2.5, if not in previous releases in the 5.2.x branch. For the moment, we’ve downgraded the PHP installation to RHEL version 5.1.6, which I’m told includes backports of relevant bug fixes from the 5.2.x branch, to see if that doesn’t stabilize the situation.

So, if it seems that PHP starts to “forget” your include_path setting, your issue may be with PHP and not Apache as I initially suspected since the include_path setting was being set via an Apache configuration file. Hope this saves someone else time and grief.

IE6 Compatibility Testing via Virtualization

In my last entry, I noted that I was testing my use of SWFUpload in both IE6 and IE7. You may wonder how I managed this. Google has plenty of information on “hack” methods to get both versions of IE to coexist on a single XP installation.

If you have the machine power for it, though, there is a method that is actually supported by Microsoft to accomplish this. It employs virtualization which is becoming increasingly popular in the computing world. It is useful for two particular applications: to make server environment installations independent of the host operating system and hardware and to allow multiple operating systems to coexist on the same hardware without the need for partitioning storage devices.

In 2003, Microsoft bought out a company called Connectix which specialized in virtualization software, one of their main products being Virtual PC. Microsoft subsequently released a rebranded version of VPC as a free download.

Once the need to test applications on IE6 and IE7 for cross-compatibility was realized, Microsoft also began sporadically releasing a freely downloadable series of up-to-date Windows XP images with IE6 pre-installed and expiration dates after which the images would no longer function. The latest image is set to expire in early June 2008.

With Microsoft pushing adoption of Windows Vista and IE7, it’s uncertain as to whether or not Microsoft will continue releasing these VPC images. However, it does appear that Microsoft plans to continue developing Virtual PC. Therefore, if you have an extra Windows XP license to spare, this arrangement is a nice and relatively lightweight solution to making IE6 and IE7 available for testing on the same machine.