Accessing
Web Resources
with PHP

Matthew Turland

The World Wide Web

A visualization of Facebook friendships forming a global map

Source

"Web 1.0" - 1990s

Yahoo.com circa 1999

"Web 2.0" - 2000s

Facebook circa 2011

The Semantic Web - 2010s?

Class linkages in open datasets Diagram illustrating various microformats

Building Blocks

Technologies comprising the Semantic Web

The Semantic Web is a Long Way Off

A desert road leading off into the distance

REST and SOA Adoption is Increasing

A row of mobile phones from various vendors

Web Scraping is Still Used

php|architect's Guide to Web Scraping with PHP

http://phparch.com </shameless_plug>

Humble Beginnings...

Lafayette911.org

http://lafayette911.org

... Can Lead To Great Things

Listing of addresses ordered by highest traffic incident count Pie chart showing frequency of various types of traffic incidents Source

Quality Assurance is Critical

A Nabaztag with flashing lights and a tilted ear

"Nabaztag" pronounced "na-poss-talk" or "na-boss-tug"

Got Skills?

Cat and skateboard in mid-air with the caption 'MAD SKILLZ i haz them'

HTTP

Drawing highlighting HTTP as a common protocol between browsers and servers

Know Enough To Use Client Libraries

A number and variety of hammers on a table

Also Know Low-Level Details

A microscope

Anatomy of a Request

URI Versus URL

Venn diagram indicating the relationship between URIs and URLs

Anatomy of a URL

http://user:pass@domain.com:8080/file.ext?query=&var=value

Anatomy of a Response

Status Codes

GitHub 404 page

GET Requests

Devices used to consume e-books and audiobooks

HEAD Requests

Head of Buddha sculpture

POST Requests

Captain Jean-Luc Picard from Star Trek: The Next Generation issuing his trademark phrase, "Make it so."

PUT Requests

A woman painting over wallpaper

DELETE Requests

Strongbad facing a computer screen that says 'DELETED!!'

Headers

Custom license plate that says "METADATA"

Cookies

Come to the Dark Side - WE HAVE COOKIES!

Cookies Example

Redirection

Blowfish in a sports car from the Torchwood episode "Kiss Kiss, Bang Bang"

Redirection Example

Referring URLs

Screenshot of referring sites from Google Analytics

Referring URL Example

Persistent Connections

Diagram showing the difference between multiple versus persistent connections

Persistent Connections Example

Content Caching

A calculator displaying a number

Content Caching - Initial Request

Content Caching - Later Requests

User Agents

Agent Smith and his clones from The Matrix trilogy

User Agent Example

Authentication

Hand pressed onto a biometric identification panel

Authentication - Initial Request

Authentication - Later Request

Know Your Libraries

A corner of the stacks of a library

Sockets

A socket wrench

Sockets Example

Streams

Photo of the Ghostbusters with the caption "Don't Cross the Streams. Good advice is timeless."

Streams Example

cURL

cURL project logo

cURL Example

rolling-curl Example

Source

pecl_http

PECL project logo

pecl_http Example

PEAR

PEAR project logo

PEAR Example

HTTP Tips

A clothes washer/dryer set
  1. Minimize and cache DNS lookups
    • cURL and pecl_http handle this natively
    • For other extensions, install a local DNS cache like nscd or dnsmasq if possible
    • Extend userland code to add DNS caching using gethostbyname() (ex: ZF-5144)
  2. Use batch jobs, request pooling, parallel processes, and client-side caching when available/appropriate
  3. Plan for failure

HTTP Resources

Tidy

Tidy Example

Configuration

XML Extensions

libxml Extension

libxml PHP manual section

XML/Tree Concepts

XPath

SimpleXML

SimpleXML Example

DOM

DOM Example

XMLReader

XMLReader Example

CSS Selectors

Regular Expressions

Who I Work For

Synacor Logo

Synacor Headquarters - Buffalo, NY

That's All, Folks