@elazar @elazar@phpc.social #longhornphp
Longhorn PHP Conference logo

Automating the Web

Hello!

  • My name is Matt
  • It's nice to see you
  • Thank you for coming

Forrest Gump waving

TL;DR

  • Web scraping (extracting data from web sites)
  • Programmatic interaction with web sites
  • Consuming web services / APIs
  • Acceptance testing

Forrest Gump from the film by the same name running out of a gate and down a road

There Will Be Slides

matthewturland.com/presentations

joind.in/talk/5cf41

Michael Scott in The Office repeatedly saying 'PowerPoint.'

Go Ahead, Hashtag It

  • Feel free to live-tweet/toot!
  • Hashtag: #longhornphp
  • Feel free to @ me
  • Deadpool instructing Negasonic Teenage Warhead to finish typing a tweet into her phone

The Book Was Better

The book cover for 'Web Scraping with PHP, 2nd Edition'

phparch.com

Composer

Packages: find, install, and autoload them

Composer logo

Obligatory Disclaimer

  • Some uses of this content may have unclear or lacking legality
  • This content is intended to be strictly educational, not prescriptive or legal advice
  • If in doubt, consult a lawyer (i.e. not me)

Dr. Ian Malcolm from the film Jurassic Park saying, 'Yeah, yeah, but your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should.'

Storytime

Grandpa from the film 'The Princess Bride' beginning to read

This is Lafayette

A map of Louisiana with Lafayette Parish marked in red

This is Lafayette 911

A screenshot of lafayette911.org

lafayette911.org

This is Ray

Raymond Camden

raymondcamden.com

Ray Created a Viewer

Screenshot of Ray's 911 viewer mapping incidents using Google Maps

Blog Post circa 2010

Six Months Later...

From the Simpsons -- Homer: "No way! When?" Ned Flanders: "Six months ago."

... He Had a Lot of Data

Visualizations of 911 statistics gathered by Ray

Blog Post circa 2010

Can We Recreate It?

Darkwing Duck saying 'Let's get dangerous.'

Use the Source, Luke

Luke Skywalker in Return of the Jedi saying, 'I warn you not to underestimate my power.'

Chrome / Firefox: Right-click > View Page Source

Use the Source, Luke

Into the Fire & Frames

Guitar Hero gameplay of Through the Fire and Flames

Into the Fire & Frames

A screenshot of lafayette911.org with no incidents listed

Into the Fire & Frames

Where's the Data?

Data from Star Trek: The Next Generation walking offscreen

Some Possible Sources

Learn the DevTools

  • Chrome: View > Developer > Developer Tools
  • Firefox: Tools > Browser Tools > Web Developer Tools

Luke Danes from Gilmore Girls saying, 'With my tools.'

Check for XHRs

  1. Chrome / Firefox: DevTools > Network tab
  2. Chrome: Fetch/XHR filter, Firefox: XHR filter
  3. Chrome / Firefox: click request in table

Network tab in Firefox Developer Tools showing XHRs made by lafayette911.org

Inspect the Request

Firefox Developer Tools showing XHRs made by lafayette911.org

Inspect the Request


POST /L911/Service2.svc/getTrafficIncidents HTTP/2
Host: apps.lafayettela.gov

...

Maurice Moss from the IT Crowd saying, 'I can't turn down a friend request from my mum.'

Inspect the Request

  • POST = method or operation
  • /L911/Service2.svc/getTrafficIncidents = Uniform Resource Identifier (URI)
  • HTTP/2 = client protocol and version
  • Host = header name
  • apps.lafayettela.gov = header value
  • ... = request body

Inspect the Response

Firefox Developer Tools showing an XHR response

Inspect the Response

Malcolm Reynolds from Firefly attempting to speak and failing

Inspect the Response

  • HTTP/2 = server protocol and version
  • 200 = status code
  • OK = status description
  • cache-control = header name
  • private = header value
  • {"d":"..."} = response body

HTTP

Mimic the Request

Streams / Filesystem

Mimic the Request

Streams / Filesystem

Text Fu

Programmers manipulate text the same way woodworkers shape wood. "The Pragmatic Programmer: Your Journey to Mastery, 20th Anniversary Edition" by David Thomas and Andrew Hunt

Neo from the film The Matrix saying, 'I know kung fu.'

Extract the Markup

JSON

Apu from The Simpsons saying, 'Look at the outrageous markup.'

Inspect the Markup

Handle Malformations

Ruby Rhod in the film The Fifth Element saying, 'I think mine is broken. Why I gotta get the broke one?'

Handle Malformations

  1. Install tidy extension
  2. Optionally, configure its options
  3. Parse markup
  4. Verify that malformations don't cause data loss

Glen Matthews (AKA Janitor) in the TV series Scrubs making a 'represent' gesture with the caption 'The Clean Up Crew'

Handle Malformations

Handle Malformations

Parse the Markup

DOM / libxml

Parse the Markup

Parse the Markup

For complex queries: XPath + DOMXPath

Parse the Markup

To use CSS selectors: symfony/css-selector

MDN: Tutorial, Reference

CSS Selectors

A comic strip by Julia Evans on CSS selectors

Trim the Address

String / Multibyte String

Parse the Address

PCRE

Regular Expressions

Pattern Syntax / Modifiers

Regular Expressions

xkcd comic strip entitled 'Regular Expressions

Parse the Date

Date and Time

Thank This Guy

Derick Rethans

Derick Rethans

We Did It!

Borat saying 'Great Success'

Now What?

  • Store data in JSON files or a database
  • Put it behind a web server or API
  • Add a Google Maps frontend
  • Profit!

An underpants gnome from South Park asking, 'Hey, what's phase two?!'

Lafayette Traffic

Google Maps showing Lafayette, LA with traffic incidents marked Google Maps showing Lafayette, LA with a traffic incident overlay Lafayette Traffic showing a live city traffic camera feed

Google Play / GitHub Android, Data

So What?

Gregory House from the TV series House, MD shrugging with the caption 'and?'

On JIRA, Briefly

Screenshot of a project board in JIRA

Standards

The strip of the XKCD comic entitled 'Standards'

cURL

A man participating the sport of curling

Copy as cURL

  • Chrome: DevTools > Network tab > right-click request > Copy > Copy as cURL
  • Firefox: DevTools > Network tab > right-click request > Copy Value > Copy as cURL

Chris Pratt feigning a bicep curl with a beer

Recreate Requests

  1. Install frizz925/curl-parser
  2. Provide copied cURL command to parser

Recreate Requests

Extract request data

Recreate Requests

Get request object

Send a Request

  1. Install guzzlehttp/guzzle
  2. Create the client
  3. Configure options and send the request

Using Responses

Debug Requests

  1. Install alexkart/curl-builder
  2. Download cURL

A girl blowing a hair curl out of her face

Debug Requests

Convert request to server request

Debug Requests

Convert server request to cURL command

Debug Requests

Tweak cURL command as needed

  • Add the -v flag to get more verbose output
  • If the request uses the POST method and has no body, add -X flag with POST as its argument

Debug Requests

Run the cURL command

Repeat Requests

A repeater from the game Plants v. Zombies

Repeat Requests

  • Chrome: DevTools > Network tab > right-click in request pane > Save all as HAR with content
  • Firefox: DevTools > Network tab > right-click in request pane > Save All as HAR
  • HAR Analyzer

Repeat Requests

Bring Out the Big Guns

Korben Dallas in the film The Fifth Element pointing a gun and laughing

Automate the Browser

Automate the Browser

symfony/dom-crawler

Automate the Browser

This is Fine

Meme from Adult Swim of a cartoon dog sitting in a house on fire saying, 'This is fine.'

WebDriver Flag

Dr. Sheldon Cooper from Big Bang Theory introducing his program Fun with Flags

WebDriver Flag

Other WebDriver Flag

Former US president Barack Obama failing to dunk a large cookie into a glass of milk before saying, 'Thanks Obama.'

Other WebDriver Flag

Headless Mode

Quick Demo

Debugging Tips

Free Project Idea

  • PsySH
  • Add integration with symfony/panther
  • e.g. custom commands to interactively fetch pages, filter and interact with elements, etc.

Fry from the TV series Futurama saying, 'Shut up and take my money!'

Other Resources

Leonardo DiCaprio in a tuxedo smiling and raising a glass toward the viewer

That's All, Folks

Outro for Looney Tunes with Porky Pig saying, 'That's all, folks.'