Serverless Forum Keyword Monitoring with PHP
I recently came across a post entitled "How To Monitor a Forum for Keywords Using Python and AWS Lambda" in my Twitter feed. While there are a number of things I like about Python as a language, the Python package management system isn't one of them. PHP is my lingua franca and Composer has ruined every other language package management system for me.
I read the post and thought, "I wonder how difficult it would be to do this in PHP?" My friend Rob Allen gave a talk on serverless PHP applications at the Midwest PHP 2020 conference recently. I recently published the second edition of my book on using PHP for web scraping. So, I rolled my sleeves up and got to it.
Prerequisites
As with the original post, we'll use the Serverless framework, but in conjunction with the Bref framework that Rob mentions in his talk.
To fetch and parse HTML from the forum, we'll use the Symfony HttpClient, BrowserKit, and CssSelector libraries. (If you have my book, you can find more information on these in Chapter 14.)
I personally like keeping my host operating system minimal and tidy, so I run as many things as I feasibly can using Docker containers. The easiest way I've found to do that with Bref is to use the same Docker image used for its development.
First, we'll install the dependencies mentioned above using Composer.
docker run --rm -it -v $(pwd):/var/task bref/dev-env \
composer require bref/bref symfony/http-client \
symfony/browser-kit symfony/css-selector
Next, we'll initialize the project to get our serverless configuration and initial boilerplate code, using defaults when prompted.
docker run --rm -it -v $(pwd):/var/task bref/dev-env \
php ./vendor/bin/bref init
Opening the generated serverless.yml
file, we see that Bref defaults to using the layer for PHP 7.3. We'll change it to use 7.4 instead, which is what its development Docker image uses, and update the function name while we're at it.
functions:
indiehackers: # Change this to an appropriate function name
handler: index.php
description: ''
layers:
- ${bref:layer.php-74} # Change 73 on this line to 74
Keyword Monitoring
As in the original post, we're going to target Indie Hackers, specifically post titles on the landing page.
To do this, we'll open the generated index.php
file and edit it to have these contents.
<?php declare(strict_types=1);
require __DIR__ . '/vendor/autoload.php';
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;
return function () {
$base_url = 'https://www.indiehackers.com';
$browser = new HttpBrowser(HttpClient::create());
$browser->request('GET', $base_url);
$crawler = $browser->getCrawler();
$links = $crawler->filter('a.feed-item__title-link');
$filtered = new CallbackFilterIterator(
$links->getIterator(),
function ($link) {
return stripos($link->textContent, 'design') !== false;
}
);
$formatted = array_map(
function ($link) use ($base_url) {
return $base_url . $link->getAttribute('href');
},
array_values(iterator_to_array($filtered))
);
return $formatted;
};
Using the aforementioned Symfony component libraries, we do the following.
- Fetch the landing page of the target web site.
- Parse the page for feed links.
- Filter those links for the phrase "design."
- Extract the link addresses and format them to be absolute URLs.
- Return the URLs.
If we wanted to integrate this with Slack, it would be simple to do so with the maknz/slack library. While it is no longer being maintained at the time of this writing, it works and makes sending Slack messages via an incoming webhook very simple.
The following code segment can be placed before the return
statement in the above function.
<?php
$client = new Maknz\Slack\Client(WEBHOOK_URL);
foreach ($formatted as $link) {
$client->send($link);
}
Deployment and Invocation
To have this function run once a day, open serverless.yml
and modify the block you changed earlier to include the events
block shown below.
functions:
indiehackers:
handler: index.php
description: ''
layers:
- ${bref:layer.php-74}
events: # Add this block
- schedule: rate(1 day)
To deploy this, create AWS keys and run the following, replacing KEY
and SECRET
with the key and secret values for your IAM user respectively.
docker run --rm -it -v $(pwd):/var/task \
--env AWS_ACCESS_KEY_ID=KEY \
--env AWS_SECRET_ACCESS_KEY=SECRET \
bref/dev-env serverless deploy
Finally, to invoke the function manually:
docker run --rm -it -v $(pwd):/var/task \
--env AWS_ACCESS_KEY_ID=KEY \
--env AWS_SECRET_ACCESS_KEY=SECRET \
bref/dev-env serverless invoke -f indiehackers
Here's an example of this function's output.
[
"https://www.indiehackers.com/post/advice-recommendations-for-improving-design-and-ux-of-dashboard-778bfdbe52"
]
Conclusion
The amount of effort involved in this seemed at least on par with what's required on the Python side, minus any potential package management system shenanigans. Overall, I was pleased with the results.
I hope you found this post useful. Thanks for reading!