Centralized Logging with Monolog, Logstash, and Elasticsearch

Published in Logging, Elasticsearch on Jun 18, 2014

This was originally posted to engineering.blopboard.com which is now defunct.

It wasn't very long ago that Blopboard went into public beta. Once the application was in production, our team needed to be able to monitor what its components — which are spread across multiple servers — were doing in order to debug issues. I want to share how I went about setting up centralized logging with hope of helping someone in a similar situation.

I took some inspiration for this from a blog post by Scott Mattocks and a presentation by Jeremy Cook.

What We Wanted

We needed to be able to collect all our logs in one place for easy access. We wanted to be able to search the logs. We wanted to be able to visualize our log data. We wanted to use open source projects if they were available.

Architecture

After some research we decided to use Logstash backed by Elasticsearch with its Kibana frontend.

Application components log to the filesystems of their respective servers, ensuring no data is lost if the logging infrastructure is temporarily unavailable.

Separate processes on those servers monitor the log files and send new messages written to them over TCP to our Logstash server. Logstash then parses the messages and indexes them on an Elasticsearch instance.

Application Logging

We were already using Monolog to handle our application logging, which handily comes with a Logstash-compatible formatter. Below is an example of how we added that to our existing logger configuration.

<?php
use Monolog\Logger;
use Monolog\Formatter\LogstashFormatter;
use Monolog\Handler\StreamHandler;

// stream resource or URI, e.g. file path
// $stream = ...;

// logging level from Logger constants, e.g. Logger::DEBUG
// $level = ...;

// Logstash "channel" field value, identifies the codebase area,
// application subsystem, integration point, etc.
// $channel = ...;

// Logstash "type" field value, identifies the application
// $type = ...;

// Logstash "source" field value, identifies the server
// $source = ...;

$logger = new Logger($channel);
$handler = new StreamHandler($stream, $level);
$formatter = new LogstashFormatter($type, $source);
$handler->setFormatter($formatter);
$logger->pushHandler($handler);

Pushing Messages to Logstash

The next step is to get the log data from the filesystem to Logstash. To do this, we have a daemon shell script that sends the data to Logstash via TCP. We use Chef to deploy this script to all machines with log files that require monitoring and a Monit configuration file to start the shell script and keep it running on those machines.

The Chef recipe ensures the shell script destination exists, copies the shell script file and Monit configuration (in case either has changed) to their respective paths, reloads monit's configuration, and restarts the Monit service associated with the shell script.

directory "/var/lib/blopboard" do
    user "root"
    group "root"
    mode 00644
    recursive true
    action :create
end

cookbook_file "/var/lib/blopboard/logstash.sh" do
    source "logstash.sh"
    owner "root"
    group "root"
    mode 00744
    action :create
end

cookbook_file "/etc/monit/conf.d/logstash.monitrc" do
    source "logstash.monitrc"
    owner "root"
    group "root"
    mode 00744
    action :create
end

bash "monit_start_service" do
    user "root"
    code <<-EOH
monit reload
monit restart logstash
EOH
end

The Monit configuration specifies how to start and stop the shell script, which uses a file to store the PID of the shell script process so that Monit can determine on each cycle whether or not that process is still running.

check process logstash with pidfile /var/run/logstash.pid
    every 1 cycles
    start program = "/bin/bash /var/lib/blopboard/logstash.sh start"
    stop program = "/bin/bash /var/lib/blopboard/logstash.sh stop"

The shell script contains a list of log file paths that may exist on any of our servers. It passes the paths of those that exist to the tail command, which monitors those files for new messages. Those messages are then piped to the nc command, which acts as a TCP client to send the messages to Logstash.

#!/bin/bash
#logstash.sh
LOGSTASH_HOST="logstash.yourhost.com"
LOGSTASH_PORT="9301"
BLOPBOARD_LOGS="/var/log/blopboard"
PID_FILE="/var/run/logstash.pid"
LOGS=(
		"$BLOPBOARD_LOGS/app.log"
		# ...
	)

# Determine which log files to monitor
args=""
for log in ${LOGS[@]}; do
	if [ -f $log ]; then
		args+=" $log"
	fi
done

# Either start or stop the command
if [ "$args" != "" ]; then
    tailcmd="tail -q -n 0 -f$args"
    cmd="$tailcmd | nc $LOGSTASH_HOST $LOGSTASH_PORT"
    case "$1" in
        'start')
            echo "$$" > $PID_FILE
            bash -c "$cmd"
            ;;
        'stop')
            kill -9 `cat $PID_FILE`
            pgrep -f "$tailcmd" | xargs kill -9
            ;;
    esac
fi

Installing Logstash

The virtual machine running Logstash needs ports opened for a few things:

A web server to serve up the files for Kibana; I used port 8080 for this
Logstash to listen for messages from other machines; 9301 is the default port for this on our machine

Other related ports include 9200 (Elasticsearch REST API) and 9300 (Elasticsearch listener).

Some Linux distributions run behind in their Logstash package versions. Luckily, the Logstash project offers packages for the latest stable build, and they even bundle Elasticsearch and Kibana so you don't have to install them separately. Installing it on Debian-based distributions is pretty straightforward:

wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
echo -e "\ndeb http://packages.elasticsearch.org/logstash/1.4/debian stable main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install logstash

Most files are installed under /opt/logstash. You can use dpkg-query -L logstash to get a full listing of the installed files.

Installing Elasticsearch

While Logstash does come with an embedded Elasticsearch installation that can run when you start Logstash, you may want to maintain a separate installation of Elasticsearch for other uses. Elasticsearch also has stable package releases available. Here's how to install it on a Debian-based distribution:

wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
echo -e "\ndeb http://packages.elasticsearch.org/elasticsearch/1.1/debian stable main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install elasticsearch
sudo update-rc.d elasticsearch defaults 95 10
sudo /etc/init.d/elasticsearch start

Installing Nginx

While the Logstash package does include Kibana, it does not include a web server to serve it up. For this, I chose Nginx. Below are commands to install and configure it. You'll want to change the SERVER_NAME and SERVER_PORT to suit your needs.

SERVER_NAME="logstash.yourserver.com"
SERVER_PORT="8080"
sudo apt-get install nginx
sudo tee /etc/nginx/sites-available/$SERVER_NAME >/dev/null <<EOL
server {
  listen $SERVER_PORT;
  root /opt/logstash/vendor/kibana;
  index index.html;
  server_name $SERVER_NAME;
  location / {
    try_files \$uri \$uri/ /index.html;
  }
}
EOL
sudo ln -s /etc/nginx/sites-available/$SERVER_NAME /etc/nginx/sites-enabled/$SERVER_NAME
sudo service nginx start

Configuring and Running Logstash

Now that all the components used by Logstash are installed, Logstash itself must be configured to use them. Here's how to create the Logstash configuration file:

sudo tee /etc/logstash/conf.d/tcp-to-elasticsearch >/dev/null <<EOL
input {
  tcp {
    port => 9301
    codec => "json_lines"
  }
}
output {
  elasticsearch {
    embedded => true
  }
}

This specifies that the TCP input plugin and Elasticsearch output plugin are used. The same port is specified for the TCP plugin as was specified in the shell script shown earlier. The TCP plugin uses the json_lines codec because Monolog sends log messages as newline-delimited JSON objects.

It's explicitly specified that the embedded Elasticsearch instance should be used in this example. You can omit this line to have it use a separate instance running on the same machine, or specify hosts for one or more external Elasticsearch servers.

Finally, to run Logstash with the configuration file that we've added:

# Add --configtest flag to test configuration
sudo /opt/logstash/bin/logstash \
	--verbose -l /var/log/logstash/tcp-to-elasticsearch.log \
	-f /etc/logstash/conf.d/tcp-to-elasticsearch &

Using Kibana

You can now verify that Logstash is running by viewing the default Logstash dashboard in Kibana. To do this, open a web browser and browse to host and port on which Nginx was configured. To issue a query in Kibana, use the Lucene query syntax.

Beyond that, you can click icons at the top right of Kibana as well as on the top right of each widget and at the top left of each row of widgets to add and configure dashboards to suit your needs and save configuration for them to your Elasticsearch instance.

Epilogue

I hope you've enjoyed this walk-through of our logging infrastructure and that it proves useful to you. Feel free to leave related questions and discussion in the comments. Thanks for reading!