Saturday, August 30, 2008

JMX for Scripts : Monitoring and Scheduling

Last week, I described a simple JMX setup to manage remotely running shell scripts. Using the HTML Adapter supplied in the JMX toolkit, we were able to provide a nice web-based front end that allowed us to start and stop the scripts, as well as observe its logs and whether it is running or not. The setup so far is functional, but not overly convenient. For example, instead of having to point your browser to the MBean page (and remember, there could be multiple machines, each running a bunch of scripts that you have to manage) once every day or hour, you may want the server to notify you (maybe send you an email or an SMS) if there is anything amiss. So over the last week, I added this sort of functionality to my JMX server, which I describe in this post.

Fortunately, JMX provides these sort of features right out of the box, in the form of Monitor and Timer objects, so the work is mainly to choose the right component for the job, then hook these components up together correctly. The diagram below shows the components I use for each of my Script adapter MBeans in my toy example:

Looks impressive, doesn't it? Well, essentially, all it is saying is that we have two Monitors and one Timer attached to each Script Adapter. The two Monitors periodically poll the Script Adapter and send out notifications into the MBean server's event pool. These are picked up by the NotificationFilters connected with each ScriptAdapter, and passed through to the Listener object, which handles the notification. The Timer is slightly different, it just sends events on a schedule, which gets picked up by the Notification Filter, and passed through to the configured Listener.

Please note: The code for all the classes is provided towards the end of the article. I have shown snippets of code where it is more informative than describing it in English, but because of the high degree of reuse, there is a corresponding number of forward references, which may be hard for a reader to reconcile if I provided full code inline.

Status Monitoring

This monitor is a StringMonitor which calls getStatus() on the ScriptAdapter every minute and looks for an exact match to the the String "ERROR". If it finds it, then it sends out an email. Here is the snippet of code from ScriptAgent.java which does this (we provide the full source below).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
    // add string monitor checking for getStatus() != "ERROR"
    StringMonitor statusMonitor = new StringMonitor();
    statusMonitor.addObservedObject(script);
    statusMonitor.setObservedAttribute("Status");
    statusMonitor.setGranularityPeriod(60000L); // check every minute
    statusMonitor.setStringToCompare("ERROR");  // look for errors...
    statusMonitor.setNotifyMatch(true);         // ...and trigger notifications on match
    ObjectName statusMonitorObjectName =
      new ObjectName("monitor:type=Status,script=" + getScriptName(script));
    server.registerMBean(statusMonitor, statusMonitorObjectName);
    statusMonitor.start();
    // ...and associated listener object to send an email once that happens
    server.addNotificationListener(statusMonitorObjectName,
      new EmailNotifierListener(script),
      new ScriptNotificationFilter(script),
      statusMonitorObjectName);

The first few lines are just instantiating the StringMonitor, linking it up to the ScriptAdapter and the getStatus(), then setting its properties before we register and start it. Linking this up to the listener and filter is done in the server.addNotificationListener() call.

To test this monitor, start up the server in a terminal. Switch to either the MBean server and notice that both scripts are running fine. Then on another terminal, navigate to the /tmp directory and create an empty .err file. Under Unix, this would be something like this:

1
2
3
sujit@sirocco:~$ cd /tmp
sujit@sirocco:/tmp$ touch count_sheep_Dolly.err
sujit@sirocco:/tmp$ 

Back on the terminal where your MBean server was started up, you should see the following output from the MBean server. It may not be instantaneous, since the monitor is run on a schedule, but it should be within a few seconds.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
...
EmailNotifierListener: sending email...
From: scriptmanager@clones-r-us.com
To: dr_evil@clones-r-us.com
Date: Wed Aug 27 08:31:56 GMT-08:00 2008
Subject: Alarm for [script:name=count_sheep_Dolly:Status]
--
Script [script:name=count_sheep_Dolly] reported ERROR.
--
...

Log size Monitoring

Our next monitor is a Log size monitor which checks the size of the log size every 30 seconds to make sure it is growing. To do this, we need to use a GaugeMonitor in difference mode. The code snippet to do this is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
    // add gauge monitor to determine if process hung, ie if log files don't grow
    GaugeMonitor logsizeMonitor = new GaugeMonitor();
    logsizeMonitor.addObservedObject(script);
    logsizeMonitor.setObservedAttribute("LogFilesize");
    logsizeMonitor.setDifferenceMode(true);           // check diffs(logsize)
    logsizeMonitor.setThresholds(Long.MAX_VALUE, 1L); // notify if less than 1 (0)
    logsizeMonitor.setNotifyLow(true);
    logsizeMonitor.setNotifyHigh(false);
    logsizeMonitor.setGranularityPeriod(30000L);      // every 30 seconds
    ObjectName logsizeMonitorObjectName =
      new ObjectName("monitor:type=Logsize,script=" + getScriptName(script));
    server.registerMBean(logsizeMonitor, logsizeMonitorObjectName);
    logsizeMonitor.start();
    // ...and the associated listener object to send an email once this happens
    server.addNotificationListener(logsizeMonitorObjectName,
      new EmailNotifierListener(script),
      new ScriptNotificationFilter(script),
      logsizeMonitorObjectName);

The configuration is similar to the one above, first we instantiate the GaugeMonitor, then we configure it to observe ScriptAdapter.LogfileSize, then we configure the GaugeMonitor's behavior. Difference mode is selected because we want to make sure that the difference between consecutive readings of ScriptAdapter.LogfileSize is greater than 0. As before, the last line sets up the listener and filter objects.

To test this monitor, start up the server in a terminal and make sure that the scripts are running. Then on another terminal, kill one of the scripts, like so:

1
2
3
4
5
sujit@sirocco:~$ cd /tmp
sujit@sirocco:/tmp$ cat count_sheep_Dolly.pid
3714
sujit@sirocco:/tmp$ kill -9 3714
sujit@sirocco:/tmp$ 

On the terminal window where the MBean server has been started, you should see the trace of the email being sent after a short while.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
...
EmailNotifierListener: sending email...
From: scriptmanager@clones-r-us.com
To: dr_evil@clones-r-us.com
Date: Wed Aug 27 08:28:18 GMT-08:00 2008
Subject: Alarm for [script:name=count_sheep_Dolly:LogFilesize]
--
Script [script:name=count_sheep_Dolly] is HUNG.
--
...

Heartbeat Timer

We also have a Heartbeat mechanism for our scripts using a JMX Timer Monitor bean. This is a Timer that sends a notification every minute. This notification is picked up (via the NotificationFilter) by the CprListener, which sends a start() call to the ScriptAdapter. The ScriptAdapter.start() method is designed to check if the script is running, and only pass the command through if the script is not running. Here is the code snippet to set it up:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    String scriptName = getScriptName(script);
    Timer heartbeatTimer = new Timer();
    heartbeatTimer.addNotification("heartbeat", scriptName,
      null, new Date(), 60000L); // every minute
    ObjectName heartbeatTimerObjectName =
      new ObjectName("timer:type=heartbeat,script=" + scriptName);
    server.registerMBean(heartbeatTimer, heartbeatTimerObjectName);
    heartbeatTimer.start();
    server.addNotificationListener(heartbeatTimerObjectName,
      new CprListener(script),
      new ScriptNotificationFilter(script),
      heartbeatTimerObjectName);

In this case, notice that the notification has a type called "heartbeat", which is what the Listener object will use to determine if it should handle the query. To test this, as before, start the MBean server on a terminal, then just wait until the Timer has a chance to kick in. You should see the following on the MBean Server console.

1
2
3
4
5
6
...
Performing CPR on script:name=count_sheep_Dolly
19:11:14: Attempted start, but script: count_sheep_Dolly already started
Performing CPR on script:name=count_sheep_Polly
19:11:14: Attempted start, but script: count_sheep_Polly already started
...

Timers can also be used with non-daemon scripts, to schedule them to run at a particular time of day.

Automatic script startup

One other thing I did was to start my scripts automatically via the JMX agent. Without JMX, if you have a bunch of application daemons running on a machine, you would probably create individual start/stop scripts for them in your /etc/init.d directory. With a JMX server managing your beans, you just start the JMX server using a start script in your /etc/init.d directory, and let it start the scripts that are registered to it. As an interesting side effect, this will also make you more popular with your Unix system administrator(s), since your popularity is computed as an inverse of the number of times your application daemons crash between midnight and 2am.

All I had to do for this was to add in a start() call in the init() method, and kill() calls in the destroy() method of my ScriptAgent class, and add a shutdown hook in the main() method to ensure that destroy() gets called on server shutdown. You will see this in action as you start and stop your MBean server.

1
2
3
4
5
6
7
8
...
[INFO] [exec:java]
08:52:31: Starting script: count_sheep_Dolly
08:52:32: Starting script: count_sheep_Polly
...
^C08:58:49: Killing script: count_sheep_Dolly
08:58:49: Killing script: count_sheep_Polly
sujit@sirocco:~$ 

The code

ScriptAdapterMBean.java

Hasn't changed a whole lot, but we did add in a new method, so it may be just good to provide the newest code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Source: src/main/java/com/mycompany/myapp/ScriptAdapterMBean.java
package com.mycompany.myapp;

public interface ScriptAdapterMBean {

  // read-only properties
  public boolean isRunning();
  public String getStatus();
  public String getLogs();
  public long getRunningTime();
  public long getLogFilesize();
  
  // operations
  public void start();
  public void kill();
}

ScriptAdapter.java

The only thing thats changed since last week is the implementation of the getLogFilesize() which is a new method in the interface.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
// Source: src/main/java/com/mycompany/myapp/ScriptAdapter.java
package com.mycompany.myapp;

import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Date;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.FilenameUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang.time.DateFormatUtils;
import org.springframework.jmx.JmxException;

public class ScriptAdapter implements ScriptAdapterMBean {

  private static final String LOG_DIR = "/tmp";
  private static final String ERR_DIR = "/tmp";
  private static final String PID_DIR = "/tmp";
  
  private String path;
  private String args;

  private String adapterName;
  
  private String pidfileName;
  private String logfileName;
  private String errfileName;
  
  private Process process;
  
  public ScriptAdapter(String path, String[] args) {
    this.path = path;
    this.args = StringUtils.join(args, " ");
    this.adapterName = StringUtils.join(new String[] {
      FilenameUtils.getBaseName(path),
      (args.length == 0 ? "" : "_"),
      StringUtils.join(args, "_")
    });
    this.pidfileName = FilenameUtils.concat(PID_DIR, adapterName + ".pid");
    this.logfileName = FilenameUtils.concat(LOG_DIR, adapterName + ".log");
    this.errfileName = FilenameUtils.concat(ERR_DIR, adapterName + ".err");
  }

  /**
   * Not part of the MBean so it will not be available as a manageable attribute.
   * @return the computed adapter name.
   */
  public String getAdapterName() {
    return adapterName;
  }
  
  /**
   * Checks for existence of the PID file. Uses naming conventions
   * to locate the correct pid file.
   */
  public boolean isRunning() {
    File pidfile = new File(pidfileName);
    return (pidfile.exists());
  }

  /**
   * If isRunning, then status == RUNNING.
   * If isRunning and .err file exists, then status == ERROR
   * If !isRunning and .err file does not exist, then status == READY 
   */
  public String getStatus() {
    File errorfile = new File(errfileName);
    if (errorfile.exists()) {
      return "ERROR";
    } else {
      if (isRunning()) {
        return "RUNNING";
      } else {
        return "READY";
      }
    }
  }

  public String getLogs() {
    if ("ERROR".equals(getStatus())) {
      File errorfile = new File(errfileName);
      try {
        return FileUtils.readFileToString(errorfile, "UTF-8");
      } catch (IOException e) {
        throw new JmxException("IOException getting error file", e);
      }
    } else {
      try {
        Process tailProcess = Runtime.getRuntime().exec(
          StringUtils.join(new String[] {"/usr/bin/tail", "-10", logfileName}, " "));
        tailProcess.waitFor();
        BufferedReader console = new BufferedReader(
          new InputStreamReader(tailProcess.getInputStream()));
        StringBuilder consoleBuffer = new StringBuilder();
        String line;
        while ((line = console.readLine()) != null) {
          consoleBuffer.append(line).append("\n");
        }
        console.close();
        tailProcess.destroy();
        return consoleBuffer.toString();
      } catch (IOException e) {
        e.printStackTrace();
        throw new JmxException("IOException getting log file", e);
      } catch (InterruptedException e) {
        throw new JmxException("Tail interrupted", e);
      }
    }
  }

  /**
   * Returns the difference between the PID file creation and the current
   * system time.
   */
  public long getRunningTime() {
    if (isRunning()) {
      File pidfile = new File(pidfileName);
      return System.currentTimeMillis() - pidfile.lastModified();
    } else {
      return 0L;
    }
  }

  /**
   * Returns the current size of the log file in bytes.
   * @return the current size of the log file.
   */
  public long getLogFilesize() {
    File logfile = new File(logfileName);
    return logfile.length();
  }

  public void start() {
    try {
      if (! isRunning()) {
        log("Starting script: " + adapterName);
        process = Runtime.getRuntime().exec(
          StringUtils.join(new String[] {path, args}, " "));
        // we don't wait for it to complete, just start it
      } else {
        log("Attempted start, but script: " + adapterName + " already started");
      }
    } catch (IOException e) {
      throw new JmxException("IOException starting process", e);
    }
  }

  public void kill() {
    if (isRunning()) {
      File pidfile = new File(pidfileName);
      log("Killing script: " + adapterName);
      try {
        String pid = FileUtils.readFileToString(pidfile, "UTF-8");
        Runtime.getRuntime().exec(StringUtils.join(new String[] {
          "/usr/bin/kill", "-9", pid}, " "));
        if (process != null) {
          // remove hanging references
          process.destroy();
        }
        pidfile.delete();
      } catch (IOException e) {
        throw new JmxException("IOException killing process", e);
      }
    }
  }
  
  private void log(String message) {
    System.out.println(DateFormatUtils.ISO_TIME_NO_T_FORMAT.format(new Date()) + 
      ": " + message);
  }
}

ScriptAgent.java

This class has had some pretty huge changes since last week. For each script adapter, we call the registerMonitors() and registerTimer() which addes the 2 monitors and 1 timer to each script adapter. Each of these monitors are linked to the appropriate Listener in these two extra methods. The init() method starts up the scripts on MBean server startup, and there is a new destroy() method which will kills the scripts on server shutdown.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
// Source: src/main/java/com/mycompany/myapp/ScriptAgent.java
package com.mycompany.myapp;

import java.util.ArrayList;
import java.util.Date;
import java.util.List;

import javax.management.MBeanServer;
import javax.management.MBeanServerFactory;
import javax.management.ObjectName;
import javax.management.monitor.CounterMonitor;
import javax.management.monitor.GaugeMonitor;
import javax.management.monitor.StringMonitor;
import javax.management.timer.Timer;

import com.sun.jdmk.comm.HtmlAdaptorServer;

public class ScriptAgent {

  // the port number of the HTTP Server Adapter
  private final static int DEFAULT_AGENT_PORT = 8081;
  
  private MBeanServer server;
  private List<ScriptAdapter> scriptAdapters = new ArrayList<ScriptAdapter>();
  
  public ScriptAgent() {
    super();
  }
  
  public void addScriptAdapter(ScriptAdapter scriptAdapter) {
    this.scriptAdapters.add(scriptAdapter);
  }

  protected void init() throws Exception {
    server = MBeanServerFactory.createMBeanServer();
    // load all script adapters
    for (ScriptAdapter scriptAdapter : scriptAdapters) {
      ObjectName script = new ObjectName(
          "script:name=" + scriptAdapter.getAdapterName());
      server.registerMBean(scriptAdapter, script);
      // each script will have a monitor to check if its running and
      // if its log file size is growing
      registerMonitors(script);
      // each script will have a timer to attempt restart every 1m
      registerTimer(script);
      // start all script services
      scriptAdapter.start();
    }
    // load HTML adapter
    HtmlAdaptorServer adaptor = new HtmlAdaptorServer();
    adaptor.setPort(DEFAULT_AGENT_PORT);
    server.registerMBean(adaptor, new ObjectName("adapter:protocol=HTTP"));
    // start 'er up!
    adaptor.start();
  }
  
  protected void destroy() throws Exception {
    for (ScriptAdapter scriptAdapter : scriptAdapters) {
      // stop all script services
      scriptAdapter.kill();
    }
  }
  
  private void registerMonitors(ObjectName script) throws Exception {
    // add string monitor checking for getStatus() != "ERROR"
    StringMonitor statusMonitor = new StringMonitor();
    statusMonitor.addObservedObject(script);
    statusMonitor.setObservedAttribute("Status");
    statusMonitor.setGranularityPeriod(60000L); // check every minute
    statusMonitor.setStringToCompare("ERROR");  // look for errors...
    statusMonitor.setNotifyMatch(true);         // ...and trigger notifications on match
    ObjectName statusMonitorObjectName = 
      new ObjectName("monitor:type=Status,script=" + getScriptName(script)); 
    server.registerMBean(statusMonitor, statusMonitorObjectName);
    statusMonitor.start();
    // ...and associated listener object to send an email once that happens
    server.addNotificationListener(statusMonitorObjectName, 
      new EmailNotifierListener(script), 
      new ScriptNotificationFilter(script), 
      statusMonitorObjectName);
    
    // add gauge monitor to determine if process hung, ie if log files don't grow
    GaugeMonitor logsizeMonitor = new GaugeMonitor();
    logsizeMonitor.addObservedObject(script);
    logsizeMonitor.setObservedAttribute("LogFilesize");
    logsizeMonitor.setDifferenceMode(true);           // check diff(logsize)
    logsizeMonitor.setThresholds(Long.MAX_VALUE, 1L); // notify if <1 (0)
    logsizeMonitor.setNotifyLow(true);
    logsizeMonitor.setNotifyHigh(false);
    logsizeMonitor.setGranularityPeriod(30000L);      // every 30 seconds
    ObjectName logsizeMonitorObjectName = 
      new ObjectName("monitor:type=Logsize,script=" + getScriptName(script));
    server.registerMBean(logsizeMonitor, logsizeMonitorObjectName);
    logsizeMonitor.start();
    // ...and the associated listener object to send an email once this happens
    server.addNotificationListener(logsizeMonitorObjectName, 
      new EmailNotifierListener(script), 
      new ScriptNotificationFilter(script), 
      logsizeMonitorObjectName);
  }

  private void registerTimer(ObjectName script) throws Exception {
    String scriptName = getScriptName(script);
    Timer heartbeatTimer = new Timer();
    heartbeatTimer.addNotification("heartbeat", scriptName, 
      null, new Date(), 60000L); // every minute
    ObjectName heartbeatTimerObjectName = 
      new ObjectName("timer:type=heartbeat,script=" + scriptName);
    server.registerMBean(heartbeatTimer, heartbeatTimerObjectName);
    heartbeatTimer.start();
    server.addNotificationListener(heartbeatTimerObjectName, 
      new CprListener(script), 
      new ScriptNotificationFilter(script), 
      heartbeatTimerObjectName);
  }

  private String getScriptName(ObjectName script) {
    String canonical = script.getCanonicalName();
    return canonical.substring("script:name=".length());
  }

  public static void main(String[] args) {
    final ScriptAgent agent = new ScriptAgent();
    Runtime.getRuntime().addShutdownHook(new Thread() {
      public void run() {
        try {
          agent.destroy();
        } catch (Exception e) {
          throw new RuntimeException(e);
        }
      }
    });
    try {
      agent.addScriptAdapter(new ScriptAdapter(
        "/home/sujit/src/rscript/src/main/sh/count_sheep.sh", new String[] {"Dolly"}));
      agent.addScriptAdapter(new ScriptAdapter(
        "/home/sujit/src/rscript/src/main/sh/count_sheep.sh", new String[] {"Polly"}));
      agent.init();
    } catch (Exception e) {
      e.printStackTrace(System.err);
    }
  }
}

ScriptNotificationFilter.java

The ScriptNotificationFilter is an optional item that makes sure that events raised for a script are handled by the listener that it configured for that script.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Source: src/main/java/com/mycompany/myapp/ScriptNotificationFilter.java
package com.mycompany.myapp;

import javax.management.Notification;
import javax.management.NotificationFilter;
import javax.management.ObjectName;
import javax.management.monitor.MonitorNotification;
import javax.management.timer.TimerNotification;

public class ScriptNotificationFilter implements NotificationFilter {

  private static final long serialVersionUID = 6299049832726848968L;

  private String scriptName;
  
  public ScriptNotificationFilter(ObjectName objectName) {
    super();
    this.scriptName = objectName.getCanonicalName();
  }
  
  /**
   * Is the notification meant for this script?
   */
  public boolean isNotificationEnabled(Notification notification) {
    if (notification instanceof MonitorNotification) {
      MonitorNotification monitorNotification = (MonitorNotification) notification;
      String observedObjectName = 
        monitorNotification.getObservedObject().getCanonicalName();
      return scriptName.equals(observedObjectName);
    } else if (notification instanceof TimerNotification) {
      TimerNotification timerNotification = (TimerNotification) notification;
      return scriptName.substring("script:name=".length()).equals(
        timerNotification.getMessage());
    } else {
      // unknown notification type
      return false;
    }
  }
}

EmailNotificationFilter.java

This listener is paired with the two Monitors. It composes and sends emails to the address configured for the script. The actual emails are shown in the tests above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// Source: src/main/java/com/mycompany/myapp/EmailNotificationFilter.java
package com.mycompany.myapp;

import java.util.Date;
import java.util.Map;

import javax.management.Notification;
import javax.management.NotificationListener;
import javax.management.ObjectName;
import javax.management.monitor.MonitorNotification;

import org.apache.commons.lang.ArrayUtils;

public class EmailNotifierListener implements NotificationListener {

  // TODO: this should probably be a configuration item
  @SuppressWarnings("unchecked")
  private static final Map<String,String> ALARM_EMAILS = 
    ArrayUtils.toMap(new String[][] {
      new String[] {
        "script:name=count_sheep_Dolly", "dr_evil@clones-r-us.com"
      },
      new String[] {
        "script:name=count_sheep_Polly", "pinky_n_brain@clones-r-us.com"
      }
  });
  private String scriptName;
  
  public EmailNotifierListener(ObjectName objectName) {
    this.scriptName = objectName.getCanonicalName();
  }
  
  public void handleNotification(Notification notification, Object handback) {
    if (notification instanceof MonitorNotification) {
      MonitorNotification monitorNotification = (MonitorNotification) notification;
      String observedAttribute = monitorNotification.getObservedAttribute();
      String observedScriptName = monitorNotification.getObservedObject().getCanonicalName(); 
      String emailTo = ALARM_EMAILS.get(observedScriptName);
      String emailSubject = "Alarm for [" + observedScriptName + ":" + 
        monitorNotification.getObservedAttribute() + "]";
      StringBuilder emailBody = new StringBuilder();
      if (observedAttribute.equals("Status")) {
        emailBody.append("Script [").
          append(observedScriptName).
          append("] reported ERROR.");
      } else if (observedAttribute.equals("LogFilesize")) {
        emailBody.append("Script [").
          append(observedScriptName).
          append("] is HUNG.");
      } else {
        // nothing to do for now, place holder for future notifications
      }
      sendEmail(emailTo, emailSubject, emailBody);
    }
  }

  private void sendEmail(String emailTo, String emailSubject, StringBuilder emailBody) {
    System.out.println("EmailNotifierListener: sending email...");
    System.out.println("From: scriptmanager@clones-r-us.com");
    System.out.println("To: " + emailTo);
    System.out.println("Date: " + new Date());
    System.out.println("Subject: " + emailSubject);
    System.out.println("--");
    System.out.println(emailBody.toString());
    System.out.println("--");
  }
}

CprListener.java

This listener is paired with the Heartbeat Timer and attempts to administer CPR to a downed script in order to restart it automatically.

Saturday, August 23, 2008

Managing remote scripts with JMX

Around eight years ago, three of us found ourselves maintaining a fairly large multi-language distributed system. Originally, each component was maintained by their original authors, so it was relatively easy for them to ensure that things were good with that component, but with the development phase ending, most of them moved off to other jobs. While we knew our way around most of these systems, we did not have the bandwidth to check on each of them regularly. So our manager came up with the idea of a central monitoring application, which we built and used with great success. Monitoring scripts written in Perl were deployed on the servers where the actual components ran, and a Perl client queried them both via cron and on-demand from a console, and reported the results. In retrospect, I think this was my first experience with management applications.

My next job was with a web publishing house, and although I did not actually work on any of the backend systems, my middleware components would interact with some of them, and without exception, all of them had embedded HTTP (mostly Jetty) servers reporting status of the application in real-time. This was my second experience with management applications.

More recently, I have been thinking about a problem closer to home. Our backend systems are composed of shell scripts calling Java and C/C++ components, and the only way to see what is happening with them is to actually log into the machines and look at their logs, for which, as developers, we have no access. Since most scripts notify on errors, there is the possibility that a script never ran, and developers will only find out about this when they see the side effects (or lack of it) in their application.

I had read about JMX in the past and had even tinkered with it, but never found a compelling reason to use it - that is, until now. The solution I came up with is a hybrid of the two approaches described above. Like the first approach, the MBean relies on standard operating system commands to do the monitoring, and like the second, I use a HTTP server to display the monitoring information, although I use the JMX tool suite's built-in HTTP Adapter rather than build a custom one.

Here is my toy script that I want to set up JMX instrumentation on. It's a script to count the number of animal clones. Depending on the argument provided to it, it whips out a cloned sheep (Dolly) or a parrot (Polly) every 30 seconds and logs a count.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash
# Check if already running, if so exit
PID_FILE=/tmp/count_sheep_$1.pid
LOG_FILE=/tmp/count_sheep_$1.log
ERR_FILE=/tmp/count_sheep_$1.err
if [ -e $PID_FILE ]; then
  currdate=`date --rfc-3339=seconds`
  echo "[$currdate]: Clone counter already running, cannot start" >gt;>gt; $ERR_FILE
  exit -1
fi
# Start processing
trap `rm -f $PID_FILE` INT TERM EXIT
echo $$ >gt; $PID_FILE
rm -f $LOG_FILE
rm -f $ERR_FILE
if [ $# -eq 0 ]; then
  sheep_name='Dolly'
else
  sheep_name=$1
fi
i=0
while (true); do
  currdate=`date --rfc-3339=seconds`
  i=`expr $i + 1`
  echo "[$currdate]: Hello $sheep_name $i" >gt;>gt; $LOG_FILE
  sleep 1
done
# Clean up
rm $PID_FILE
trap - INT TERM EXIT

Notice that I am depending on certain coding and naming conventions which should be fairly easy to enforce in a corporate environment. If not, there is very likely to be other conventions that you can take advantage of to make your MBeans more reusable across different script types. The required conventions are as follows:

  • Scripts should always write their PID into a .pid file at a specific location on startup and remove it on normal termination.
  • Scripts should write progress information into a .log file at a specific location.
  • Scripts should write error information into a .err file at a specific location only if there is an error.

Our MBean to monitor and control scripts is a standard MBean, and its interface is shown below. Notice that we expose the operations "start" and "kill" (it runs in an infinite loop), and a bunch of read-only properties (signified by getXXX methods). In order to start the script, we pass in the path name of the script and its arguments via a constructor in its implementation. This is for security, so people cannot start up "/bin/rm -rf *" via JMX for example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Source: src/main/java/com/mycompany/myapp/ScriptAdapterMBean.java
package com.mycompany.myapp;

public interface ScriptAdapterMBean {

  // read-only attributes
  public boolean isRunning();
  public String getStatus();
  public String getLogs();
  public long getRunningTime();
  
  // operations
  public void start();
  public void kill();
}

The corresponding implementation is shown below. As you can see, for each of the exposed operations and attributes, I spawn a Java Process and run operating system commands through it, similar to what a human user would do if he could log on to the script's host machine.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
// Source: src/main/java/com/mycompany/myapp/ScriptAdapter.java
package com.mycompany.myapp;

import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.FilenameUtils;
import org.apache.commons.lang.StringUtils;
import org.springframework.jmx.JmxException;

public class ScriptAdapter implements ScriptAdapterMBean {

  private static final String LOG_DIR = "/tmp";
  private static final String ERR_DIR = "/tmp";
  private static final String PID_DIR = "/tmp";
  
  private String path;
  private String args;

  private String adapterName;
  
  private String pidfileName;
  private String logfileName;
  private String errfileName;
  
  private Process process;
  
  /**
   * Ctor to build a ScriptAdapter MBean implementation. Path and args
   * are injected via ctor for security.
   * @param path the full path to the script.
   * @param args the arguments to the script.
   */
  public ScriptAdapter(String path, String[] args) {
    this.path = path;
    this.args = StringUtils.join(args, " ");
    this.adapterName = StringUtils.join(new String[] {
      FilenameUtils.getBaseName(path),
      (args.length == 0 ? "" : "_"),
      StringUtils.join(args, "_")
    });
    this.pidfileName = FilenameUtils.concat(PID_DIR, adapterName + ".pid");
    this.logfileName = FilenameUtils.concat(LOG_DIR, adapterName + ".log");
    this.errfileName = FilenameUtils.concat(ERR_DIR, adapterName + ".err");
  }

  /**
   * Needed by Agent to build the ObjectName reference. Not part of the MBean
   * so it will not be available as a manageable attribute.
   * @return the computed adapter name.
   */
  public String getAdapterName() {
    return adapterName;
  }
  
  /**
   * Checks for existence of the PID file. Uses naming conventions
   * to locate the correct pid file.
   */
  public boolean isRunning() {
    File pidfile = new File(pidfileName);
    return (pidfile.exists());
  }

  /**
   * If isRunning, then status == RUNNING.
   * If isRunning and .err file exists, then status == ERROR
   * If !isRunning and .err file does not exist, then status == READY 
   */
  public String getStatus() {
    File errorfile = new File(errfileName);
    if (errorfile.exists()) {
      return "ERROR";
    } else {
      if (isRunning()) {
        return "RUNNING";
      } else {
        return "READY";
      }
    }
  }

  /**
   * Runs a tail -10 logfile on the log file to get the latest logs at
   * this point in time.
   * @return the last 10 lines (or fewer) of the log file.
   */
  public String getLogs() {
    if ("ERROR".equals(getStatus())) {
      File errorfile = new File(errfileName);
      try {
        return FileUtils.readFileToString(errorfile, "UTF-8");
      } catch (IOException e) {
        throw new JmxException("IOException getting error file", e);
      }
    } else {
      try {
        Process tailProcess = Runtime.getRuntime().exec(
          StringUtils.join(new String[] {
          "/usr/bin/tail", "-10", logfileName}, " "));
        tailProcess.waitFor();
        BufferedReader console = new BufferedReader(
          new InputStreamReader(tailProcess.getInputStream()));
        StringBuilder consoleBuffer = new StringBuilder();
        String line;
        while ((line = console.readLine()) != null) {
          consoleBuffer.append(line).append("\n");
        }
        console.close();
        tailProcess.destroy();
        return consoleBuffer.toString();
      } catch (IOException e) {
        e.printStackTrace();
        throw new JmxException("IOException getting log file", e);
      } catch (InterruptedException e) {
        throw new JmxException("Tail interrupted", e);
      }
    }
  }

  /**
   * Returns the difference between the PID file creation and the current
   * system time.
   */
  public long getRunningTime() {
    if (isRunning()) {
      File pidfile = new File(pidfileName);
      return System.currentTimeMillis() - pidfile.lastModified();
    } else {
      return 0L;
    }
  }

  /**
   * Operation to start the script.
   */
  public void start() {
    try {
      if (! isRunning()) {
        process = Runtime.getRuntime().exec(
          StringUtils.join(new String[] {path, args}, " "));
        // we don't wait for it to complete, just start it
      }
    } catch (IOException e) {
      throw new JmxException("IOException starting process", e);
    }
  }

  /**
   * Operation to stop the script using a kill -9 PID call, using the
   * PID from the .pid file.
   */
  public void kill() {
    if (isRunning()) {
      File pidfile = new File(pidfileName);
      try {
        String pid = FileUtils.readFileToString(pidfile, "UTF-8");
        Runtime.getRuntime().exec(StringUtils.join(new String[] {
          "/usr/bin/kill", "-9", pid}, " "));
        if (process != null) {
          // remove hanging references
          process.destroy();
        }
        pidfile.delete();
      } catch (IOException e) {
        throw new JmxException("IOException killing process", e);
      }
    }
  }
}

Finally, we build an MBean server agent that will load up one or more instances of our MBean. In order to expose the HTTP interface, we also load in the HtmlAdaptorServer and set it to listen on port 8081.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
// Source: src/main/java/com/mycompany/myapp/ScriptAgent.java
package com.mycompany.myapp;

import java.util.ArrayList;
import java.util.List;

import javax.management.MBeanServer;
import javax.management.MBeanServerFactory;
import javax.management.ObjectName;

import com.sun.jdmk.comm.HtmlAdaptorServer;

public class ScriptAgent {

  // the port number of the HTTP Server Adapter
  private final static int DEFAULT_AGENT_PORT = 8081;
  
  private MBeanServer server;
  private List<ScriptAdapter> scriptAdapters = 
    new ArrayList<ScriptAdapter>();
  
  public ScriptAgent() {
    super();
  }
  
  public void addScriptAdapter(ScriptAdapter scriptAdapter) {
    this.scriptAdapters.add(scriptAdapter);
  }

  protected void init() throws Exception {
    server = MBeanServerFactory.createMBeanServer();
    for (ScriptAdapter scriptAdapter : scriptAdapters) {
      server.registerMBean(scriptAdapter, new ObjectName(
        "script:name=" + scriptAdapter.getAdapterName()));
    }
    HtmlAdaptorServer adaptor = new HtmlAdaptorServer();
    adaptor.setPort(DEFAULT_AGENT_PORT);
    server.registerMBean(adaptor, new ObjectName("adapter:protocol=HTTP"));
    adaptor.start();
  }
  
  public static void main(String[] args) {
    try {
      ScriptAgent agent = new ScriptAgent();
      agent.addScriptAdapter(new ScriptAdapter(
        "/home/sujit/src/rscript/src/main/sh/count_sheep.sh", 
        new String[] {"Dolly"}));
      agent.addScriptAdapter(new ScriptAdapter(
        "/home/sujit/src/rscript/src/main/sh/count_sheep.sh", 
        new String[] {"Polly"}));
      agent.init();
    } catch (Exception e) {
      e.printStackTrace(System.err);
    }
  }
}

Next we compile our code and start this agent up on the command line using Maven2, like so:

1
2
sujit@sirocco:~$ mvn exec:java \
  -Dexec.mainClass=com.mycompany.myapp.ScriptAgent

And point our browser to http://localhost:8081. In my little experiment, everything is on the same machine, but in a real-world setup, I would be pointing my browser to a server agent listening on a remote machine.

And now, on to the mandatory screenshots, to walk you through a typical flow using the MBean HTML adapter.

The Agent View - we have loaded two instances of the ScriptAdapter MBean, one to manage the script for cloning sheep and one for cloning parrots.
The MBean view for the sheep cloning management bean. I've rolled it down a bit so you can see both the kill and start operation buttons. Notice the status attribute "READY". Right now, logs are empty, since there are no logs in the file system.
Clicking on the [Start] button says "start was successful". Lets go back to the MBean view by clicking the Back to MBean View link.
The MBean view shows that the script is running. Notice the status attribute "RUNNING" and the non-zero value of RunningTime. If you reload, you will see the logs and the running time refreshing.

Once we have enough of sheep-cloning, we can kill the process by hitting the [KILL] button. This will produce the "Kill Successful" page.
Back at the MBean view, we notice that the status is READY (to run), and the isRunning value is false. We also notice the logs from the last run in there, just in case one needs to do some post-mortem. If you don't like this, then you can remove the .log file in the script or change the MBean code to check for isRunning() before showing the logs.

For the longest time, JMX appeared to me to be useful for tool and middleware vendors, but somewhat pointless for application developers, so I never really pursued it. The only time I have seen it being used by application developers in web application development is to build in backdoors to the application to turn on and off verbose logging for a short time, for example.

This application seems to be a good use of JMX. Obviously there is much more to JMX than what I have done so far, but I plan to experiment with some more features of JMX and describe it in future posts.