Saturday, June 23, 2007

PyLucene: Python scripting for Lucene

I started learning Python about 3 years ago, and since then I have been trying to adapt it for all my scripting needs. Since I mostly do Java programming, I am not exactly what you would call a hardcore Python programmer. I find myself using Python mostly for database reporting, converting files of data from one format to another, etc. There have been times in the past when I would have to report on a Lucene index, or do some post-processing on an existing index to inject special one-off values on an index created by our index building pipeline, but my approach had been to simply write a Java program to do this. Since I dislike running Java programs from the command prompt (mainly because I have to write a shell script that sets the CLASSPATH), I end up writing a JUnit unit test to run the code. A lot of work, I know, but thats what I had to work with then.

I had read about PyLucene in the Lucene in Action book, but hadn't had the opportunity to actually download it and take it for a spin. This opportunity came up recently, and I am happy to report that installing and working with PyLucene was relatively painless and quite rewarding. In this post, I explain how I installed PyLucene on my Linux box and show two little scripts that I converted over from Java. From what I have seen, PyLucene has a strong following, but unlike me, these guys actually use PyLucene to build full fledged applications, not just little one-off scripts. Hopefully, once you see how simple it is, you will be encouraged to use it, even if you use a language such as Java or C# for mainline development.

PyLucene installation (Fedora Core 4 Linux)

The installation is relatively straightforward, but the instructions are not very explicit. I was trying to install on a box running Fedora Core 4 Linux, and there is no RPM package. Neither is there a package that can be installed by the standard "configure, make, make install" procedure. Seeing no pre-built packages for my distribution, I initially attempted to install from source, but ran into strange prompts that I could not answer, so I tried downloading the Unix binary distribution instead. I ended up copying the files from the binary distribution to my filesystem according to the README file included in this distribution.

1
2
3
4
5
6
sujit@sirocco:~/PyLucene-2.0$ ls
CHANGES  CREDITS  python  README  samples  test
sujit@sirocco:~/PyLucene-2.0$ cd python
sujit@sirocco:~/PyLucene-2.0/python$ ls
PyLucene.py  _PyLucene.so  security
sujit@sirocco:~/PyLucene-2.0/python$ cp -R /usr/lib/python-2.4/site-packages

Basically, I copied all the files under the python subdirectory of the downloaded binary distribution to my Python site-packages directory. That was the end of the installation.

To test this module, I decided to port the two Java programs I had written to do the simple index reporting and post-processing I spoke of earlier. Not only did they end up taking fewer lines of code to write, they are also at the right level of abstraction, since these things really deserve to be scripts. I also ended up setting up the groundwork to be able to build quick and dirty scripts to access and modify Lucene databases, just like I have for databases.

Script to report on crawled URLs in an Index

The script below just opens up an index whose directory is supplied on the command line, and returns a pipe-delimited report (which currently goes to stdout) of title and url. This can be useful for testing, since you will now know what kind of search term to enter for these indexes to come back with results. It can also be useful for verifying that we crawled the sites we were supposed to crawl.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/usr/bin/python
# Takes an index directory from the command line and produces a pipe
# delimited report of title and URL from the index.
import sys
import string
from PyLucene import IndexSearcher, StandardAnalyzer, FSDirectory

def usage():
  print " ".join([sys.argv[0], "/path/to/index/to/read"])
  sys.exit(-1)

def main():
  if (len(sys.argv) != 2):
    usage()
  path = sys.argv[1]
  dir = FSDirectory.getDirectory(path, False)
  searcher = IndexSearcher(dir)
  analyzer = StandardAnalyzer()
  numdocs = int(searcher.maxDoc())
  print "#-docs:", numdocs
  for i in range(1, numdocs):
    doc = searcher.doc(i)
    title = doc.get("title")
    url = doc.get("url")
    print "|".join([title.encode('ascii', 'replace'), url])
  searcher.close()

if __name__ == "__main__":
  main()

Script to inject additional precomputed data

This script takes a pre-built index as input and injects an additional field in some of the records depending on the URL. This can be useful if you set up your url field to be storable but do not tokenize it, so you may want to post process the index to match the URLs against one or more patterns and add in another facet field which you can then query on. In this case, the facet is set up as Index.UN_TOKENIZED so our application code will have to specify the exact facet its looking for.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/usr/bin/python
# Copies the index whose source directory is specified and copies it after
# transformations to the specified target directory. In this case, it looks
# at the URL and adds in a facet field.
import sys
import string
from PyLucene import IndexSearcher, IndexWriter, StandardAnalyzer, FSDirectory, Field

def usage():
  print " ".join([sys.argv[0], "/path/to/index/source", "/path/to/index/target"])
  sys.exit(-1)

def main():
  if (len(sys.argv) != 3):
    usage()
  srcPath = sys.argv[1]
  destPath = sys.argv[2]
  srcDir = FSDirectory.getDirectory(srcPath, False)
  destDir = FSDirectory.getDirectory(destPath, True)
  analyzer = StandardAnalyzer()
  searcher = IndexSearcher(srcDir)
  writer = IndexWriter(destDir, analyzer, True)
  numdocs = int(searcher.maxDoc())
  for i in range(1, numdocs):
    doc = searcher.doc(i)
    title = doc.get("title")
    url = doc.get("url")
    if (url.find("pattern1") > -1):
      doc.add(Field("facet", "pattern1", Field.Store.YES, Field.Index.UN_TOKENIZED))
    writer.addDocument(doc)
  searcher.close()
  writer.optimize()
  writer.close()

if __name__ == "__main__":
  main()

In both cases, the code should look familiar if you have worked with Lucene before. It is really the same Java classes wrapped up to be accessible through Python, so the only difference is the more compact Pythonic syntax. The one caveat is that PyLucene uses Lucene 1.4, whereas most Lucene shops are probably up at 2.0 or 2.1 (if you want to be on the bleeding edge). However, for one off scripts, the version difference should not make a difference most of the time, unless you are trying to use one of the newer features in your Python code.

Adding your own Analyzer to Luke

On a kind of related note, I was able to add Analyzers to my Luke application. I know support exists for this, and most Lucene programmers probably know how to do this already, but since there is no clear instructions on how to do this, I figured I'd write it up here. It's not hard once you know how. The standard shell script invocation for Luke is:

1
2
#!/bin/bash
java -jar $HOME/bin/lukeall-0.7.jar

I was experimenting with the Lucene based spell checker described in the Java.net: Did You Mean: Lucene? article, and I wanted to use the SubwordAnalyzer within Luke. Luke comes with a pretty comprehensive set of Analyzer implementations, but this one was not one of them. So I changed the script above to include the jar file that contained this class, along with its dependencies (such as commons-lang, commons-io, etc), and changed the java call to use -cp instead. Here is my new script to call Luke.

1
2
3
4
5
6
7
#!/bin/bash
M2_REPO=$HOME/.m2/repository
export CLASSPATH=$HOME/projects/spellcheck/target/spellcheck-1.0-SNAPSHOT.jar:\
  $M2_REPO/log4j/log4j/1.2.12/log4j-1.2.12.jar:\
  $M2_REPO/commons-io/commons-io/1.2/commons-io-1.2.jar:\
  $M2_REPO/commons-lang/commons-lang/2.2/commons-lang-2.2.jar
java -cp lukeall-0.7.jar:$CLASSPATH org.getopt.luke.Luke

And now I can use the SubwordAnalyzer from within Luke to query an index which used this analyzer to build an index out of a list of English words.

Saturday, June 16, 2007

A Spring/Tiles Example Web Application on Maven2

We were building a new web application recently, and being in a position to influence the development, I promptly chose to use the Maven2 web application directory structure. Spring is now our web application framework of choice, so that was a given. We have also had very good results with Tiles in our legacy web application, thanks to the efforts of one of my colleagues, so we also wanted to use Tiles here. I had set up a web application at my previous job to work with Spring and Tiles about 3 years ago, but the details were hazy, and the Tiles integration in our legacy application uses some proprietary components, so I decided to figure this out afresh for this project. Surprisingly, there does not seem to be much documentation about Spring/Tiles integration on the web, but I was able to build an example by piecing together information from various sources, which I describe here. Hopefully, the information will help someone in a similar situation.

We start off with a standard Spring application, with the web.xml containing a reference to the Spring DispatcherServlet, as shown below. The web.xml file lives in src/main/webapp/WEB-INF

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<?xml version="1.0" encoding="ISO-8859-1"?>
<web-app xmlns="http://java.sun.com/xml/ns/j2ee"  
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee 
      http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd" version="2.4">

  <servlet>
    <servlet-name>myapp</servlet-name>
    <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>

  <servlet-mapping>
    <servlet-name>myapp</servlet-name>
    <url-pattern>*.html</url-pattern>
  </servlet-mapping>

</web-app>

The myapp-servlet.xml referenced by the web.xml above. The myapp-servlet.xml also lives in src/main/webapp/WEB-INF and is shown below. The myapp-servlet.xml sets up the TilesConfigurer with the location of the tiles configuration file (tiles-def.xml), sets up the Tiles view resolver, and specifies the URL mappings to the respective Spring controllers. It also imports non-web bean definitions from the applicationContext.xml in src/main/resources. This is a personal preference, since I like to be able to unit test the non-web components using JUnit, and breaking this up into a separate configuration file makes this easier.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:util="http://www.springframework.org/schema/util"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans 
       http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
       http://www.springframework.org/schema/util 
       http://www.springframework.org/schema/util/spring-util-2.0.xsd">

  <import resource="classpath:applicationContext.xml" />

  <bean id="tilesConfigurer" 
      class="org.springframework.web.servlet.view.tiles.TilesConfigurer">
    <property name="definitions">
      <list>
        <value>/WEB-INF/tiles-def.xml</value>
      </list>
    </property>
  </bean>

  <bean id="viewResolver" 
      class="org.springframework.web.servlet.view.InternalResourceViewResolver">
    <property name="requestContextAttribute" value="requestContext"/>
    <property name="viewClass" 
        value="org.springframework.web.servlet.view.tiles.TilesView"/>
  </bean>

  <!-- URL Mappings -->
  <bean id="urlMapping" 
      class="org.springframework.web.servlet.handler.SimpleUrlHandlerMapping">
    <property name="alwaysUseFullPath" value="true"/>
    <property name="mappings">
      <props>
        <prop key="/example.html">exampleController</prop>
      </props>
    </property>
  </bean>
  
</beans>

We then define the various tiles. Our example layout contains 4 tiles, one each for static content for the header and footer, one for the left navigation toolbar, and one for the main body of the page. The tiles-def.xml lives in src/main/webapp/WEB-INF and is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE tiles-definitions PUBLIC
       "-//Apache Software Foundation//DTD Tiles Configuration 1.1//EN"
       "http://jakarta.apache.org/struts/dtds/tiles-config_1_1.dtd">
<tiles-definitions>

  <!-- Components -->
  <definition name="head-tile" path="/example/tiles/head.jsp"/>
  <definition name="left-nav-tile" path="/example/tiles/leftnav.jsp"/>
  <definition name="body-tile" path="/example/tiles/body.jsp"/>
  <definition name="foot-tile" path="/example/tiles/foot.jsp"/>

  <!-- Pages -->

  <!-- Example -->
  <definition name="example" path="/example/example.jsp">
    <put name="head-position" value="head-tile"/>
    <put name="left-nav-position" value="left-nav-tile"/>
    <put name="body-position" value="body-tile"/>
    <put name="foot-position" value="foot-tile"/>
  </definition>

</tiles-definitions>

Currently the only dynamic component is the main body, which uses the "who" parameter to fill out the "Hello ${who}" header. All others are static. The example.jsp page appears below, followed by the different tiles. The locations are in the path attribute in the definitions in the tiles-def.xml file. The root is at src/main/webapp, so /example is actually src/main/webapp/example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<%-- src/main/webapp/example/example.jsp --%>
<%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%>
<%@taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%>
<%@ taglib uri="http://struts.apache.org/tags-tiles" prefix="tiles" %>
<html>
  <head><title>Example Page</title></head>
  <body>
    <table cellspacing="0" cellpadding="0" border="0">
      <tr>
        <td colspan="2">
          <tiles:insert attribute="head-position"/>
        </td>
      </tr>
      <tr>
        <td width="25%">
          <tiles:insert attribute="left-nav-position"/>
        </td>
        <td width="75%">
          <tiles:insert attribute="body-position"/>
        </td>
      </tr>
      <tr>
        <td colspan="2">
          <tiles:insert attribute="foot-position"/>
        </td>
      </tr>
    </table>
  </body>
</html>

I realize that using table tags to layout pages are kind of frowned upon nowadays, but if you have been reading my posts, you will realize that I am not exactly a UI guru. So please bear with me, and mentally replace the table tags with the appropriate CSS magic that is less offensive. The tiles are quite simple, and they are shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
<!-- src/main/webapp/example/tiles/head.jsp -->
<h1>Da Korporate Header go here</h1>

<!-- src/main/webapp/example/tiles/leftnav.jsp -->
<ol>
  <li>Foo</li>
  <li>Bar</li>
</ol>

<!-- src/main/webapp/example/tiles/body.jsp -->
<h2>Hello ${who}</h2>
... body text filler ...

<!-- src/main/webapp/example/tiles/foot.jsp -->
<h1>Da Korporate Footer go here</h1>

So in effect, tiles have given us the ability to reuse JSP snippets on different pages. It is quite likely that the header and footer tiles, and perhaps the left nav tile, will be used across the entire application. So we need to create new tiles for only the body element for different applications.

The controller that backs this is referenced as exampleController in the myapp-servlet.xml and defined more fully in the applicationContext.xml file in src/main/resources. This file currently contains only the controller definition, but could be used to declare beans that the controller(s) depend on as well.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:util="http://www.springframework.org/schema/util"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans 
       http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
       http://www.springframework.org/schema/util 
       http://www.springframework.org/schema/util/spring-util-2.0.xsd">

  <!-- ExampleController -->
  <bean id="exampleController" class="com.mycompany.myapp.example.ExampleController">
    <property name="viewName" value="example"/>
  </bean>

</beans>

The actual Java code is quite simple. All it does is pick up the parameter "who" from the URL, and pass it through to the view as a ModelAndView attribute. The java tree is rooted at src/main/java, in case you did not already know.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
public class ExampleController extends ParameterizableViewController {

  public ModelAndView handleRequestInternal(HttpServletRequest request, 
      HttpServletResponse response) throws Exception {
    String who = ServletRequestUtils.getStringParameter(request, "who");
    ModelAndView mav = new ModelAndView();
    mav.addObject("who", (who == null ? "NULL" : who));
    mav.setViewName(getViewName());
    return mav;
  }

}

Start up the web application from the command line with "mvn jetty6:run" and hit the URL: http://localhost:8081/myapp/example.html?who=Sujit to see the following page:

We can also make certain tiles "smarter", in the sense that the Java logic backing these components need not be supplied by the main Spring controller, but can be specified separately. This can be useful when designing widgets for your web pages, which need to do significant processing on the request parameters before rendering the output. We could also refactor the logic out to some kind of service and have the main controller make a single call into it to get the renderable data, but this still means that we have to remember to pull data for each component in every new page controller we write. Specifying a controller for a tile is done in the controllerClass attribute in the tiles definitions.

For our example, we will make the left nav component smart. Depending on the value of the parameter "type" in the URL, it will display different lists. This requires specifying the controllerClass in the tiles definition for left-nav-tile in tiles-def.xml.

1
2
  <definition name="left-nav-tile" path="/example/tiles/leftnav.jsp" 
      controllerClass="com.mycompany.myapp.example.LeftNavController"/>

The controller is not a Spring controller, but a Tiles Controller. The code for that is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public class LeftNavController extends ControllerSupport {

  private String[][] menuItems = {
    new String[] {"Foo", "Bar"},
    new String[] {"London", "New York", "San Francisco", "Brussels"},
    new String[] {"Engineering", "Finance", "Marketing"}
  };
  
  public void execute(ComponentContext tileContext, 
      HttpServletRequest request, 
      HttpServletResponse response, 
      ServletContext servletContext) throws Exception {
    // decide what kind of menu to show based on parameter "type"
    String menuTypeStr = (String) tileContext.getAttribute("type");
    int menuType = 0;
    try {
      menuType = Integer.parseInt(menuTypeStr);
    } catch (NumberFormatException e) {}
    if (menuType < 0 || menuType > (menuItems.length - 1)) {
      menuType = 0;
    }
    String[] selectedMenuItem = menuItems[menuType];
    request.setAttribute("menu", selectedMenuItem);
  }
}

One small wrinkle. The type parameter has to be injected into the tile context for the left-nav tile. This is done by setting it in the layout example.jsp file from the request using a tiles:put element, like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
<%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%>
<%@taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%>
<%@ taglib uri="http://struts.apache.org/tags-tiles" prefix="tiles" %>
<html>
  ...
        <td width="25%">
          <tiles:insert attribute="left-nav-position">
            <tiles:put name="type" value="${param.type}"/>
          </tiles:insert>
        </td>
  ...
</html>

The corresponding tile leftnav.jsp in src/main/webapp/example/tiles also has to be modified to show the "menu" object we just pushed into the context using the LeftNavController. Here it is:

1
2
3
4
5
6
7
<%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%>
<%@taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%>
<ol>
  <c:forEach var="menuItem" items="${menu}">
    <li>${menuItem}</li>
  </c:forEach>
</ol>

Now, hitting the application with http://localhost:8081/aetna/example.html?who=Sujit&type=1 will produce the following page, which shows us that the type parameter is being correctly interpreted.

So this is it. The whole thing is not terribly complicated, but requires you to mess with a lot of XML files. The good news is that working with Tiles can significantly speed up your web application development and make it easier, as well as enforce a uniform look and feel to your web application. And once you set it up, and developers get used to the process of adding components and layouts, it will just become second nature and you will wonder how you worked without Tiles.

Saturday, June 09, 2007

Scaling images with Java

As a newly minted manager, I am finding myself doing things over the past couple of weeks that I would not normally do otherwise. One such thing was to cut images out of a PDF document, paste it into the GIMP and make scaled images suitable for use on web pages. Why would I do such a thing, you ask? Well, the project was running late, and even though I hadn't been part of the project up until now, I was now responsible for its delivery. One of the things that needed to get done was this, and someone had to do it, so I did.

I am usually a great believer in writing tools, since the time spent writing tools usually are recovered many times over in terms of productivity and morale gains. However, I mistook the scope of the work, thinking that there may just be a few images which needed to be handled this way. Also, since I don't do too much image-processing at work or at home, I did not know how quickly I could build the tool. Anyway, at the time, it did not seem like a good idea to spend time building the tool. In retrospect, I realize I was wrong.

The GIMP has multiple scriptable interfaces where you can write scripts to automate its behavior. There is the Scheme based script-fu, the Perl based perl-fu, and the Python based python-fu. I did find script-fu and python-fu based solutions on the Internet which I could have adapted. However, the person who was doing the image work was using Adobe Photoshop on Windows, and he was unfamiliar with the GIMP. So a GIMP based solution would not have been optimal in my case. I ultimately settled on building a Java based solution, since the rest of our content generation pipeline is Java-based. The code I came up with is loosely based on the code I found in the Real's Howto site.

We needed code that would take a directory full of JPEG files, scale it into three sizes - thumbnail, medium and large, and dump it an output location. The thumbnail will be used as a clickable image in sidebars, and will measure 60 pixels on its long side. The medium image would be used for inline display on the web page, and will measure 250 pixels on its long side. The large image will be in its own image page when the thumbnail is clicked, and will measure 450 pixels on its long side. We will specify the source directory where the input JPEG files are located, and an output directory. The code will create three subdirectories - thumbnails, medium and large and put the scaled images in the correct location.

Here is the code to do this. As you can see, there are two generate() methods. The parameter-less generator() method can be used to batch process a directory full of JPEG files, while the one with the file name can be process a single file. The one-argument generate() method is also the place where all the image processing code is happening. There are two setters which can be set using IoC from a Spring container (my choice), but which can also be set explicitly within code, as shown in the calling code example below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
/**
 * Takes an input image JPEG file, and produces 3 scaled versions of this
 * image: thumbnal, medium and large.
 * @author Sujit Pal
 */
public class ScaledImageGenerator {

  private static final Logger LOGGER = Logger.getLogger(ScaledImageGenerator.class);
  
  private String sourceDir;
  private String targetDir;
  
  private enum ImageSize {
    THUMBNAIL(60), 
    MEDIUM(250), 
    LARGE(450);
    
    int longSide;
    
    ImageSize(int longSide) {
      this.longSide = longSide;
    }
  };
  
  public ScaledImageGenerator() {
    super();
  }
  
  public void setSourceDir(String sourceDir) {
    this.sourceDir = sourceDir;
  }
  
  public void setTargetDir(String targetDir) {
    this.targetDir = targetDir;
  }

  /**
   * Process all JPEG files in the directory specified by sourceDir and
   * drop the different versions of the target images in the target directory.
   * This is a wrapper over the single file generate call.
   * @throws IOException if one is thrown.
   */
  public void generate() throws IOException {
    setUpDirectories();
    File sourceDirectory = new File(sourceDir);
    String[] jpegInputFiles = sourceDirectory.list(new FilenameFilter() {
      public boolean accept(File dir, String name) {
        return name.endsWith(".jpg");
      }
    });
    int totalFiles = jpegInputFiles.length;
    int i = 0;
    for (String jpegInputFile : jpegInputFiles) {
      i++;
      LOGGER.info("Processing file(" + i + "/" + totalFiles + "):" + jpegInputFile);
      generate(jpegInputFile);
    }
  }
  
  /**
   * Generates 3 images for a single input JPEG image.
   * @param imageFileName the name of the source JPEG image file.
   * @throws Exception if one is thrown.
   */
  public void generate(String imageFileName) throws IOException {
    BufferedImage sourceImage = ImageIO.read(
      new File(FilenameUtils.concat(sourceDir, imageFileName)));
    int srcWidth = sourceImage.getWidth();
    int srcHeight = sourceImage.getHeight();
    for (ImageSize imageSize : ImageSize.values()) {
      double longSideForSource = (double) Math.max(srcWidth, srcHeight);
      double longSideForDest = (double) imageSize.longSide;
      double multiplier = longSideForDest / longSideForSource;
      int destWidth = (int) (srcWidth * multiplier);
      int destHeight = (int) (srcHeight * multiplier);
      BufferedImage destImage = new BufferedImage(destWidth, destHeight, 
        BufferedImage.TYPE_INT_RGB); 
      Graphics2D graphics = destImage.createGraphics();
      AffineTransform affineTransform = 
        AffineTransform.getScaleInstance(multiplier, multiplier);
      graphics.drawRenderedImage(sourceImage, affineTransform);
      ImageIO.write(destImage, "JPG", new File(FilenameUtils.concat(
        getImageTargetDir(imageSize), imageFileName)));
    }
  }

  /**
   * Clean up target directories from previous run, if any, and create fresh
   * subdirectories under the target directory for the current run.
   * @throws Exception if one is thrown.
   */
  private void setUpDirectories() throws IOException {
    for (ImageSize imageSize : ImageSize.values()) {
      String imageTargetDir = getImageTargetDir(imageSize);
      FileUtils.deleteDirectory(new File(imageTargetDir));
      FileUtils.forceMkdir(new File(imageTargetDir));
    }
  }
  
  /**
   * Returns the target directory for the scaled image, given the ImageSize attribute.
   * Uses the targetDir setting that is injected in via the container.
   * @param imageSize the ImageSize.
   * @return the name of the target directory.
   */
  private String getImageTargetDir(ImageSize imageSize) {
    String imageTargetDir = null;
    switch (imageSize) {
    case THUMBNAIL:
      imageTargetDir = FilenameUtils.concat(targetDir, "thumbnails");
      break;
    case MEDIUM:
      imageTargetDir = FilenameUtils.concat(targetDir, "medium");
      break;
    case LARGE:
      imageTargetDir = FilenameUtils.concat(targetDir, "large");
      break;
    }
    return imageTargetDir;
  }
}

The example below illustrates calling code that works against a single file. This is pulled from my JUnit test. You will need to manually create the target directory, and set up three subdirectories - thumbnails, medium and large within it.

1
2
3
4
    ScaledImageGenerator generator = new ScaledImageGenerator();
    generator.setSourceDir("/path/to/source/directory");
    generator.setTargetDir("/path/to/target/directory");
    generator.generate("s_seagull.jpg");

To test this code, I used a royalty-free image from FreeDigitalPhotos.net. You can see the original image here. The output images from the code described above is shown below:

Thumbnail version
Medium Inline version
Large full size version

So there you have it. If the image paths need to be recorded in a database, this can be added quite simply in the code above as well. Its too late for this code to be of any use to the current project, but hopefully, having this as part of our codebase will help us in a similar project down the line.

Sunday, June 03, 2007

Restoring Windows XP on a Toshiba Satellite running Fedora

Last week I posted an article describing my experiences converting over my new Fujitsu Lifebook over from Microsoft Vista to Ubuntu Linux. This week I talk about going the other way for my old Toshiba Satellite laptop, restoring it back to use Windows XP using the Toshiba OEM Restore disk. I need to do this because the laptop will now be used by my (elementary) school-going son, who uses Windows XP on the family desktop computer and at school, so we don't want him getting confused switching back and forth between Linux and Windows. There are also a lot of games and educational software which run only on Windows and Mac OSX. I suppose I could get him to run these over Wine, but when I last tried it couple of years ago, I could not get sound to work, so I did not even try this time. Finally, according to my wife, one Linux snob is one too many for any given household, and we don't want to argue with that :-).

Just sticking the OEM Restore CD in and restarting the machine should theoretically be all that I needed to do, but if that was all it took, I wouldn't be writing this up. What I got instead, at the end of the restore process, was a machine that, when switched on, came up with an empty black screen, with the word "GRUB" on the left top corner.

Searching on the net, I found instances similar to this situation, where the machine responds with "Grub error 17". The problem was that GRUB (The Grand Unified Boot Loader that most Linux distributions use for booting) is written on the Master Boot Record (or MBR). Windows also needs to boot using information from the MBR. But apparently, the restore process cannot write to the MBR if it is already occupied by GRUB.

The solution is to clean up the MBR somehow. I found a rather vitriolic but quite detailed article from a person who made the switch from Windows to Linux, and was describing his frustrations and switch back after an extremely unhappy few weeks with Linux. The article describes the problems with "Grub error 17", and mentions the Ultimate Boot CD (aka UBCD), which solved the problem.

The Ultimate Boot CD is a free collection of PC system software tools, which can be used to boot up a PC. It reminds me of my system software tool collection (an overflowing case of 5.25" floppy disks) when I worked in software support many years ago, but it is of course much more comprehensive. Its available as an ISO, and I was able to burn a CD-RW from my Ubuntu laptop by downloading the ISO to my Desktop, right clicking the file and select "Write to CD", then "Ok" to actually burn the image to the CD.

I then booted the laptop with the UBCD, selected Filesystem Tools, then Partition Tools, then XFDISK (Extended FDISK). I then removed the one single partition and exited. Restoring from the OEM Restore CD and restarting sent me back to the original GRUB prompt, although it did tell me that it was doing a quick format of the C: drive.

I tried again, rebooting from the UBCD, selecting Filesystem Tools, then Partition Tools, then MBR Tools. A menu comes up, and I selected "Wipe MBR". Restoring from the OEM Restore CD and restarting worked this time. I booted into the original Windows XP operating system that the machine came with.

Anyway, that was all I had this week. While it did not take that long to write, it did take me a while to come to this solution. Having destroyed couple of CD-Rs before on my Fedora system, I was not sure I would be able to burn the UBCD disk right, but ultimately I found an Ubuntu tutorial which described the process. I am also relatively clueless about Windows and DOS, having stopped using both regularly quite some time ago, so it was quite a learning experience. My biggest concern was to not destroy the disk by using a wrong command, so I was extra careful, which takes more time. But I guess things turned out OK at the end.