In my last post, I mentioned that I presented at the Demystifying Deep Learning and Artificial Intelligence event at Oakland. My talk was about using Transfer Learning from and Fine tuning a Deep Convolutional Network (DCNN) trained on ImageNet to classify images in a different domain. The domain I chose was the images of the retina to detect varying stages of Diabetic Retinopathy (DR). The images came from the Diabetic Retinopathy competition on Kaggle.
In order to demonstrate the ideas mentioned in the presentation, I trained a few simple networks with a sample (1,000/35,000) of the data provided. My results were nowhere close to the competition winner, who achieved a Kappa score of 0.85 (a metric indicating agreement of predictions with labels), which is better than human performance (0.83 between a General Physicial and an Opthalmologist and 0.72 between an Optometrist and an Opthalmologist according to this forum post). Although my best model did achieve a Kappa score of 0.75 on my validation set, which puts me at around the 25-26 position on the public leaderboard.
The competition winner Benjamin Graham (min-pooling) posted his a description of his algorithm after the competition. One of the things he did was to preprocess the images so they had more uniformity in terms of brightness and shape. This made sense, since the images vary quite a bit along these dimensions, as you can see below.
I have been recently playing around with OpenCV, so I figured it would be interesting to apply some of these techniques to preprocess the images so they were more similar to each other. This post describes what I did.
I first tried to standardize on the size. As you can see, some images are more rectangular, with more empty space on the left and right, and some are more square. In fact, if you group loosely by aspect ratio, it turns out that there are three major size groups.
My first attempt at standardization was to find the edge of the circle representing the retina, then crop on the vertical tangent to the edge. I ended up not using this approach, but I include it here because I think it is interesting and maybe if I had more time and patience I might have figured out a way to use this approach instead of what I did.
The code to do so is shown below. The image is first read in as a grayscale image and converted to a matrix, then vertical and horizontal Sobel filters are applied to extract edges. Finally, we find the edge farthest from the center (approximated by the vertical center of the image) and crop vertically along this.
import cv2 def compute_edges(image): image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) image = cv2.GaussianBlur(image, (11, 11), 0) sobel_x = cv2.Sobel(image, cv2.CV_64F, 1, 0) sobel_x = np.uint8(np.absolute(sobel_x)) sobel_y = cv2.Sobel(image, cv2.CV_64F, 0, 1) sobel_y = np.uint8(np.absolute(sobel_y)) edged = cv2.bitwise_or(sobel_x, sobel_y) return edged def crop_image_to_edge(edged, threshold=10, margin=0.2): # find edge along center and crop mid_y = edged.shape // 2 notblack_x = np.where(edged[mid_y, :] >= threshold) if notblack_x.shape == 0: lb_x = 0 ub_x = edged.shape else: lb_x = notblack_x ub_x = notblack_x[-1] if lb_x > margin * edged.shape: lb_x = 0 if (edged.shape - ub_x) > margin * edged.shape: ub_x = edged.shape mid_x = edged.shape // 2 notblack_y = np.where(edged[:, mid_x] >= threshold) if notblack_y.shape == 0: lb_y = 0 ub_y = edged.shape else: lb_y = notblack_y ub_y = notblack_y[-1] if lb_y > margin * edged.shape: lb_y = 0 if (edged.shape - ub_y) > margin * edged.shape: ub_y = edged.shape cropped = edged[lb_y:ub_y, lb_x:ub_x, :] return cropped image = cv2.imread(image_name) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # left image edged = compute_edges(gray) # middle image cropped = crop_image_to_edge(gray) # right image
Although in this (and lots of other) cases, this gave me good results, but it failed on some where the edge could not be detected because the image was so dark. Also, as you can see from the histogram on the left below, aspect ratios of the original uncropped images had two distinct clusters, but after the cropping operation, the distribution is all over the place. Our objective was to standardize on the aspect ratio after the cropping operation, the kind of scenario shown on the histogram on the right.
The approach I came up with was to eyeball the aspect ratios. Most of them were around 1.3 and 1.5, so I decided based on some manual cropping that the best aspect ratio is around 1.2. The resulting histogram of aspect ratios is the one on the right above.
def crop_image_to_aspect(image, tar=1.2): # load image image_bw = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY) # compute aspect ratio h, w = image_bw.shape, image_bw.shape sar = h / w if h > w else w / h if sar < tar: return image else: k = 0.5 * (1.0 - (tar / sar)) if h > w: lb = int(k * h) ub = h - lb cropped = image[lb:ub, :, :] else: lb = int(k * w) ub = w - lb cropped = image[:, lb:ub, :] return cropped cropped = crop_image_to_aspect(image)
This is what the random sample of 9 retina images looks like after the cropping operation.
Next I tried looking at standardizing the brightnesses. Benjamin Graham's report suggests just subtracting the mean pixel value from each RGB channel, but I decided to do something a little fancier. First I converted each image to the HSV (Hue, Saturation, Value) color space and computed the mean value of V across all images in my sample. The value of V is a measure of the brightness of the image. I then computed the mean V per image. I then added the global V mean and subtracted the local V mean from each V, and converted it back to RGB.
def brighten_image_hsv(image, global_mean_v): image_hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV) h, s, v = cv2.split(image_hsv) mean_v = int(np.mean(v)) v = v - mean_v + global_mean_v image_hsv = cv2.merge((h, s, v)) image_bright = cv2.cvtColor(image_hsv, cv2.COLOR_HSV2RGB) return image_bright vs =  for image_dir, image_name in get_next_image_loc(DATA_DIR): image = cv2.imread(os.path.join(DATA_DIR, image_dir, image_name)) image_hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) h, s, v = cv2.split(image_hsv) vs.append(np.mean(v)) global_mean_v = int(np.mean(np.array(vs))) brightened = brighten_image_hsv(resized, global_mean_v)
As expected, this mean centering operation converts a somewhat skewed distribution of brightnesses to a more balanced one.
After mean centering the brightness values and converting back to RGB, our sample of 9 retina images looks like this. The resulting images are not as clean as the examples shown in the winner's competition report, where he mean centered directly on RGB. But the brightness does look roughly equal now.
In order to mean center by RGB, we compute the global mean of R, G and B channels across all the images, then subtract the individual R, G, and B channel means from the image. Code to do this is shown below:
def brighten_image_rgb(image, global_mean_rgb): r, g, b = cv2.split(image) m = np.array([np.mean(r), np.mean(g), np.mean(b)]) brightened = image + global_mean_v - m return brightened mean_rgbs =  for image_dir, image_name in get_next_image_loc(DATA_DIR): image = cv2.imread(os.path.join(DATA_DIR, image_dir, image_name)) image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) r, g, b = cv2.split(image_rgb) mean_rgbs.append(np.array([np.mean(r), np.mean(g), np.mean(b)])) global_mean_rgbs = np.mean(mean_rgbs, axis=0) brightened = brighten_image_rgb(resized, global_mean_rgbs)
The set of sample images, brightened by RGB channel, looks like this:
Sadly, the preprocessing does not actually translate to higher accuracy or Kappa scores. In fact, resizing and brightening the image using HSV results in a Kappa score of 0.68. Resizing and brightening using RGB results in a Kappa score of 0.61. Kappa score without pre-processing was 0.75. So preprocessing images actually had a negative effect in my case. However, knowing how to do this is good to know for the future, so I think it was time well spent.
The entire code for preprocessing the sample images, as well as printing a random sample of 9 images at each step, is available here in my project on GitHub.