Scanning Lecture Notes – Separating Colors

Continuing my journey to perfect my scanned lecture notes, I’ll be reviewing my efforts to find a good way to threshold scanned notes to black and white. I’ve spent several days experimenting with this stuff, and I think I’ve managed to improve on the basic methods used.

In the process of experimenting, I’ve come up with what I think are the 3 main hurdles in scanning notes (or text in general) to black and white.

  1. Bleeding. When using both sides of the paper, the ink might “bleed” through to the other side. Even if the ink doesn’t actually pass through, it might still be visible as a kind of shadow when scanning, just like when you hold a piece of paper in front of a light and are able to make out the text on the other side.
  2. Non-black ink. Photocopying blue ink is notoriously messy. Scanning it to b&w also imposes challenges.
  3. Skipping. This is an artifact that is sometimes introduced when writing with a ballpoint pen. It’s a result of inconsistent ink flow and is more rare with more liquid inks such as rollerballs or fountain pens.

Those issues can be visualized in the first three images. These images are the originals I’ve tested the various methods with. The other images are results of the various methods explained in this post and should convey the difference between them.

The basic way to convert the images to black and white would be to threshold them. A trivial improvement is achieved by using -monochome to automatically set the threshold. We can also disable dithering using the +dither flag. As can be seen, with dithering, the ink bleeding produces undesirable noise, while without it, the skipping is very pronounced and the non-black ink looks worse.

$ convert originals/XXX.png -monochrome XXX_mc.png
$ convert originals/XXX.png +dither -monochrome XXX_mcd.png

A somewhat better way would be to use a two-color quantization and only then convert it to b&w. More information can be found in ImageMagick quantization documentation.

$ convert originals/XXX.png +dither -colors 2 -colorspace gray -normalize XXX_2q.png

This slightly improved the non-black ink situation, but had no effect at all on the images with black ink, which is actually expected.

This led me to think that my problems are local; e.g., the threshold should be computed locally, as I want a different threshold around images compared to blank areas. My searches landed me on Fred’s great comparison of threshold methods, and finally on his localthresh script. The script expects the foreground to be bright, which can be fixed using the -n yes parameter to negate the image. There are a couple of supported methods for how to do the local thresholding. I’ve used the third, which is “local mean and mean absolute deviation,” which was the method that worked best. The radius should be a bit larger than the biggest feature (e.g., line) you’re trying to detect. I’ve used a radius of 50, which is around 2 mm for a 600 DPI scan. Having too small a radius resulted in “outlining” of the text, while having too big a radius had little effect. The bias was chosen by trial and error; the bigger the bias, the better it will clean up the “bleeding,” and the worse it will perform regarding the “skipping.” Having a smaller bias will make the bleeding more pronounced, but will almost fix the skipping.

./localthresh -n yes -m 3 -r 50 -b 20 originals/XXX.png XXX_local50_20.png

This led me to think that I should attempt to remove the “bleeding” before trying to threshold the text. This would allow a “darker” threshold without emphasizing the “bleeding,” as it will be gone at this stage. I’ve experimented quite a lot with this until I came up with the following process: using -statistic and then thresholding to create a mask that contains only the actual ink, with a bit of padding. Taking the mask and composing it with the original image either via screen or via divide, and then negating, results in a copy of the original image with the “bleeding” removed. This allows the use of an aggressive threshold to emphasize the text.

Using a small geometry for the -statistic part better removes the “bleeding,” but also might remove fine lines. The first two attempts use the Nonpeak statistic code, which does a slightly better job removing bleeding, but seems to be more sensitive to the geometry. The last example of this kind uses the minimum statistic, better handles the “skipping,” but seems to introduce a bit more noise adjacent to the text. The threshold values were found using trial and error.

$ convert originals/XXX.png ( +clone -statistic Nonpeak 10x10 -threshold 90% c) -compose screen -composite -threshold 85% XXX_blur.png
$ convert originals/XXX.png ( +clone -statistic Nonpeak 5x5 -threshold 90% ) -compose screen -composite -threshold 85% XXX_blur2.png
$ convert originals/XXX.png ( +clone -statistic minimum 10x10 -negate -threshold 20% ) -compose Divide_Src -composite -threshold 85% XXX_blur4.png

Conclusions

This sums up several days of experimenting with various methods, more than I expected to get into it. The images below should give you a general idea of how each method mentioned fares. I’ve also tried many variations and other methods, but the results weren’t interesting enough to report.

If you’re using black ink, I think the last method fares the best. It also performed pretty well on the non-black ink. However, if you used non-black ink, but you know that you don’t have skipping and bleeding to fix, I would go with the two-color quantization.

If you know of other methods to perform this task, don’t hesitate to comment, as I would be interested to read.

Leave a Reply

Your email address will not be published. Required fields are marked *