Continuing my journey to prefect my scanned lecture notes, I’ll be reviewing my efforts for finding a good way to threshold scanned notes to black and white. I’ve spent several days experimenting with this stuff, and I think I’ve managed to improve on the basic methods used.
In the process of experimenting, I’ve come up with what I think are the 3 main hurdles into scanning notes (or text in general) to black and white.
- Bleeding. When using both sides of the paper the ink might be “bleed” through to the other side. Even if the ink doesn’t actually pass through, it might still be visible as kind of shadow, when scanning, just like when you hold a piece of paper in front of a light and you’re able to make out the text on the other side.
- Non-black ink. Photocopying blue ink, is notoriously messy. Scanning it to b&w, also imposes challenges.
- Skipping. This is an artifact that sometimes introduced when writing with a ballpoint pen. It’s a result of inconsistent ink flow, and is more rare with more liquid inks such as rollerballs or fountain pens.
Those issue can be visualized in the first three images. These images are the originals I’ve tested the various methods with. The other images are results of the various methods, explained in this post, and should convey the difference between them.
The basic way to convert the images to black and white, would be threshold them. A trivial improvement is achieved by using the
-monochome to automatically set the threshold. We can also disable dithering using the
+dither flag. As can be seen, with dithering, the ink bleeding produces an undesirable noise, while without it the skipping is very pronounced and the non-black ink look worse.
$ convert originals/XXX.png -monochrome XXX_mc.png $ convert originals/XXX.png +dither -monochrome XXX_mcd.png
A somewhat better way would be to use a two color quantization and only then convert it to b&w. More information can be found in ImageMagick quantization documentation.
$ convert originals/XXX.png +dither -colors 2 -colorspace gray -normalize XXX_2q.png
This slightly improved the non-black ink situation, but had no effect at all on the images with black ink (which is actually expected).
This lead me to think that my problems are local, e.g. the threshold should be computed locally, as I want different threshold around images compared to blank areas. My searches landed me in Fred’s great comparison of threshold methods, and finally in his
localthresh script. The script expects the foreground to be bright, which can be fixed using the
-n yes parameters to negate the image. There are a couple of supported methods on who to do the local thresholding. I’ve used the third, which is “local mean and mean absolute deviation”, which was the method that worked best. The radius should be big larger than the biggest feature (e.g. line) you’re trying to detect. I’ve used radius of 50 which it around 2mm for 600 DPI scan. Having a too smaller radius resulted in “outlining” of the text, while having a too big radius had little effect. The bias was chosen by trial and error, the bigger the bias the better it will clean up the “bleeding”, and the worse it will perform regarding the “skipping”. Having a smaller bias, will make the bleeding more pronounced, but will almost fix the skipping.
./localthresh -n yes -m 3 -r 50 -b 20 originals/XXX.png XXX_local50_20.png
This led me to think that I should attempt to remove the “bleeding” before trying to threshold the text. This would allow a “darker” threshold, without emphasizing the “bleeding”, as they will be gone at this stage. I’ve experimented quite a lot with this until I came up with the following process Using
and then thresholding to create a mask that contains the only the actual ink (with a bit of padding). Taking the mask composing it with the original image either via
screen or via
divide and then negating, results in a copy of the original image with the “bleeding” removed. This allows to use an aggressive threshold to emphasize the text.
Using a small geometry for the
-statistic part better removes the “bleeding” but also might remove fine lines. The first two attempts use the
Nonpeak statistic code which does a slightly better job removing bleeding, but seems to be more sensitive to the geometry. The last example of this kind uses the
minimum statistic, better handles the “skipping”, but seems to introduce a bit more noise adjacent to the text. The threshold values were found using trial and error.
$ convert originals/XXX.png \( +clone -statistic Nonpeak 10x10 -threshold 90% c\) -compose screen -composite -threshold 85% XXX_blur.png $ convert originals/XXX.png \( +clone -statistic Nonpeak 5x5 -threshold 90% \) -compose screen -composite -threshold 85% XXX_blur2.png $ convert originals/XXX.png \( +clone -statistic minimum 10x10 -negate -threshold 20% \) -compose Divide_Src -composite -threshold 85% XXX_blur4.png
This sums up several days (I got into it more than I expected. of experimenting with various methods. The images below should give you a general idea of how each method mentioned fares. I’ve also tried many variations and other methods, but the results weren’t interesting enough to report.
If you’re using a black ink, I think the last method fairs the best. It also preformed pretty well on the non-black ink. However, if you used a non-black ink, but you know that you don’t have skippings and bleedings to fix, I would go with the two color quantization.
If you know of other methods to preform this task, don’t hesitate to comment, as I would be interested to read.