Category Archives: Uncategorized

Lossless JPEG rotation

JPEG is a lossy format, and naive rotation results in a loss of quality. JPEG does allow some lossless operations, such as rotation by 90 degrees and flipping, on the basic blocks (MCUs) that compromise the image. It also allows re-arranging those blocks. Using this lossless operation, it is possible to preform a lossless JPEG rotation. To do so, the rotated image mus meet some basic criteria like having it size a multiple of the MCU size (usually 16×16).

Not all programs preform a lossless JPEG rotation, so it is useful to be aware which does. I check a couple of commonly used program to see if they indeed preform lossless rotation. The testing procedure was:

  1. Start with the original JPEG photo.
  2. Rotate it once to the right using each program.
  3. Rotate a copy of the rotated photo back to the right using the same program.
  4. Compare using ImageMagick (compare -metric ae) the results.


Gnome’s Image Viewer 3.14.1 is lossless
Digikam (4.4.0) is lossless, however rotating with Digikam’s Image Editor is lossy.
Shotwell (0.20.1) does lossy rotation.

Default PBKDF2 Iteration Count for Encrypted Keys Generated by OpenSSL

When generating keys with openssl you have the option to encrypt them. It is done by specifying a cipher alogrithm, for example

openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:2048 -aes-128-cbc -out key.pem 

generates a 2048 bit RSA key and encrypts it with AES in CBC mode. OpenSSL will prompt you to provide a pass-phrase for the encryption. It is important to understand how that pass-phrase/password will be used to derive a key for the AES encryption. The whole encryption scheme is defined by something called PBES2 1, which in turn uses PBKDF2. The important factor on the computation complexity of PBKDF2, is the number of hash-iterations used.

OpenSSL doesn’t have an option in its command-line utilities to control that number of iterations. However, that number is allowed to change pretty much arbitrarly by the standard, so it is part of the ASN1 representation of the generated encrypted key.

$ openssl asn1parse -i -in key.pem  | head
    0:d=0  hl=4 l=1311 cons: SEQUENCE          
    4:d=1  hl=2 l=  73 cons:  SEQUENCE          
    6:d=2  hl=2 l=   9 prim:   OBJECT            :PBES2
   17:d=2  hl=2 l=  60 cons:   SEQUENCE          
   19:d=3  hl=2 l=  27 cons:    SEQUENCE          
   21:d=4  hl=2 l=   9 prim:     OBJECT            :PBKDF2
   32:d=4  hl=2 l=  14 cons:     SEQUENCE          
   34:d=5  hl=2 l=   8 prim:      OCTET STRING      [HEX DUMP]:F3098873E5AB1A81
   44:d=5  hl=2 l=   2 prim:      INTEGER           :0800
   48:d=3  hl=2 l=  29 cons:    SEQUENCE       

The line saying INTEGER :0800 states the number of iteration used (in hex notation) for the generated key.pem. It means that at least for OpenSSL 1.0.1, the default number of iterations is 0x800=2048. This number is relatively low in modern standards2.

  1. As the name suggest there is also PBES1, which is now obsolete. The main difference is that PBES1 only allowed DES and RC2 to be used as cipers. See RFC 2898 for more details. 
  2. Apple uses 10,000 iterations for iTunes passwords, and LastPass defaults to 5,000 

wxWidgets 2.8 to 3.0 Migration: Converting wxString to Numbers

wxWidgets provides a set of utility methods to converts wxString to various integer types such as ToLong(). While the documentation for those functions remained roughly the same between wxWidgets 2.8 and 3.0 the implementation did change. In wxWidgets 2.8, if the string was empty, using any of the number converstion functions would result in the value 0. But, in wxWidgets 3.0 it’s different as can be learned from the following comment in wxstring.cpp:

// notice that we return false without modifying the output parameter at all if
// nothing could be parsed but we do modify it and return false then if we did
// parse something successfully but not the entire string

This means that if you relied on ToLong() to store 0 to the pointer to long when given empty string, in wxWidgets 3.0 you will get uninitialized value there.

I also noticed when comparing the code of wxString in 2.8 and 3.0, that they implemented the integer conversion functions using C macros, while in 2.8 they used templates. I wonder why it was changed, as it looks more like a regression to me.

Restricting SSH Access to rsync

Passphrase-less SSH keys allows one to automate remote tasks by not requiring user intervention to enter a passphrase to decrypt the key. While this is convenient, is posses a security risk as the plain key can be used by anyone who gets hold of it to access the remote server. To this end, the developers of SSH allowed to restrict via the .ssh/authorized_keys the commands that can be executed of specific keys. This works great for simple commands, but as using rsync requires executing remote commands withe different arguments on the remote end, depending on the invocation on the local machine, it gets quite complicated to properly restrict it via .ssh/authorized_keys.

Luckily, the developers of rsync foresaw this problem and wrote a script called rrsync (for restricted rsync) specifically to ease the restricting keys to be used only for rsync via .ssh/authorized_keys. If you have rsync installed, rrsync should have been distributed along side it. In Debian/Ubuntu machines it can be found under /usr/share/doc/rsync/scripts/rrsync.gz. If you can find it there, you can download the script directly from here. On the remote machine, copy the script, unpacking if needed, and make it executable:

user@remote:~$ gunzip /usr/share/doc/rsync/scripts/rrsync.gz -c > ~/bin/rrsync
user@remote:~$ chmod +x ~/bin/rrsync

On the local machine, create a new SSH key and leave the passphrase empty (this will allow you to automate the rsync via cron). Copy the public key to the remote server.

user@local:~$ ssh-keygen -f ~/.ssh/id_remote_backup -C "Automated remote backup"
user@local:~$ scp ~/.ssh/ user@remote:~/

Once the public key is on the remote server edit ~/.ssh/authorized_keys and append the public key.

user@remote:~$ vim ~/.ssh/authorized_keys

(Vim tip: Use :r! cat to directly insert the contents of into a new line). Now prepend to the newly added line

command="$HOME/bin/rrsync -ro ~/backups/",no-agent-forwarding,no-port-forwarding,no-pty,no-user-rc,no-X11-forwarding

The command="..." restricts access of that public key by executing the given command and disallowing others. All the other no-* stuff further restrict what can be done with that particular public key. As the SSH daemon will not start the default shell when accessing the server using this public key, the $PATH environment variable will be pretty empty (similar to cron), hence you should specify the full path to the rrsync script. The two arguments to rrsync are -ro which restricts modifying the directory (drop it if you want to upload stuff to the remote directory) and the path to the directory you want to enable remote access to (in my example ~/backups/).

The result should look something like:

command="$HOME/bin/rrsync -ro ~/backups/",no-agent-forwarding,no-port-forwarding,no-pty,no-user-rc,no-X11-forwarding ssh-rsa AAA...vp Automated remote backup

After saving the file, you should be able to rsync files from the remote server to the local machine, without being prompted to for a password.

user@local:~$ rsync -e "ssh -i $HOME/.ssh/id_remote_backup" -av user@remote: etc2/

To things are needed to be noted:

  1. You need to specify the passphrase-less key in the rsync command (the -e "ssh -i $HOME/.ssh/id_remote_backup" part).
  2. The remote directory is always relative to the directory given to rrsync in the ~/.ssh/authorized_keys file.

Quickly Exiting Insert-Mode in Vim

Changing from insert mode to normal mode is usually quick. The other direction is more cumbersome. You either have to reach out for the escape key, or use the Ctrl-[ (which I never got used to).

After seeing a blog post suggesting to map jk to exit insert mode, I was inspired to create my own mapping. I chose kj because it’s faster to type, as typing inwards is faster than outwards (you can check for yourself by tapping with your fingers on your desk). To use it, add the following to your .vimrc:

:inoremap kj <ESC>

Now, whenever you are in insert mode, quickly typing kj will exit insert mode. It will introduce a short pause after typing k, but this is only a visual one, so it doesn’t actually slow you down. kj is one of the rarest bigrams in English, so you’ll almost never have to actually type it inside a text, but if you do, just wait a bit after typing k to type the j.

After writing this post, I’ve came across a Vim Wiki page listing all kinds of ways to avoid the escape key.

I’ve recently published my vimrc, take a look it might give you ideas for other neat tricks.

Binary Downloads are Back at GitHub

Eight months after dropping support for binary downloads, GitHub re-enables them and calls them Releases. It’s a welcomed move which in my opinion is vital as offering binary releases is crucial for any project in a compiled language which targets end-users (as opposed to developers). Plainly put, when a user wants to download and use some software, he doesn’t want to mess with compilation issues and dependency. Unless of-course he is a Gentoo user, and then he’s probably more of a developer than a regular user).

The new GitHub releases have a nice feature which allows, actually requires you, to tag your release in the version control. That’s something I haven’t seen in other project hosting and it’s looks really positive. However, they still lack a basic feature SourceForge has had for years – download stats. It’s really nice to be able to know how many people downloaded each release of your project. Even plain download counter will do, you don’t need the full-blown download stats SourceForge has. I really look forward and hope that GitHub will implement this.

Mozilla Persona

I came across today Mozilla Persona. It’s a Single-Sign-On (SSO) system that is similar to OpenID. While it looks like there is no need for yet-another-SSO, it does have some promising features compared to OpenID and especially OpenID provided by “Big Player” like Google and Facebook (actually Facebook doesn’t provide OpenID by similar working Facebook Connect).

The one main benefit is privacy. The first kind of privacy is related to the provider. In OpenID, the provider knows exactly where you’ve logged in too. For example, if I want to use my Google account as an OpenID to sign into a gardening forum, Google will know that I’ve signed up there and they will get notified every time I sign-in. Persona on the other hand, seems to sidestep this issue. After registering with a Persona provider (Mozilla offers one), the provider gives the user cryptographically signed token which he can present to sites he signs in to. The site can verify the validity of the certificate without telling the provider which user it wishes to validate.

Another aspect of privacy provided by Persona is our ease of creating alter-egos (and thus keep our anonymity on the net). Facebook and other OpenID like providers require extensive personal information and has real-name policy (which in case of violation can result in blocked account). Persona, by allowing you to register with any email address (think about Mailinator) it allows you to create these anonymous-persona. It also allows you more control on the kind of profile information it shares with providers.

There is last remaining issue, which still concerns me. If you use an OpenID provider, such as Google, and it decides to block your account then you lose access to all those places you authenticated to using that account. This can be worked around by setting up your own OpenID provider, but that’s not simple. I’m bot sure if Persona offers an easier way around it.

Overall, Persona looks very promising as an alternative to OpenID. If anyone has real experience with it, I would love to hear.

GitHub Stops Offering Binary Downloads

Only few month ago, almost anyone would swear by GitHub and curse SourceForge. GitHub was (and probably still) the fastest growing and by now the largest code repository, while SourceForge was the overthrown king. SourceForge looks like an archaic service despite some major facelifts while GitHub is the cool kid on the block. Recently, GitHub showed us why SourceForge is still relevant for the open-source community.

Back in December, GitHub dropped their support for downloading files from outside the code repository. They say that they believe that code should be distributed directly from the git repository. This is probably fine for projects written in dynamic languages (such as python, ruby, javascript) where no binary distribution is expected. However, this seems to me like a blow to any GitHub hosted C/C++ project. No one expects lay users to compile projects directly from source, it a hassle for most people except developers (and possibly Gentoo users :-)).

It might be a good idea on GitHub team, as they promote themselves as a developer collaboration tool, and also most of their projects a indeed in dynamic languages (see the top languages statistics). The GitHub teams offers in their post two solutions: Uploading files to Amazon S3 and switching to SourceForge, and I’ve read at least a few people recommending putting binary releases in the git repository (bad idea).

Overall, I think this move by GitHub, just turned SourceForge into the best code repository (for compiled code) once again.

Scanning Lecture Notes – Separating Colors

Continuing my journey to prefect my scanned lecture notes, I’ll be reviewing my efforts for finding a good way to threshold scanned notes to black and white. I’ve spent several days experimenting with this stuff, and I think I’ve managed to improve on the basic methods used.

In the process of experimenting, I’ve come up with what I think are the 3 main hurdles into scanning notes (or text in general) to black and white.

  1. Bleeding. When using both sides of the paper the ink might be “bleed” through to the other side. Even if the ink doesn’t actually pass through, it might still be visible as kind of shadow, when scanning, just like when you hold a piece of paper in front of a light and you’re able to make out the text on the other side.
  2. Non-black ink. Photocopying blue ink, is notoriously messy. Scanning it to b&w, also imposes challenges.
  3. Skipping. This is an artifact that sometimes introduced when writing with a ballpoint pen. It’s a result of inconsistent ink flow, and is more rare with more liquid inks such as rollerballs or fountain pens.

Those issue can be visualized in the first three images. These images are the originals I’ve tested the various methods with. The other images are results of the various methods, explained in this post, and should convey the difference between them.
Continue reading