Few days ago I came up with an idea to create a true random number generator based on noise gathered from a cheap microphone attached to my computer. Tests showed that when sampling the microphone, the least significant bit behaves pretty randomly. This lead me to think it might be good source for gathering entropy for a true random number generator.
The base design was to gather the noise from the microphone than apply a process that will make in more uniform and refine its randomness. After some design iterations I came up with a process based on applying a hash function to the noise. Each iteration involves filling block of the hash function from the least significant bits of the microphone output and applying the hash. Each iteration outputs the current hash digest. Assuming the hash function is uniform, this will output a uniformly distributed blocks of bits. Furthermore, because there the previous state of the hash function influences the next digest computation, the process accumulates entropy that can smooth out potentially less random blocks. Because for all commonly used hash function the block size is much larger than the digest size the output can tell much about the current state or any future or past state. This also holds true even if someone can find all pre-images of the hash function as the amount of possible states will be too big.
I’ve built a Python proof of concept (using md5 as a hash function) suitable for Linux.
import hashlib import struct class GRandom: def __init__(self): self.audio = open("/dev/dsp","rb") self.hash = hashlib.md5() def get_raw_block(self): buffer = self.audio.read(self.hash.block_size*8) bytes = struct.unpack("%iB"%len(buffer), buffer) longs =  for i in range(self.hash.block_size/4): temp = 0 for b in bytes[i*32:(i+1)*32]: temp = (temp << 1) ^ (b & 1) longs.append(temp) return struct.pack("%iI"%len(longs), *longs) def get_block(self): self.hash.update(self.get_raw_block()) return self.hash.digest()
The amount of generated bits per second is given by (sample rate)*(digest size)/(block size). So for 8KHz (default) sampling rate and md5 we’ll get a theoretical speed of 2000b/s. SHA type hashes have higher digest to block size ration thus may result in higher speeds. Another source of speed up may be to change the sample rate of the microphone. But setting it too high may have negative effects on the entropy. The code may get a considerable performance gain by porting it to c/c++, as it uses both bit manipulations and calculates hashes. Anyways, even the Python implementation’s speed allows us it be used for many cases where true randomness is required, such as generating passwords.
Update 2010-08-07 I’ve corrected the speed calculation.