Preventing Directory Traversal in Python

Consider the following use case:

PREFIX = '/home/user/files/'
full_path = os.path.join(PREFIX, filepath)
read(full_path, 'rb')

Assuming that filepath is user-controlled, a malicious user user might attempt a directory traversal (like setting filepath to ../../../etc/passwd). How can we make sure that filepath cannot traverse “above” our prefix? There are of course numerous solutions to sanitizing input against directory traversalthat. The easiest way (that I came up with) to do so in python is:

filepath = os.normpath('/' + filepath).lstrip('/')

It works because it turns the path into an absolute path, normalizes it and makes it relative again. As one cannot traverse above /, it effectively ensures that the filepath cannot go outside of PREFIX.

Post updated: see the comments below for explanation of the changes.

6 thoughts on “Preventing Directory Traversal in Python”

  1. Hi, I try your solution on Python 2.7.3.
    Almost works, but if filepath is begin with ‘/’, it has problem again.
    Try another solution? Thanks.

    For example:
    import os
    PREFIX = ‘/home/user/files/’
    filepath = “/etc/passwd”
    filepath = os.path.normpath(‘/’ + filepath)[1:]
    full_path = os.path.join(PREFIX, filepath)
    print full_path
    full_path will be “/etc/passwd”

  2. Thanks for pointing it out. In the use-case I had in mind for this snippet absolute paths weren’t a problem, but I should have tested it better anyway. Using .lstrip('/') instead of [1:] fixes the issue (I’ve updated the post as well). The reason for why it fails is quite surprising (in my opinion).

    Explanation: The issue stemmed from two issues one in normpath and the other in os.path.join. It turns out that when normpath (or abspath) gets an absolute path starting with a single slash or 3+ slashes, the result would have a single slash. However, if the input had exactly two leading slashes the output will retain them. This behavior conforms to an obscore passage in the POSIX standard (last paragraph):

    A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash.

    As a result, pythons leaves the two slashes intact which is kind of unexpected (as this bug report may attest).

    The leading two slashes issue, means that after the string slicing the result is still an absolute path. Here comes another possible unexpected behavior (albeit this time well documented) – in case one of the arguments to os.path.join is an absolute path, the function would discard all preceding arguments. Thus in our case it would discard the prefix path, causing the bug.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.