Consider the following use case:
PREFIX = '/home/user/files/' full_path = os.path.join(PREFIX, filepath) read(full_path, 'rb') ...
filepath is user-controlled, a malicious user user might attempt a directory traversal (like setting
../../../etc/passwd). How can we make sure that filepath cannot traverse “above” our prefix? There are of course numerous solutions to sanitizing input against directory traversalthat. The easiest way (that I came up with) to do so in python is:
filepath = os.normpath('/' + filepath).lstrip('/')
It works because it turns the path into an absolute path, normalizes it and makes it relative again. As one cannot traverse above
/, it effectively ensures that the
filepath cannot go outside of
Post updated: see the comments below for explanation of the changes.
6 thoughts on “Preventing Directory Traversal in Python”
Hi, I try your solution on Python 2.7.3.
Almost works, but if filepath is begin with ‘/’, it has problem again.
Try another solution? Thanks.
PREFIX = ‘/home/user/files/’
filepath = “/etc/passwd”
filepath = os.path.normpath(‘/’ + filepath)[1:]
full_path = os.path.join(PREFIX, filepath)
full_path will be “/etc/passwd”
Thanks for pointing it out. In the use-case I had in mind for this snippet absolute paths weren’t a problem, but I should have tested it better anyway. Using
[1:]fixes the issue (I’ve updated the post as well). The reason for why it fails is quite surprising (in my opinion).
Explanation: The issue stemmed from two issues one in
normpathand the other in
os.path.join. It turns out that when
abspath) gets an absolute path starting with a single slash or 3+ slashes, the result would have a single slash. However, if the input had exactly two leading slashes the output will retain them. This behavior conforms to an obscore passage in the POSIX standard (last paragraph):
As a result, pythons leaves the two slashes intact which is kind of unexpected (as this bug report may attest).
The leading two slashes issue, means that after the string slicing the result is still an absolute path. Here comes another possible unexpected behavior (albeit this time well documented) – in case one of the arguments to
os.path.joinis an absolute path, the function would discard all preceding arguments. Thus in our case it would discard the prefix path, causing the bug.
Windows’ version: https://ideone.com/ErrOkF At least it breaks directory traversal with full name. Curious that output in ipython is ‘c:\c:\qwe’
Yes, replace ‘/’ with r’\/’ or os.sep
what’s wrong with os.path.basename? thx
Because that doesn’t let you have any directories at all in the user provided path.