I’ve been using python to write various bots and crawler for a long time. Few days ago I needed to write some simple bot to remove some 400+ spam pages in Sikumuna, I took an old script of mine (from 2006) in order to modify it. The script used ClientForm, a python module that allows you to easily parse and fill html forms using python. I quickly found that ClientForm is now deprecated in favor of mechanize. In the beginning I was partly set back by the change, as ClientForm was pretty easy to use, and
mechanize‘s documentation could use some improvement. However, I quickly changed my mind about
mechanize. The basic interface for
mechanize is a simple browser object, that litteraly allows you to browse using python. It takes care of handling cookies and such and it got similar form-filling abilities to ClientForm, but this time they are integrated into the browser object.
For future reference for myself, and as another code example to
mechanizes sparse documentation I’m giving below the gist of the simple bot I wrote:
self.browser = mechanize.Browser() self.browser.set_handle_robots(False) def login(self): self.browser.open(self.login_url) self.browser.select_form(name="userlogin") self.browser["wpName"] = self.username self.browser["wpPassword"] = self.password res = self.browser.submit() def find_pages(self, prefix): self.browser.open(self.find_pages_url) self.browser.select_form(nr=0) self.browser["from"] = prefix res = self.browser.submit() data = res.read() link_regex = re.compile('<td><a href="([^"]*)"[^<]*</a></td>') return link_regex.findall(data) def delete_page(self, page_url): self.browser.open(page_url + "&action=delete") if "Kindle" not in self.browser.title(): print self.browser.title() if raw_input("Confirm: ") != "y": return self.browser.select_form(nr=0) self.browser["wpReason"] = "Spam" self.browser.submit() def run(self, prefix): self.login() pages = self.find_pages(prefix) print "Found %d page" % len(pages) for i,page in enumerate(pages): print "Deleting", i self.delete_page(page)
This isn’t a complete code example, as the rest of the code is just mundane, but you can clearly see how simple it is to use
The interesting parts are:
- Initializing the browser object using
- Openning pages:
- Selecting forms:
browser.select_form(name="userlogin")(selecting forms by name)
browser.select_form(nr=0)(selecting forms by their sequential number in the page).
- Filling forms is done by assigning values to the form fields on the browser object:
browser["wpName"] = self.username
6 thoughts on “mechanize – Writing Bots in Python Made Simple”
Hey there, You have performed an excellent job. I will definitely digg it and in my view recommend to my friends. I am sure they will be benefited from this site.
I think you did not post the complete code.
Maybe, some of the code in the start is missing.
It’s intentional, this code snippet has everything needed to understand it, except some initialization code of the class which contains some confidential things like usernames and passwords.
Ervin, thats no complete code, but is very useful code for writing spider.
Could you possibly post the whole code without the confidential things, like putting username=username, password=password, or something like that?
Thank you Guy!
Was using Ruby, then discovered the dependency challenge and the antivirus warnings.
My interest is to create automation bots for both web and desktop automation.
I have looked into Rad Studio C++ and/or Delphi, Visual Studio and Qt.
Which of these three would you, yourself prefer to create automation bots to incorporate Python?
IronPython can be included within Visual Studio, and Qt can also incorporate Python, but am not sure about Rad Studio Berlin 10.