mechanize – Writing Bots in Python Made Simple

I’ve been using python to write various bots and crawler for a long time. Few days ago I needed to write some simple bot to remove some 400+ spam pages in Sikumuna, I took an old script of mine (from 2006) in order to modify it. The script used ClientForm, a python module that allows you to easily parse and fill html forms using python. I quickly found that ClientForm is now deprecated in favor of mechanize. In the beginning I was partly set back by the change, as ClientForm was pretty easy to use, and mechanize‘s documentation could use some improvement. However, I quickly changed my mind about mechanize. The basic interface for mechanize is a simple browser object, that litteraly allows you to browse using python. It takes care of handling cookies and such and it got similar form-filling abilities to ClientForm, but this time they are integrated into the browser object.

For future reference for myself, and as another code example to mechanizes sparse documentation I’m giving below the gist of the simple bot I wrote:

        self.browser = mechanize.Browser()
        self.browser.set_handle_robots(False)
 
def login(self):
    self.browser.open(self.login_url)
    self.browser.select_form(name="userlogin")
    self.browser["wpName"] = self.username
    self.browser["wpPassword"] = self.password
    res = self.browser.submit()
 
def find_pages(self, prefix):
    self.browser.open(self.find_pages_url)
    self.browser.select_form(nr=0)
    self.browser["from"] = prefix
    res = self.browser.submit()
 
    data = res.read()
    link_regex = re.compile('<td><a href="([^"]*)"[^<]*</a></td>')
    return link_regex.findall(data)
 
def delete_page(self, page_url):
    self.browser.open(page_url + "&action=delete")
    if "Kindle" not in self.browser.title():
        print self.browser.title()
        if raw_input("Confirm: ") != "y":
            return
    self.browser.select_form(nr=0)
    self.browser["wpReason"] = "Spam"
    self.browser.submit()
 
def run(self, prefix):
    self.login()
    pages = self.find_pages(prefix)
    print "Found %d page" % len(pages)
    for i,page in enumerate(pages):
        print "Deleting", i
        self.delete_page(page)

This isn’t a complete code example, as the rest of the code is just mundane, but you can clearly see how simple it is to use mechanize.

The interesting parts are:

  • Initializing the browser object using mechanize.Browser()
  • Openning pages: browser.open(url)
  • Selecting forms: browser.select_form(name="userlogin") (selecting forms by name) browser.select_form(nr=0) (selecting forms by their sequential number in the page).
  • Filling forms is done by assigning values to the form fields on the browser object: browser["wpName"] = self.username
  • Submitting: browser.submit()

6 thoughts on “mechanize – Writing Bots in Python Made Simple

  1. ielts

    Hey there, You have performed an excellent job. I will definitely digg it and in my view recommend to my friends. I am sure they will be benefited from this site.

  2. Ervin

    Hi,

    I think you did not post the complete code.
    Maybe, some of the code in the start is missing.

  3. Guy Post author

    It’s intentional, this code snippet has everything needed to understand it, except some initialization code of the class which contains some confidential things like usernames and passwords.

  4. joe

    Could you possibly post the whole code without the confidential things, like putting username=username, password=password, or something like that?

  5. Jon

    Thank you Guy!

    Was using Ruby, then discovered the dependency challenge and the antivirus warnings.

    My interest is to create automation bots for both web and desktop automation.

    I have looked into Rad Studio C++ and/or Delphi, Visual Studio and Qt.

    Which of these three would you, yourself prefer to create automation bots to incorporate Python?

    IronPython can be included within Visual Studio, and Qt can also incorporate Python, but am not sure about Rad Studio Berlin 10.

Leave a Reply

Your email address will not be published. Required fields are marked *

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.