econpy.org

Python for Economists

Introduction to HTTP Requests in Python

An HTTP request refers to the process of a user requesting a resource from a web server. The resource could be a file or even a short message (such as the contents of HTTP headers). An obvious example of where people use HTTP to request files is in the address bar of a typical browser, and an example of a short message that's communicated between a user and a server is the status code, which is sent from the server to the user in response to a user's request. The status code is the server's way of telling the user whether it was able to find the requested resource. If you're unfamilar with status codes, start by memorizing the two most common: 200 means the server was successful at locating the requested resource but 404 means the server was not able to locate it.

NOTE: HTTP is the language that users and web servers use to communicate with each other. Typically, the communication involves the user requesting a file or a short message from the server.


In this tutorial, I'll introduce 4 ways to retrieve the source code of a website. The first (next) page of this tutorial uses the httplib module from the Python standard library which, as you'll see, is the most tedious method of the four. The second page makes use of urllib2, which is also in the Python standard library, but it's marginally easier to work with than httplib. The third page looks at twill, a 3rd-party Python module built on top of mechanize. Twill is fundamentally different than the other three methods because it works by mimicking a browser that you program to do tasks, such as link following, form filling, pressing buttons and various other browser-level interactions. Finally, the module covered on the fourth page is the requests module, which I encourage you to learn to love, and you'll see why as you work through the first three examples.