Python for Economists

Code Snippets

Here you'll find snippets of python code for doing various data processing tasks. Below each snippet of code is an IPython %loadpy magic function that can be used like this:

In [1]: %loadpy

The %loadpy magic function accepts URLs or paths to local .py python scripts and returns the contents of the script to your IPython terminal (without executing the script). For example, %loadpy is handy when you want to make a quick edit to a remote or local python script before executing it.

Generally Useful Functions

Create a list containing the names of all files in a directory (dir_name) and its subdirectories (if 'sub_dir' is True).
def dir_list(dir_name, sub_dir, *args):
    file_list = []
    for file in os.listdir(dir_name):
        dirfile = os.path.join(dir_name, file)
        if os.path.isfile(dirfile):
            if len(args) == 0:
                if os.path.splitext(dirfile)[1][1:] in args:
        elif os.path.isdir(dirfile) and sub_dir:
            file_list += dir_list(dirfile, sub_dir, *args)
    return file_list

Merge all files in a list of files (file_list) into a single file (output_file).
def mergefiles(file_list, output_file):
    f = open(output_file, 'w')
    for file in file_list:
        print 'Writing file: %s' % file
    print "File created: %s" % output_file

Often times it's useful to combine the previous 2 scripts. That is, to have a function that takes the path to a directory as it's input and returns a single file that's created by merging every file in that directory (and all files in subdirectories too if sub_dir=True). To do so, use the entire dir_list function as the 'file_list' input of the mergefiles function. For example, the following command would merge every .txt file in /home/user/Desktop/Data (not including files in subdirectories) and save the merged content to a file called outputFile.txt in the working directory:

In [2]: mergefiles(dir_list('/home/user/Desktop/Data', False, 'txt'), 'outputFile.txt')

Remove all html tags from a string.
def striptags(raw_html):
    tag = [False]
    def checkit(i):
        if tag[0]:
            tag[0] = (i != '>')
            return False
        elif i == '<':
            tag[0] = True
            return False
        return True
    return ''.join(i for i in raw_html if checkit(i))

Unique lines (order preserving).
def uniquify(myList, idfun=None):
    if idfun is None:
        def idfun(x): return x
    seen,result = {},[]
    for item in myList:
        marker = idfun(item)
        if marker in seen: continue
        seen[marker] = 1
    return result

Return only the digits within a string using a lambda function.
def onlyDigits(myStr):
    return filter(lambda x: x.isdigit(), myStr)

Other Tricks and Tools

Change your user-agent using the requests module.
import requests
r = requests.get("")
oldUA = r.config['base_headers']['User-Agent']
newHeader = r.config['base_headers']
newHeader['User-Agent'] = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv: Gecko/20071127 Firefox/'
print "OLD: %s" % oldUA
print "NEW: %s" % newHeader['User-Agent']

Scrape data from multiple websites using regular expressions.
import requests,re
URLs = ['', '']
reStr = '<title>(.*?)</title>'
for url in URLs:
    page = requests.get(url).content
    print re.findall(re.compile(reStr), page)