econpy.org

Python for Economists

Code Snippets

Here you'll find snippets of python code for doing various data processing tasks. Below each snippet of code is an IPython %loadpy magic function that can be used like this:

In [1]: %loadpy http://econpy.pythonanywhere.com/scripts/foo.py

The %loadpy magic function accepts URLs or paths to local .py python scripts and returns the contents of the script to your IPython terminal (without executing the script). For example, %loadpy is handy when you want to make a quick edit to a remote or local python script before executing it.

Generally Useful Functions

Create a list containing the names of all files in a directory (dir_name) and its subdirectories (if 'sub_dir' is True).
def dir_list(dir_name, sub_dir, *args):
    file_list = []
    for file in os.listdir(dir_name):
        dirfile = os.path.join(dir_name, file)
        if os.path.isfile(dirfile):
            if len(args) == 0:
                file_list.append(dirfile)
            else:
                if os.path.splitext(dirfile)[1][1:] in args:
                    file_list.append(dirfile)
        elif os.path.isdir(dirfile) and sub_dir:
            file_list += dir_list(dirfile, sub_dir, *args)
    return file_list
%loadpy http://econpy.pythonanywhere.com/scripts/dir_list.py

Merge all files in a list of files (file_list) into a single file (output_file).
def mergefiles(file_list, output_file):
    f = open(output_file, 'w')
    for file in file_list:
        print 'Writing file: %s' % file
        f.write(open(file).read())
    f.close()
    print "File created: %s" % output_file
%loadpy http://econpy.pythonanywhere.com/scripts/mergefiles.py

Often times it's useful to combine the previous 2 scripts. That is, to have a function that takes the path to a directory as it's input and returns a single file that's created by merging every file in that directory (and all files in subdirectories too if sub_dir=True). To do so, use the entire dir_list function as the 'file_list' input of the mergefiles function. For example, the following command would merge every .txt file in /home/user/Desktop/Data (not including files in subdirectories) and save the merged content to a file called outputFile.txt in the working directory:

In [2]: mergefiles(dir_list('/home/user/Desktop/Data', False, 'txt'), 'outputFile.txt')

Remove all html tags from a string.
def striptags(raw_html):
    tag = [False]
    def checkit(i):
        if tag[0]:
            tag[0] = (i != '>')
            return False
        elif i == '<':
            tag[0] = True
            return False
        return True
    return ''.join(i for i in raw_html if checkit(i))
%loadpy http://econpy.pythonanywhere.com/scripts/striptags.py

Unique lines (order preserving).
def uniquify(myList, idfun=None):
    if idfun is None:
        def idfun(x): return x
    seen,result = {},[]
    for item in myList:
        marker = idfun(item)
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result
%loadpy http://econpy.pythonanywhere.com/scripts/uniquify.py

Return only the digits within a string using a lambda function.
def onlyDigits(myStr):
    return filter(lambda x: x.isdigit(), myStr)
%loadpy http://econpy.pythonanywhere.com/scripts/onlydigits.py

Other Tricks and Tools

Change your user-agent using the requests module.
import requests
r = requests.get("http://econpy.pythonanywhere.com/ex/cpu.html")
oldUA = r.config['base_headers']['User-Agent']
newHeader = r.config['base_headers']
newHeader['User-Agent'] = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
print "OLD: %s" % oldUA
print "NEW: %s" % newHeader['User-Agent']
%loadpy http://econpy.pythonanywhere.com/scripts/useragent.py

Scrape data from multiple websites using regular expressions.
import requests,re
URLs = ['http://econpy.pythonanywhere.com/ex/001.html', 'http://econpy.pythonanywhere.com/ex/cpu.html']
reStr = '<title>(.*?)</title>'
for url in URLs:
    page = requests.get(url).content
    print re.findall(re.compile(reStr), page)
%loadpy http://econpy.pythonanywhere.com/scripts/regexmultiplepages.py