User Tools

Site Tools


python_cookbook

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
python_cookbook [2015/10/01 16:28]
mantis [Writing a config file]
python_cookbook [2019/03/05 09:51]
mantis [Testing]
Line 1: Line 1:
 +====== Basic Language Features ======
 +===== "​Everything is an Object"​ =====
 +And by everything they mean _everything_. For example it is legal to do the following
 +
 +<code python>
 +#define a function (which actually creates a function object)
 +def function():
 +    function.count += 1 # count is an attribute of the function object (defined below)
 +    print "​Function called %d times" % function.count
 +
 +function.count=0 #add an attribute to the function ob
 +function() ​ # --> Function called 1 times
 +function() ​ # --> Function called 2 times
 +function() ​ # --> Function called 3 times
 +</​code>​
 +Yes that is legal python code. Swallow it.
 +
 +===== Global Variables =====
 +If a variable name is read in python all namespaces are searched in order, until a variable with that name is found. However if a variable is changed and it does not exist in the local namespace it is **created** thus shadowing a global variable with the same name. 
 +
 +Note that:
 +  * this only applies to //​assignment//​
 +  * Modifiying a mutable global variable (eg. adding elements to a list) is possible without declaring the variable as global
 +
 +To reassign a global variable in a function it needs to be declared as **global** before it is used:
 +
 +<code python>
 +myGlobalVar = 23
 +def test():
 +   ​global myGlobalVar
 +   ​myGlobalVar=42
 +</​code>​
 +==== Some Global Weirdness ====
 +Its also interesting to note that (because of the above)
 +
 +<code python>
 +x = 3
 +def test():
 +  print x
 +
 +#​-->​Works
 +
 +
 +x=3
 +def test():
 +  print x
 +  x = 7
 +
 +#​-->​Gives an **error** (variable used before defined).
 +</​code>​
 +Therefore if you want to use a global variable its best to simply put the "​global myVar" line at the start of the function.
 +
 +===== Modules =====
 +Modules can be nested in a package structure similar to java. Eg:
 +
 +<​code>​
 +  - src
 +  |    main.py
 +  |----test
 +  |        |__init__.py
 +  |--------|muh
 +               |---- __init__.py
 +               ​|MyMuh.py
 +</​code>​
 +The //​__init__.py//​ files need to exist in every directory, altough they are allowed to be empty.
 +
 +In order to import such a nested module the root of the directory structure (in this example the directory src) must be included in PYTHONPATH:
 +
 +in main.py you have to write
 +
 +<​code>​
 +import test.muh.MyMuh as muhModule ​
 +muhModule.doMuh()
 +</​code>​
 +===== Namespaces =====
 +(from [[[http://​www.diveintopython.org/​html_processing/​locals_and_globals.html|diveintopython.org]]]:​)
 +
 +Namespace order # local namespace - specific to the current function or class method # global namespace - specific to the current module # built-in namespace - global to all modules
 +
 +Accessing locals with the function locals() returns a copy of that namespace. Accessing globals with globals() returns the actual namespace
 +
 +===== Sequences =====
 +==== Sorting a Sequence ====
 +<code python>
 +l=[3,​2,​15,​3,​2,​1,​4]
 +lSorted = sorted(l)
 +</​code>​
 +Default sorting order is ascending
 +
 +==== Descending Sort ====
 +<code python>
 +l = [3,​2,​15,​3,​2,​1,​4]
 +lReverseSorted = sorted(l, reverse=True)
 +</​code>​
 +==== Sorting a list by a certain element of a list item ====
 +Having a sequence consisting of tuples(or sequences) like 
 +
 +<code python>
 +mylist=((1,'​one'​),​(0,'​zero'​),​(4,'​four'​))
 +</​code> ​
 +
 +you can easily sort them by one element by using the itemgetter function:
 +
 +<code python>
 +from operator import itemgetter
 +mysorted = sorted(mylist,​key=itemgetter(0))
 +</​code>​
 +(for the first part of the tuples also.)
 +
 +If the list item is a class, then a lambda function has to be used:
 +
 +<code python>
 +mySorted = sorted(mylist,​ key = lambda element: element.myKeyAttribute)
 +</​code>​
 +==== Sorting a list of lists by length ====
 +<code python>
 +tmp = [[[1,​2,​3],​[3],​[6,​7]]]
 +sorted(tmp, lambda x,y: len(x)-len(y))
 +</​code>​
 +==== Sorting values in a Dict ====
 +<code python>
 +import operator
 +items = sorted(my_dict.items(),​ key=operator.itemgetter(1)) # sort by value, itemgetter(0) to sort by key
 +</​code>​
 +==== List intersection/​union/​difference ====
 +Union A∪B
 +
 +<code python>
 +union=A+filter(lambda x:x not in A,B)
 +</​code>​
 +Intersection A∩B
 +
 +<code python>
 +intersection=filter(lambda x:x in A,B)
 +</​code>​
 +Difference A\B
 +
 +<code python>
 +difference=filter(lambda x:x not in B,A)
 +</​code>​
 +Symmetrical difference AΔB
 +
 +<code python>
 +symdifference=filter(lambda x:x not in B,​A)+filter(lambda x:x not in A,B)
 +</​code>​
 +==== List comprehension ====
 +An easy way to define lists
 +
 +<code python>
 +noprimes = [j for i in range(2, 8) for j in range(i*2, 50, i)]
 +primes = [x for x in range(2, 50) if x not in noprimes]
 +</​code>​
 +==== Subtracting two lists ====
 +To element-wise subtract two lists from each other
 +
 +<code python>
 +# define lists x=[5,6,7]
 +y=[3,4,5]
 +# subtract them element-wise
 +import operator
 +map(operator.sub,​ x, y)
 +>>>​ [2,2,2]
 +</​code>​
 +==== Flattening lists ====
 +<code python>
 +l = [[[1,​2,​3],​[4,​5,​6],​ [7], [8,9]]]
 +[item for sublist in l for item in sublist]
 +</​code>​
 +
 +==== Iterating ====
 +
 +The docs of [[https://​docs.python.org/​2/​library/​itertools.html#​recipes|itertools]] have lots of useful recipes.
 +
 +Favourites:
 +
 +<code python>
 +from itertools import tee, izip
 +
 +def pairwise(iterable):​
 +    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
 +    a, b = tee(iterable)
 +    next(b, None)
 +    return izip(a, b)
 +    ​
 +l = range(4)
 +for i in pairwise(l):​
 +    print i
 +</​code>​
 +
 +===== Dict comprehension =====
 +<code python>
 +dict([(i, chr(65+i)) for i in range(4)])
 +</​code>​
 +===== Merging Sequences =====
 +<code python>
 +def merge(*input):​
 +    return reduce(list.add,​ input, list())
 +</​code>​
 +Using:
 +
 +<code python>
 +a=[1,3,4]
 +b=[5,6]
 +c=[[a","​b","​c]]
 +merge(a,​b,​c) # [1, 3, 4, 5, 6, '​a',​ '​b',​ '​c'​]
 +</​code>​
 +===== Tuples =====
 +==== "​Modifying"​ a Tuple ====
 +Tuples are immutable and can not be modified. In order to change one element in a tuple, a new tuple has to be constructed. For example to add 36 to the 4th element of the tuple:
 +
 +<code python>
 +t=(1,​2,​3,​4,​5,​6,​7,​8,​9) ​
 +t1 = (t[:3] + (t[3]+36,) + t[4:])
 +</​code>​
 +
 +Or simply create a list from the tuple
 +
 +<code python>
 +t=(1,​2,​3,​4,​5,​6,​7,​8,​9) ​
 +t1 = list (t) 
 +t1[3] += 36
 +</​code>​
 +===== Lambda functions =====
 +Lambda functions are inline functions with a simplified syntax. They are best used for functionality which is not really reusable in other parts of the code. See [[http://​www.diveintopython.org/​power_of_introspection/​lambda_functions.html|diveintopython.org]] for more.
 +
 +<code python>
 +g = lambda x: x*2
 +# g(3) will return 6
 +</​code>​
 +===== and or =====
 +Those two keywords act boolean but they return one of the values they compare. Evaluation is from left to right. ​ <​nowiki>​0,​ //, [], (), {}, and None are false in a boolean context; everything else is true.</​nowiki>​ As can be expected, for and: if every value is true, the last true one is returned. If every value is false, the first false one is returned. For or it is the other way round. ​
 +
 +===== Ternary Operator =====
 +In **Python 2.5** there is a built in ternary operator (if?​then:​else ):
 +
 +<code python>
 +x = 12 if (y>0) else -12
 +</​code>​
 +
 +
 +==== Ternary Operator pre Python 2.5 ====
 +In **older Python versions** and/or can be used to emulate the ternary operator
 +
 +<code python>
 +1 and "​first"​ or "​second"​ # returns first
 +</​code>​
 +
 +However since ""​ is false in a boolean context
 +<code python>
 +1 and ""​ or "​second"​ # returns second
 +</​code>​
 +so **be careful**!
 +
 +A safe way to emulate the ternary operator if?​then:​else is
 +
 +<code python>
 +result =  ((c>3) and [a] or [b])[0] # if c>3 return a else return b
 +</​code>​
 +See [[[http://​diveintopython.org/​power_of_introspection/​and_or.html|diveintopython.org]]] for more.
 +
 +===== Reloading modules =====
 +<code python>
 +import qgisfleettools as q
 +q.doStuff()
 +reload(q)
 +</​code>​
 +===== String Formatting =====
 +
 +==== Recommended approach ====
 +
 +Use [[http://​docs.python.org/​2/​library/​stdtypes.html#​str.format|str.format()]]. Fields can be replaced by index or by name:
 +
 +<code python>
 +print '{1} and {0}'​.format('​spam',​ '​eggs'​) # by index
 +print 'This {food} is {adjective}.'​.format(food='​spam',​ adjective='​absolutely horrible'​) #  by name
 +print 'The story of {0}, {1}, and {other}.'​.format('​Bill',​ '​Manfred',​ other='​Georg'​) # both
 +</​code>​
 +==== Old approach: % operator ====
 +//* By Position //
 +
 +<code python>
 +" %s , %d , %f " ​ % (a_string,​an_int,​ a_float)
 +</​code>​
 +//* By name //
 +
 +<code python>
 +" %(name1)s %(name2)d ​ %(name1)s" ​ % {"​name1":​value1,​ "​name2":​value2} ​ #note that name1 is used several times in the template!
 +</​code>​
 +//* The "​locals trick" The function ''​locals()''​ creates a dict from all variables in the namespace. This can be used for easier "​by-name"​ variable substitution:​ //
 +
 +<code python>
 +y=7
 +print ("​value =  %(y)s" % locals() )
 +</​code>​
 +==== Formatting Numbers ====
 +<code python>
 +i = 4
 +"​%d"​ % (i,)  # --> "​4"​
 +"​%4d" ​ % (i,)  # --> " ​  ​4"​
 +"​%04d"​ % (i,)  # --> "​0004"​
 +j = 1.3 "​%.2f"​ % (j)  # --> "​1.30"​ ''​
 +</​code>​
 +==== Templates ====
 +String templates provide a simpler string way for substitutions. Instead of the normal "​%"​-based substitutions,​ Templates support "​$"​-based substitution
 +
 +<code python>
 +from string import Template
 +s = Template('​$who likes $what'​)
 +s.substitute(who='​tim',​ what='​kung pao')
 +</​code>​
 +see http://​www.python.org/​doc/​2.5.2/​lib/​node40.html for details
 +
 +==== Rot 13 ====
 +<code python>
 +"​muh"​.encode('​rot13'​)
 +</​code>​
 +
 +
 +===== Classes =====
 +==== Automatically set parameters of initializer as member variables ====
 +This may go against the zen of python (explicit is better than implicit) but it is extremely convenient when dealing with a lot of arguments in the initializer
 +
 +<code python>
 +class MyClassWithLongInitializer:​
 +    def __init__(self,​ a,​b,​c,​d,​e,​f,​g,​ x, y=244):
 +        self.__dict__.update(**locals()) # instead of self.a=a;​self.b=b;​self.c=c ...
 +</​code>​
 +
 +==== Taking care of boilerplate code ====
 +
 +[[https://​glyph.twistedmatrix.com/​2016/​08/​attrs.html|attrs]] helps with boilerplate code a lot, e.g. initialisation,​ string representation,​ comparison.
 +
 +<code python>
 +import attr
 +@attr.s
 +class Point3D(object):​
 +    x = attr.ib()
 +    y = attr.ib()
 +    z = attr.ib()
 +</​code>​
 +====== CGI Scripting ======
 +===== Content Type =====
 +Is set simply by printing the corresponding information at the start of the script:
 +
 +<code python>
 +import cgi
 +print "​Content-Type:​ text/​html\n"​ # or "​Content-Type:​ image/​png\n" ​ or somesuch
 +</​code>​
 +===== Get Request Parameters =====
 +<code python>
 +import cgi
 +sectionId = cgi.FieldStorage()['​sectionid'​].value
 +</​code>​
 +see http://​aspn.activestate.com/​ASPN/​Cookbook/​Python/​Recipe/​81547 for a script to create a dict from the fieldStorage
 +
 +===== Show errors in output page =====
 +The cgitb module provides a special exception handler for Python scripts. (Its name is a bit misleading. It was originally designed to display extensive traceback information in HTML for CGI scripts. It was later generalized to also display this information in plain text.) After this module is activated, if an uncaught exception occurs, a detailed, formatted report will be displayed. The report includes a traceback showing excerpts of the source code for each level, as well as the values of the arguments and local variables to currently running functions, to help you debug the problem. ( text shamelessly stolen from http://​docs.python.org/​lib/​module-cgitb.html ​ )
 +
 +<code python>
 +import cgitb
 +cgitb.enable()
 +</​code>​
 +====== Commandline Parameters ======
 +===== Simple =====
 +<code python>
 +import sys
 +print sys.argv[1]
 +</​code>​
 +===== Advanced =====
 +I have found several builtin modules to deal with commandline parameters. The most flexible and object oriented seems to be OptionParser.
 +
 +==== Option Parser ====
 +<code python>
 +from optparse import OptionParser
 +parser= OptionParser("​usage:​ %prog [options] INPUT_FILE"​)
 +parser.add_option("​-f",​ "​--file",​ dest="​infile",​ help="​input file")
 +parser.add_option("​-d",​ "​--direction",​ choices=[[0,​1]],​ dest="​direction",​ help="​direction of the road")
 +parser.add_option("​-v",​ "​--verbose", ​ dest="​verbosity",​ action="​count",​default=0,​ help="​Increase Verbosity of debugging output: -v -vv -vvv ")
 +parser.add_option("​-s",​ "​--show-invalid-lines", ​ dest="​showInvalidLines",​ action="​store_true",​default=False,​ help="​shows..."​)
 +(options,​args) = parser.parse_args("​test -v --file out.txt"​.split())
 +if (options.showInvalidLines):​
 +  print "​Showing Invalid Lines"
 +if (options.verbosity > 1):
 +  print "Very Verbose"​
 +</​code>​
 +
 +However, optparse is deprecated since 2.7, the (very similar) replacement is argparse: [[http://​docs.python.org/​2/​library/​argparse.html#​module-argparse]]
 +
 +====== Compressed Data ======
 +
 +===== Gzip =====
 +
 +Reading
 +
 +<code python>
 +import gzip
 +
 +f = gzip.open('​file.txt.gz'​)
 +file_content = f.read()
 +f.close()
 +</​code>​
 +
 +
 +Writing
 +<code python>
 +import gzip
 +
 +f_out = gzip.open('​file.txt.gz',​ '​wb'​)
 +f_out.write(data)
 +f_out.close()
 +
 +</​code>​
 +
 +====== Config Files ======
 +There appear to be several ways to read in config files. This example uses the ConfigParser class.
 +
 +===== Config File Layout =====
 +Here is an example config file
 +
 +<​code>​
 +[DEFAULT]
 +text:"​Das ist ein text"
 +
 +[General]
 +muh:"​Die Kuh macht muh"
 +times:3
 +</​code>​
 +The DEFAULT Section is special, the names of other Sections can be whatever you like.
 +
 +===== Reading the config file =====
 +<code python>
 +from ConfigParser import ConfigParser
 +config = ConfigParser()
 +config.read("​test.cfg"​)
 +print config.get("​General","​muh"​)
 +config.getint("​General",​ "​times"​)
 +print config.get("​General",​ "​text"​) # text not defined in General section, but a DEFAULT definition exists
 +myBool = config.getboolean("​General",​ "​myBoolean"​)
 +</​code>​
 +===== Writing a config file =====
 +<code python>
 +from ConfigParser import ConfigParser
 +config = ConfigParser()
 +config.add_section("​Test"​)
 +config.set("​Test",​ "​Muh",​ 123)
 +
 +with open('​example.cfg',​ '​wb'​) as configfile:
 +    config.write(configfile)
 +
 +
 +</​code>​
 +====== Cryptography ======
 +===== Hashes =====
 +<code python>
 +import hashlib
 +m = hashlib.sha512()
 +m.update("​text"​)
 +m.update("​more text")
 +print m.hexdigest()
 +</​code>​
 +See http://​www.python.org/​doc/​current/​lib/​module-hashlib.html for details and a list of available hash algorithms.
 +
 +====== CSV Files ======
 +===== Reading CSV Files =====
 +A CSV file with a header can be read like so:
 +
 +<code python>
 +import csv
 +for row in csv.DictReader(file(file_name),​ delimiter=";"​):​
 +     print row["​id"​]
 +
 +</​code>​
 +
 +
 +===== Writing CSV Files =====
 +CSVs with headers can also be written with the csv library:
 +
 +<code python>
 +import csv
 +
 +with open('​names.csv',​ '​w'​) as csvfile:
 +    fieldnames = ['​first_name',​ '​last_name'​]
 +    writer = csv.DictWriter(csvfile,​ fieldnames=fieldnames)
 +
 +    writer.writeheader()
 +    writer.writerow({'​first_name':​ '​Baked',​ '​last_name':​ '​Beans'​})
 +    writer.writerow({'​first_name':​ '​Lovely',​ '​last_name':​ '​Spam'​})
 +    writer.writerow({'​first_name':​ '​Wonderful',​ '​last_name':​ '​Spam'​})
 +</​code>​
 +
 +
 +====== Database ======
 +
 +===== Server-side cursors =====
 +
 +[[http://​en.wikipedia.org/​wiki/​Cursor_%28databases%29|Database cursors]] are a construct to traverse over records. ​
 +
 +When data is selected, it usually gets transferred to the client process first - the cursor is on the **client side**. ​
 +
 +For large result sets this poses an obvious problem if the client lacks the required resources. Some drivers support **server-side cursors**. Using those, the client can control how much data it wants to receive at once, thus being able to handle even very large datasets. ​
 +For recipes see
 +  * [[python_cookbook#​postgresql_server-side_cursor| PostgreSQL]]
 +
 +===== Postgres =====
 +<code python>
 +sudo apt-get install python-psycopg2
 +</​code>​
 +==== Selecting ====
 +<code python>
 +import psycopg2
 +conn = psycopg2.connect(host="​localhost",​ database="​mydb",​ user="​soma",​ password="​xxx"​)
 +cursor = conn.cursor()
 +cursor.execute("​select * from timeseries limit 10")
 +for row in  cursor:
 +  print row
 +</​code>​
 +
 +Using column names to index rows: 
 +
 +<code python>
 +import psycopg2
 +import psycopg2.extras
 +conn = psycopg2.connect(host="​localhost",​ database="​mydb",​ user="​soma",​ password="​xxx"​)
 +dict_cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
 +dict_cur.execute("​SELECT a,b,c FROM table"​)
 +for r in cursor:
 +  print r["​a"​]
 +</​code>​
 +
 +  * see http://​initd.org/​psycopg/​docs/​extras.html
 +
 +
 +==== PostgreSQL server-side cursor ====
 +
 +
 +In order to use server-side cursors with PostgreSQL/​psycopg2 one just needs to use psycopg2'​s **named cursor** and set an adequate [[http://​initd.org/​psycopg/​docs/​cursor.html#​fetch|itersize]]
 +
 +<code python>
 +cursor = conn.cursor(name='​a_cursor'​)
 +cursor.itersize = 10000
 +
 +for row in cursor:
 +    pass
 +</​code>​
 +
 +For details see e.g. [[http://​initd.org/​psycopg/​docs/​usage.html#​server-side-cursors|psycopg2'​s server-side doc on cursors]] and the [[http://​www.postgresql.org/​docs/​current/​static/​sql-declare.html| PostgreSQL cursor doc]]
 +
 +==== Batch insert ====
 +<code python>
 +import psycopg2
 +conn = psycopg2.connect(host="​localhost",​ database="​mydb",​ user="​soma",​ password="​xxx"​)
 +cursor = conn.cursor()
 +INSERT = "​INSERT INTO mytable (a,b,c) values (%s,​%s,​%s)"​
 +data = [(12,'​meas',​14),​(14,'​est',​11)]
 +cursor.executemany(INSERT,​ data)
 +</​code>​
 +
 +==== Select into numpy Array ====
 +
 +<code python>
 +import numpy
 +import psycopg2 as pgdb
 +
 +...
 +cursor = conn.cursor()
 +cursor.execute('​SELECT a, b FROM demo')
 +result = numpy.fromiter((tuple (row) for row in cursor), dtype=[('​a',​float),​ ('​b',​ float)], count = cursor.rowcount)
 +
 +</​code>​
 +
 +
 +I had some trouble getting this to work with datetime. Here is a workaround (not efficient, but working)
 +
 +
 +<code python>
 +import dateutil
 +import numpy
 +
 +...
 +
 +cursor.execute('​SELECT time, value FROM table'​) # query
 +
 +data = [(dateutil.parser.parse(time),​value) for (time,​value) in cur] # convert time column to datetime objects
 +result = numpy.array(data,​ dtype=[('​time',​object),​ ('​value',​ float)]) # convert to numpy array
 +
 +</​code>​
 +
 +
 +
 +==== Inserting ====
 +<code python>
 +import psycopg2
 +conn = psycopg2.connect(host="​localhost",​ database="​mydb",​ user="​soma",​ password="​xxx"​)
 +cursor = conn.cursor()
 +try:
 +  cursor.execute( "​INSERT INTO timeseries_valuetype (name, unit) VALUES ( %(name)s, %(unit)s )", {"​name":"​mph",​ "​unit":"​mp/​h"​});​
 +  conn.commit()
 +except pgdb.DatabaseError,​ details:
 +  print "Got a DatabaseError,​ details are: " + str(details) ​
 +  conn.rollback()
 +</​code>​
 +Note:The parameter (%(name)s ​ ) for this style of placeholder variable is a dictionary that maps the names of the placeholders in the first part to values. The  s  indicates type string.
 +
 +==== Copy ====
 +The psycopg2 driver offers superior batch inserting performance using the copy_from method:
 +
 +<code python>
 +import psycopg2
 +import cStringIO
 +
 +# create data to copy (basically a csv file)
 +data = cStringIO.StringIO() # use cStringIO for best performance
 +data.write(u"​1,​1000\n"​)
 +data.write(u"​2,​2000\n"​)
 +
 +data.seek(0) # jump to beginning of "​file"​
 +
 +conn = psycopg2.connect(host="​localhost",​ database="​mydb",​ user="​soma",​ password="​xxx"​)
 +cursor.copy_from(data,​ '​my_table',​ sep=","​) # copy data to my_table
 +conn.commit()
 +
 +</​code>​
 +
 +
 +==== Tutorials ====
 +http://​homepages.inf.ed.ac.uk/​s9808248/​ad/​tutorial7.php
 +
 +===== SQLite =====
 +<code python>
 +sudo apt-get install ​ python-pysqlite2 sqlite3
 +</​code>​
 +**Note:** python-pysqlite2 creates sqlite3 databases. SQLite2 and SQLite3 are not compatible!
 +
 +==== Creating a DB Connection ====
 +<code python>
 +from pysqlite2 import dbapi2 as sqlite
 +connection = sqlite.connect("​test.db"​) # or use :memory: to create an in-memory database
 +cursor = connection.cursor()
 +</​code>​
 +==== Attaching a second DB ====
 +<code python>
 +cursor.execute("​attach '/​var/​fleet/​output/​trips.db'​ as trips"​)
 +</​code>​
 +==== Selecting ====
 +<code python>
 +cursor.execute("​SELECT * from names"​)
 +print cursor.fetchall()
 +#
 +# or
 +#
 +for row in cursor:
 +  print row[0]
 +connection.close()
 +</​code>​
 +==== Inserting ====
 +<code python>
 +cursor.execute ('​CREATE TABLE names (id INTEGER PRIMARY KEY, name VARCHAR(50),​ email VARCHAR(50))'​) ​
 +cursor.execute('​INSERT INTO names VALUES (null, "John Doe", " jdoe@jdoe.zz "​)'​)
 +print cursor.lastrowid
 +connection.commit()
 +</​code>​
 +Batch insert
 +
 +<code python>
 +cursor.execute( '​create table test(roadid INTEGER, name TEXT)'​)
 +values = [(1,'​one'​),​(2,'​two'​)] # anything iterable will do
 +cursor.executemany("​INSERT INTO test (roadid, name) VALUES (?,?) ", values)
 +</​code>​
 +==== Creating user defined functions ====
 +see: [[[http://​koeritz.com/​docs/​python-pysqlite2/​usage-guide.html#​native-database-engine-features-and-extensions-beyond-the-python-db-api|PySqlite usage guide]]]
 +
 +==== Aggregate Function ====
 +<code python>
 +from pysqlite2 import dbapi2 as sqlite
 +
 +class MySum:
 +  def __init__(self):​
 +    self.count = 0
 +  def step(self, value):
 +    self.count += value
 +  def finalize(self):​
 +    return self.count
 +
 +con = sqlite.connect(":​memory:"​)
 +con.create_aggregate("​mysum",​ 1, MySum)
 +cur = con.cursor()
 +cur.execute("​create table test(i)"​)
 +cur.execute("​insert into test(i) values (1)")
 +cur.execute("​insert into test(i) values (2)")
 +cur.execute("​select mysum(i) from test")
 +print cur.fetchone()[0]
 +</​code>​
 +==== load_extension ====
 +According to [[[http://​code.google.com/​p/​xenia/​wiki/​SpatialLite|documentation]]] it is necessary to explicitly enable load_extension
 +
 +<code python>
 +DB = sqlite.connect( '​./​html_content.db'​ )
 +DB.enable_load_extension(True)
 +DB.execute( "​SELECT load_extension('/​usr/​lib/​libspatialite.so.2.0.3'​)"​ )
 +</​code>​
 +===== MSSQL =====
 +The best way I found so far is to use [[http://​pymssql.sourceforge.net/​ pymssql]]. Unfortunately there is currently no ubuntu package for pymssql so it needs to be installed by hand. Luckily this is quite trivial:
 +
 +==== Install pymssql ====
 +Start by installing the required dependencies
 +
 +<​code>​
 +sudo aptitude install python2.5-dev freetds-dev
 +</​code>​
 +Download and the latest pymssql release from [[https://​sourceforge.net/​project/​showfiles.php?​group_id=40059 pymssql]] at sourceforge and install it:
 +
 +<​code>​
 +tar -xvzf pymssql-0.8.0.tar.gz cd pymssql-0.8.0 python setup.py install
 +</​code>​
 +==== Using pymssql ====
 +A short usage example stolen from [[http://​john.parnefjord.se/​node/​43 here]]
 +
 +<code python>
 +
 +import _mssql
 +mssql=_mssql.connect('​mssql.server.com','​databaseuser','​password'​)
 +mssql.select_db('​Northwind'​)
 +query="​select firstname,​lastname,​birthdate from dbo.Employees;"​
 +if mssql.query(query):​
 +  rows=mssql.fetch_array()
 +  rowNumbers = rows[0][1]
 +  print "​Number of rows fetched: " + str(rowNumbers)
 +  for row in rows:
 +    for i in range(rowNumbers):​
 +      print str(i) + "​\t"​ + row[2][i][0] + "​\t"​ + row[2][i][1] + "​\t"​ + str(row[2][i][2])
 +else:
 +  print mssql.errmsg() print mssql.stdmsg()
 +  mssql.close()
 +</​code>​
 +===== MYSQL =====
 +A good introduction to mysql-python can be found at http://​mysql-python.sourceforge.net/​MySQLdb.html#​mysqldb
 +
 +==== Install ====
 +<​code>​
 +sudo apt-get install python-mysqldb
 +</​code>​
 +==== Select ====
 +<code python>
 +import MySQLdb
 +connection = MySQLdb.connect("​10.101.21.25",​ "​user","​pass","​database"​)
 +cur = connection.cursor()
 +cursor.execute("​SELECT * from timeseries"​)
 +result = cur.fetchall()
 +for r in result:
 +  print r
 +  cur.close()
 +  connection.close()
 +</​code>​
 +
 +==== Insert ====
 +<code python>
 +import MySQLdb
 +connection = MySQLdb.connect("​10.101.21.25",​ "​user","​pass","​database"​)
 +cur = connection.cursor()
 +cur.execute("""​ INSERT INTO timeseries(day,​roadid,​laneid,​speed) values ('​2008-01-01',​ 1,​1,​60.0) ​ """​) ​
 +connection.commit()
 +</​code>​
 +
 +==== Batch Insert ====
 +<code python>
 +
 +c.executemany( """​INSERT INTO breakfast (name, spam, eggs, sausage, price) VALUES (%s, %s, %s, %s, %s)""",​[
 +  ("Spam and Sausage Lover'​s Plate",​ 5, 1, 8, 7.95 ),
 +  ("Not So Much Spam Plate",​ 3, 2, 0, 3.95 ),
 +  ("​Don'​t Wany ANY SPAM! Plate",​ 0, 4, 3, 5.95 )
 +] )
 +</​code>​
 +Notes: ​ * there is a mix of types in the values array (strings, ints, floats) but we still only use %s in the format string (otherwise you will get an error!) ​ * executemany() tries to throw the whole values array at MySQL at once. If you try to insert many thousand records, this may //exceed MySQL'​s standard buffer size//, and wil give you an exception:
 +
 +<​code>​_mysql_exceptions.OperationalError:​ (1153, "Got a packet bigger than '​max_allowed_packet'​ bytes"​) </​code>​
 +
 +To prevent this you need to manually split your value-list into smaller batches like this:
 +
 +<code python>
 +batch_size=20000 # you might have to experiment to find optimal batch_size for your data
 +while values: # repeat until all records in values have been inserted ''​
 +  batch, values = values[:​batch_size],​ values[batch_size:​] #split values into the current batch and the remaining records
 +  cur.executemany("​INSERT INTO timeseries(day,​roadid,​laneid,​intrvl,​speed,​stddev,​count) VALUES (%s,​%s,​%s,​%s,​ %s, %s, %s)", batch ) #insert current batch ''​
 +</​code>​
 +====== Date and time ======
 +There are (at least) two separate ways to deal with dates: ​  
 +  * '​time'​ - represents a timestamp as a tuple of at least 9 values and may be deprecated by now. (see [[http://​pleac.sourceforge.net/​pleac_python/​datesandtimes.html | here ]] )   
 +  * '​datetime'​ - represents a timestamp as a datetime object ( [[http://​docs.python.org/​lib/​datetime-datetime.html|documentation]] )
 +
 +===== Get Current Date/Time =====
 +<code python>
 +import datetime
 +current_time = datetime.datetime.now()
 +</​code>​
 +===== Parse dates =====
 +Python provides a dateutil library which can be used to parse many common date formats:
 +
 +<code python>
 +import dateutil.parser
 +dateutil.parser.parse("​2011-05-18 12:​30:​00"​)
 +</​code>​
 +
 +
 +
 +If you need to parse a custom format, use the strptime function of the datetime library ([[http://​docs.python.org/​library/​datetime.html#​strftime-strptime-behavior|check here for directives]] and their meanings):
 +
 +<code python>
 +import datetime
 +d = datetime.datetime.strptime("​20071031T235958","​%Y%m%dT%H%M%S"​)
 +year = d.year # access fields of datetime
 +</​code>​
 +
 +the other is to use the time library (which seems a little less intuitive)
 +<code python>
 +import time
 +year,​month,​day = time.strptime("​20071003","​%Y%m%d"​)[0:​3] # values returned as tuple of ints print year,​month,​day h,​m,​s=time.strptime("​04:​12:​02",​ "​%H:​%M:​%S"​)[3:​6]
 +</​code>​
 +
 +To handle ISO 8601 timestamps like 20071031T235958 and get a datetime object:
 +<code python>
 +datetime.datetime.strptime("​20071031T235958","​%Y%m%dT%H%M%S"​)
 +</​code>​
 +
 +Get Unix timestamp from Python datetime:
 +<code python>
 +calendar.timegm(tuple)
 +</​code>​
 +
 +Get Python time from Unix timestamp:
 +<code python>
 +time.gmtime(unixtimestamp)
 +</​code>​
 +===== Date formatting =====
 +Print datetime object
 +
 +<code python>
 +dat= datetime.datetime.strptime('​2008-04-21 11:​00:​00',​ '​%Y-%m-%d %H:​%M:​%S'​)
 +dat.strftime('​%Y%m%dT%H%M%S'​)
 +</​code>​
 +To print a time object prettily
 +
 +<code python>
 +time.strftime("​%Y-%m-%d %H:​%M:​%S",​timeStamp)
 +</​code>​
 +===== Manipulating Dates =====
 +<code python>
 +import datetime
 +d = datetime.datetime(2007,​11,​21,​12,​33)
 +d += datetime.timedelta(days=4,​hours=2,​minutes=40,​seconds=20,​milliseconds=300);​
 +</​code>​
 +<code python>
 +x=datetime.now()
 +x.replace(minute=20)
 +</​code>​
 +===== Difference of 2 Dates =====
 +Getting the difference between 2 datetime-objects is easy:
 +
 +<code python>
 +testTime1=datetime.datetime(2001,​1,​1,​0,​0,​0)
 +testTime2=datetime.datetime(2009,​4,​23,​12,​34,​45)
 +difference=testTime2-testTime1
 +print difference #the effect of these few lines: 3034 days, 12:34:45
 +</​code>​
 +But you also can split this difference into weeks,​days,​minutes,​hours,​...
 +
 +<code python>
 +weeks, days = divmod(difference.days,​ 7)
 +minutes, seconds = divmod(difference.seconds,​ 60)
 +hours, minutes = divmod(minutes,​ 60)
 +</​code>​
 +===== Timezones =====
 +As far as I know datetime.strptime ignores the timezone information,​ and **always creates naive (timezone-unaware) datetimes**. Often this is not what you want. In order to attach a timezone to a naive timestamp, and then convert it to a local time use the following:
 +
 +<code bash>
 +sudo aptitude install python-tz
 +</​code>​
 +==== Attach Timezone to a "​dumb"​ timestamp ====
 +<code python>
 +import pytz
 +import datetime
 +
 +ts = datetime.datetime.now() # create a dumb timestamp
 +tz_vienna = pytz.timezone("​Europe/​Vienna"​)
 +localized_ts = tz_vienna.localize(ts) # this just assumes that the timestamp is "​right"​ and attaches the timezone. But it DOES correctly handle daylight savings time
 +</​code>​
 +==== Convert a Time ====
 +<code python>
 +
 +import datetime
 +import pytz
 +
 +# create naive timestamp
 +naive_time = datetime.datetime.strptime("​24.11.11 12:​46:​25",​ "​%d.%m.%y %H:​%M:​%S"​)
 +
 +# attach timezone
 +tz_vienna = pytz.timezone("​Europe/​Vienna"​)
 +local_time =  tz_vienna.localize(naive_time)
 +
 +# convert to utc
 +utc_time = local_time.astimezone(pytz.utc)
 +</​code>​
 +====== External Commands ======
 +To simply run an external command.
 +
 +<code python>
 +import os
 +exitValue = os.system("​ls"​)
 +</​code>​
 +Note that this returns the commands **exit code shifted by 8 bits**! (dont ask.. read http://​blog.tsul.net/​2008/​04/​ossystem-and-its-return-value.html)
 +
 +If you want the programs exit code it is probably easier to do
 +
 +<code python>
 +import subprocess
 +exitCode = subprocess.call([[ls","​-a]])
 +</​code>​
 +
 +Passing a list with pieces of commandline is often not very handy. Using call like so
 +
 +<code python>
 +import subprocess
 +exitCode = subprocess.call("​ls -a", shell=True)
 +</​code>​
 +
 +means (surprise!): ​
 +
 +> "the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most system shells and still want access to other shell features such as filename wildcards, shell pipes and environment variable expansion." ​
 +
 +It also means a **SECURITY HAZARD if the input to the command comes from untrustable sources!** (from http://​docs.python.org/​dev/​library/​subprocess.html#​frequently-used-arguments )
 +So use with care.
 +
 +
 +To execute an external command and **get the output** use
 +
 +<code python>
 +import commands
 +output = commands.getoutput("​grep '​muh'​ input.txt"​)
 +</​code>​
 +====== File handling ======
 +===== Reading =====
 +==== Reading file line per line ====
 +(from [[[http://​www.yukoncollege.yk.ca/​~ttopper/​COMP118/​rCheatSheet.html|here]]] : Remember that **line is a string** even if it looks like a number)
 +
 +<code python>
 +infile = open( infilename, "​r"​ )
 +for line in infile:
 +  # Do stuff with line. # e.g. num = int( line )
 +infile.close()
 +</​code>​
 +==== Filter input lines in finite time ====
 +Reading in a whole file and filtering afterwards is rather slow. For now it's using grep because I have not found out how speed optimization can be done otherwise. Giving grep the -s flag prevents us from having grep's error message in our lines list. We can thus assume that we either have the expected lines or none at all.
 +
 +<code python>
 +import commands
 +lines =  commands.getoutput("​grep -s " + myregex + " " + myfile).strip().split("​\n"​) # we have a list of the lines that matched the regular expressions now
 +for line in lines:
 +  if(len(line)>​0):​
 +    #do something
 +</​code>​
 +===== Writing =====
 +<code python>
 +f = file("​test.txt","​w"​)
 +f.write("​muh"​)
 +f.close()
 +</​code>​
 +The mode string can be "​w"​ (write) or "​a"​(append) , if binary data needs to be written use "​wb"​ or "​ab"​
 +
 +===== Files with Umlauts =====
 +One way to handle nasty Umlauts: (check [[ http://​docs.python.org/​library/​codecs.html | here ]] to find the correct codec)
 +
 +<code python>
 +import codecs
 +inputfile = codecs.open("​something.csv",​ "​r","​latin1"​)
 +for i, line in enumerate(inputfile):​
 +  print line
 +inputfile.close()
 +
 +outputfile = codecs.open("​somethingelse.csv","​w","​latin1"​)
 +outputfile.write("​Mäh,​ öh, blüb!"​)
 +outputfile.close() ''​
 +</​code>​
 +
 +Another way:
 +
 +<code python>
 +fileencoding = "​iso-8859-1"​
 +for raw in file("​Taxistandplaetze(Sektoren).csv"​):​
 +  print raw.decode(fileencoding)
 +</​code>​
 +
 +
 +Convert string to another encoding: ​
 +
 +<code python>
 +import codecs
 +
 +inputfile = codecs.open("​something.csv",​ "​r","​latin1"​)
 +line = inputfile.next()
 +ascii_line = line.encode("​ascii","​ignore"​) ​
 +# if a character can not be encoded in this encoding python would normally raise an exception
 +# '​ignore'​ tells the encoder to ignore such errors. other options are '​replace'​ , '​xmlcharrefreplace',​ '​backslashreplace'​
 +# see http://​docs.python.org/​library/​codecs.html
 +</​code>​
 +
 +
 +===== File number support =====
 +(Found on a mailing list)
 +
 +<code python>
 +
 +from itertools import izip, count
 +
 +def enumerate(iterable,​ start=0):
 +  return izip(count(start),​ iterable) # redefine enumerate
 +
 +for i, line in enumerate(infile):​
 +  print "line number: " + str(i) + ": " + line.rstrip()
 +</​code>​
 +===== Replace lines in-line =====
 +<code python>
 +import fileinput
 +for line in fileinput.input(onefilename,​ inplace=1):
 +  print line.replace(old,​new)
 +</​code>​
 +===== Checking if a File/​Directory Exists =====
 +<code python>
 +import os
 +os.path.exists("/​path/​to/​some/​where"​)
 +</​code>​
 +
 +===== Get filename from absolute path =====
 +<code python>
 +import os
 +
 +path,​filename = os.path.split(absolute_path)
 +</​code>​
 +
 +===== Creating Directories =====
 +<code python>
 +import os
 +if not os.path.isdir(dir):​
 +  os.makedirs(dir) # creates all non-existent directories in the path of dir
 +</​code>​
 +===== Working with the Working-Directory =====
 +(yes its a bad pun, I know)
 +
 +<code python>
 +import os
 +current_work_dir = os.getcwd() ​ #get current working dir
 +os.chdir(new_work_dir) ​         #change working dir ''​
 +</​code>​
 +If your script needs to change the working directory I would strongly suggest not to rely on relative paths. Use os.getcwd() to store the script'​s base path and construct absolute paths for every os.chdir() you are going to do!
 +
 +===== Deleting a Directory =====
 +Deleting an empty directory can be done with os.rmdir() but most of the time your directory will not be empty. If you want to do the equivalent of an //rm -rf dir_to_delete//​ use
 +
 +<code python>
 +import shutil
 +shutil.rmtree("​dir_to_delete"​)
 +</​code>​
 +===== Listing Directory Contents =====
 +If you want to use wildcards try:
 +
 +<code python>
 +import glob
 +dir_contents = glob.glob("/​home/​soma/​*.txt"​)
 +</​code>​
 +An alternative would be to use os.listdir. The following script lists all files in the current directory (it filters out subdirectories)
 +
 +<code python>
 +import os
 +dirContents = os.listdir("​."​)
 +for aFile in (c for c in dirContents if not os.path.isdir(c)):​
 +  print "a file: %s", aFile
 +</​code>​
 +===== Temporary Files/​Directories =====
 +<code python>
 +import tempfile
 +tempfile.mkstemp() # create temporary file
 +tempfile.mkdtemp("​.tmp",​ "​backup_"​) #create a temp directory with prefix and suffix. eg: /​tmp/​backup_rt5JWA.tmp
 +</​code>​
 +====== JSON-RPC ======
 +For a  json-rpc library take a look at  http://​json-rpc.org/​ .  But be warned right now the ServiceProxy does not set the content-type and might not work with some servers.
 +
 +===== Client =====
 +Using the python json library it is very easy to create a simple ServiceProxy client (code mostly stolen from json-rpc.org but added content-type. I am posting this here because the client is so simple that installing an extra library might be overkill).
 +
 +<code python>
 +# Code mostly from http://​json-rpc.org/​
 +
 +import urllib2
 +import json
 +
 +# Define helper classes
 +
 +class JSONRPCException(Exception):​
 +    def __init__(self,​ rpcError):
 +        Exception.__init__(self)
 +        self.error = rpcError
 +
 +class ServiceProxy(object):​
 +    def __init__(self,​ serviceURL, serviceName=None):​
 +        self.__serviceURL = serviceURL
 +        self.__serviceName = serviceName
 +
 +    def __getattr__(self,​ name):
 +        if self.__serviceName != None:
 +            name = "​%s.%s"​ % (self.__serviceName,​ name)
 +        return ServiceProxy(self.__serviceURL,​ name)
 +
 +    def __call__(self,​ *args):
 +         ​postdata = json.dumps({"​method":​ self.__serviceName,​ '​params':​ args, '​id':'​jsonrpc'​})
 +
 +         req = urllib2.Request(url=self.__serviceURL,​ data=postdata)
 +         ​req.add_header("​Content-Type",​ "​application/​json"​)
 +         ​respdata = urllib2.urlopen(req).read()
 +
 +         resp = json.loads(respdata)
 +         if resp['​error'​] != None:
 +             raise JSONRPCException(resp['​error'​])
 +         else:
 +             ​return resp['​result'​]
 +
 +
 +# Usage Example
 +
 +sp = ServiceProxy('​http://​some-jsonrpc-service.com/​service'​)
 +result ​ = sp.someMethod(1,"​two",​3)
 +</​code>​
 +====== Logging ======
 +===== Logging to Console =====
 +<code python>
 +# Initialize the Logger
 +import logging
 +logging.basicConfig(level=logging.DEBUG,​ format="​%(asctime)-15s\t%(name)-5s\t%(levelname)-8s\t%(message)s"​)
 +logging.info("​Logging Initialized"​)
 +</​code>​
 +===== Logging to File =====
 +<code python>
 +import logging
 +logging.basicConfig(filename="​mylog.log",​ level=20, format="​%(asctime)-15s %(levelname)s\t(%(filename)s:​%(lineno)d) -  %(message)s"​)
 +logging.info("​Logging Initialized."​)
 +</​code>​
 +see also:
 +
 +  * [[http://​docs.python.org/​lib/​module-logging.html|Logger Documentation]]
 +  * [[http://​docs.python.org/​release/​2.5/​lib/​node422.html|Python 2 output Formatting]]
 +  * [[https://​docs.python.org/​3/​howto/​logging-cookbook.html|Python 3 logging cookbook]]
 +
 +====== Mathematics ======
 +===== Rounding =====
 +<code python>
 +import math
 +print math.floor(2.4)#​ --> 2.0
 +print math.ceil(2.4) #  --> 3.0
 +print round(2.4) ​    # --> 2
 +print int(2.4) ​      # --> 2
 +print round(2.6) ​    # --> 3
 +print int(2.6) ​      # --> 2 Conversion to int **truncates** the comma, the round function rounds!
 +# rounding errors
 +round(26.9403314917,​2) # --> 26.940000000000001
 +myvar = round(26.9403314917,​2)
 +print myvar            # --> gives 26.94, which is a good enough estimation in most cases
 +</​code>​
 +===== int to bin =====
 +From [[http://​www.daniweb.com/​code/​snippet285.html|here]]:​
 +
 +<code python>
 +count=24
 +n=286476
 +print  ""​.join([str((n >> y) & 1) for y in range(count-1,​ -1, -1)])
 +</​code>​
 +===== Transpose of a matrix =====
 +Entries are stored row by row in list of lists (or tuples)
 +
 +<code python>
 +>>>​ x = [[1, 2, 3], [4, 5, 6]] 
 +>>>​ zip(*x) [(1, 4), (2, 5), (3, 6)]
 +</​code>​
 +====== Machine Learning ======
 +Interesting libraries
 +  * http://​mlpy.sourceforge.net/​
 +  * http://​www.pymvpa.org/​
 +  * http://​scikit-learn.sourceforge.net/ ​
 +
 +===== k-means clustering =====
 +**Attention:​** Make sure your data does not contain NANs!
 +
 +<code python>
 +import pylab
 +import numpy
 +import scipy.cluster.vq
 +
 +# the data to cluster
 +timeseries = numpy.array([numpy.random.rand(10) for i in range(10)])
 +
 +# find 5 centroids
 +centroids, x = scipy.cluster.vq.kmeans(timeseries,​ 5)
 +
 +# assign each timeseries to a centroid
 +idx,_ = scipy.cluster.vq.vq(timeseries,​centroids)
 +
 +# plot centroids and corresponding timeseries
 +pylab.figure()
 +for i in range(5):
 +    pylab.subplot(510+i)
 +    cluster = i
 +    pylab.plot(timeseries[idx==cluster].T,​ color='​b'​)
 +    pylab.plot(centroids[cluster].T,​ color='​r'​)
 +
 +</​code>​
 +
 +
 +===== Hierarchical clustering =====
 +**Attention:​** Make sure your data does not contain NANs!
 +
 +<code python>
 +import pylab
 +import numpy
 +import scipy.cluster.hierarchy as hcluster
 +
 +timeseries = numpy.array([numpy.random.rand(10) for i in range(50)])
 +
 +# get a cluster-id for every timeseries
 +idx = hcluster.fclusterdata(timeseries,​ 4.0, criterion='​maxclust',​ method='​complete'​)
 +
 +# calculate the centroid for every cluster
 +centroids={}
 +for i in set(idx):
 +    centroid = pylab.mean(timeseries[idx==i],​0)
 +    centroids[i] = centroid
 +
 +# plot centroids and corresponding timeseries
 +fig = pylab.figure()
 +for i in range(0, len(centroids)):​
 +    subplot = len(centroids)*100 + 10 + i+1
 +    pylab.subplot(subplot)
 +    cluster = i+1
 +    pylab.plot(centroids[cluster],​ color='​r',​ linewidth=2)
 +    pylab.plot(timeseries[idx==cluster].T,​ color='​b'​)
 +    pylab.axis(ymin=0)
 +</​code>​
 +
 +
 +===== PCA =====
 +<code python>
 +import pylab
 +import numpy
 +import mlpy
 +
 +# 2 dimensional data
 +data = numpy.array([[1,​1],​ [2,2.3], [3, 3.4]])
 +pylab.scatter(data[:,​0],​ data[:,1], label="​orig_data"​)
 +
 +pca = mlpy.PCA()
 +pca.learn(data) ​
 +
 +# plot principal components
 +coef = pca.coeff() # column1=pc1 , column2=pc2
 +pylab.plot([0,​coef[0,​0]] ,[0, coef[0,1]], '​-r',​ label="​first PC" )
 +pylab.plot([0,​coef[1,​0]] ,[0, coef[1,1]], '​-b',​ label="​second PC")
 +
 +# dimensionality reduction / reconstruction
 +z = pca.transform(data,​k=1) # reduce data to the first principal component
 +rec = pca.transform_inv(z) ​ # reconstruct the original data from the first principal component
 +
 +# plot reconstructed data
 +pylab.plot(rec[:,​0],​ rec[:,1], '​+r',​ label="​reconstructed data")
 +pylab.legend()
 +
 +</​code>​
 +
 +===== Mixture Models =====
 +Try [[PyMix]]
 +
 +
 +
 +
 +
 +====== Networking ======
 +===== Simple HTTP Download =====
 +<code python>
 +import urllib
 +url = urllib.urlopen("​http://​10.101.21.115:​8080/​display/​servlet/​graph?​chart=avg&​day=20061018&​sectionid=0303-0304"​)
 +f = file("​out.png",​ "​w"​)
 +f.write(url.read()) ​
 +f.close()
 +</​code>​
 +===== Very simple HTTP Servers =====
 +With [[http://​docs.python.org/​2/​library/​simplehttpserver.html|SimpleHTTPServer]] contents of the current directory can be served via HTTP.
 +
 +<code bash>
 +# serve current directory @ port 8000
 +python -m SimpleHTTPServer
 +</​code>​
 +
 +The port can be supplied as first argument. For ports < 1024 root privileges are required (which is not recommended due to security problems on StackOverflow)
 +
 +<code bash>
 +sudo python -m SimpleHTTPServer 80
 +</​code>​
 +===== Simple HTTP Servers =====
 +<code python>
 +from BaseHTTPServer import BaseHTTPRequestHandler,​ HTTPServer
 +from urlparse import urlparse, parse_qs
 +
 + 
 +class MyServer(BaseHTTPRequestHandler):​
 +  ​
 +  def do_GET(self):​
 +    params = parse_qs(urlparse(self.path).query) # get request parameters ​   ​
 +    self.send_message("​hello " + params['​name'​][0])
 +
 +  def do_POST(self):​
 +    content_len = int(self.headers.getheader('​content-length'​))
 +    post_body = self.rfile.read(content_len)
 +    ​
 +    self.send_message("​Ok. Got Message " + post_body)
 +    ​
 +  def send_message(self,​ message):
 +    self.send_response(200,​ '​OK'​)
 +    self.send_header('​Content-type',​ '​text/​html'​)
 +    self.end_headers()
 +    ​
 +    self.wfile.write(message)
 +     
 +  @staticmethod
 +  def serve_forever(port):​
 +    HTTPServer(('',​ port), MyServer).serve_forever()
 + 
 +if __name__ == "​__main__":​
 +  MyServer.serve_forever(8080)
 +
 +</​code>​
 +
 +see http://​www.doughellmann.com/​PyMOTW/​BaseHTTPServer/​index.html for a lot of interesting remarks
 +
 +===== Socket Server =====
 +<code python>
 +
 +import SocketServer
 +
 +class EchoRequestHandler(SocketServer.BaseRequestHandler ):
 +    def setup(self):​
 +        print self.client_address,​ '​connected!'​
 +        self.request.send('​hi ' + str(self.client_address) + '​\n'​)
 +
 +    def handle(self):​
 +        data = '​dummy'​
 +        while data:
 +            data = self.request.recv(1024)
 +            self.request.send(data)
 +            if data.strip() == '​bye':​
 +                return
 +
 +    def finish(self):​
 +        print self.client_address,​ '​disconnected!'​
 +        self.request.send('​bye ' + str(self.client_address) + '​\n'​)
 +
 +#server host is a tuple ('​host',​ port)
 +server = SocketServer.ThreadingTCPServer(('',​ 50008), EchoRequestHandler)
 +server.serve_forever()
 +</​code>​
 +===== Using FTP =====
 +
 +==== FTP Download ====
 +
 +
 +<code python>
 +
 +from ftplib import FTP 
 +
 +ftp = FTP()
 +ftp.connect("​127.0.0.1",​ port=21)
 +ftp.login("​username",​ "​password"​) ​
 + 
 +ftp.cwd('​dir/​subdir/​subdir'​)
 + 
 +local_file = open("​local_file.xml",​ '​wb'​)
 +cmd = 'RETR ' + "​file.xml"​
 +ftp.retrbinary(cmd,​ local_file.write)
 +
 +f.close()
 +ftp.quit()
 +
 +</​code>​
 +
 +
 +==== FTP Upload ====
 +
 +<code python>
 +
 +from ftplib import FTP 
 +
 +ftp = FTP("​ftp.theftp.org"​)
 +ftp.login("​username",​ "​password"​) ​
 +
 +ftp.cwd('​subdirectory/​subdirectory'​)
 +
 +f = open(filepath,​ '​rb'​)
 +cmd = 'STOR ' + fileName ​
 +ftp.storbinary(cmd,​ f)
 +
 +f.close()
 +ftp.quit()
 +</​code>​
 +====== Numpy ======
 +After having to reduce the memory footprint for one of my scripts I realized how **much more efficient** numpy arrays are compared to the default python lists. If you have large arrays of data you should strongly consider using numpy.
 +
 +
 +===== External Documentation =====
 +  * http://​docs.scipy.org/​doc/​
 +  * http://​www.scipy.org/​Tentative_NumPy_Tutorial
 +
 +===== Creating Arrays =====
 +<code python>
 +import numpy
 +arr =  numpy.zeros((96,​4)) # create two dimensional array with values initialized to 0
 +</​code>​
 +===== Additional views on arrays =====
 +
 +Use views on arrays to add new functionality,​ reshape arrays, etc. - all without copying the actual data
 +
 +<code python>
 +import numpy as np
 +
 +x = np.array([(1,​ 2),(3,4)], dtype=[('​a',​ np.int8), ('​b',​ np.int8)]) # access via z["​a"​]
 +z = x.view(np.recarray) ​
 +print z.a # recarrays allow access with attributes
 +</​code>​
 +===== Read a CSV File into a numpy ndarray =====
 +
 +Easy way with [[http://​docs.scipy.org/​doc/​numpy/​reference/​generated/​numpy.genfromtxt.html|numpy.genfromtxt]]
 +
 +<code python>
 +import numpy
 +data= numpy.genfromtxt("​data.csv",​ delimiter=";",​ names=True)
 +
 +#accessing data
 +column_x = data["​x"​] # access a column
 +row1 = data[1] # access a row
 +row1_x = data[1]["​x"​] # access column "​x"​ of row 1
 +
 +</​code>​
 +
 +Another, more verbose, method ( TODO: remove )
 +
 +<code python>
 +import numpy
 +
 +lines = file("​cv_in.txt"​).readlines()
 +rows = len(lines)
 +cols = len(lines[0].strip().split("​\t"​))
 +values = numpy.zeros( (rows,​cols))
 +
 +count = 0
 +for line in lines:
 +  values[count,​] = [ float(x) for x in line.strip().split("​\t"​) ]
 +  count += 1
 +</​code>​
 +====== Optimisation ======
 +
 +===== Cython =====
 +
 +A nice introduction can be found here http://​www.perrygeo.net/​wordpress/?​p=116
 +
 +===== Pandas =====
 +[[Pandas]]
 +
 +====== Parallel Computing ======
 +===== Threading vs. Multiprocessing =====
 +Python < 2.6 supports threads trough the "​threading"​ module. An important note however: CPython currently has something called a //Global Interpreter Lock// (http://​en.wikipedia.org/​wiki/​Global_Interpreter_Lock). In short this means that only one thread is allowed to use the Python interpreter. As a consequence this highly limits the concurrency of a single process with multiple threads.
 +
 +What this means is that in Python < 2.6 you will **not gain performance** from running a python numbercrunching application with multiple threads! You will only gain performance if your threads are strongly IO bound!
 +
 +Starting with Python 2.6 the multiprocessing module is made available. This module circumvents the GIL by using subprocesses and is the preferred option to parallelize python calculations (see http://​docs.python.org/​library/​multiprocessing.html )
 +
 +===== Multiprocessing =====
 +If you want true parallelism (without GIL restrictions) use the multiprocessing library. Here is a simple example using queues and Worker Processes:
 +
 +<code python>
 +import multiprocessing as mp
 +import Queue  # for catching Queue.Empty
 +
 +import random
 +import time
 +
 +
 +class Result(object):​
 +    def __init__(self,​ v):
 +        self.v = v
 +
 +    def value(self):​
 +        return self.v
 +
 +
 +def iterate(queue):​
 +    """​helper to iterate over a queue until it is empty"""​
 +    while True:
 +        try:
 +            yield queue.get_nowait()
 +        except Queue.Empty:​
 +            break
 +
 +
 +class Worker(mp.Process):​
 +    def __init__(self,​ i, qin, qout):
 +        super(Worker,​ self).__init__()
 +        self.id = i
 +        self.qin = qin
 +        self.qout = qout
 +
 +    def run(self):
 +        for data in iterate(self.qin):​
 +            print "​worker %d has data: %d" % (self.id, data)
 +            time.sleep(random.randint(0,​ 1))
 +            self.qout.put(Result(data * 2))
 +
 +        print "​Ending Worker %d" % self.id
 +
 +
 +
 +mgr = mp.Manager() ​ # creating Queues without a Manager will lead to strange behaviour
 +
 +q_in = mgr.Queue()
 +q_out = mgr.Queue()
 +
 +# create data
 +for i in range(250):
 +    q_in.put(i)
 +
 +# create workers
 +workers = []
 +for i in range(20):
 +    w = Worker(i, q_in, q_out)
 +    workers.append(w)
 +    w.start()
 +
 +# wait for the workers to finish
 +for w in workers:
 +    w.join()
 +
 +# process results
 +while not q_out.empty():​
 +    result =  q_out.get()
 +    print result.value()
 +
 +</​code>​
 +
 +
 +
 +===== Threading =====
 +
 +==== Create a Worker Class ====
 +<code python>
 +
 +import threading
 +
 +class Worker(threading.Thread):​
 +  def run(self):
 +    doSomeWork()
 +
 +for i in range(3):
 +   ​worker = Worker()
 +   ​worker.start()
 +</​code>​
 +==== Execute a method as a thread ====
 +<code python>
 +
 +def worker_method():​
 +  doSomething()
 +
 +t = Thread(target=worker_method)
 +t.start()
 +</​code>​
 +===== Queues =====
 +Queues can be used to pass data around in a thread-safe manner. See http://​www.python.org/​doc/​2.5.2/​lib/​QueueObjects.html for details.
 +
 +<code python>
 +
 +from threading import Thread
 +from Queue import Queue
 +
 +def worker():
 +  while True:
 +    item = q.get() ​ # this call BLOCKS if the queue is empty.
 +                    # Use get_nowait() ​ if you would rather like an Exception.
 +    do_work(item)
 +    q.task_done() ​ # THIS is important!
 +
 +q = Queue()
 +for i in range(num_worker_threads):​
 +  t = Thread(target=worker)
 +  t.setDaemon(True)
 +  t.start()
 +  for item in items_to_process:​
 +    q.put(item)
 +
 +  q.join() ​      # blocks until all itmes are processed
 +</​code>​
 +**Note//'​ the '//​q.task_done()** call in the worker. This tells the queue that processing one item has finished. If your workers for some reason dont call this ( eg. because of an Exception) ​ then the call q.join() will NEVER UNBLOCK!
 +
 +===== External Documentation =====
 +* http://​docs.python.org/​library/​threading.html
 +
 +
 +
 +====== Pylab recipes ======
 +[[PylabRecipes]]
 +
 +====== Regular Expressions ======
 +===== Find out if something does (not) match =====
 +<code python>
 +import re
 +pattern = re.compile("​^\d"​)
 +match = pattern.match("​my text")
 +if match:
 +  print "line starts with number"​
 +else:
 +  print "line does not start with number"​
 +</​code>​
 +===== Split a String =====
 +To split a string by a simple delimiter just use string.split(). For a more complex splitting operation:
 +
 +<code python>
 +import re
 +s = "a 1 and 2 and 3 and 4"
 +a = re.split("​\d",​ s) # every number is a delimiter
 +</​code>​
 +===== Extract Data Using Subgroups =====
 +==== Single Match ====
 +Either use a compiled pattern:
 +
 +<code python>
 +import re
 +p = re.compile("​x=(\d+?​).*?​y=(\d+?​)"​)
 +match = p.search("​blah x=3 y=4 and ")
 +(x,y) = match.groups()
 +print x,y
 +</​code>​
 +Or use the package function:
 +
 +<code python>
 +import re
 +matches= re.search("​x=(\d+?​)[ ]*?​y=(\d+?​)[ ]","​blah x=3 y=4 and ")
 +(x,y) = matches.groups()
 +print x,y
 +</​code>​
 +==== Multiple Matches ====
 +Either use a **compiled pattern**:
 +
 +<code python>
 +
 +import re
 +p = re.compile("​x=(\d+?​).*?​y=(\d+?​)"​)
 +mItr = p.finditer("​x=3 y=4 and then x=5, y=7  and x=8, y=9") # return iterator<​tuple>​
 +for m in mItr :
 +  (x,y) = m.groups()
 +  #OR: matches = p.findall("​x=3 y=4 and then x=5, y=7  and x=8, y=9") # return array of tuples
 +</​code>​
 +Or use the **package function**
 +
 +<code python>
 +import re
 +matches = re.findall("​x=(\d+?​).*?​y=(\d+?​)","​x=3 y=4 and then x=5, y=7  and x=8, y=9") # or re.finditer
 +</​code>​
 +==== Selecting Groups From a Match ====
 +The .groups() method can optionally take a sequence that indicates which groups should be returned (group names or indices). eg.
 +
 +<code python>
 +(x,y) = match.groups([1,​4]) #only get first and fourth group from the match
 +</​code>​
 +====== Sending Mail ======
 +<code python>
 +import smtplib
 +msg="​Subject:​ Subject \n\nBlablablablabla"​
 +smtp = smtplib.SMTP("​localhost"​)
 +smtp.sendmail("​root",​ reciever, msg)
 +smtp.quit()
 +</​code>​
 +====== Scipy ======
 +===== Installing =====
 +<code bash>
 +sudo aptitude install python-scipy
 +</​code>​
 +===== Interpolation =====
 +Interpolation is the process of using a set of data values for a function to determine the missing values of that function. Scipy provides a lot of functionality for this. See [[[http://​www.cs.mun.ca/​~rod/​2500/​notes/​interpolation/​interpolation.html|here]]]
 +
 +Simple Example:
 +
 +<code python>
 +
 +import numpy
 +import scipy.interpolate
 +
 +orig_data = numpy.array([1,​2,​3,​0,​5,​0,​5,​0,​7]) # data to smooth
 +
 +# find x,y positions between which to interpolate
 +x_data = [i for i in range(len(orig_data)) if orig_data[i] > 0] # indices where orig_data is valid
 +y_data = orig_data[x_data] ​ # the valid data in orig_data at the corresponding indices
 +spline = scipy.interpolate.splrep(x_data,​y_data,​s=200) # calculate the spline
 +smoothed_data = scipy.interpolate.splev(range(len(orig_data-1)),​ spline) # calculate a complete series
 +
 +# optional: plot data
 +import pylab
 +pylab.plot(orig_data)
 +pylab.plot(smoothed_data)
 +</​code>​
 +
 +===== Regression =====
 +
 +==== Linear Regression ====
 +<code python>
 +import numpy
 +import scipy
 +
 +# data to fit
 +x = numpy.arange(0,​9)
 +y = [19, 20, 20.5, 21.5, 22, 23, 23, 25.5, 24]
 +
 +# calculate regression parameters ( y_fitted = a + bx ) 
 +[a,b] = scipy.polyfit(x,​y,​1)
 +
 +# calculate fit
 +y_fitted = scipy.polyval([a,​b],​x)
 +</​code>​
 +
 +Ordinary Linear Least Squares Fit using mlpy
 +<code python>
 +import numpy
 +import pylab
 +import mlpy
 +
 +#  data to learn
 +x = numpy.random.normal(1,​ 5, 50)
 +x = x.reshape(-1,​1) # need to transform x, the features of each datapoint must be in one row
 +y = numpy.random.normal(2,​2,​ 50)
 +
 +# Ordinary least squares fit
 +ols = mlpy.OLS()
 +ols.learn(x,​y)
 +
 +# predict data using the learned regression
 +x1 = numpy.arange(-20,​20,​0.5).reshape(-1,​1) # features of one input-point must be in a row
 +y1 = ols.pred(x1)
 +
 +# plot
 +pylab.scatter(x,​y)
 +pylab.plot(x1,​y1)
 +</​code>​
 +
 +===== Smoothing Data =====
 +A smoothing function that does exactly the same as the Matlab function "​smooth"​ (from http://​www.scipy.org/​Cookbook/​SignalSmooth )
 +
 +<code python>
 +def smooth(x,​window_len=5,​window='​flat'​):​
 +    if x.ndim != 1:
 +        raise ValueError, "​smooth only accepts 1 dimension arrays."​
 +    if x.size < window_len:
 +        raise ValueError, "Input vector needs to be bigger than window size."
 +    if window_len<​3:​
 +        return x
 +    if not window in ['​flat',​ '​hanning',​ '​hamming',​ '​bartlett',​ '​blackman'​]:​
 +        raise ValueError, "​Window has to be one of '​flat',​ '​hanning',​ '​hamming',​ '​bartlett',​ '​blackman'"​
 +
 +    s=numpy.r_[2*x[0]-x[window_len-1::​-1],​x,​2*x[-1]-x[-1:​-window_len:​-1]]
 +    if window == '​flat':​ #moving average
 +        w=numpy.ones(window_len,'​d'​)
 +    else:
 +        w=eval('​numpy.'​+window+'​(window_len)'​)
 +
 +    y=numpy.convolve(w/​w.sum(),​s,​mode='​same'​)
 +    return y[window_len:​-window_len+1]
 +</​code>​
 +===== Spline Polynoms =====
 +<code python>
 +import numpy
 +import scipy.signal
 +y = numpy.array([41.621207814814809,​ 42.328298238095236,​ 45.881729878787887,​ 43.800834224999996])
 +y_smoothed = scipy.signal.cspline1d(y)
 +</​code>​
 +
 +===== t-test =====
 +
 +<code python>
 +from scipy import stats
 +import numpy
 +import statistics
 +
 +# http://​www.biostathandbook.com/​onesamplettest.html
 +data = [120.6, 116.4,​117.2,​118.1,​114.1,​116.9,​113.3,​121.1,​116.9,​117.0]
 +
 +m = sum(data)/​len(data)
 +
 +null_hypothesis = 120
 +
 +t_value, p_value = stats.ttest_1samp(data,​ null_hypothesis)
 +
 +print(statistics.stdev(data))
 +print(numpy.std(data,​ ddof=1))
 +
 +print(t_value,​ p_value)
 +</​code>​
 +====== Statistical Functions ======
 +===== Pylab =====
 +Pylab provides a number of simple statistical functions:
 +
 +  * pylab.mean
 +  * pyalb.median
 +  * pylab.var
 +
 +===== Python-Statlib =====
 +The google project ​ [[[http://​code.google.com/​p/​python-statlib/​|Python-Statlib]] is the most complete statistical library for python I have found so far.
 +
 +==== Installation ====
 +Unfortunately there is currently no ubuntu package, so you will have to download the latest .tar.gz from http://​code.google.com/​p/​python-statlib/​downloads/​list and install it by extracting it and running:
 +
 +<code bash>
 +sudo python setup.py install
 +</​code>​
 +==== Example ====
 +<code python>
 +from statlib import stats
 +mean = stats.mean([1,​2,​3,​4,​5])
 +</​code>​
 +A complete list of the supported statistical functions can be found at http://​code.google.com/​p/​python-statlib/​wiki/​StatsDoc
 +
 +
 +==== Running Median ====
 +Normally it is advisable to use pylab.median .... but if you have so many values that they don't fit into memory anymore, there is a trick that can give you a rough estimate for the median: ​
 +
 +<code python>
 +
 +def running_median(v,​step_size=0.01):​
 +    """​ Estimate median from v. Warning: this will be inaccurate unless there are MANY values in v! """​
 +    median = v[0]
 +    for i in range(len(v)):​
 +        inc  =  step_size if v[i] > median else -step_size
 +        median += inc
 +    return median
 +
 +
 +# test accurracy of the approach ​       ​
 +errors=[]
 +for i in range(500):
 +    print i
 +    v = numpy.random.uniform(0,​100,​size=50000)
 +    errors.append(running_median(v) - pylab.median(v))
 +
 +pylab.hist(errors,​bins=50)
 +
 +</​code>​
 +
 +(found in comments [[http://​programmingpraxis.com/​2012/​05/​29/​streaming-median/​|here]] - damn I hate paywalls. I'd love to read the piglet tracking paper!)
 +
 +====== Testing ======
 +
 +===== pytest =====
 +
 +https://​docs.pytest.org/​en/​latest/​example/​
 +
 +https://​docs.pytest.org/​en/​latest/​goodpractices.html#​choosing-a-test-layout-import-rules
 +
 +
 +===== unittest =====
 +
 +  * Python 2 https://​docs.python.org/​2.7/​library/​unittest.html
 +  * python 3 https://​docs.python.org/​3.4/​library/​unittest.html
 +  * http://​agiletesting.blogspot.com/​2005/​01/​python-unit-testing-part-1-unittest.html the nutshell
 +
 +==== Basics ====
 +<code python>
 +import unittest
 +class CalculationTest(unittest.TestCase):​
 + # before each test
 + def setUp(self):​
 + self.x = 23
 +
 + # tests = methods whose name starts with '​test',​ executed in order of their function name
 + def testGetArithAvg(self):​
 + self.assertEquals(23,​ self.x)
 +
 +# execute all tests
 +if __name__ == '​__main__':​
 + unittest.main()
 +</​code>​
 +
 +==== classwide Setup and teardown ====
 +
 +There are two class methods that are called before/​after tests in an individual class run. 
 +**setUpClass** and **tearDownClass** are called with the class as the only argument and must be decorated as a classmethod():​
 +
 +<code python>
 +import unittest
 +
 +class MyTest(unittest.case.TestCase):​
 +
 +    @classmethod
 +    def setUpClass(cls):​
 +        cls.msg = "I am upset!" ​        
 +
 +    ​
 +
 +    def test_hello(self):​
 +        print self.msg
 +        ​
 +    @classmethod
 +    def tearDownClass(cls):​
 +        cls.msg = None
 +
 +if __name__ == '​__main__':​
 +    unittest.main()
 +       
 +</​code> ​     ​
 +
 +==== Test for exceptions ====
 +
 +Since [[http://​docs.python.org/​2/​library/​unittest.html#​unittest.TestCase.assertRaises|python 2.7]] this is best done by using the context manager returned by unittest.assertRaises()
 +
 +<code python>
 +with self.assertRaises(SomeException):​
 +    test_something_that_raises_exception()
 +</​code>​
 +
 +For older python versions see [[http://​stackoverflow.com/​questions/​6103825/​how-to-properly-use-unit-testings-assertraises-with-nonetype-objects|this SO question]].
 +
 + 
 +====== Web Services ======
 +There are several python frameworks available.
 +
 +===== ZSI =====
 +ZSI is a framework which supports webservice servers and clients. And has support for wsdl2python as well as dynamic webservice calls via a ServiceProxy. See http://​pywebsvcs.sourceforge.net/​holger.pdf
 +
 +<code bash>
 +sudo aptitude install python-zsi
 +</​code>​
 +===== Accessing a Webservice with ZSI =====
 +<code python>
 +import sys
 +from ZSI.ServiceProxy import ServiceProxy
 +wsdlUrl='​http://​www2.meteomedia.at/​wetter_verkehr/​weather_data.php?​wsdl'​
 +service = ServiceProxy(wsdlUrl,​ tracefile=sys.stdout)
 +service.getLastCalculationTime()
 +</​code>​
 +==== ServiceProxy and caching ====
 +A very important note: ZSI ServiceProxy creates a cache where it puts all the python classes generated from a WSDL. AND DOES NOT REFRESH THAT CACHE for you. So if you are wondering why changes made to a WSDL you are fetching are not reflected in the Python classes you try to use in your client, have a look at ~/​.zsi_service_proxy_dir (ZSI 2.1) or ./​.service_proxy_dir (ZSI 2.0) and clean up!
 +
 +It might also be useful to **explicitly** control which directory is used by:
 +
 +<code python>
 +service = ServiceProxy(wsdlUrl,​cachedir='/​tmp/​zsi_test/',​ tracefile=sys.stdout)
 +</​code>​
 +==== Datetimes ====
 +Datetimes are tricky (see http://​pywebsvcs.sourceforge.net/​zsi.html#​SECTION007600000000000000000 and http://​pywebsvcs.sourceforge.net/​cookbook.pdf for details): basically ZSI does **not** expect a datetime to be a string in the standard xs:datetime format, or a python datetime. Instead it expects a python timetuple in UTC, which means timezones are not supported.
 +
 +<code python>
 +dt = datetime.datetime.now().timetuple() # valid parameter for a webservice request
 +</​code>​
 +===== Soappy =====
 +**Soappy is DEPRECATED and should no longer be used**
 +
 +<code bash>
 +sudo aptitude install python-soappy
 +</​code>​
 +See http://​www.ebi.ac.uk/​Tools/​webservices/​tutorials/​python for tutorials
 +
 +===== Accessing a Webservice with Soappy =====
 +<code python>
 +from SOAPpy import WSDL
 +wsdlUrl = '​http://​1.2.3.4:​8000/​dynamicroutermodule?​wsdl'​
 +service =  WSDL.Proxy(wsdlUrl)
 +request={}
 +request[[fromRoadId]]=10400000586647
 +request[[fromRoadDirection]]=1
 +request[[toRoadId]]=10400002879088
 +request[[toRoadDirection]]=1
 +request[[routingType]]=0
 +route = service.getRoute(arg0=request)
 +</​code>​
 +===== Writing a report file =====
 +
 +==== docx ====
 +
 +
 +==== rtf ====
 +
 +pyth comes with conversion tools but offers no image support.
 +
 +pyrtf-ng ​
 +
 +
 +====== XML Processing ======
 +
 +===== ElementTree =====
 +
 +<code python>
 +import xml.etree.ElementTree
 +
 +#
 +# Parse
 +#
 +
 +# parse xml from file
 +root = xml.etree.ElementTree.parse(filename).getroot() # parse() yields an ElementTree object so we need to explicitly call getroot()
 +
 +# parse xml from string
 +root = xml.etree.ElementTree.fromstring(xml_str) # fromstring() directly yields the root element
 +
 +
 +
 +#
 +# Search
 +#
 +
 +# find a tag via xpath
 +gisroute = root.find("​GisRes/​GisRoute"​)
 +
 +# find multiple tags via xpath
 +connections = root.findall("​ConnectionList/​Connection"​)
 +
 +#
 +# Access
 +#
 +
 +# access attributes
 +route_id = gisroute.attrib["​id"​]
 +
 +# access text
 +txt = gisroute.text
 +
 +
 +</​code>​
 +===== DOM =====
 +==== Parsing a Document ====
 +<code python>
 +import xml.dom.minidom
 +from xml.dom.minidom import Node
 +doc = xml.dom.minidom.parse("​maps.xml"​)
 +for node in doc.getElementsByTagName("​Placemark"​):​
 +    #do something to node
 +</​code>​
 +==== Getting an Attribute ====
 +//Extract the  "​muh"​ from  <node att="​muh"​ /> //
 +
 +<code python>
 +    att = node.getAttribute("​att"​)
 +</​code>​
 +==== Getting text from content of node ====
 +//Extract the "​muh"​ from  <​parent>​MUH</​parent>​ //
 +
 +<code python>
 +for node in parent.childNodes:​
 +    if node.nodeType == Node.TEXT_NODE:​
 +        print node.data
 +</​code>​
 +==== Creating a XML doc ====
 +<code python>
 +from xml.dom.minidom import Document
 +
 +# Create the minidom document
 +doc = Document()
 +
 +# Create the <wml> base element
 +wml = doc.createElement("​wml"​)
 +doc.appendChild(wml)
 +
 +# Create the main <​card>​ element
 +maincard = doc.createElement("​card"​)
 +maincard.setAttribute("​id",​ "​main"​)
 +wml.appendChild(maincard)
 +
 +# Create a <p> element
 +paragraph1 = doc.createElement("​p"​)
 +maincard.appendChild(paragraph1)
 +
 +# Give the <p> elemenet some text
 +ptext = doc.createTextNode("​This is a test!"​)
 +paragraph1.appendChild(ptext)
 +
 +# save
 +out=file('​out.xml','​w'​)
 +doc.writexml(out)
 +out.close()
 +</​code>​
 +(example from http://​www.postneo.com/​projects/​pyxml/​)
 +
 +===== SAX =====
 +//see http://​wiki.python.org/​moin/​Sax //
 +
 +====== Design Patterns ======
 +===== Singleton =====
 +<code python>
 +class Singleton:
 +    __shared_state = {}
 +    def __init__(self):​
 +        self.__dict__ = self.__shared_state
 +
 +s1 = Singleton()
 +s1.x = 1
 +s2 = Singleton()
 +s2.x # 1
 +</​code>​
 +===== Iterator =====
 +<code python>
 +def datetimeIterator(from_date=datetime.now(),​ to_date=None,​ delta=timedelta(days=1) ):
 +    while to_date is None or from_date <= to_date:
 +        yield from_date from_date = from_date + delta
 +    return
 +
 +for d in datetimeIterator(datetime.strptime("​20090101","​%Y%m%d"​),​datetime.strptime("​20090610","​%Y%m%d"​)):​
 +    print datetime.strptime("​20090104","​%Y%m%d"​)==d
 +</​code>​
  
python_cookbook.txt · Last modified: 2019/03/05 09:51 by mantis