Python

Table of Contents

1 Basics

1.1 Data Types

1.1.1 Identifiers and Keywords

Table 1: Python's Keywords
and continue except global lambda pass while
as def False if None raise with
assert del finally import nonlocal return yield
break elif for in not True class
else from is or try    

1.1.2 Boolean

Booleans are either True or False. They can be treated as numbers, True is 1; False is 0.

None is false. Zero values are false. Empty list is false. Empty tuple is false. Empty set is false. Empty dictionary is false.

1.1.3 Numbers

Numbers can be integers, floats, fractions or even complex numbers.

int(float("2.2")) # 2
11/2 # 5.5
11//2 # 5
-11//2 # -6
11 ** 2 # 121
11 % 2 # 1
hex(32) # ox20
bin(32) # 0b100000

1.1.4 Lists

Lists are ordered sequence of vaues.

[x**2 for x in range(5) if x%2] # [1, 9]

alist = ["as", "12", 12]

atuple = tuple(alist) # ("as", "12", 12)
aset = set(alist) # {"as", "12", 12}

# Slicing
print(alist[-1]) # 12
print(alist[-1:-2]) # []
print(alist[-2:-1]) # ["12"]

# Adding
alist = alist + [True] ["as", "12", 12, True]
alist.append(3.0) # ["as", "12", 12, True, 3.0]
alist.inser(0, 3.0) # [3.0, "as", "12", 12, True, 3.0, ]

# Searching
alist.count(3.0) # 2
12 in alist # True
alist.index(12) # 3

# Removing
del alist[1] # [3.0, "12", 12, True, 3.0]
alist.remove(3.0) # ["12", 12, True, 3.0]
alist.pop() # ["12", 12, True]
alist.pop(1) # ["12", True]

1.1.5 Tuples

Tuples are ordered, immutable sequence of values.

# t = (i for i in range(5)) return a generator instead of a tuple
atuple = ("as", "12", 12)

alist = list(atuple) # ["as", "12", 12]
aset = set(atuple) # {"as", "12", 1}

# Slicing
print(atuple[-1]) # 12
print(atuple[-1:-2]) # ()
print(atuple[-2:-1]) # ("12", )

A named tuple behaves just like a plain tuple, but it has the ability to refer to items in the tuple by name as well.

Sale = collections.namedtuple("Sale", "Product Customer Price")
# or Sale = collections.namedtuple("Sale", ("Product", "Customer", "Price"))

s = Sale("Apple", "Howard", 2.6)
#print(s["Customer"]) Error
print(s.Product) # Apple
# s.Product = "Other" Error

1.1.6 Sets

There are two built-in set types: the mutable set type and the immutable frozenset.

Sets are unordered bags of values, and always contain unique items. A set can contain values of any immutable datatype. Only hashable objects may be added to set. Hashable objects are objects which have __hash__() and can be compared using __eq__(). All the built-in immutable data types, such as float, frozenset, int, str and tuple, can be added to sets. The built-ion mutable data types, such as dict, list and set, are not hashable, and can not be added to sets.

It supports the standard comparison opeators(<, <=, =, !, >=, >).

1.1.7 set

The empty set must be created using set(), empty braces {} used to create an empty dict.

#t = {[1,2]} Error: unhashable type: 'list'

{x+2 for x in range(5) if x%2} # {3, 5}

aset = {"as", "12", 12}

alist = list(aset) # ["as", "12", 12]
atuple = set(aset) # ("as", "12", 12)

# Adding
aset.add(True) # {True, "as", "12", 12}
aset.update({12, 3})  #  {True, 3, 12, "as", "12"}
aset.update([5,7]) # {True, 3, 5, 7, 12, 'as', '12'}

# Searching
5 in aset # True

# Removing
aset.discard5() # {True, 3, 7, 12, 'as', '12'}
aset.remove(7) # {True, 3, 12, 'as', '12'}
aset.pop() #  {True, 3, 12, 'as'}

# 
bset = {3, "e3"}
bset.issubset(aset) # False
aset.intersection(bset) # {2} not update aset
aset.difference(bset) # {True, 12, 'as'} not update aset
aset.symmetric_difference(bset) # {True, 'e3', 12, 'as'} not update aset
aset.union(bset) # aset = {True, 'e3', 3, 12, 'as'}
1.1.7.1 frozenset

1.1.8 Dictionaries

A dictionary is an unordered set of key-value pair.

Only hashable objects may be used as dictionary keys, so immutable data types (numbers, str, tuple, frozenset) can be.

It supports equality comparison operators(== and !=) which are applied item by item.

#md = {t:len(t) for t in [(1,2), "asdf"]} # output {(1,2):2, "asdf":4}
adict = dict([("server","db.diveintopython3.org"), ('database','mysql')])
adict = dict(server="db.diveintopython3.org", database='mysql')
adict = {"server":"db.diveintopython3.org", 'database':'mysql'}

adict['server'] # "db.diveintopython3.org"

for k in adict:
  print(k)  # print each key

for v in adict.values():
  print(v)  # print each value

for item adict.items():
  print(item)  # print each key-value pair in tuple

for key, value in adict.items():
  print("({0},{1})".format(key, value)) # print each key-value pair in tuple

adict['database'] = 'oracle' # {"server":"db.diveintopython3.org", 'database':'oracle'}

adict('user') = 'howard' # {"server":"db.diveintopython3.org", "user":"howard", 'database':'mysql'}

1.1.9 None

None is special constant in Python, it is not False, not 0, not empty string. Comparing None to anything other than None will always return False.

1.1.10 Strings

Strings are immutable sequences of Unicode characters.

s = '''qqqqq or
1111'''

s.splitlines() # ['qqqqq or', '1111']
s.lower().count('q') # 5
s.split("or") # ['qqqqq ', '\n1111']
s[4:7] # "q o"
s[7:] # 'r\n1111'

"{0.__class__}".format(1) # "<class 'int'>"
"{var.__class__}".format(var=1) # "<class 'int'>"

h = {"s1": "11", "s2":22} # keys must be string, cannot be tuple
"{s1} {s2}".format(**h)

1.2 Control Flow

if boolean_expression1:
  suite1
elif boolean_expression2:
  suite2
else:
  suite3

expression1 if boolean_expression1 else expression2

while boolean_expression:
  suite1
else: 
# If the loop above does not terminate normally(break statement, return statement or exception), suite2 is skipped.
  suite2

for expression in iterable:
  suite1
else:
# If the loop above does not terminate normally(break statement, return statement or exception), suite2 is skipped.
  suite2

1.2.1 Exception Handling

try:
  suite1
except exceptionType as variable:
# exceptionType can be tuple of multiple exception types. "as variable" is optional
  suite2
else:
# optional. run suite3 if no exception found in suite1
  suite3
finally:
# optional. always run suite4 even if return statement run before
  suite4

class exceptionType1(Exception): # base exception could be Exception or its sub class
  pass 

try:
  raise exceptionType1("customed")
except exceptionType1 as v1:
  ''' output from print below (even if v1 above is declared as type of Exception)
  <class '__main__.exceptionType1'>(customed)
  '''
  print(v1.__class__, ", ", v1) # <class '__main__.exceptionType1'> even if v1 is type of Exception
  try:
    raise Exception("raise again") from v1
  except Exception as v2:
    ''' output from 2 prints below
    <class 'Exception'>(raise again) caused by:
            <class '__main__.exceptionType1'>(customed)
    '''
    print("%s(%s) caused by:"%(v2.__class__, v2))
    v3 = v2.__cause__
    print("\t%s(%s)"%(v3.__class__, v3))

1.2.2 Context Manager

It allow us to simplify code by ensuring that certain operations are performed before and after a particular block is executed. It defines methods __enter__() and __exit__().

with expression as variable: # the return value of __exit__ is assigned to variable
  suit

with expression1 as variable1, expression2 as variable2: # only can be used since python3.1
  suit

The __enter__() is automatically called when it is created in a with statement, and its return value is used for the as variable.

The __exit__(exc_type, exc_val, exc_tb) is automatically called when it goes out of scope after with statement. The returning True will cause the with statement to suppress the exception, otherwise the exception continues propagate after with statement. The parameters are exception type, value and traceback infomation when an exception occurred while in the body of with statement, otherwise they are None.

2 Functions

Four kinds of functions: global functions, local functions(nested functions), lambda functions, and methods.

All function return a value, it will return None if the function does not execute return statement.

It allows function arguments to have default values. Arguments can be specified in any order by using name arguments. As soon as you have a named argument, all arguments to the right of that need to be named arguments, too. We can use the sequence unpacking operator * to supply positional arguments. Also we can unpack a mapping using the mapping unpacking operator ** to supply keyword arguments.

It is best not to use global variables except as constant, if you have to, please use global statement.

def outer():
    def inner1(): # local function
        print("inner1: ", s1)

    def inner2(): # local function
        # prevent a new local variable from being created rather than the one in outer updated
        nonlocal s2 
        s = "str2 from inner"
        print("Inner1: ", s2)

    if True:
        s1 = "str1 from outer" 
        s2 = "str2 from outer" 

    inner1()
    inner2()
    print("outer:  ", s2)

'''
The lambda function can not contain branches or loops(although conditional expressions are allowed),
and can not have a return(or yield) statement. 
'''
s = lambda x, y: {"min":x, "max":y} if x<y else {"min":y, "max":x}
s(3,2) # {'min': 2, 'max': 3}

2.1 Generator

Generators are just a simple form of iterators, it provides a means of lazy evaluation. It is syntactically almost identical to list comprehensions, the difference being that it is enclosed in parentheses rather than brackets.

h = {1:1, 3:3, 2:2}

#g = ((key, h[key]) for key in sorted(h)) # It is not a tuple, it is a generator

def generator(d):
    for key in sorted(d):
        rcv = yield key, d[key]
        print("rcv: ", rcv)

g1 = generator(h)
for i in g1: # repeat calling g1.next() until StopIteration
    print(i)
    print()
print("############")

g2 = generator(h)
for i in range(3):
    print(next(g2)) # g2.next() is called
    print()
print("############")

g3 = generator(h)
print(g3.send(None)) 
for i in range(2):
    print()
    print(g3.send(i))

Generator's methods

  • generator.next()

    start the execution of a generator or resume the execution, returns with value of yield expression or raises StopIteration.

  • generator.send(v)

    It can start a generator by using None as argument. It can resume the execution. It keeps argument v as the result of yield expression, and returns value of yield expression. It could raise StopIteration.

2.2 Partial Function

It is the creation of a function from an existing function and some arguments to produce a new function that does what the original function did, but with some arguments fixed so that callers do not have to pass them.

import functools

seasons = ("Spring", "Summer", "Fall", "Winter")
print(list(enumerate(seasons)))

enumerate1 = functools.partial(enumerate, start=1)
print(list(enumerate1(seasons)))

2.3 Function Decorator

A decorator is a function that takes a function or method as its sole argument and returns a new function or method that incorporates the decorated function or method with some additional functionality added.

def decorator_maker_with_arguments(s):
    print("decorator_maker_with_arguments: ", s)

    def my_decorate(func):
        print("my_decorate ", s)

        @functools.wraps(func) # wrap the function wrapper to make it has the name and docstring of func 
        def wrapper(x):
            print("Before %s(%s)"%(func.__name__, x))
            func(x)
            print("After %s(%s)"%(func.__name__, x))

        return wrapper

    return my_decorate


@decorator_maker_with_arguments("arguments")
def lazy_func(x):
    print("lazy_func(%s)"%x)

# it is equal to decorator_maker_with_arguments("arguments")(lazy_func)("test") 
# when no @decorator_maker_with_arguments("arguments")
lazy_func("test")

2.4 Dynamic Code Execution

To create a function dynamically, we can use built-in exec(object[, globals[,locals]]). object could be either a string or a code object. The return value of exec function is None. In the object, it has no access of any imported modules, any functions or variables, any other objects in the scope of the exec call. These problems addressed by providing the other two arguments(they are dictionaries). The reference to the generated function is added into the locals argument of exec function, if no locals provided, then take globals as locals too.

exec function can handle any amount of code whereas eval function evaluate a single expression.

import math

def outer(oarg):
  code = '''
def inner(iarg): 
  print("oarg: %d, iarg: %d"%(oarg, iarg))
  return math.pi * iarg * oarg 
  '''

  ctxt = {}
  ctxt["math"] = math
  ctxt["oarg"] = oarg
  exec(code, ctxt)
  inner = ctxt["inner"]
  print(inner(3))

outer(2)

3 Classes

Everything in Python is an object, classes are objects, too.

Class names do not have to match module names. It is recomended to use an upercase letter as the first letter of custom modules and custom classes.

All classes are drived directly or indirectly from the ultimate base class object.

Python does not provide: overloading and access control. The attributes that begin with two leading underscore will prevent unintentional accesses so that they can be considered to be private(Actually, they are kept in the name like _classname__attribute, and can be accessed).

3.1 methods

class methods are set up by using the built-in classmethod function as a decorator, but you do not have to put @classmethod before the defination of __new__(), because python knows it. class methods have their firsst argument added by python and it is the class the class methods defined in.

static methods are set up by using the built-in staticmethod function as a decorator, they have no first argument added automatically by python.

instance methods have their first argument added by python and it is the instance the method was called on.

All these 3 kinds of methods get any other arguments we pass to them.

3.1.1 Special Methods

3.1.1.1 __new__() and __init__()

When an object is created, first __new__() is called, then __init()__ is called to initialize it.

__init__ method is called immediately after an instance of the class is created. As with other method, the first argument of __init__ is always a refernce to the current instance of the class, and, by convention, the argument is named self.

3.1.1.2 __iterator__() and __next__()
class Fib:
    def __init__(self, max):
        self.max = max;

    def __iter__(self):
        self.a = 0
        self.b = 1
        return self

    def __next__(self):
        fib = self.a
        if fib>self.max:
            raise StopIteration
        seld.a, self.b = self.b, self.a+self.b
        return fib

An iterator is just a class that defines an __iter__ method, which returns an object that implements a __next__ method, in most cases, __iter__ method returns self, since the class implements __iter__ method also implements its own __next__ method.

__next__ method is called whenever someone calls next() on an iterator of an instance of a class, it raises StopIteration exception when the iteration is exhausted. As for for loop, it will exit the loop when noticing the exception.

def power(values):
    for v in values:
        print("power %d"%v)
        yield v

def adder(values):
    for v in values:
        print("adder %d"%v)
        if v%2==0:
            yield v+3
        else:
            yield v+2

es = [1, 2, 4, 7]
rs = adder(power(es))
for r in rs:
    print(r)

result

power 1
adder 1
3
power 2
adder 2
5
power 4
adder 4
7
power 7
adder 7
9
3.1.1.3 __str__() and __repr__()
class Point:
    def __init__(self, x, y):
        self.x, self.y = x, y

    def __str__(self):
        return self.__repr__()

    def __repr__(self):
        return "{0.__class__.__name__}({0.x}, {0.y})".format(self)

class Circle(Point):
    def __init__(self, radius, x=0, y=0):
        super().__init__(x, y)
        self.radius = radius

    def __str__(self):
        return self.__repr__()

    def __repr__(self):
        return "{0.__class__.__name__}({0.radius}, {0.x}, {0.y})".format(self)


c = Circle(5,3,4)
print(c)

if c.__module__ == "__main__":
  d = eval(repr(c)) # eval("Circle(5,3,4)")
else:
  d = eval(c.__module__+"."+repr(c))

print("id of {0} is {1}\nid of {2} is {3}".format(c, hex(id(c)), d, hex(id(d))))

''' output:
Circle(5, 3, 4)
id of Circle(5, 3, 4) is 0x28124a8
id of Circle(5, 3, 4) is 0x2812550
'''

To call the base version of a method inside a reimplemented method, we can do so using the super().

The functions __str__() and __repr__() are called by built-in function str() and repr() respectively. The result of repr() is intended to be passed to eval() to produce an object equivalent to the one repr() was called on.

3.1.1.4 __eq__() and other comparisons

By default, all instances of custom classes are hashable, so they can be used as dictionary keys and stored in sets. But if we reimplement __eq__(), instances are no longer hashable.

class A:
    def __eq__(self, other):
        print("A __eq__ called: %r == %r"%(self, other))
        return self.va == other # you can try to return NotImplemented, True, etc

class B:
    def __eq__(self, other):
        print("B __eq__ called: %r == %r"%(self, other))
        return self.vb == other # you can try to return NotImplemented, True, etc

a = A()
a.va = 3 # it is int, do not know how to compare itself to B

b = B()
b.vb = 4

print(a==b)

When a==b, it tries the following:

  • if type(b) is a new-style class, and type(b) is a subclass of type(a), and type(b) has overriden __eq__, then the result is b.__eq__(a)
  • if type(a) has overriden __eq__ (that is, type(a).__eq__ is not object.__eq__), then the result is a.__eq__(b)
  • if type(b) has overriden __eq__, the the result is b.__eq__(a)
  • if none of the above are the case, it repeats the process looking for __cmp__. if it exists, the objects are equal if it return zero
  • As a final fallback, it calls object.__eq__(a,b) which is True if a and b are the same object, that is same as a is b

If any of the special methods return NotImplemented, it acts as though the method did not exist.

To provide the complete set of comparisons(<, <=, =, !, >, >=), it is nessary to implement at least three of them: <, <= and ==.

3.2 variables

class Lazy:
    rule = "DefaultClassVariable" # create class variabe inside the class defination, but outside of definations of methods
    def __init__(self):
        #self.rule = "DefaultInstanceVariable" # create an instance variable
        print("\t", self.rule) # if no instance variable, refer to class variable

a = Lazy()
b = Lazy()
print()

print(a.rule) # if no instance variable, refer to class variable
print(b.rule) # 
print(a.__class__.rule)
print(b.__class__.rule)
print()

a.rule = "InstanceVariable" # create an instance variable just for a
print(a.rule)
print(b.rule)
print(a.__class__.rule)
print(b.__class__.rule)
print()

a.__class__.rule = "ClassVariable" # explicitely refer to class variable
print(a.rule)
print(b.rule)
print(a.__class__.rule)
print(b.__class__.rule)

3.3 __slots__

__slots__ is class attribute, and __dict__ is instance attribute.

Due to instance's __dict__ attribute, you can add new attributes to an instance's namespace with any name you want. __slots__ prevents the automatic creation of __dict__ and __weakref__, and save memory, also it can limit set of attribute names that are allowed in instances of the class.

If the base class uses __slots__, the subclass must declare a __slots__, even empty, or the memory saving will be lost. If the base class has no __slots__, __slots__ declared in subclass is meaningless because __dict__ attribute of the base class is always accessible.

You can add __dict__ into __slots__, and enable assignment of new attributes not listed in __slots__.

class Point:
    __slots__ = ("x", "y") 
    def __init__(self, x=0, y=0):
        self.x, self.y = x, y
    def __str__(self):
        return ".x: %d, .y: %d"%(self.x, self.y)

p = Point(1,2)
#print(p.__dict__) # it has no __dict__ attribute because of __slots__
print(p)
p.y = 4 # you can change the value of attributes
del(p.y); p.y = 7 # you can remove the attribute declared in __slots__, and add it later
print(p)
#p.z = 5 # you cannot add other attributes because of __slots__

3.4 attribute access

special methods:

  • __delattr__(self, name)

    del x.n, deletes object x's n attribute

  • __getattr__(self, name)

    v = x.n, returns the value of object x's n attribute if it is not found directly

  • __setattr__(self, name, value)

    x.n = v, set object x's n attribute's value to v

useProperty = False

class Image:
    def __init__(self, width, height):
        self.__width = width  # self.__setattr__("_{classname}__width", width) is called
        self.__height = height

    if useProperty:
        @property
        def width(self):
            return self.__width

        @property
        def height(self):
            return self.__height
    else:
        def __getattr__(self, name):
            classname = self.__class__.__name__
            if name in frozenset({"width", "height"}):
                return self.__dict__["_{classname}__{name}".format(**locals())]
            raise AttributeError("'{classname}' object has no attribute '{name}'".format(**locals()))

        def __setattr__(self, name, value):
            classname = self.__class__.__name__
            if name in frozenset({"width", "height"}):
                raise AttributeError("the attribute '{name}' of {classname} object is immutable".format(**locals()))
            elif name.startswith("_%s"%classname):
                self.__dict__[name] = value

if __name__ == '__main__':
    img = Image(20, 30)
    print("w: %s, h: %s"%(img.width, img.height))

#    img.width = 40 # AttributeError: the attribute 'width' of Image object is immutable

    img._Image__width = 40 
    print("w: %s, h: %s"%(img.width, img.height))

3.5 property

The property class implements __get__ and __set__, so it is a data descriptor. Its __get__ is called in __getattribute__.

class Circle(Point):
    def __init__(self, radius, x=0, y=0):
        super().__init__(x, y)
        self.__radius = radius

    @property # property function takes radius function as getter argument, and returns a property instance
    def radius(self):
        return self.__radius

    @radius.setter # @radius returns an object of class property
    def radius(self, radius):
        assert radius>0, "radius must be positive"
        self.__radius = radius

c = Circle(6, 3, 4)
print(c.radius)
#print(c.radius()) Error 'float' is not callable
#c.radius = 0 Error "radius must be positive"
c.radius = 9
print(c.radius)
#del c.radius Error: cannot delete attribute

To make an attribute(radius) into a readable(writable) property, it would be better to create a private attribute(\radius),

The property() function takes up to four arguments: a getter function, a setter function, a deleter function and a docstring, then returns an object of class property, which has methods getter, setter and deleter to change the getter function, setter function and deleter function respectively.

The @property is the same as calling the property() with just a argument, the getter function.

3.6 Descriptors

A descriptor is an object that is assigned as a class attribute(celsius) of a class(Temperature), and the object is an instance of a class(Celsius) that defines __get__ method, and optionally __set__ and __delete__ methods, these methods are invoked automatically upon the attribute(celsius) access, that is, the attribute(celsius) access is overridden by methods __get__, __set__ and __delete__.

class Celsius:
    '''
    the owner is class Temperature.
    the instance is None if the attribute(celsius) is accessed from the class(/Temperature/).
    its return value is given to the code that requests the value of the attribute.
    '''
    def __get__(self, instance, owner):
        return 5 * (instance.fahrenheit - 32) /9
    def __set__(self, instance, value): # it should not return anything
        instance.fahrenheit =32 + 9 * value / 5

class Temperature:
    def __init__(self, v):
        self.fahrenheit = v
        #To add an instance attribute celsius, you should self.__dict__["celsius"] = 5 because self.celsius = 5 calls __set__ of Celsius
    celsius = Celsius()

t = Temperature(212)
print(t.celsius)
t.celsius = 0
print(t.fahrenheit)

If an object defines both __get__ and __set__, it is called a data descriptor. Descriptors only define __get__ are called non-data descriptors.

Descriptors are mechanism behind properties, methods, static methods, class methods, and super().

3.7 Multiple Inheritance

Multiple inheritance can generally be avoided by using single inheritance and setting a metaclass if we want to support an additional API.

3.8 MRO

MRO stands for Method(attribute) Resolution Order.

Here is the general procedure for access to attribute a of instance i, where C is the class of i.

  • Execute __getattribute__() of the instance, it returns either the attribute value or raise AttributeError.
    • return C.__dict__['a'].__get__(i, C) if C.__dict__ contains 'a' that is a data descriptor
    • return i.__dict__['a'] if i contains 'a'
    • return C.__dict__['a'] if C.__dict__ contains 'a' that is not a data descriptor
    • invoke __getattribute__ of the base class
  • Execute a.__getattr__() if __getattribute__() raises AttributeError.
 class Child():
    def __getattr__(self, name):
        if name == "foo":
            return "Fifth"    

def getattribute1(self, name):
    if name == "foo":
        return "First"
    return object.__getattribute__(self, name)

def getattribute2(self, name):
    if name == "foo":
        raise AttributeError("as")
    return object.__getattribute__(self, name)

bar = Child()

Child.foo = "Fourth"
print(bar.__class__.__dict__, bar.__dict__)
print(bar.foo) #print(Child.__dict__['foo'])

bar.foo = "Third"
print(bar.__class__.__dict__, bar.__dict__)
print(bar.foo) #print(bar.__dict__['foo'])

Child.foo = property(lambda self: "Second") # it is a descriptor
print(bar.__class__.__dict__, bar.__dict__)
print(bar.foo) #print(Child.__dict__['foo'].__get__(bar, Child))

Child.__getattribute__ = getattribute1
print(bar.__class__.__dict__, bar.__dict__)
print(bar.foo) 

Child.__getattribute__ = getattribute2
print(bar.__class__.__dict__, bar.__dict__)
print(bar.foo)

3.9 Class Decorator

Just as we can create decorators for functions and methods, we can also create decorators for entire classes. It takes a class object (the result of class statement) and returns a modified version of the class they decorate.

def delegate(attribute_name, method_names): 
    print("delegate(%s,%s)"%(attribute_name, method_names)) 
    def decorator(cls):
        print("decorate(%s)"%cls.__name__)
        nonlocal attribute_name # without this statement, it raises UnboundLocalError: attribute_name referenced before assignment
        if attribute_name.startswith("__"):
            attribute_name = "_"+cls.__name__+attribute_name
        for name in method_names:
            print("%s.%s(self, *args, **kwargs)"%(cls.__name__, name))
            setattr(cls, name, 
                    eval("lambda self, *args, **kwargs: self.{0}.{1}(*args, **kwargs)".format(attribute_name, name)))
        return cls
    return decorator

@delegate("__list", ("pop", "append", "__getitem__", "__delitem__", "__iter__", "__reversed__", "__len__", "__str__"))
class SortedList:
    def __init__(self):
        self.__list = []

print(SortedList.__dict__) # delegate invoked when SortedList defined
s = SortedList()
s.append(5)
print(len(s))

3.10 Abstract Base Class(ABC)

The purpose of it is to define interface, not to create instance.

It has at least one abstract method or property. Abstract methods can be defined

  • with no implementation(their suite is pass, or raise NotImplementedError()), or
  • with an actual implementation that can be invoked from subclasses.

Classes that derived from an ABC can be instantiated only if they reimplement al the abstract methods and abstract properties they have inherited.

All ABCs must have ametaclass of abc.ABCMeta (from the abc module), or from one of its subclasses.

import abc

class Appliance(metaclass=abc.ABCMeta): # for an ABC, abc.ABCMeta or its subclass is must
    @abc.abstractmethod # to make __init__() an abstract method
    def __init__(self, model, price):
        self.__module = model
        self.price = price # set_price() will be called to initiate private data (__price) directly

    def get_price(self):
        return self.__price

    def set_price(self, price):
        self.__price = price

    price = abc.abstractproperty(get_price, set_price) # to make an abstact readable/writable property

    @property 
    def model(self): # the model property is not abstract, no need reimplementing it in concrete subclass although it is allowed
        return self.__module

class Cooker(Appliance):
    def __init__(self, model, price, fuel):
        super().__init__(model, price)
        self.fuel = fuel

    price = property(lambda self: super().price, lambda self, price: super().set_price(price))

cooker = Cooker("module", 1.2, "oil")
print("model: %s, price: %f, fuel: %s"%(cooker.model, cooker.price, cooker.fuel))
cooker.price = 2.4
print("model: %s, price: %f, fuel: %s"%(cooker.model, cooker.price, cooker.fuel))

3.11 Metaclasses

Classes are objects, so you can

  • assign it to a variable
  • copy it
  • add attributes to it
  • pass it as a function parameter
  • be created dynamically (even in a function)
    def make_class(class_name):
      class C:
          def print_class_name(self):
              print(class_name)
      C.__name__ = class_name
      return C
    
    C1, C2 = [make_class(c) for c in ("C1", "C2")]
    c1, c2 = C1(), C2()
    c1.print_class_name()
    

Python creates a new class by calling a metaclass explicitly. Example shown below. The class type is a metaclass, and all metaclasses must inherit from it or its subclass.

def greet(self, who):
    print(self.greeting, who)

Person = type('Person', (object,), {'greet': greet, 'greeting': 'Hello'}) # type(classname, baseclasses, attributes)

jonathan = Person()
jonathan.greet('Readers') # output: Hello Readers

Also the metaclass can be called implicitly before a class-to-be created, and the metaclass is determined by looking at the baseclasses of the class-to-be(metaclasses are inherited), at the __metaclass__ attribute of the class-to-be or the __metaclass__ global variable.

All metaclasses must inherit from it or its subclass, why?

  • type(object) is type
  • class object is ultimate base class
  • metaclass of a class-to-be must be subclass of the metaclass of its base class

If the metaclass of a class-to-be is determined by its baseclass(instead of __metaclass__ attribute), then methods defined on the metaclass become class methods of the class-to-be, and can be invoked by the class-to-be, but not the instance of the class-to-be, that is different from normal class methods, which can be called from either a class or its instancess.

It an be used to change the classes that use them. If the change involves the name, base classes, or directory of the class beging created(e.g., __slots__), then we need to reimplement the metaclass's __new__(); but for other changes, such as adding methods or data attributes, reimplementing __init__() is sufficient.

class Field(object):
    def __init__(self, ftype):
        self.ftype = ftype

    def is_valid(self, value):
        return isinstance(value, self.ftype)

class EnforcerMeta(type):
    def __init__(cls, name, bases, ns):
        cls._fields = {}
        for key, value in ns.items():
            if isinstance(value, Field):
                cls._fields[key] = value

class Enforcer(metaclass=EnforcerMeta):   
    def __setattr__(self, key, value):
        if key in self._fields:
            if not self._fields[key].is_valid(value):
                raise TypeError('Invalid type for field')
            super().__setattr__(key, value)

class Person(Enforcer):
    name = Field(str)
    age = Field(int)

p = Person()
p.name = "Howard"
p.age = 30
p.name = "Hou"
print(p.name, p.age)
Person.name="123"
print(p.name, p.age)

Class decorators and metaclasses have quite a bit in common. In fact, anything that can be done with a class decorator can done using a metaclass. Metaclasses are capable of more since they are run before the class is created, rather than after, which is the case with decorators.

4 Packages and Modules

4.1 Package

A package is simply a directory that contains a set of modules and a file called init.py, and init.py could be blank, or contain a list(named all) of modules in the directory which will be imported whem from package import * used.

4.2 Module

A python module, in general, is a .py file. Not all modules have associated .py files, like some built-in modules and modules written in other languages. Modules could contain as many class definations as we like.

We can use import to import package or modules in a package. It is recomended to import standard library modules first, then third-party library modules, and fianlly our own modules.

import os 
print(os.path.basename(filename)) # safe fully qualified access 

import os.path as path
print(path.basename(filename)) # risk of name collision with path

from os import path
print(path.basename(filename)) # risk of name collision with path

# * means everything that is not private, or all objects named in global __all__ variable if __all__ is provided
from os.path import * 
print(basename(filename)) # risk of many name collision

When you try to import a module, it looks in all the directories defined in sys.path that is just a list and you can modify it with standard list methods.

Modules are objects, and have a built-in attribute __name__. If you import the module, then __name__ is the module's filename, without a directory path or file extension. If you run the module directly, __name__ is __main__.

4.2.1 Dynamically Importing Modules

# 
fh = open("t.py", "r", encoding="utf8")
code = fh.read()
fh.close()

m = type(sys)("tpy")
exec(code, m.__dict__)
sys.modules["tpy"] = m

if hasattr(m, "printHello"):
    print(m.printHello.__class__)
    m.printHello()

5 regular expressions

5.1 Special Symbols and Chatacters

Notation Description Example Regex
rel1|rel2 Match regular expression rel1 or rel2 foo|bar
. Match any character except \n b.b
[…] Match any single character from character class [aeiou]
[x-y] Match any single character in the range from x to y [0-9]
[^…] Do not match any character from character class [ˆaeiou], [ˆ0-9]
(…) Match enclosed regex and save as subgroup ([0-9]{3})?
* Match 0 or more occurrences of preceding regex [A-Za-z0-9]*
+ Match 1 or more occurrences of preceding regex [a-z]+\.com
? Match 0 or 1 occurrences of precediong regex goo?
{N} Match N occurences of preceding regex [0-9]{3}
{M, N} Match from M to N occurrences of preceding regex [0-9]{5,9}
(*|+|?|{})? 'non-greedy' versions of above occurrence/repetition symbols .*?[a-z]
^ Match start of string ^Dear
$ Match end of string /bin/*sh$
\d same as [0-9] (\D is inverse of \d: [0-9]) data\d.txt
\w same as [A-Za-z0-9] (\W is inverse of \w) [a-z_]\w+
\s whitespace character, same as [\n\t\r\v\f] (§ is inverse of \s) of\sthe
\c Match special character c \., \\, \*
\b Mathc any word boundary (\B is inverse of \b) \bthe\b
\N Match saved group N price: \16

Others are listed below, and only (?P<name>) represents a grouping for matches, all others do not create a group.

  • '(?iLmsux)', embed one or more special 'flags' parameters(like, iGnorecase, Locale, mULTILINE) within the regex itself.
  • '(?:…)', Non-capturing version of regular parentheses. The substring matched by the group cannot be rettrieved after performing a match or referenced later in the pattern.
  • '(?P<name>)', the substring matched by the group is accessible via the symbolic group name name.
  • '(?P=name)', A backrefernce to a named group, it matches whatever text was matched by the earlier group named name
  • '(?#…)', A comment, the content of the parentheses are simply ignored.
  • '(?=…)', lookahead assertion.

    Example, Isaac(?=Asimov) will match Isaac only if it is followed by Asimov.

  • '(?!…)' Negative lookahead assertion.

    Example, Isaac(?!Asimov) will match Isaac only if it is not followed by Asimov.

  • '(?<=….)', Positive lookbehind assertion.

    Example, (?<=abc)def will find a match in abcdef, since it will look back 3 characters and check if the contained patten matches.

  • '(?<!…)', Negative lookbehind assertion
  • '(?(id/name)yes-pattern|no-pattern)', match with yes-pattern if the group with given id or name exists, and with no-pattern if it doesn't. no-pattern is optional.

    Example, ^(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$) will match <user@host.com.cn>, user@host.com.cn, but not with <user@host.com.cn, user@host.com.cn>.

These special characters can be applied to Ruby as well.

5.2 re Module in python

  • re module function only
    • compile(pattern, flags=0)

      compile regex pattern with optional flags, and return a regex object.

  • re module functions and regex object methods
    • match(pattern, string, flags=0)

      try to match pattern to string with optional flags; return match object on success, None on failure.

    • search(pattern, string, flags=0)

      search for first occurrence of pattern within string with optional flags; return match object on success, None on failure.

    • findall(pattern, string, flags=0)

      look up all occurrences of pattern in string; return a list of matches

    • finditer(pattern, string, flags=0)

      same as findall except returns an iterator instead of a list. for each match, the iterator returns a match object.

    • split(pattern, string, max=0)

      split string into a list according to pattern, returns list of successful matches, splitting at most max times.

    • sub(pattern, repl, string, count=0)

      replace all occurrences of the pattern in string with repl, substituting all occurrences unless count provided.

  • common match object methods
    • group(num=0)

      return entire match, or specific subgroup num

    • groups(default=None)

      return all matching subgroup in a tuple(empty if there aren't any)

    • groupdict(default=None)

      return dict containing all matching named subgroups with the names as the keys(empty if there aren't any)

  • common module attributes
    • re.I, re.IGNORECASE

      Case-insensitive matching

    • re.S, re.DOTALL

      . should match all characters including \n

    • re.L, re.LOCALE

      Matches via \w, \W, \b, \B, \s, § depends on locale

    • re.M, re.MULTILINE

      cause ^ and $ to match the begining and end of each line in target string rather than strictly the begnning and end of the entire string.

    • re.X, re.VERBOSE

      All whitespace plus #(and all text after it on a single line) are ignored unless in a character class or backslash-escaped.

Example

import re

astring = " howard@google.com.cn Gorman@baidu.net"

pattern = "(?P<Name>\w+)@(?P<Company>\w+)(?:\.\w+)+"
flags = re.IGNORECASE

print("re.findall():", re.findall(pattern, astring, flags)) 

mt = re.search(pattern, astring, flags)
s = "\nre.search():%s\n"%mt.__class__
s += "\tmt.groups():%s, mt.group(1):%s"%(mt.groups(), mt.group(1))
print(s)

'''
The difference between re.search() and re.match(),
re.search() try to match anywhere of astring,
re.search() just match the start of astring
'''
print("\nre.match():", re.match(pattern, astring, flags)) 

m1 = re.finditer(pattern, astring, flags)
print("\nre.finditer():", m1)
for i, m in enumerate(m1):
    s1 = "\tm.__class__:%s \n"%m.__class__
    s1 += "\t\tm.start({0}): {1}, m.start({0}): {2}\n".format(i, m.start(i), m.end(i))
    s1 += "\t\tm.groupdict():%s\n"%m.groupdict()
    s1 += "\t\tm.groups():{0}, \n".format(m.groups()) # "%s"%tuple does not work
    s1 += "\t\tm.lastindex:%d\n"%m.lastindex # the number of the highest capturing group
    s1 += "\t\tm.group(Name):%s, m.group(Company):%s\n"%(m.group("Name"), m.group("Company"))
    s1 += m.expand("\t\t\g<Name> work for \g<Company>") 
    print(s1)

rx = re.compile(pattern, flags)
s2  = "\nrx=re.compile():%s\n"%rx.__class__
s2 += "\trx.pattern:%s"%rx.pattern
s2 += ", rx.flags:%s"%rx.flags
s2 += ", rx.groupindex:%s\n"%rx.groupindex
'''
You have to make it match from index 1 of astring, otherwise, it returns None,
'''
s2 += "\trx.match(astring, 1):%s"%rx.match(astring, 1).__class__
print(s2)

'''
Both re and rx have function: findall(), finditer(), match(), search(),
and the parameters for these 4 function are same no matter they are of re or rx.
The meaning of each function of them is consistent no matter in re or in rx. 
In re, the parameters are pattern, astring and flags, you can see above.
In rx, the parameters are astring, startindex of astring, and endindex of astring
'''

output:

re.findall(): [('howard', 'google'), ('Gorman', 'baidu')]

re.search():<class '_sre.SRE_Match'>
	mt.groups():('howard', 'google'), mt.group(1):howard

re.match(): None

re.finditer(): <callable_iterator object at 0x00000000027E6E48>
	m.__class__:<class '_sre.SRE_Match'> 
		m.start(0): 1, m.start(0): 21
		m.groupdict():{'Name': 'howard', 'Company': 'google'}
		m.groups():('howard', 'google'), 
		m.lastindex:2
		m.group(Name):howard, m.group(Company):google
		howard work for google
	m.__class__:<class '_sre.SRE_Match'> 
		m.start(1): 22, m.start(1): 28
		m.groupdict():{'Name': 'Gorman', 'Company': 'baidu'}
		m.groups():('Gorman', 'baidu'), 
		m.lastindex:2
		m.group(Name):Gorman, m.group(Company):baidu
		Gorman work for baidu

rx=re.compile():<class '_sre.SRE_Pattern'>
	rx.pattern:(?P<Name>\w+)@(?P<Company>\w+)(?:\.\w+)+, rx.flags:34, rx.groupindex:{'Name': 1, 'Company': 2}
	rx.match(astring, 1):<class '_sre.SRE_Match'>

6 Networking

The primary module is socket module.

tcpSock = socket.socket(AF_INET, SOCK_STREAM)
udpSock = socket.socket(AF_INET, SOCK_DGRAM)

7 Multithread

threading and Queue modules are used for multithreaded programming with Python. With the Queue module, users can create a queue data structure that can be shred across multiple threads.

When the main thread finishes, threading allows the important child threads to finish first before exiting. If you do not care a child thread to finish when the main thread exits, then childthread.daemon = True. The daemon flag can be inherited by its child thread.

import threading
from time import sleep, ctime
from queue import Queue
from random import randint

class MyThread(threading.Thread):
    def __init__(self, func, args, name = ''):
        super().__init__()
        self.name = name
        self.func = func
        self.args = args

    def getResult(self):
        return self.res

    # override run() defined in threading.Thread, called by start() of threading.Thread
    def run(self):  
        print("starting %s at: %s"%(self.name, ctime()))
        self.res = self.func(*self.args)
        print("%s finished at: %s"%(self.name, ctime()))

def writeQ(queue):
    print("producing object for Q ...")
    queue.put('xxx', 1)
    print('size now ', queue.qsize())

def readQ(queue):
    val = queue.get(1)
    print('consumed object from Q ..., size now ', queue.qsize())

def writer(queue, loops):
    for i in range(loops):
        writeQ(queue)
        sleep(randint(1, 3))

def reader(queue, loops):
    for i in range(loops):
        readQ(queue)
        sleep(randint(2, 5))

funcs = [writer, reader]
nfuncs = range(len(funcs))

def main():
    nloops = randint(2, 5)
    q = Queue(32)
    threads = []

    for i in nfuncs:
        t = MyThread(funcs[i], (q, nloops), funcs[i].__name__)
        #t = threading.Thread(target=funcs[i], args=(q, nloops)) # no default kwargs parameter here
        threads.append(t)

    for i in nfuncs:
        threads[i].start()

    for i in nfuncs:
        threads[i].join()

if __name__ == '__main__':
    main()

threading.Thread

  • _init_(group=None, target=None, name=None, args=None, kwargs={}, verbose=None, daemon=None)

    target could be any callable object, args and kwargs are given to target as parameters, group is unimplemented.

  • start() begin thread execution
  • run()

    defining functionality, called by start(), usually overriden in subclass,

  • join(timeout=None)

    suspend until the started thread terminates

  • isDaemon() return True if thread deamonic

Last Updated 2016-01-28 Thu 17:00.

Created by Howard Hou with Emacs 24.5.1 (Org mode 8.2.10)