Python
Table of Contents
- 1. Basics
- 2. Functions
- 3. Classes
- 4. Packages and Modules
- 5. regular expressions
- 6. Networking
- 7. Multithread
1 Basics
1.1 Data Types
1.1.1 Identifiers and Keywords
and | continue | except | global | lambda | pass | while |
as | def | False | if | None | raise | with |
assert | del | finally | import | nonlocal | return | yield |
break | elif | for | in | not | True | class |
else | from | is | or | try |
1.1.2 Boolean
Booleans are either True
or False
. They can be treated as numbers, True
is 1; False
is 0.
None
is false. Zero values are false. Empty list is false. Empty tuple is false. Empty set is false.
Empty dictionary is false.
1.1.3 Numbers
Numbers can be integers, floats, fractions or even complex numbers.
int(float("2.2")) # 2 11/2 # 5.5 11//2 # 5 -11//2 # -6 11 ** 2 # 121 11 % 2 # 1 hex(32) # ox20 bin(32) # 0b100000
1.1.4 Lists
Lists are ordered sequence of vaues.
[x**2 for x in range(5) if x%2] # [1, 9] alist = ["as", "12", 12] atuple = tuple(alist) # ("as", "12", 12) aset = set(alist) # {"as", "12", 12} # Slicing print(alist[-1]) # 12 print(alist[-1:-2]) # [] print(alist[-2:-1]) # ["12"] # Adding alist = alist + [True] ["as", "12", 12, True] alist.append(3.0) # ["as", "12", 12, True, 3.0] alist.inser(0, 3.0) # [3.0, "as", "12", 12, True, 3.0, ] # Searching alist.count(3.0) # 2 12 in alist # True alist.index(12) # 3 # Removing del alist[1] # [3.0, "12", 12, True, 3.0] alist.remove(3.0) # ["12", 12, True, 3.0] alist.pop() # ["12", 12, True] alist.pop(1) # ["12", True]
1.1.5 Tuples
Tuples are ordered, immutable sequence of values.
# t = (i for i in range(5)) return a generator instead of a tuple atuple = ("as", "12", 12) alist = list(atuple) # ["as", "12", 12] aset = set(atuple) # {"as", "12", 1} # Slicing print(atuple[-1]) # 12 print(atuple[-1:-2]) # () print(atuple[-2:-1]) # ("12", )
A named tuple behaves just like a plain tuple, but it has the ability to refer to items in the tuple by name as well.
Sale = collections.namedtuple("Sale", "Product Customer Price") # or Sale = collections.namedtuple("Sale", ("Product", "Customer", "Price")) s = Sale("Apple", "Howard", 2.6) #print(s["Customer"]) Error print(s.Product) # Apple # s.Product = "Other" Error
1.1.6 Sets
There are two built-in set types: the mutable set type and the immutable frozenset.
Sets are unordered bags of values, and always contain unique items.
A set can contain values of any immutable datatype.
Only hashable objects may be added to set. Hashable objects are objects which have
__hash__()
and can be compared using __eq__()
. All the built-in immutable data types, such as
float, frozenset, int, str and tuple, can be added to sets. The built-ion mutable
data types, such as dict, list and set, are not hashable, and can not be added to sets.
It supports the standard comparison opeators(<, <=, =, !
, >=, >).
1.1.7 set
The empty set must be created using set()
, empty braces {}
used to create an empty dict.
#t = {[1,2]} Error: unhashable type: 'list' {x+2 for x in range(5) if x%2} # {3, 5} aset = {"as", "12", 12} alist = list(aset) # ["as", "12", 12] atuple = set(aset) # ("as", "12", 12) # Adding aset.add(True) # {True, "as", "12", 12} aset.update({12, 3}) # {True, 3, 12, "as", "12"} aset.update([5,7]) # {True, 3, 5, 7, 12, 'as', '12'} # Searching 5 in aset # True # Removing aset.discard5() # {True, 3, 7, 12, 'as', '12'} aset.remove(7) # {True, 3, 12, 'as', '12'} aset.pop() # {True, 3, 12, 'as'} # bset = {3, "e3"} bset.issubset(aset) # False aset.intersection(bset) # {2} not update aset aset.difference(bset) # {True, 12, 'as'} not update aset aset.symmetric_difference(bset) # {True, 'e3', 12, 'as'} not update aset aset.union(bset) # aset = {True, 'e3', 3, 12, 'as'}
1.1.7.1 frozenset
1.1.8 Dictionaries
A dictionary is an unordered set of key-value pair.
Only hashable objects may be used as dictionary keys, so immutable data types (numbers, str, tuple, frozenset) can be.
It supports equality comparison operators(== and !=) which are applied item by item.
#md = {t:len(t) for t in [(1,2), "asdf"]} # output {(1,2):2, "asdf":4} adict = dict([("server","db.diveintopython3.org"), ('database','mysql')]) adict = dict(server="db.diveintopython3.org", database='mysql') adict = {"server":"db.diveintopython3.org", 'database':'mysql'} adict['server'] # "db.diveintopython3.org" for k in adict: print(k) # print each key for v in adict.values(): print(v) # print each value for item adict.items(): print(item) # print each key-value pair in tuple for key, value in adict.items(): print("({0},{1})".format(key, value)) # print each key-value pair in tuple adict['database'] = 'oracle' # {"server":"db.diveintopython3.org", 'database':'oracle'} adict('user') = 'howard' # {"server":"db.diveintopython3.org", "user":"howard", 'database':'mysql'}
1.1.9 None
None
is special constant in Python, it is not False
, not 0, not empty string. Comparing None
to anything other
than None
will always return False
.
1.1.10 Strings
Strings are immutable sequences of Unicode characters.
s = '''qqqqq or 1111''' s.splitlines() # ['qqqqq or', '1111'] s.lower().count('q') # 5 s.split("or") # ['qqqqq ', '\n1111'] s[4:7] # "q o" s[7:] # 'r\n1111' "{0.__class__}".format(1) # "<class 'int'>" "{var.__class__}".format(var=1) # "<class 'int'>" h = {"s1": "11", "s2":22} # keys must be string, cannot be tuple "{s1} {s2}".format(**h)
1.2 Control Flow
if boolean_expression1: suite1 elif boolean_expression2: suite2 else: suite3 expression1 if boolean_expression1 else expression2 while boolean_expression: suite1 else: # If the loop above does not terminate normally(break statement, return statement or exception), suite2 is skipped. suite2 for expression in iterable: suite1 else: # If the loop above does not terminate normally(break statement, return statement or exception), suite2 is skipped. suite2
1.2.1 Exception Handling
try: suite1 except exceptionType as variable: # exceptionType can be tuple of multiple exception types. "as variable" is optional suite2 else: # optional. run suite3 if no exception found in suite1 suite3 finally: # optional. always run suite4 even if return statement run before suite4 class exceptionType1(Exception): # base exception could be Exception or its sub class pass try: raise exceptionType1("customed") except exceptionType1 as v1: ''' output from print below (even if v1 above is declared as type of Exception) <class '__main__.exceptionType1'>(customed) ''' print(v1.__class__, ", ", v1) # <class '__main__.exceptionType1'> even if v1 is type of Exception try: raise Exception("raise again") from v1 except Exception as v2: ''' output from 2 prints below <class 'Exception'>(raise again) caused by: <class '__main__.exceptionType1'>(customed) ''' print("%s(%s) caused by:"%(v2.__class__, v2)) v3 = v2.__cause__ print("\t%s(%s)"%(v3.__class__, v3))
1.2.2 Context Manager
It allow us to simplify code by ensuring that certain operations are performed before and after a particular block is
executed. It defines methods __enter__()
and __exit__()
.
with expression as variable: # the return value of __exit__ is assigned to variable suit with expression1 as variable1, expression2 as variable2: # only can be used since python3.1 suit
The __enter__()
is automatically called when it is created in a with
statement, and its return value is used for
the as variable.
The __exit__(exc_type, exc_val, exc_tb)
is automatically called when it goes out of scope after with
statement. The
returning True
will cause the with
statement to suppress the exception, otherwise the exception continues propagate
after with
statement. The parameters are exception type, value and traceback infomation when an exception occurred while
in the body of with
statement, otherwise they are None.
2 Functions
Four kinds of functions: global functions, local functions(nested functions), lambda functions, and methods.
All function return a value, it will return None
if the function does not execute return
statement.
It allows function arguments to have default values. Arguments can be specified in any order by using name arguments. As soon as you have a named argument, all arguments to the right of that need to be named arguments, too. We can use the sequence unpacking operator * to supply positional arguments. Also we can unpack a mapping using the mapping unpacking operator ** to supply keyword arguments.
It is best not to use global variables except as constant, if you have to, please use global
statement.
def outer(): def inner1(): # local function print("inner1: ", s1) def inner2(): # local function # prevent a new local variable from being created rather than the one in outer updated nonlocal s2 s = "str2 from inner" print("Inner1: ", s2) if True: s1 = "str1 from outer" s2 = "str2 from outer" inner1() inner2() print("outer: ", s2) ''' The lambda function can not contain branches or loops(although conditional expressions are allowed), and can not have a return(or yield) statement. ''' s = lambda x, y: {"min":x, "max":y} if x<y else {"min":y, "max":x} s(3,2) # {'min': 2, 'max': 3}
2.1 Generator
Generators are just a simple form of iterators, it provides a means of lazy evaluation. It is syntactically almost identical to list comprehensions, the difference being that it is enclosed in parentheses rather than brackets.
h = {1:1, 3:3, 2:2} #g = ((key, h[key]) for key in sorted(h)) # It is not a tuple, it is a generator def generator(d): for key in sorted(d): rcv = yield key, d[key] print("rcv: ", rcv) g1 = generator(h) for i in g1: # repeat calling g1.next() until StopIteration print(i) print() print("############") g2 = generator(h) for i in range(3): print(next(g2)) # g2.next() is called print() print("############") g3 = generator(h) print(g3.send(None)) for i in range(2): print() print(g3.send(i))
Generator's methods
generator.next()
start the execution of a generator or resume the execution, returns with value of
yield
expression or raises StopIteration.generator.send(v)
It can start a generator by using
None
as argument. It can resume the execution. It keeps argumentv
as the result ofyield
expression, and returns value ofyield
expression. It could raise StopIteration.
2.2 Partial Function
It is the creation of a function from an existing function and some arguments to produce a new function that does what the original function did, but with some arguments fixed so that callers do not have to pass them.
import functools seasons = ("Spring", "Summer", "Fall", "Winter") print(list(enumerate(seasons))) enumerate1 = functools.partial(enumerate, start=1) print(list(enumerate1(seasons)))
2.3 Function Decorator
A decorator is a function that takes a function or method as its sole argument and returns a new function or method that incorporates the decorated function or method with some additional functionality added.
def decorator_maker_with_arguments(s): print("decorator_maker_with_arguments: ", s) def my_decorate(func): print("my_decorate ", s) @functools.wraps(func) # wrap the function wrapper to make it has the name and docstring of func def wrapper(x): print("Before %s(%s)"%(func.__name__, x)) func(x) print("After %s(%s)"%(func.__name__, x)) return wrapper return my_decorate @decorator_maker_with_arguments("arguments") def lazy_func(x): print("lazy_func(%s)"%x) # it is equal to decorator_maker_with_arguments("arguments")(lazy_func)("test") # when no @decorator_maker_with_arguments("arguments") lazy_func("test")
2.4 Dynamic Code Execution
To create a function dynamically, we can use built-in exec(object[, globals[,locals]])
.
object could be either a string or a code object. The return value of exec
function
is None
. In the object, it has no access of any imported modules, any functions or
variables, any other objects in the scope of the exec
call. These problems addressed
by providing the other two arguments(they are dictionaries). The reference to the generated
function is added into the locals argument of exec
function, if no locals provided, then
take globals as locals too.
exec
function can handle any amount of code whereas eval
function evaluate a single expression.
import math def outer(oarg): code = ''' def inner(iarg): print("oarg: %d, iarg: %d"%(oarg, iarg)) return math.pi * iarg * oarg ''' ctxt = {} ctxt["math"] = math ctxt["oarg"] = oarg exec(code, ctxt) inner = ctxt["inner"] print(inner(3)) outer(2)
3 Classes
Everything in Python is an object, classes are objects, too.
Class names do not have to match module names. It is recomended to use an upercase letter as the first letter of custom modules and custom classes.
All classes are drived directly or indirectly from the ultimate base class object
.
Python does not provide: overloading and access control. The attributes that begin with two leading underscore will prevent unintentional accesses so that they can be considered to be private(Actually, they are kept in the name like _classname__attribute, and can be accessed).
3.1 methods
class methods are set up by using the built-in classmethod
function as a decorator, but you do not have to put
@classmethod
before the defination of __new__()
, because python knows it. class methods have their firsst
argument added by python and it is the class the class methods defined in.
static methods are set up by using the built-in staticmethod
function as a decorator, they have no first argument
added automatically by python.
instance methods have their first argument added by python and it is the instance the method was called on.
All these 3 kinds of methods get any other arguments we pass to them.
3.1.1 Special Methods
3.1.1.1 __new__()
and __init__()
When an object is created, first __new__()
is called, then __init()__
is called to initialize it.
__init__
method is called immediately after an instance of the class is created. As with other method, the first
argument of __init__
is always a refernce to the current instance of the class, and, by convention, the argument
is named self.
3.1.1.2 __iterator__()
and __next__()
class Fib: def __init__(self, max): self.max = max; def __iter__(self): self.a = 0 self.b = 1 return self def __next__(self): fib = self.a if fib>self.max: raise StopIteration seld.a, self.b = self.b, self.a+self.b return fib
An iterator is just a class that defines an __iter__
method, which returns an object that implements a __next__
method, in most cases, __iter__
method returns self, since the class implements __iter__
method also
implements its own __next__
method.
__next__
method is called whenever someone calls next()
on an iterator of an instance of a class, it raises
StopIteration
exception when the iteration is exhausted. As for for
loop, it will exit the loop when noticing
the exception.
def power(values): for v in values: print("power %d"%v) yield v def adder(values): for v in values: print("adder %d"%v) if v%2==0: yield v+3 else: yield v+2 es = [1, 2, 4, 7] rs = adder(power(es)) for r in rs: print(r)
result
power 1 adder 1 3 power 2 adder 2 5 power 4 adder 4 7 power 7 adder 7 9
3.1.1.3 __str__()
and __repr__()
class Point: def __init__(self, x, y): self.x, self.y = x, y def __str__(self): return self.__repr__() def __repr__(self): return "{0.__class__.__name__}({0.x}, {0.y})".format(self) class Circle(Point): def __init__(self, radius, x=0, y=0): super().__init__(x, y) self.radius = radius def __str__(self): return self.__repr__() def __repr__(self): return "{0.__class__.__name__}({0.radius}, {0.x}, {0.y})".format(self) c = Circle(5,3,4) print(c) if c.__module__ == "__main__": d = eval(repr(c)) # eval("Circle(5,3,4)") else: d = eval(c.__module__+"."+repr(c)) print("id of {0} is {1}\nid of {2} is {3}".format(c, hex(id(c)), d, hex(id(d)))) ''' output: Circle(5, 3, 4) id of Circle(5, 3, 4) is 0x28124a8 id of Circle(5, 3, 4) is 0x2812550 '''
To call the base version of a method inside a reimplemented method, we can do so using the super()
.
The functions __str__()
and __repr__()
are called by built-in function str()
and repr()
respectively.
The result of repr()
is intended to be passed to eval()
to produce an object equivalent to the one repr()
was called on.
3.1.1.4 __eq__()
and other comparisons
By default, all instances of custom classes are hashable, so they can be used as dictionary keys and stored in sets.
But if we reimplement __eq__()
, instances are no longer hashable.
class A: def __eq__(self, other): print("A __eq__ called: %r == %r"%(self, other)) return self.va == other # you can try to return NotImplemented, True, etc class B: def __eq__(self, other): print("B __eq__ called: %r == %r"%(self, other)) return self.vb == other # you can try to return NotImplemented, True, etc a = A() a.va = 3 # it is int, do not know how to compare itself to B b = B() b.vb = 4 print(a==b)
When a==b, it tries the following:
- if
type(b)
is a new-style class, andtype(b)
is a subclass oftype(a)
, andtype(b)
has overriden__eq__
, then the result isb.__eq__(a)
- if
type(a)
has overriden__eq__
(that is,type(a).__eq__
is notobject.__eq__
), then the result isa.__eq__(b)
- if
type(b)
has overriden__eq__
, the the result isb.__eq__(a)
- if none of the above are the case, it repeats the process looking for
__cmp__
. if it exists, the objects are equal if it returnzero
- As a final fallback, it calls
object.__eq__(a,b)
which isTrue
ifa
andb
are the same object, that is same asa is b
If any of the special methods return NotImplemented
, it acts as though the method did not exist.
To provide the complete set of comparisons(<, <=, =, !
, >, >=), it is nessary to implement at least three of them:
<, <= and ==.
3.2 variables
class Lazy: rule = "DefaultClassVariable" # create class variabe inside the class defination, but outside of definations of methods def __init__(self): #self.rule = "DefaultInstanceVariable" # create an instance variable print("\t", self.rule) # if no instance variable, refer to class variable a = Lazy() b = Lazy() print() print(a.rule) # if no instance variable, refer to class variable print(b.rule) # print(a.__class__.rule) print(b.__class__.rule) print() a.rule = "InstanceVariable" # create an instance variable just for a print(a.rule) print(b.rule) print(a.__class__.rule) print(b.__class__.rule) print() a.__class__.rule = "ClassVariable" # explicitely refer to class variable print(a.rule) print(b.rule) print(a.__class__.rule) print(b.__class__.rule)
3.3 __slots__
__slots__
is class attribute, and __dict__
is instance attribute.
Due to instance's __dict__
attribute, you can add new attributes to an instance's namespace with any name you want.
__slots__
prevents the automatic creation of __dict__
and __weakref__
, and save memory, also it can limit set
of attribute names that are allowed in instances of the class.
If the base class uses __slots__
, the subclass must declare a __slots__
, even empty, or the memory saving will be lost.
If the base class has no __slots__
, __slots__
declared in subclass is meaningless because __dict__
attribute of
the base class is always accessible.
You can add __dict__
into __slots__
, and enable assignment of new attributes not listed in __slots__
.
class Point: __slots__ = ("x", "y") def __init__(self, x=0, y=0): self.x, self.y = x, y def __str__(self): return ".x: %d, .y: %d"%(self.x, self.y) p = Point(1,2) #print(p.__dict__) # it has no __dict__ attribute because of __slots__ print(p) p.y = 4 # you can change the value of attributes del(p.y); p.y = 7 # you can remove the attribute declared in __slots__, and add it later print(p) #p.z = 5 # you cannot add other attributes because of __slots__
3.4 attribute access
special methods:
- __delattr__(self, name)
del x.n, deletes object x's n attribute
- __getattr__(self, name)
v = x.n, returns the value of object x's n attribute if it is not found directly
- __setattr__(self, name, value)
x.n = v, set object x's n attribute's value to v
useProperty = False class Image: def __init__(self, width, height): self.__width = width # self.__setattr__("_{classname}__width", width) is called self.__height = height if useProperty: @property def width(self): return self.__width @property def height(self): return self.__height else: def __getattr__(self, name): classname = self.__class__.__name__ if name in frozenset({"width", "height"}): return self.__dict__["_{classname}__{name}".format(**locals())] raise AttributeError("'{classname}' object has no attribute '{name}'".format(**locals())) def __setattr__(self, name, value): classname = self.__class__.__name__ if name in frozenset({"width", "height"}): raise AttributeError("the attribute '{name}' of {classname} object is immutable".format(**locals())) elif name.startswith("_%s"%classname): self.__dict__[name] = value if __name__ == '__main__': img = Image(20, 30) print("w: %s, h: %s"%(img.width, img.height)) # img.width = 40 # AttributeError: the attribute 'width' of Image object is immutable img._Image__width = 40 print("w: %s, h: %s"%(img.width, img.height))
3.5 property
The property
class implements __get__
and __set__
, so it is a data descriptor. Its __get__
is called in __getattribute__
.
class Circle(Point): def __init__(self, radius, x=0, y=0): super().__init__(x, y) self.__radius = radius @property # property function takes radius function as getter argument, and returns a property instance def radius(self): return self.__radius @radius.setter # @radius returns an object of class property def radius(self, radius): assert radius>0, "radius must be positive" self.__radius = radius c = Circle(6, 3, 4) print(c.radius) #print(c.radius()) Error 'float' is not callable #c.radius = 0 Error "radius must be positive" c.radius = 9 print(c.radius) #del c.radius Error: cannot delete attribute
To make an attribute(radius) into a readable(writable) property, it would be better to create a private attribute(\radius),
The property()
function takes up to four arguments: a getter function, a setter function, a deleter function
and a docstring, then returns an object of class property
, which has methods getter
, setter
and deleter
to
change the getter function, setter function and deleter function respectively.
The @property
is the same as calling the property()
with just a argument, the getter function.
3.6 Descriptors
A descriptor is an object that is assigned as a class attribute(celsius) of a class(Temperature), and the
object is an instance of a class(Celsius) that defines __get__
method, and optionally __set__
and __delete__
methods, these methods are invoked automatically upon the attribute(celsius) access, that is, the
attribute(celsius) access is overridden by methods __get__
, __set__
and __delete__
.
class Celsius: ''' the owner is class Temperature. the instance is None if the attribute(celsius) is accessed from the class(/Temperature/). its return value is given to the code that requests the value of the attribute. ''' def __get__(self, instance, owner): return 5 * (instance.fahrenheit - 32) /9 def __set__(self, instance, value): # it should not return anything instance.fahrenheit =32 + 9 * value / 5 class Temperature: def __init__(self, v): self.fahrenheit = v #To add an instance attribute celsius, you should self.__dict__["celsius"] = 5 because self.celsius = 5 calls __set__ of Celsius celsius = Celsius() t = Temperature(212) print(t.celsius) t.celsius = 0 print(t.fahrenheit)
If an object defines both __get__
and __set__
, it is called a data descriptor. Descriptors only define __get__
are called non-data descriptors.
Descriptors are mechanism behind properties, methods, static methods, class methods, and super()
.
3.7 Multiple Inheritance
Multiple inheritance can generally be avoided by using single inheritance and setting a metaclass if we want to support an additional API.
3.8 MRO
MRO stands for Method(attribute) Resolution Order.
Here is the general procedure for access to attribute a of instance i, where C is the class of i.
- Execute
__getattribute__()
of the instance, it returns either the attribute value or raise AttributeError.- return
C.__dict__['a'].__get__(i, C)
ifC.__dict__
contains 'a' that is a data descriptor - return
i.__dict__['a']
if i contains 'a' - return
C.__dict__['a']
ifC.__dict__
contains 'a' that is not a data descriptor - invoke
__getattribute__
of the base class
- return
- Execute
a.__getattr__()
if__getattribute__()
raisesAttributeError
.
class Child(): def __getattr__(self, name): if name == "foo": return "Fifth" def getattribute1(self, name): if name == "foo": return "First" return object.__getattribute__(self, name) def getattribute2(self, name): if name == "foo": raise AttributeError("as") return object.__getattribute__(self, name) bar = Child() Child.foo = "Fourth" print(bar.__class__.__dict__, bar.__dict__) print(bar.foo) #print(Child.__dict__['foo']) bar.foo = "Third" print(bar.__class__.__dict__, bar.__dict__) print(bar.foo) #print(bar.__dict__['foo']) Child.foo = property(lambda self: "Second") # it is a descriptor print(bar.__class__.__dict__, bar.__dict__) print(bar.foo) #print(Child.__dict__['foo'].__get__(bar, Child)) Child.__getattribute__ = getattribute1 print(bar.__class__.__dict__, bar.__dict__) print(bar.foo) Child.__getattribute__ = getattribute2 print(bar.__class__.__dict__, bar.__dict__) print(bar.foo)
3.9 Class Decorator
Just as we can create decorators for functions and methods, we can also create decorators for entire classes. It takes a class object (the result of class statement) and returns a modified version of the class they decorate.
def delegate(attribute_name, method_names): print("delegate(%s,%s)"%(attribute_name, method_names)) def decorator(cls): print("decorate(%s)"%cls.__name__) nonlocal attribute_name # without this statement, it raises UnboundLocalError: attribute_name referenced before assignment if attribute_name.startswith("__"): attribute_name = "_"+cls.__name__+attribute_name for name in method_names: print("%s.%s(self, *args, **kwargs)"%(cls.__name__, name)) setattr(cls, name, eval("lambda self, *args, **kwargs: self.{0}.{1}(*args, **kwargs)".format(attribute_name, name))) return cls return decorator @delegate("__list", ("pop", "append", "__getitem__", "__delitem__", "__iter__", "__reversed__", "__len__", "__str__")) class SortedList: def __init__(self): self.__list = [] print(SortedList.__dict__) # delegate invoked when SortedList defined s = SortedList() s.append(5) print(len(s))
3.10 Abstract Base Class(ABC)
The purpose of it is to define interface, not to create instance.
It has at least one abstract method or property. Abstract methods can be defined
- with no implementation(their suite is
pass
, orraise NotImplementedError()
), or - with an actual implementation that can be invoked from subclasses.
Classes that derived from an ABC can be instantiated only if they reimplement al the abstract methods and abstract properties they have inherited.
All ABCs must have ametaclass of abc.ABCMeta
(from the abc module), or from one of its subclasses.
import abc class Appliance(metaclass=abc.ABCMeta): # for an ABC, abc.ABCMeta or its subclass is must @abc.abstractmethod # to make __init__() an abstract method def __init__(self, model, price): self.__module = model self.price = price # set_price() will be called to initiate private data (__price) directly def get_price(self): return self.__price def set_price(self, price): self.__price = price price = abc.abstractproperty(get_price, set_price) # to make an abstact readable/writable property @property def model(self): # the model property is not abstract, no need reimplementing it in concrete subclass although it is allowed return self.__module class Cooker(Appliance): def __init__(self, model, price, fuel): super().__init__(model, price) self.fuel = fuel price = property(lambda self: super().price, lambda self, price: super().set_price(price)) cooker = Cooker("module", 1.2, "oil") print("model: %s, price: %f, fuel: %s"%(cooker.model, cooker.price, cooker.fuel)) cooker.price = 2.4 print("model: %s, price: %f, fuel: %s"%(cooker.model, cooker.price, cooker.fuel))
3.11 Metaclasses
Classes are objects, so you can
- assign it to a variable
- copy it
- add attributes to it
- pass it as a function parameter
- be created dynamically (even in a function)
def make_class(class_name): class C: def print_class_name(self): print(class_name) C.__name__ = class_name return C C1, C2 = [make_class(c) for c in ("C1", "C2")] c1, c2 = C1(), C2() c1.print_class_name()
Python creates a new class by calling a metaclass explicitly. Example shown below. The class type
is a metaclass,
and all metaclasses must inherit from it or its subclass.
def greet(self, who): print(self.greeting, who) Person = type('Person', (object,), {'greet': greet, 'greeting': 'Hello'}) # type(classname, baseclasses, attributes) jonathan = Person() jonathan.greet('Readers') # output: Hello Readers
Also the metaclass can be called implicitly before a class-to-be created, and the metaclass is determined by looking at
the baseclasses of the class-to-be(metaclasses are inherited), at the __metaclass__
attribute of the class-to-be or the
__metaclass__
global variable.
All metaclasses must inherit from it or its subclass, why?
type(object)
istype
- class
object
is ultimate base class - metaclass of a class-to-be must be subclass of the metaclass of its base class
If the metaclass of a class-to-be is determined by its baseclass(instead of __metaclass__
attribute), then methods
defined on the metaclass become class methods of the class-to-be, and can be invoked by the class-to-be, but not the
instance of the class-to-be, that is different from normal class methods, which can be called from either a class or its instancess.
It an be used to change the classes that use them. If the change involves the name, base classes, or directory of
the class beging created(e.g., __slots__
), then we need to reimplement the metaclass's __new__()
; but for other
changes, such as adding methods or data attributes, reimplementing __init__()
is sufficient.
class Field(object): def __init__(self, ftype): self.ftype = ftype def is_valid(self, value): return isinstance(value, self.ftype) class EnforcerMeta(type): def __init__(cls, name, bases, ns): cls._fields = {} for key, value in ns.items(): if isinstance(value, Field): cls._fields[key] = value class Enforcer(metaclass=EnforcerMeta): def __setattr__(self, key, value): if key in self._fields: if not self._fields[key].is_valid(value): raise TypeError('Invalid type for field') super().__setattr__(key, value) class Person(Enforcer): name = Field(str) age = Field(int) p = Person() p.name = "Howard" p.age = 30 p.name = "Hou" print(p.name, p.age) Person.name="123" print(p.name, p.age)
Class decorators and metaclasses have quite a bit in common. In fact, anything that can be done with a class decorator can done using a metaclass. Metaclasses are capable of more since they are run before the class is created, rather than after, which is the case with decorators.
4 Packages and Modules
4.1 Package
A package is simply a directory that contains a set of modules and a file called init.py, and init.py
could be blank, or contain a list(named all) of modules in the directory which will be imported whem
from package import *
used.
4.2 Module
A python module, in general, is a .py file. Not all modules have associated .py files, like some built-in modules and modules written in other languages. Modules could contain as many class definations as we like.
We can use import
to import package or modules in a package. It is recomended to import standard library
modules first, then third-party library modules, and fianlly our own modules.
import os print(os.path.basename(filename)) # safe fully qualified access import os.path as path print(path.basename(filename)) # risk of name collision with path from os import path print(path.basename(filename)) # risk of name collision with path # * means everything that is not private, or all objects named in global __all__ variable if __all__ is provided from os.path import * print(basename(filename)) # risk of many name collision
When you try to import a module, it looks in all the directories defined in sys.path
that is just a list and you
can modify it with standard list methods.
Modules are objects, and have a built-in attribute __name__
. If you import the module, then __name__
is the
module's filename, without a directory path or file extension. If you run the module directly, __name__
is
__main__
.
4.2.1 Dynamically Importing Modules
# fh = open("t.py", "r", encoding="utf8") code = fh.read() fh.close() m = type(sys)("tpy") exec(code, m.__dict__) sys.modules["tpy"] = m if hasattr(m, "printHello"): print(m.printHello.__class__) m.printHello()
5 regular expressions
5.1 Special Symbols and Chatacters
Notation | Description | Example Regex |
---|---|---|
rel1|rel2 | Match regular expression rel1 or rel2 | foo|bar |
. | Match any character except \n | b.b |
[…] | Match any single character from character class | [aeiou] |
[x-y] | Match any single character in the range from x to y | [0-9] |
[^…] | Do not match any character from character class | [ˆaeiou], [ˆ0-9] |
(…) | Match enclosed regex and save as subgroup | ([0-9]{3})? |
* | Match 0 or more occurrences of preceding regex | [A-Za-z0-9]* |
+ | Match 1 or more occurrences of preceding regex | [a-z]+\.com |
? | Match 0 or 1 occurrences of precediong regex | goo? |
{N} | Match N occurences of preceding regex | [0-9]{3} |
{M, N} | Match from M to N occurrences of preceding regex | [0-9]{5,9} |
(*|+|?|{})? | 'non-greedy' versions of above occurrence/repetition symbols | .*?[a-z] |
^ | Match start of string | ^Dear |
$ | Match end of string | /bin/*sh$ |
\d | same as [0-9] (\D is inverse of \d: [0-9]) | data\d.txt |
\w | same as [A-Za-z0-9] (\W is inverse of \w) | [a-z_]\w+ |
\s | whitespace character, same as [\n\t\r\v\f] (§ is inverse of \s) | of\sthe |
\c | Match special character c | \., \\, \* |
\b | Mathc any word boundary (\B is inverse of \b) | \bthe\b |
\N | Match saved group N | price: \16 |
Others are listed below, and only (?P<name>) represents a grouping for matches, all others do not create a group.
- '(?iLmsux)', embed one or more special 'flags' parameters(like, iGnorecase, Locale, mULTILINE) within the regex itself.
- '(?:…)', Non-capturing version of regular parentheses. The substring matched by the group cannot be rettrieved after performing a match or referenced later in the pattern.
- '(?P<name>)', the substring matched by the group is accessible via the symbolic group name name.
- '(?P=name)', A backrefernce to a named group, it matches whatever text was matched by the earlier group named name
- '(?#…)', A comment, the content of the parentheses are simply ignored.
- '(?=…)', lookahead assertion.
Example, Isaac(?=Asimov) will match Isaac only if it is followed by Asimov.
- '(?!…)' Negative lookahead assertion.
Example, Isaac(?!Asimov) will match Isaac only if it is not followed by Asimov.
- '(?<=….)', Positive lookbehind assertion.
Example, (?<=abc)def will find a match in abcdef, since it will look back 3 characters and check if the contained patten matches.
- '(?<!…)', Negative lookbehind assertion
- '(?(id/name)yes-pattern|no-pattern)', match with yes-pattern if the group with given id or name exists,
and with no-pattern if it doesn't. no-pattern is optional.
Example, ^(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$) will match <user@host.com.cn>, user@host.com.cn, but not with <user@host.com.cn, user@host.com.cn>.
These special characters can be applied to Ruby as well.
5.2 re Module in python
- re module function only
- compile(pattern, flags=0)
compile regex pattern with optional flags, and return a regex object.
- compile(pattern, flags=0)
- re module functions and regex object methods
- match(pattern, string, flags=0)
try to match pattern to string with optional flags; return match object on success, None on failure.
- search(pattern, string, flags=0)
search for first occurrence of pattern within string with optional flags; return match object on success, None on failure.
- findall(pattern, string, flags=0)
look up all occurrences of pattern in string; return a list of matches
- finditer(pattern, string, flags=0)
same as findall except returns an iterator instead of a list. for each match, the iterator returns a match object.
- split(pattern, string, max=0)
split string into a list according to pattern, returns list of successful matches, splitting at most max times.
- sub(pattern, repl, string, count=0)
replace all occurrences of the pattern in string with repl, substituting all occurrences unless count provided.
- match(pattern, string, flags=0)
- common match object methods
- group(num=0)
return entire match, or specific subgroup num
- groups(default=None)
return all matching subgroup in a tuple(empty if there aren't any)
- groupdict(default=None)
return dict containing all matching named subgroups with the names as the keys(empty if there aren't any)
- group(num=0)
- common module attributes
- re.I, re.IGNORECASE
Case-insensitive matching
- re.S, re.DOTALL
. should match all characters including \n
- re.L, re.LOCALE
Matches via \w, \W, \b, \B, \s, § depends on locale
- re.M, re.MULTILINE
cause ^ and $ to match the begining and end of each line in target string rather than strictly the begnning and end of the entire string.
- re.X, re.VERBOSE
All whitespace plus #(and all text after it on a single line) are ignored unless in a character class or backslash-escaped.
- re.I, re.IGNORECASE
Example
import re astring = " howard@google.com.cn Gorman@baidu.net" pattern = "(?P<Name>\w+)@(?P<Company>\w+)(?:\.\w+)+" flags = re.IGNORECASE print("re.findall():", re.findall(pattern, astring, flags)) mt = re.search(pattern, astring, flags) s = "\nre.search():%s\n"%mt.__class__ s += "\tmt.groups():%s, mt.group(1):%s"%(mt.groups(), mt.group(1)) print(s) ''' The difference between re.search() and re.match(), re.search() try to match anywhere of astring, re.search() just match the start of astring ''' print("\nre.match():", re.match(pattern, astring, flags)) m1 = re.finditer(pattern, astring, flags) print("\nre.finditer():", m1) for i, m in enumerate(m1): s1 = "\tm.__class__:%s \n"%m.__class__ s1 += "\t\tm.start({0}): {1}, m.start({0}): {2}\n".format(i, m.start(i), m.end(i)) s1 += "\t\tm.groupdict():%s\n"%m.groupdict() s1 += "\t\tm.groups():{0}, \n".format(m.groups()) # "%s"%tuple does not work s1 += "\t\tm.lastindex:%d\n"%m.lastindex # the number of the highest capturing group s1 += "\t\tm.group(Name):%s, m.group(Company):%s\n"%(m.group("Name"), m.group("Company")) s1 += m.expand("\t\t\g<Name> work for \g<Company>") print(s1) rx = re.compile(pattern, flags) s2 = "\nrx=re.compile():%s\n"%rx.__class__ s2 += "\trx.pattern:%s"%rx.pattern s2 += ", rx.flags:%s"%rx.flags s2 += ", rx.groupindex:%s\n"%rx.groupindex ''' You have to make it match from index 1 of astring, otherwise, it returns None, ''' s2 += "\trx.match(astring, 1):%s"%rx.match(astring, 1).__class__ print(s2) ''' Both re and rx have function: findall(), finditer(), match(), search(), and the parameters for these 4 function are same no matter they are of re or rx. The meaning of each function of them is consistent no matter in re or in rx. In re, the parameters are pattern, astring and flags, you can see above. In rx, the parameters are astring, startindex of astring, and endindex of astring '''
output:
re.findall(): [('howard', 'google'), ('Gorman', 'baidu')] re.search():<class '_sre.SRE_Match'> mt.groups():('howard', 'google'), mt.group(1):howard re.match(): None re.finditer(): <callable_iterator object at 0x00000000027E6E48> m.__class__:<class '_sre.SRE_Match'> m.start(0): 1, m.start(0): 21 m.groupdict():{'Name': 'howard', 'Company': 'google'} m.groups():('howard', 'google'), m.lastindex:2 m.group(Name):howard, m.group(Company):google howard work for google m.__class__:<class '_sre.SRE_Match'> m.start(1): 22, m.start(1): 28 m.groupdict():{'Name': 'Gorman', 'Company': 'baidu'} m.groups():('Gorman', 'baidu'), m.lastindex:2 m.group(Name):Gorman, m.group(Company):baidu Gorman work for baidu rx=re.compile():<class '_sre.SRE_Pattern'> rx.pattern:(?P<Name>\w+)@(?P<Company>\w+)(?:\.\w+)+, rx.flags:34, rx.groupindex:{'Name': 1, 'Company': 2} rx.match(astring, 1):<class '_sre.SRE_Match'>
6 Networking
The primary module is socket module.
tcpSock = socket.socket(AF_INET, SOCK_STREAM) udpSock = socket.socket(AF_INET, SOCK_DGRAM)
7 Multithread
threading and Queue modules are used for multithreaded programming with Python. With the Queue module, users can create a queue data structure that can be shred across multiple threads.
When the main thread finishes, threading allows the important child threads to finish first before exiting. If you do not care a child thread to finish when the main thread exits, then childthread.daemon = True. The daemon flag can be inherited by its child thread.
import threading from time import sleep, ctime from queue import Queue from random import randint class MyThread(threading.Thread): def __init__(self, func, args, name = ''): super().__init__() self.name = name self.func = func self.args = args def getResult(self): return self.res # override run() defined in threading.Thread, called by start() of threading.Thread def run(self): print("starting %s at: %s"%(self.name, ctime())) self.res = self.func(*self.args) print("%s finished at: %s"%(self.name, ctime())) def writeQ(queue): print("producing object for Q ...") queue.put('xxx', 1) print('size now ', queue.qsize()) def readQ(queue): val = queue.get(1) print('consumed object from Q ..., size now ', queue.qsize()) def writer(queue, loops): for i in range(loops): writeQ(queue) sleep(randint(1, 3)) def reader(queue, loops): for i in range(loops): readQ(queue) sleep(randint(2, 5)) funcs = [writer, reader] nfuncs = range(len(funcs)) def main(): nloops = randint(2, 5) q = Queue(32) threads = [] for i in nfuncs: t = MyThread(funcs[i], (q, nloops), funcs[i].__name__) #t = threading.Thread(target=funcs[i], args=(q, nloops)) # no default kwargs parameter here threads.append(t) for i in nfuncs: threads[i].start() for i in nfuncs: threads[i].join() if __name__ == '__main__': main()
threading.Thread
- _init_(group=None, target=None, name=None, args=None, kwargs={}, verbose=None, daemon=None)
target could be any callable object, args and kwargs are given to target as parameters, group is unimplemented.
- start() begin thread execution
- run()
defining functionality, called by start(), usually overriden in subclass,
- join(timeout=None)
suspend until the started thread terminates
- isDaemon() return True if thread deamonic