python学习笔记-day-9(代码片段)

author author     2022-11-29     140

关键词:

进程与线程

什么是线程(thread)?

线程是操作系统能够进行运算调度的最小单位。它被包含在进程之中,是进程中的实际运作单位。一条线程指的是进程中一个单一顺序的控制流,一个进程中可以并发多个线程,每条线程并行执行不同的任务

A thread is an execution context, which is all the information a CPU needs to execute a stream of instructions.

Suppose you‘re reading a book, and you want to take a break right now, but you want to be able to come back and resume reading from the exact point where you stopped. One way to achieve that is by jotting down the page number, line number, and word number. So your execution context for reading a book is these 3 numbers.

If you have a roommate, and she‘s using the same technique, she can take the book while you‘re not using it, and resume reading from where she stopped. Then you can take it back, and resume it from where you were.

Threads work in the same way. A CPU is giving you the illusion that it‘s doing multiple computations at the same time. It does that by spending a bit of time on each computation. It can do that because it has an execution context for each computation. Just like you can share a book with your friend, many tasks can share a CPU.

On a more technical level, an execution context (therefore a thread) consists of the values of the CPU‘s registers.

Last: threads are different from processes. A thread is a context of execution, while a process is a bunch of resources associated with a computation. A process can have one or many threads.

Clarification: the resources associated with a process include memory pages (all the threads in a process have the same view of the memory), file descriptors (e.g., open sockets), and security credentials (e.g., the ID of the user who started the process).

什么是进程(process)?

An executing instance of a program is called a process.

Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.

进程与线程的区别?

  1. Threads share the address space of the process that created it; processes have their own address space.
  2. Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.
  3. Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.
  4. New threads are easily created; new processes require duplication of the parent process.
  5. Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.
  6. Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.

Python GIL(Global Interpreter Lock)

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

上面的核心意思就是,无论你启多少个线程,你有多少个cpu, Python在执行的时候会淡定的在同一时刻只允许一个线程运行,擦。。。,那这还叫什么多线程呀?莫如此早的下结结论,听我现场讲。

首先需要明确的一点是GIL并不是Python的特性,它是在实现Python解析器(CPython)时所引入的一个概念。就好比C++是一套语言(语法)标准,但是可以用不同的编译器来编译成可执行代码。有名的编译器例如GCC,INTEL C++,Visual C++等。Python也一样,同样一段代码可以通过CPython,PyPy,Psyco等不同的Python执行环境来执行。像其中的JPython就没有GIL。然而因为CPython是大部分环境下默认的Python执行环境。所以在很多人的概念里CPython就是Python,也就想当然的把GIL归结为Python语言的缺陷。所以这里要先明确一点:GIL并不是Python的特性,Python完全可以不依赖于GIL

这篇文章透彻的剖析了GIL对python多线程的影响,强烈推荐看一下:http://www.dabeaz.com/python/UnderstandingGIL.pdf 

 

Python threading模块

线程有2种调用方式,如下:

直接调用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import threading
import time
 
def sayhi(num): #定义每个线程要运行的函数
 
    print("running on number:%s" %num)
 
    time.sleep(3)
 
if __name__ == ‘__main__‘:
 
    t1 = threading.Thread(target=sayhi,args=(1,)) #生成一个线程实例
    t2 = threading.Thread(target=sayhi,args=(2,)) #生成另一个线程实例
 
    t1.start() #启动线程
    t2.start() #启动另一个线程
 
    print(t1.getName()) #获取线程名
    print(t2.getName())

继承式调用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import threading
import time
 
 
class MyThread(threading.Thread):
    def __init__(self,num):
        threading.Thread.__init__(self)
        self.num = num
 
    def run(self):#定义每个线程要运行的函数
 
        print("running on number:%s" %self.num)
 
        time.sleep(3)
 
if __name__ == ‘__main__‘:
 
    t1 = MyThread(1)
    t2 = MyThread(2)
    t1.start()
    t2.start()

Join & Daemon

Some threads do background tasks, like sending keepalive packets, or performing periodic garbage collection, or whatever. These are only useful when the main program is running, and it‘s okay to kill them off once the other, non-daemon, threads have exited.

Without daemon threads, you‘d have to keep track of them, and tell them to exit, before your program can completely quit. By setting them as daemon threads, you can let them run and forget about them, and when your program quits, any daemon threads are killed automatically.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#_*_coding:utf-8_*_
__author__ = ‘Alex Li‘
 
import time
import threading
 
 
def run(n):
 
    print(‘[%s]------running----\n‘ % n)
    time.sleep(2)
    print(‘--done--‘)
 
def main():
    for in range(5):
        = threading.Thread(target=run,args=[i,])
        t.start()
        t.join(1)
        print(‘starting thread‘, t.getName())
 
 
= threading.Thread(target=main,args=[])
m.setDaemon(True#将main线程设置为Daemon线程,它做为程序主线程的守护线程,当主线程退出时,m线程也会退出,由m启动的其它子线程会同时退出,不管是否执行完任务
m.start()
m.join(timeout=2)
print("---main thread done----")

  

Note:Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event.

  

 

 

线程锁(互斥锁Mutex)

一个进程下可以启动多个线程,多个线程共享父进程的内存空间,也就意味着每个线程可以访问同一份数据,此时,如果2个线程同时要修改同一份数据,会出现什么状况?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import time
import threading
 
def addNum():
    global num #在每个线程中都获取这个全局变量
    print(‘--get num:‘,num )
    time.sleep(1)
    num  -=1 #对此公共变量进行-1操作
 
num = 100  #设定一个共享变量
thread_list = []
for in range(100):
    = threading.Thread(target=addNum)
    t.start()
    thread_list.append(t)
 
for in thread_list: #等待所有线程执行完毕
    t.join()
 
 
print(‘final num:‘, num )

正常来讲,这个num结果应该是0, 但在python 2.7上多运行几次,会发现,最后打印出来的num结果不总是0,为什么每次运行的结果不一样呢? 哈,很简单,假设你有A,B两个线程,此时都 要对num 进行减1操作, 由于2个线程是并发同时运行的,所以2个线程很有可能同时拿走了num=100这个初始变量交给cpu去运算,当A线程去处完的结果是99,但此时B线程运算完的结果也是99,两个线程同时CPU运算的结果再赋值给num变量后,结果就都是99。那怎么办呢? 很简单,每个线程在要修改公共数据时,为了避免自己在还没改完的时候别人也来修改此数据,可以给这个数据加一把锁, 这样其它线程想修改此数据时就必须等待你修改完毕并把锁释放掉后才能再访问此数据。 

*注:不要在3.x上运行,不知为什么,3.x上的结果总是正确的,可能是自动加了锁

加锁版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import time
import threading
 
def addNum():
    global num #在每个线程中都获取这个全局变量
    print(‘--get num:‘,num )
    time.sleep(1)
    lock.acquire() #修改数据前加锁
    num  -=1 #对此公共变量进行-1操作
    lock.release() #修改后释放
 
num = 100  #设定一个共享变量
thread_list = []
lock = threading.Lock() #生成全局锁
for in range(100):
    = threading.Thread(target=addNum)
    t.start()
    thread_list.append(t)
 
for in thread_list: #等待所有线程执行完毕
    t.join()
 
print(‘final num:‘, num )

 

GIL VS Lock 

机智的同学可能会问到这个问题,就是既然你之前说过了,Python已经有一个GIL来保证同一时间只能有一个线程来执行了,为什么这里还需要lock? 注意啦,这里的lock是用户级的lock,跟那个GIL没关系 ,具体我们通过下图来看一下+配合我现场讲给大家,就明白了。

技术分享

 

  

RLock(递归锁)

说白了就是在一个大锁中还要再包含子锁

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import threading,time
 
def run1():
    print("grab the first part data")
    lock.acquire()
    global num
    num +=1
    lock.release()
    return num
def run2():
    print("grab the second part data")
    lock.acquire()
    global  num2
    num2+=1
    lock.release()
    return num2
def run3():
    lock.acquire()
    res = run1()
    print(‘--------between run1 and run2-----‘)
    res2 = run2()
    lock.release()
    print(res,res2)
 
 
if __name__ == ‘__main__‘:
 
    num,num2 = 0,0
    lock = threading.RLock()
    for in range(10):
        = threading.Thread(target=run3)
        t.start()
 
while threading.active_count() != 1:
    print(threading.active_count())
else:
    print(‘----all threads done---‘)
    print(num,num2)

  

Semaphore(信号量)

互斥锁 同时只允许一个线程更改数据,而Semaphore是同时允许一定数量的线程更改数据 ,比如厕所有3个坑,那最多只允许3个人上厕所,后面的人只能等里面有人出来了才能再进去。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import threading,time
 
def run(n):
    semaphore.acquire()
    time.sleep(1)
    print("run the thread: %s\n" %n)
    semaphore.release()
 
if __name__ == ‘__main__‘:
 
    num= 0
    semaphore  = threading.BoundedSemaphore(5#最多允许5个线程同时运行
    for in range(20):
        = threading.Thread(target=run,args=(i,))
        t.start()
 
while threading.active_count() != 1:
    pass #print threading.active_count()
else:
    print(‘----all threads done---‘)
    print(num)

 

Events

An event is a simple synchronization object;

the event represents an internal flag, and threads
can wait for the flag to be set, or set or clear the flag themselves.

event = threading.Event()

# a client thread can wait for the flag to be set
event.wait()

# a server thread can set or reset it
event.set()
event.clear()
If the flag is set, the wait method doesn’t do anything.
If the flag is cleared, wait will block until it becomes set again.
Any number of threads may wait for the same event.

通过Event来实现两个或多个线程间的交互,下面是一个红绿灯的例子,即起动一个线程做交通指挥灯,生成几个线程做车辆,车辆行驶按红灯停,绿灯行的规则。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import threading,time
import random
def light():
    if not event.isSet():
        event.set() #wait就不阻塞 #绿灯状态
    count = 0
    while True:
        if count < 10:
            print(‘\033[42;1m--green light on---\033[0m‘)
        elif count <13:
            print(‘\033[43;1m--yellow light on---\033[0m‘)
        elif count <20:
            if event.isSet():
                event.clear()
            print(‘\033[41;1m--red light on---\033[0m‘)
        else:
            count = 0
            event.set() #打开绿灯
        time.sleep(1)
        count +=1
def car(n):
    while 1:
        time.sleep(random.randrange(10))
        if  event.isSet(): #绿灯
            print("car [%s] is running.." % n)
        else:
            print("car [%s] is waiting for the red light.." %n)
if __name__ == ‘__main__‘:
    event = threading.Event()
    Light = threading.Thread(target=light)
    Light.start()
    for in range(3):
        = threading.Thread(target=car,args=(i,))
        t.start()

这里还有一个event使用的例子,员工进公司门要刷卡, 我们这里设置一个线程是“门”, 再设置几个线程为“员工”,员工看到门没打开,就刷卡,刷完卡,门开了,员工就可以通过。

技术分享 View Code

 

  

  

queue队列 

queue is especially useful in threaded programming when information must be exchanged safely between multiple threads.

class queue.Queue(maxsize=0) #先入先出
class queue.LifoQueue(maxsize=0) #last in fisrt out 
class queue.PriorityQueue(maxsize=0) #存储数据时可设置优先级的队列

Constructor for a priority queue. maxsize is an integer that sets the upperbound limit on the number of items that can be placed in the queue. Insertion will block once this size has been reached, until queue items are consumed. If maxsize is less than or equal to zero, the queue size is infinite.

The lowest valued entries are retrieved first (the lowest valued entry is the one returned by sorted(list(entries))[0]). A typical pattern for entries is a tuple in the form: (priority_number, data).

exception queue.Empty

Exception raised when non-blocking get() (or get_nowait()) is called on a Queue object which is empty.

exception queue.Full

Exception raised when non-blocking put() (or put_nowait()) is called on a Queue object which is full.

Queue.qsize()
Queue.empty() #return True if empty  
Queue.full() # return True if full 
Queue.put(itemblock=Truetimeout=None)

Put item into the queue. If optional args block is true and timeout is None (the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Full exception if no free slot was available within that time. Otherwise (block is false), put an item on the queue if a free slot is immediately available, else raise the Full exception (timeout is ignored in that case).

Queue.put_nowait(item)

Equivalent to put(item, False).

Queue.get(block=Truetimeout=None)

Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).

Queue.get_nowait()

Equivalent to get(False).

Two methods are offered to support tracking whether enqueued tasks have been fully processed by daemon consumer threads.

Queue.task_done()

Indicate that a formerly enqueued task is complete. Used by queue consumer threads. For each get() used to fetch a task, a subsequent call to task_done() tells the queue that the processing on the task is complete.

If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue).

Raises a ValueError if called more times than there were items placed in the queue.

Queue.join() block直到queue被消费完毕

生产者消费者模型

在并发编程中使用生产者和消费者模式能够解决绝大多数并发问题。该模式通过平衡生产线程和消费线程的工作能力来提高程序的整体处理数据的速度。

为什么要使用生产者和消费者模式

在线程世界里,生产者就是生产数据的线程,消费者就是消费数据的线程。在多线程开发当中,如果生产者处理速度很快,而消费者处理速度很慢,那么生产者就必须等待消费者处理完,才能继续生产数据。同样的道理,如果消费者的处理能力大于生产者,那么消费者就必须等待生产者。为了解决这个问题于是引入了生产者和消费者模式。

什么是生产者消费者模式

生产者消费者模式是通过一个容器来解决生产者和消费者的强耦合问题。生产者和消费者彼此之间不直接通讯,而通过阻塞队列来进行通讯,所以生产者生产完数据之后不用等待消费者处理,直接扔给阻塞队列,消费者不找生产者要数据,而是直接从阻塞队列里取,阻塞队列就相当于一个缓冲区,平衡了生产者和消费者的处理能力。

 

下面来学习一个最基本的生产者消费者模型的例子

技术分享
 1 import threading
 2 import queue
 3 
 4 def producer():
 5     for i in range(10):
 6         q.put("骨头 %s" % i )
 7 
 8     print("开始等待所有的骨头被取走...")
 9     q.join()
10     print("所有的骨头被取完了...")
11 
12 
13 def consumer(n):
14 
15     while q.qsize() >0:
16 
17         print("%s 取到" %n  , q.get())
18         q.task_done() #告知这个任务执行完了
19 
20 
21 q = queue.Queue()
22 
23 
24 
25 p = threading.Thread(target=producer,)
26 p.start()
27 
28 c1 = consumer("陈荣华")
技术分享

 

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import time,random
import queue,threading
= queue.Queue()
def Producer(name):
  count = 0
  while count <20:
    time.sleep(random.randrange(3))
    q.put(count)
    print(‘Producer %s has produced %s baozi..‘ %(name, count))
    count +=1
def Consumer(name):
  count = 0
  while count <20:
    time.sleep(random.randrange(4))
    if not q.empty():
        data = q.get()
        print(data)
        print(‘\033[32;1mConsumer %s has eat %s baozi...\033[0m‘ %(name, data))
    else:
        print("-----no baozi anymore----")
    count +=1
p1 = threading.Thread(target=Producer, args=(‘A‘,))
c1 = threading.Thread(target=Consumer, args=(‘B‘,))
p1.start()
c1.start()

  

 

python机器学习笔记(代码片段)

Python机器学习笔记一机器学习概述1.1人工智能概述1.1.1机器学习与人工智能、深度学习关系机器学习和人工智能、深度学习的关系机器学习是人工智能的一个实现途径深度学习是机器学习的一个方法发展而来达特茅斯会议(Da... 查看详情

python机器学习笔记(代码片段)

Python机器学习笔记一机器学习概述1.1人工智能概述1.1.1机器学习与人工智能、深度学习关系机器学习和人工智能、深度学习的关系机器学习是人工智能的一个实现途径深度学习是机器学习的一个方法发展而来达特茅斯会议(Da... 查看详情

python学习笔记(代码片段)

...回顾时使用.1.类天下语言是一家,你抄我完,我抄他.没错,python的没啥特殊的,先来个简单的例子:classStudent(object):def__init__(self,name,score):self.name=nameself.score=scoredefg 查看详情

《南溪的python灵隐笔记》——tqdm的学习笔记(代码片段)

1如何使tqdm的输出信息呈现默认颜色tqdm(range(60),file=sys.stdout)#可以通过修改file的输出流来实现这个功能 查看详情

python学习笔记(代码片段)

python学习笔记1.关于#!/usr/bin/envpython这只是告诉操作系统这个脚本的解释、执行程序的具体位置,加了这个头,在”X”类操作系统中就直接可以这样执行脚本了“hello.py”,否则的话就得这样执行“pythonhello.py”.区别:#... 查看详情

python学习笔记之类(代码片段)

学习思维导图:Python类示例:#_*_coding:UTF-8_*_#开发者:zhuozhiwengang#开发时间:2022/5/816:41#文件名称:ClassObjectPython#开发工具:PyCharm#类定义classPerson:"""创建Person类"""#类实例(对象) 查看详情

python学习笔记(代码片段)

文章目录1.基础2.类型2.1.有序集合list和tuple2.2.dict和set3.函数3.1.闭包3.2.匿名函数3.3.装饰器3.4.偏函数4.生成器generator6.其他:6.1.map/reduce/filter/sort6.2.模块供个人学习笔记回顾时使用.1.基础print()输出input()输入r'xxx'xxx字符不需要... 查看详情

python学习笔记(代码片段)

文章目录1.异常2.单测3.读写文件3.1读文件3.2写文件3.3.操作文件和目录4.正则表达式4.1贪婪匹配5.其它5.1.StringIOBytesIO5.2.序列化5.2.1pickle5.2.2.JSON供个人学习笔记回顾时使用.1.异常没啥说的,语法换下而已.下文只会记录一下和JS语法不... 查看详情

python学习笔记(代码片段)

Python学习笔记一、数值1、内置的数值运算操作符2、内置的数值运算函数3、math库①、math库中常用的数学常数②、math库中的部分数值函数③、math库中的部分幂对数函数与三角函数二、字符串1、基本字符串运算符2、内置的字符串... 查看详情

python学习笔记(代码片段)

最近在学习python,使用过程中有个List和dict的声明比较特殊,在此备注下字典是可变对象,初始化一定不能放在for循环前面List声明必须放在程序开头(按道理List也是可变对象,但不知道为什么一定要在刚开始时就声明)代码中... 查看详情

immunitydebugger学习笔记(代码片段)

...:最下方的PyCommands窗格既可以执行调试命令也可以执行python脚步文件。1、PyCommands学习在Immunity中执行Python的方法即使用PyCommands。PyCommands就是一个个python脚本文件,存放在Immunity安装目录的PyCommands文件夹里。每个python脚本都执... 查看详情

python随记学习笔记(代码片段)

--------不断更新中----------⭐️本笔记仅自用,如要学习详细知识,请移步其他文章⭐️内置函数1.filter过滤函数过滤列表元素,第一个参数类型为函数类型lambda表达式可用于指定过滤列表元素的条件。filter(lambdax:x%3==0... 查看详情

《流畅的python》学习笔记及书评(代码片段)

《流畅的python》学习笔记文章目录《流畅的python》学习笔记写在前面1.Python数据模型1.1特殊方法2.序列构成的数组2.1内置序列类型2.2列表推导和生成器表达式2.3元组不仅仅是不可变的列表2.4切片2.5增量赋值2.6排序2.7数组、内存视... 查看详情

《流畅的python》学习笔记及书评(代码片段)

《流畅的python》学习笔记文章目录《流畅的python》学习笔记写在前面1.Python数据模型1.1特殊方法2.序列构成的数组2.1内置序列类型2.2列表推导和生成器表达式2.3元组不仅仅是不可变的列表2.4切片2.5增量赋值2.6排序2.7数组、内存视... 查看详情

《python数据科学手册》学习笔记及书评(代码片段)

《Python数据科学手册》学习笔记文章目录《Python数据科学手册》学习笔记写在前面1.食谱数据库数据找不到的问题2.Seaborn马拉松可视化里时分秒转化为秒数的问题3.scikit-learn使用fetch_mldata无法下载MNIST数据集的问题4.GridSearchCV.grid_s... 查看详情

python学习笔记——if语句(代码片段)

一1个简单的示例cars=['bwn','audi','subaru','toyota']forcarincars:ifcar=='bwn':print(car.upper())else:print(car.title())输出结果为:BWNAudiSubaruToyota二条件测 查看详情

python学习笔记-logging模块(日志)(代码片段)

模块级函数logging.getLogger([name]):返回一个logger对象,如果没有指定名字将返回rootloggerlogging.debug()、logging.info()、logging.warning()、logging.error()、logging.critical():设定rootlogger的日志级别logging.basicConfig():用默认Formatter 查看详情

python学习笔记(代码片段)

文章目录1.进程1.1.系统原生OS模块,创建进程1.2.multiprocessing模块1.2.1.Process单进程1.2.2.Pool进程池1.3.subprocess模块使用外部子进程2.线程2.1.threading模块,创建线程2.2.锁LOCK2.3.ThreadLocal模块,跨函数使用值1.进程线程是最小的执行单元,... 查看详情