[Python] Python GIL 的問題

2014-12-05 ephrain Comments 0 Comment

最近因為要在讀書會報告的關係，稍微看了一下 Python GIL 的相關資料～

有興趣的人可以先參考下列的資料：

– python wiki: GlobalInterpretLock

– Inside the python GIL (pdf)、video

– Understanding the python GIL (pdf)、video

– Interactive GIL Visualization

簡單整理一下：

GIL 是什麼？

GIL 是 Global Interpreter Lock 的縮寫。

它是 CPython 實作上出現的一個 lock，在執行 python bytecode 時，

皆需拿到 GIL 之後才能執行～但在在做 I/O 時會放掉～

同時，每根 thread 在執行預設 100 個 tick 之後，也會放掉 GIL 再重新取得，

好讓其他 thread 有機會拿到～

為什麼會有 GIL 的出現？

因為 CPython 在實作時，有些內部的資料結構不是 thread-safe 的，

因此就用一個 global lock 來保護。

GIL 會影響其他的 python interpreter 嗎？

GIL 只存在於 CPython，其他如 PyPy、IronPython、Jython 是沒有這個問題的。

GIL 造成了什麼問題？

這個有點難回答，因為它造成了好幾個問題 😛

1. 首先最明顯的，是 multi-threaded 的程式無法充分利用到多個 CPU 的好處。

原本預期的是每個 CPU 在同一時間執行不同的 thread，好讓整體的執行時間縮短，

但因為每根 thread 都要拿到 GIL 才能執行 python bytecode，

因此事實上同時只會有一個 thread 在執行 python bytecode，

多個 CPU 完全沒有用處…

2. 即使在單一 CPU 執行 single-threaded 的程式，效率也受到影響。

這是因為每 100 個 tick 後就會放掉 GIL 再重新取得，

雖然沒有其他的 thread 會來搶，但是放掉/取得 GIL 的動作都是多餘的負擔。

(這個狀況在多 CPU 上執行 single-threaded 程式也是類似)

3. 在單一 CPU 執行 multi-threaded 程式，thread 越多效率越差。

這是因為 GIL 還被某個 thread 咬住的時候，OS 可能會把另一根 thread 叫起來執行，

但這 thread 因為沒有 GIL 所以只能等，

而已咬住 GIL 的 thread 也得等到 OS 挑中它時才能繼續執行。

因此 thread 數目越多時，等待 GIL 與等待被 OS 挑中執行的時間會越拉長。

4. 在多 CPU 上執行 mult-threaded 程式時，問題更加嚴重。

每顆 CPU 可能都挑了一個 thread 來執行，但因為 GIL 無法拿到而只能等，浪費了 CPU 資源。

(這浪費掉的 CPU 資源原本可能可以給其他程式使用)

就算咬住 GIL 的 thread 在 100 tick 時放掉了 GIL，也會有多個 thread 同時來搶

(真的是同時，因為有好幾個 CPU 同時在執行搶 GIL)，

但因為放掉 GIL 時會發出 signal 給其他 thread，其他 thread 醒來需要時間，

因此很有可能原本咬住 GIL 的 thread 就又搶到 GIL 了，

導致其他 thread 的 response time 拉長～

5. 因為 signal 只能被主 thread 處理，

因此如果主 thread 一直拿不到 GIL 時，signal 的處理會延遲…

這就是為什麼 multi-threaded python 程式按 Ctrl-C 時，程式常常會停不下來的原因。

自己寫了一個小小的測試程式，來比較看看 GIL 的影響～

import os
import subprocess
import sys
import thread
import threading
import time
cpu_loop_count = 200000000
io_loop_count  = 10000000
num_cpu_thread = int(sys.argv[1])
num_io_thread  = int(sys.argv[2])
def do_cpu():
sum = 0
for i in xrange(cpu_loop_count/num_cpu_thread):
sum += 1
def do_io():
fname = "tmpgil_%s" % (thread.get_ident())
with open(fname, "w") as f:
for i in xrange(io_loop_count/num_io_thread):
f.write("a")
time_begin = time.time()
t_list = []
for i in xrange(num_cpu_thread):
t = threading.Thread(target=do_cpu)
t.start()
t_list.append(t)
for i in xrange(num_io_thread):
t = threading.Thread(target=do_io)
t.start()
t_list.append(t)
for t in t_list:
t.join()
time_end = time.time()
print("%d CPU threads and %d I/O threads cost %s secs with %s CPU cores." % (num_cpu_thread,
num_io_thread,
time_end-time_begin,
subprocess.check_output("sysctl hw.activecpu | awk '{print $2}'", shell=True).strip()))

這個測試程式是在 Mac 上執行的，用 sysctl 來取得目前可用的 CPU 數量，

至於設定 CPU 數量的部分可以參考 [Mac] 在 Mac 上限制可用的 CPU Core 數目這篇～

執行後的結果如下：

1 CPU threads and 0 I/O threads cost 10.7932708263 secs with 1 CPU cores.
2 CPU threads and 0 I/O threads cost 10.2277369499 secs with 1 CPU cores.
4 CPU threads and 0 I/O threads cost 11.7571058273 secs with 1 CPU cores.
8 CPU threads and 0 I/O threads cost 9.43700098991 secs with 1 CPU cores.
1 CPU threads and 0 I/O threads cost 10.1425180435 secs with 2 CPU cores.
2 CPU threads and 0 I/O threads cost 12.5565118790 secs with 2 CPU cores.
4 CPU threads and 0 I/O threads cost 14.1571121216 secs with 2 CPU cores.
8 CPU threads and 0 I/O threads cost 15.8341510296 secs with 2 CPU cores.
1 CPU threads and 0 I/O threads cost 9.59016990662 secs with 4 CPU cores.
2 CPU threads and 0 I/O threads cost 11.9481139183 secs with 4 CPU cores.
4 CPU threads and 0 I/O threads cost 17.4996449947 secs with 4 CPU cores.
8 CPU threads and 0 I/O threads cost 17.5706129074 secs with 4 CPU cores.
0 CPU threads and 1 I/O threads cost 4.47482180595 secs with 1 CPU cores.
0 CPU threads and 2 I/O threads cost 8.97815394402 secs with 1 CPU cores.
0 CPU threads and 4 I/O threads cost 7.83446407318 secs with 1 CPU cores.
0 CPU threads and 8 I/O threads cost 6.58127403259 secs with 1 CPU cores.
0 CPU threads and 1 I/O threads cost 4.86090803146 secs with 2 CPU cores.
0 CPU threads and 2 I/O threads cost 6.73209619522 secs with 2 CPU cores.
0 CPU threads and 4 I/O threads cost 8.55393910408 secs with 2 CPU cores.
0 CPU threads and 8 I/O threads cost 8.81372404099 secs with 2 CPU cores.
0 CPU threads and 1 I/O threads cost 3.53081011772 secs with 4 CPU cores.
0 CPU threads and 2 I/O threads cost 15.7843191624 secs with 4 CPU cores.
0 CPU threads and 4 I/O threads cost 46.6944420338 secs with 4 CPU cores.
0 CPU threads and 8 I/O threads cost 40.7415630817 secs with 4 CPU cores.

基本上在單一 CPU 狀況下，CPU-bound thread 的數量影響不大～

多 CPU 狀況下，CPU-bound thread 越多就有越慢的趨勢～

至於 I/O-bound thread 似乎是在 4-CPU 以上問題比較嚴重…

不過這些測試當然跟測試資料、方法和系統都有關係，

只是可以看出一些 GIL 對程式的影響～

目前暫解 GIL 的方法，就是產生新的 process，

減少一個 process 中 thread 的數目囉～

(本頁面已被瀏覽過 906 次)

EPH 的程式日記

記錄程式設計生活的點點滴滴

[Python] Python GIL 的問題

2014-12-05 ephrain Comments 0 Comment

發佈留言取消回覆

發佈留言 取消回覆

發佈留言取消回覆