[Python] Debug python 連結 C 函式庫造成的 crash 問題
專案的程式是用 python 開發,不過也同時會在 python 中呼叫到 C 的函式庫,
偶爾遇到 crash 的狀況時,要如何查問題呢?
下面用一些小程式來舉例吧~
假設現在有一個很小的函式庫 libtest.so,只提供了一個 init() 的函式,
下面是 test.h 的內容:
#ifndef _TEST_H_ #define _TEST_H_ extern "C" { int init(); } #endif
相對應的 test.cpp 內容如下,init() 會呼叫 _init_impl(),
而 _init_impl() 在印出 “init” 這個字串後,就應該 crash 在 strcpy() 那邊了:
#include "test.h" #include <stdio.h> #include <string.h> int _init_impl() { printf("init\n"); // Make it crash char* p = NULL; strcpy(p, "lksdjflksdjflskjdflskdjflsdkjflskdjflskdjflskdjflskdjflsdkfjlsdjf"); printf("init end\n"); return 0; } int init() { return _init_impl(); }
編譯成函式庫後,產生一個 libtest.so 檔案~
相關的 g++ 選項可以參考 g++(1) – Linux man page:
-g3: contains more debug information
-O3: optimize more
-rdynamic: add all symbols, not only used ones
-shared: produce a shared object which can then be linked with other objects to form an executable
-fPIC: emit position-independent code
g++ -g3 -ggdb -O3 -rdynamic -shared -fPIC test.cpp -o libtest.so
現在我們要寫一個 python 程式來呼叫這函式庫中的 init() 函式:
import ctypes def call_init(): dll.init() dll = ctypes.CDLL("libtest.so") call_init()
執行一下,立刻就會 crash 了,
並且產生了一個 core dump 檔案 (本例中是 core.11601)~
如果沒有產生 core dump 的話,可能是選項沒有打開,可以參考這裡設定:
testuser@localhost ~ $ LD_LIBRARY_PATH=. python loadtest.py init Segmentation fault (core dumped) testuser@localhost ~ $ ll core* -rw-------. 1 testuser testuser 2953216 Jan 27 00:46 core.11601
終於進入正題了 (真久啊…),要怎麼從這個 core dump 中找出問題呢?
參考資料:DebuggingWithGdb、Python Developer’s Guide: gdb support
1. 使用 gdb 開啟 core dump 檔案
第一步當然是用 gdb 把 core dump 打開來瞧瞧…
testuser@localhost ~ $ gdb $(which python) core.11601 GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6) ...... Core was generated by `python loadtest.py'. Program terminated with signal 11, Segmentation fault. #0 0x00007fcde1b5e68e in _init_impl () at test.cpp:11 11 strcpy(p, "lksdjflksdjflskjdflskdjflsdkjflskdjflskdjflskdjflskdjflsdkfjlsdjf"); Missing separate debuginfos, use: debuginfo-install python-2.6.6-52.el6.x86_64
gdb 很聰明地指出是 test.cpp 的 strcpy() 那行導致了 crash,
所以結案了…. 等等,這當然是最美好的狀況,
如果 stack 稍微被搞亂的話,很可能就沒辦法指出這麼清楚的地方了~
另外,我們也想知道是哪行 python 程式出錯的…
畢竟如果程式很複雜的話,有時單靠 C 程式那邊的線索還不太夠…
2. 安裝 python debug symbol
gdb 其實已經把 python debug symbol 如何安裝,
都寫在 Missing separate debuginfos 那一行了,直接執行吧:
sudo debuginfo-install python-2.6.6-52.el6.x86_64
裝完之後,再 gdb 一次,這次可能會有其他缺少的 debug symbol,
看缺什麼就裝什麼吧~
不過有時候會有些就是一直找不到的,就先暫時不管了,例如:
Could not find debuginfo for main pkg: libffi-3.0.5-3.2.el6.x86_64
3. 使用 gdb 的 python 相關指令作 debug
當 debug symbol 裝好時,gdb 進去執行 bt,
應該可以看到更清楚的 call stack,例如:
(gdb) bt #0 0x00007ff0602b368e in _init_impl () at test.cpp:11 #1 0x00007ff060703dac in ffi_call_unix64 () from /usr/lib64/libffi.so.5 #2 0x00007ff060703b34 in ffi_call () from /usr/lib64/libffi.so.5 #3 0x00007ff060917074 in _call_function_pointer (pProc=0x7ff0602b3720 <init()>, argtuple=<unknown at remote 0x7fff91bf0ed0>, flags=4353, argtypes=<value optimized out>, restype=<_ctypes.SimpleType at remote 0xb08be0>, checker=0x0) at /usr/src/debug/Python-2.6.6/Modules/_ctypes/callproc.c:816 #4 _CallProc (pProc=0x7ff0602b3720 <init()>, argtuple=<unknown at remote 0x7fff91bf0ed0>, flags=4353, argtypes=<value optimized out>, restype= <_ctypes.SimpleType at remote 0xb08be0>, checker=0x0) at /usr/src/debug/Python-2.6.6/Modules/_ctypes/callproc.c:1163 #5 0x00007ff0609103a2 in CFuncPtr_call (self=<value optimized out>, inargs=<value optimized out>, kwds=0x0) at /usr/src/debug/Python-2.6.6/Modules/_ctypes/_ctypes.c:3860 #6 0x000000366c443c63 in PyObject_Call (func=<_FuncPtr(__name__='init') at remote 0x7ff060bfeae0>, arg=<value optimized out>, kw=<value optimized out>) at Objects/abstract.c:2492 #7 0x000000366c4d4f74 in do_call (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4012 #8 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3817 #9 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2453 #10 0x000000366c4d7657 in PyEval_EvalCodeEx (co=0x7ff060b35f30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3044 #11 0x000000366c4d7732 in PyEval_EvalCode (co=<value optimized out>, globals=<value optimized out>, locals=<value optimized out>) at Python/ceval.c:545 #12 0x000000366c4f1bac in run_mod (mod=<value optimized out>, filename=<value optimized out>, globals= {'__builtins__': <module at remote 0x7ff060bee868>, '__file__': 'loadtest.py', '__package__': None, 'ctypes': <module at remote 0x7ff060b50328>, '__name__': '__main__', 'dll': <CDLL(_FuncPtr=<_ctypes.CFuncPtrType at remote 0xafe060>, init=<_FuncPtr(__name__='init') at remote 0x7ff060bfeae0>, _handle=11452160, _name='libtest.so') at remote 0x7ff060b47dd0>, '__doc__': None}, locals= {'__builtins__': <module at remote 0x7ff060bee868>, '__file__': 'loadtest.py', '__package__': None, 'ctypes': <module at remote 0x7ff060b50328>, '__name__': '__main__', 'dll': <CDLL(_FuncPtr=<_ctypes.CFuncPtrType at remote 0xafe060>, init=<_FuncPtr(__name__='init') at remote 0x7ff060bfeae0>, _handle=11452160, _name='libtest.so') at remote 0x7ff060b47dd0>, '__doc__': None}, flags=<value optimized out>, arena=<value optimized out>) at Python/pythonrun.c:1358 #13 0x000000366c4f1c80 in PyRun_FileExFlags (fp=0xae7340, filename=0x7fff91bf266a "loadtest.py", start=<value optimized out>, globals= {'__builtins__': <module at remote 0x7ff060bee868>, '__file__': 'loadtest.py', '__package__': None, 'ctypes': <module at remote 0x7ff060b50328>, '__name__': '__main__', 'dll': <CDLL(_FuncPtr=<_ctypes.CFuncPtrType at remote 0xafe060>, init=<_FuncPtr(__name__='init') at remote 0x7ff060bfeae0>, _handle=11452160, _name='libtest.so') at remote 0x7ff060b47dd0>, '__doc__': None}, locals= {'__builtins__': <module at remote 0x7ff060bee868>, '__file__': 'loadtest.py', '__package__': None, 'ctypes': <module at remote 0x7ff060b50328>, '__name__': '__main__', 'dll': <CDLL(_FuncPtr=<_ctypes.CFuncPtrType at remote 0xafe060>, init=<_FuncPtr(__name__='init') at remote 0x7ff060bfeae0>, _handle=11452160, _name='libtest.so') at remote 0x7ff060b47dd0>, '__doc__': None}, closeit=1, flags=0x7fff91bf1530) at Python/pythonrun.c:1344 #14 0x000000366c4f316c in PyRun_SimpleFileExFlags (fp=0xae7340, filename=0x7fff91bf266a "loadtest.py", closeit=1, flags=0x7fff91bf1530) at Python/pythonrun.c:948 #15 0x000000366c4ff8a2 in Py_Main (argc=<value optimized out>, argv=<value optimized out>) at Modules/main.c:618 #16 0x000000365fc1ed5d in __libc_start_main (main=0x400710 <main>, argc=2, ubp_av=0x7fff91bf1658, init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fff91bf1648) at libc-start.c:226 #17 0x0000000000400649 in _start ()
鍵入 py 後按下 <TAB> 鍵,可以看到跟 python 相關的指令:
(gdb) py py-bt py-down py-list py-locals py-print py-up python
如果想看相對應的 python call stack,可以執行 py-bt ~
以下例來說,可以看到 loadtest.py 裡面執行了 call_init() 這個函式,
因此就可以從這邊開始,搭配 bt 看到的 C call stack 來找問題囉~
(gdb) py-bt #9 file 'loadtest.py', in 'call_init' #12 file 'loadtest.py', in '<module>'