[Python] 用 struct 寫出 binary 二進位檔案

2016-04-01 ephrain Comments 0 Comment

今天在寫一個 C++ 的 unit-testing 程式時，需要 pcap 的資料，

平常我們都是用一個叫 dpkt 的 python 模組來處理 pcap，

C++ 倒是不曉得要怎麼做，而且我也沒打算查，

畢竟我只需要把 pcap 裡面的 packet raw data 拿出來就夠了…

比較簡單的作法是寫一個 python 小程式，讓它用 dpkt 讀取 pcap 後，

再寫出一個讓 C++ 程式容易解析的檔案就可以了～

因為需要的資料有 packet 的時間、資料長度、和資料本身，

因此我想把結構定成 (4-byte timestamp), (4-byte data length), (data)，

這樣在 C++ 程式那邊就只要先讀取 8 byte 後，得到時間和長度，

就知道接下來要讀多少資料進來～

但在 python 端，要如何將資料寫成像上面定義的二進位結構呢？

查了一下，原來是用我蠻不熟的 struct 模組～

用 struct.pack() 就可以將 python 的資料，用指定的格式轉成二進位資料～

下面是個例子：

struct.pack("=ii%ds" % (buf_len), int(ts), buf_len, buf)

上面的例子中，struct.pack() 的第一個參數是一個 format string，

“=” 代表的是 native byte order, standard size，

這樣的好處是整數 integer 的大小是標準的 (standard) 4 bytes，

不會因為作業系統平台不同、或是因 x86/x64 而有所改變～

接下來的兩個 i 代表的就是兩個 python 中的整數，我們用的值是 int(ts) 和 buf_len。

format string 中最後的部分事實上是代換成像 3s 之類的字串，依 buf_len 而定，

像 buf_len 是 5 的話，就會被代換成 5s，

這樣就可以把 buf 這個 python 變數當成長度為 5 的字串～

另外要注意這邊的 5s 的 5 是不能省略的，不然就會當成長度為 1 的字串了…

注意 struct.pack() 在轉字串時，是不會將最後面的 NULL terminator 加進去的～

可以看這個例子：

>>> import struct
>>> buf = "abcde"
>>> buf_len = len(buf)
>>> ts = 0x01020304
>>> struct.pack("=ii%ds" % (buf_len), int(ts), buf_len, buf)
'\x04\x03\x02\x01\x05\x00\x00\x00abcde'

綜合以上寫成的轉換程式如下，

這個程式只要輸入一個 pcap 檔，就可以轉出一個給 C++ 程式解析用的檔案：

#!/usr/bin/env python
import dpkt
import struct
import sys
with open(sys.argv[1], "rb") as pcap_in:
try:
pcap = dpkt.pcap.Reader(pcap_in)
with open(sys.argv[1] + "_forUT", "wb") as pcap_out:
for ts, buf in pcap:
buf_len = len(buf)
pcap_out.write(struct.pack("=ii%ds" % (buf_len), int(ts), buf_len, buf))
except BaseException as e:
print "Exception", str(e)

而在 C++ 的 Unit-testing (Google Test) 程式那邊，就可以用下面的程式來讀入這個檔案：

pf = fopen(pcap_forUT, "rb");
while (true)
{
int nTimestamp = 0, nBufLen = 0;
// Ensure the integer size is same as python struct
ASSERT_EQ(4, sizeof(nTimestamp));
ASSERT_EQ(4, sizeof(nBufLen));
// Read integer timestamp
int nRead = fread(&nTimestamp, 1, sizeof(nTimestamp), pf);
if (nRead == 0)
{
break;
}
ASSERT_EQ(sizeof(nTimestamp), nRead);
// Read integer data len
ASSERT_EQ(sizeof(nBufLen), fread(&nBufLen, 1, sizeof(nBufLen), pf));
// Allocate buffer for the data
char* pBuffer = reinterpret_cast<char*>(malloc(nBufLen));
ASSERT_TRUE(pBuffer != NULL);
ASSERT_EQ(nBufLen, fread(pBuffer, 1, nBufLen, pf));
// Do some operations...
// Free buffer
free(pBuffer);
}
fclose(pf);

參考資料：stackoverflow: Python how to write to a binary file?

(本頁面已被瀏覽過 2,181 次)

EPH 的程式日記

記錄程式設計生活的點點滴滴

[Python] 用 struct 寫出 binary 二進位檔案

2016-04-01 ephrain Comments 0 Comment

發佈留言取消回覆

發佈留言 取消回覆

發佈留言取消回覆