[Linux] 用 tcpdump 觀察 HTTP 流量中的 User-Agent

2016-08-10 ephrain Comments 0 Comment

今天同事說專案裡的測試程式出了點問題，看了一下感覺蠻怪的，

用 pycurl 去連巴哈姆特 http://www.gamer.com.tw 這個網址時，網站會回傳 HTTP 403 Forbidden，

但如果直接用瀏覽器或是 curl 去連，卻是正常的 HTTP 200 OK…

一時也看不出是什麼問題，決定用 tcpdump 看一下我們送給網站的資料，

看看是不是能和正常的 curl 流量比較看看～

先用 nslookup 查一下，www.gamer.com.tw 有兩個 IP，

分別是 104.20.25.138 和 104.20.24.138：

testuser@localhost ~ # nslookup www.gamer.com.tw
Server:		8.8.4.4
Address:	8.8.4.4#53
Non-authoritative answer:
www.gamer.com.tw	canonical name = web115.gamer.com.tw.
Name:	web115.gamer.com.tw
Address: 104.20.25.138
Name:	web115.gamer.com.tw
Address: 104.20.24.138

接著用 tcpdump 來抓一下我們連到 www.gamer.com.tw 的封包，

– 用 -vvv 顯示更多封包的細節

– -s 0 確保所有大小的封包都會抓下來

– -A 顯示封包中的 ASCII 文字 (方便看 HTTP 的內容)

– host 選項中則指定說 104.20.24.138 或 104.20.25.138 這兩個 IP 都可以：

tcpdump -vvv -i eth0 -s 0 -A host 104.20.24.138 or 104.20.25.138

接著先執行一下有問題的 pycurl 測試程式，

前面三個封包就是所謂的 TCP three-way handshake，

第四個封包是 HTTP 的第一個封包，可以看到它包含了 HTTP 的標頭部分，

送出的標頭資訊並不多，但可以看到 User-Agent 是設定成

PycURL/7.19.3.1 libcurl/7.42.1 OpenSSL/1.0.1e zlib/1.2.7 c-ares/1.10.0 libidn/1.28 libssh2/1.4.3：

22:31:22.208987 IP (tos 0x0, ttl 64, id 61003, offset 0, flags [DF], proto TCP (6), length 60)
172.22.2.18.58937 > 104.20.25.138.http: Flags [S], cksum 0x2ff5 (incorrect -> 0x9a85), seq 3947994309, win 29200, options [mss 1460,sackOK,TS val 712274574 ecr 0,nop,wscale 7], length 0
E..<.K@.@.......h....9.P.Q........r./..........
*tr.........
22:31:22.210994 IP (tos 0x0, ttl 56, id 0, offset 0, flags [DF], proto TCP (6), length 52)
104.20.25.138.http > 172.22.2.18.58937: Flags [S.], cksum 0x8bea (correct), seq 1185844204, ack 3947994310, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 10], length 0
E..4..@.8...h........P.9F....Q....r................
22:31:22.211053 IP (tos 0x0, ttl 64, id 61004, offset 0, flags [DF], proto TCP (6), length 40)
172.22.2.18.58937 > 104.20.25.138.http: Flags [.], cksum 0x2fe1 (incorrect -> 0x3deb), seq 1, ack 1, win 229, length 0
E..(.L@.@.......h....9.P.Q..F...P.../...
22:31:22.211269 IP (tos 0x0, ttl 64, id 61005, offset 0, flags [DF], proto TCP (6), length 229)
172.22.2.18.58937 > 104.20.25.138.http: Flags [P.], cksum 0x309e (incorrect -> 0x90cd), seq 1:190, ack 1, win 229, length 189
E....M@.@.......h....9.P.Q..F...P...0...GET / HTTP/1.1
Host: www.gamer.com.tw
User-Agent: PycURL/7.19.3.1 libcurl/7.42.1 OpenSSL/1.0.1e zlib/1.2.7 c-ares/1.10.0 libidn/1.28 libssh2/1.4.3
Accept: */*
Connection: Keep-Alive

接著執行正常的 curl 去連網站…

這邊去掉了 TCP three-way handshake 的封包，單看 HTTP 標頭封包的話，

可以注意到和 pycurl 那邊幾乎一樣，只差在 User-Agent 是 curl/7.42.1：

21:43:47.619697 IP (tos 0x0, ttl 64, id 36467, offset 0, flags [DF], proto TCP (6), length 120)
172.22.2.18.55991 > 104.20.24.138.http: Flags [P.], cksum 0x2f31 (incorrect -> 0x4807), seq 1:81, ack 1, win 229, length 80
E..x.s@.@.}F....h......Pa.m.
...P.../1..GET / HTTP/1.1
Host: www.gamer.com.tw
User-Agent: curl/7.42.1
Accept: */*

嗯… 如果 HTTP 標頭幾乎都一樣，但卻有 403 Forbidden 和 200 OK 兩種截然不同結果的話，

唯一的嫌疑犯就是不同的 User-Agent 了～

試著執行 curl -A “PycURL”，將 User-Agent 指定成 PycURL 去連一下網站…

賓果！這下 curl 也拿到 403 Forbidden 的結果了，

看來是巴哈姆特有擋下 PycURL 這個特殊的 User-Agent 字串，

猜測是有人用 pycurl 做自動化的機器人去連巴哈姆特，所以後來這個字串就直接被網站擋掉了～

其實後來從網站回傳的 HTTP body 中，

也有看到 banned access based on browser’s signature 的字眼，

可以確認是因為 User-Agent 的部分被擋了：

<div class="cf-error-details cf-error-1010">
<h1>Access denied</h1>
<p>The owner of this website (www.gamer.com.tw) has banned your access based on your browser's signature (2d04442861b4463e-ua47).</p>
<ul class="cferror_details">
<li>Ray ID: 2d04442861b4463e</li>
<li>Timestamp: 2016-08-10 14:51:29 UTC</li>
<li>Your IP address: 11.22.33.44</li>
<li class="XXX_no_wrap_overflow_hidden">Requested URL: www.gamer.com.tw/ </li>
<li>Error reference number: 1010</li>
<li>Server ID: FL_80F43</li>
<li>User-Agent: PycURL</li>
</ul>
</div>

這次的功臣是 tcpdump，也很高興有多學到一點 tcpdump 的用法囉～

參考資料：A tcpdump Tutorial and Primer with Examples

(本頁面已被瀏覽過 658 次)

EPH 的程式日記

記錄程式設計生活的點點滴滴

[Linux] 用 tcpdump 觀察 HTTP 流量中的 User-Agent

2016-08-10 ephrain Comments 0 Comment

發佈留言取消回覆

發佈留言 取消回覆

發佈留言取消回覆