http分段多线程下载 & 断点续传
HTTP 请求头 Range
请求资源的部分内容(不包括响应头的大小),单位是byte,即字节,从0开始.
如果服务器能够正常响应的话,服务器会返回 206 Partial Content 的状态码及说明.
如果不能处理这种Range的话,就会返回整个资源以及响应状态码为 200 OK .(这个要注意,要分段下载时,要先判断这个)
比如:类似下面的
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
➜ /tmp curl -H "Range: bytes=0-10" http://127.0.0.1:8180/bg-upper.png -v
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8180 (#0)
> GET /bg-upper.png HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 127.0.0.1:8180
> Accept: */*
> Range: bytes=0-10
>
< HTTP/1.1 206 Partial Content
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Accept-Ranges: bytes
< ETag: W/"3103-1435633968000"
< Last-Modified: Tue, 30 Jun 2015 03:12:48 GMT
< Content-Range: bytes 0-10/3103
< Content-Type: image/png
< Content-Length: 11
< Date: Tue, 29 Dec 2015 09:18:36 GMT
<
�PNG
* Connection #0 to host 127.0.0.1 left intact
响应头就是 HTTP/1.1 206 Partial Content
Range 请求头格式
1
Range: bytes=start-end
例如:
1
2
3
Range: bytes=10- :第10个字节及最后个字节的数据
Range: bytes=40-100 :第40个字节到第100个字节之间的数据.
注意,这个表示[start,end],即是包含请求头的start及end字节的,所以,下一个请求,应该是上一个请求的[end+1, nextEnd]
响应头
Content-Range
1
Content-Range: bytes 0-10/3103
这个表示,服务器响应了前(0-10)个字节的数据,该资源一共有(3103)个字节大小。
Content-Type
1
Content-Type: image/png
表示这个资源的类型
Content-Length
1
Content-Length: 11
表示这次服务器响应了11个字节的数据(0-10)
Last-Modified
1
Last-Modified: Tue, 30 Jun 2015 03:12:48 GMT
表示资源最近修改的时间(分段下载时要注意这个东西,因为如果修改了,分段下载可能就要重新下载了)
ETag
1
2
ETag: W/"3103-1435633968000"
ETag: "35c8f02e-58fe6c003da43"
这个响应头表示资源版本的标识符,通常是消息摘要(类似MD5一样)(分段下载时要注意这个东西,或者缓存控制也要注意这个东西)
注意,每种服务器对生成ETag的算法不同,这个要特别注意 如果使用分布式缓存,要特别要保证每台服务器生成的ETag算法是一致的.
缓存的过期,要同时结合(ETag + Last-Modified)这两个响应头来判断.
强ETag
只要实体发生任何改变,都会改变ETag值.如:
ETag: "1234234234"
弱ETag
它在前面会有个 W/ ,如:
ETag: W/"12342423"
分段下载
利用这个特点,我们可以使用分段下载(多线程下载,分布式下载)
思想:先请求一个 HEAD 方法的请求,获取总文件大小:
HEAD 请求
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
➜ /tmp curl -X HEAD http://127.0.0.1:8180/bg-upper.png -v
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8180 (#0)
> HEAD /bg-upper.png HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 127.0.0.1:8180
> Accept: */*
>
< HTTP/1.1 200 OK
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Accept-Ranges: bytes
< ETag: W/"3103-1435633968000"
< Last-Modified: Tue, 30 Jun 2015 03:12:48 GMT
< Content-Type: image/png
< Content-Length: 3103
< Date: Tue, 29 Dec 2015 10:16:16 GMT
<
* transfer closed with 3103 bytes remaining to read
* Closing connection 0
curl: (18) transfer closed with 3103 bytes remaining to read
➜ /tmp
那个响应头的 Content-Length 就是总字节大小了(3103)字节.
多线程下载
假设分2条线程
线程1 下载
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
3103 / 2 = 1551
➜ /tmp curl -H "Range: bytes=0-1551" http://127.0.0.1:8180/bg-upper.png -v -o 0-1151.png
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to 127.0.0.1 (127.0.0.1) port 8180 (#0)
> GET /bg-upper.png HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 127.0.0.1:8180
> Accept: */*
> Range: bytes=0-1551
>
< HTTP/1.1 206 Partial Content
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Accept-Ranges: bytes
< ETag: W/"3103-1435633968000"
< Last-Modified: Tue, 30 Jun 2015 03:12:48 GMT
< Content-Range: bytes 0-1551/3103
< Content-Type: image/png
< Content-Length: 1552
< Date: Tue, 29 Dec 2015 10:19:43 GMT
<
{ [data not shown]
100 1552 100 1552 0 0 1376k 0 --:--:-- --:--:-- --:--:-- 1515k
* Connection #0 to host 127.0.0.1 left intact
➜ /tmp
这样子,线程1就下载了(0-1551)字节的数据了.
线程2 下载
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
➜ /tmp curl -H "Range: bytes=1552-3103" http://127.0.0.1:8180/bg-upper.png -v -o 1552-end.png
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to 127.0.0.1 (127.0.0.1) port 8180 (#0)
> GET /bg-upper.png HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 127.0.0.1:8180
> Accept: */*
> Range: bytes=1552
>
< HTTP/1.1 416 Requested Range Not Satisfiable
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Accept-Ranges: bytes
< Content-Range: bytes */3103
< Content-Type: text/html;charset=utf-8
< Content-Language: en
< Content-Length: 954
< Date: Tue, 29 Dec 2015 10:26:18 GMT
<
{ [data not shown]
100 954 100 954 0 0 457k 0 --:--:-- --:--:-- --:--:-- 931k
* Connection #0 to host 127.0.0.1 left intact
➜ /tmp
合并cat 0-1151.png 1552-end.png > filename.png
这样子就OK了.
HTTP 请求头注意
根据HTTP规范,HTTP的消息头部的字段名,是不区分大小写的.
1
2
3
4
5
6
7
8
9
3.2. Header Fields
Each header field consists of a case-insensitive field name followed
by a colon (“:”), optional leading whitespace, the field value, and
optional trailing whitespace.
RFC7230