HTTP/1.1
Http1.1最初定义在Hypertext Transfer Protocol -- HTTP/1.1中,后面被Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing废除,因此当前HTTP1.1协议实际上包括:
- RFC7230:Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing
- RFC7231:Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
- RFC7232:Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests
- RFC7233:Hypertext Transfer Protocol (HTTP/1.1): Range Requests
- RFC7234:Hypertext Transfer Protocol (HTTP/1.1): Caching
- RFC7235:Hypertext Transfer Protocol (HTTP/1.1): Authentication
概要
GET /hello.txt HTTP/1.1 User-Agent: curl/7.16.3 libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3 Host: www.example.com Accept-Language: en, mi
HTTP/1.1 200 OK Date: Mon, 27 Jul 2009 12:28:53 GMT Server: Apache Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT ETag: "34aa387-d-1568eb00" Accept-Ranges: bytes Content-Length: 51 Vary: Accept-Encoding Content-Type: text/plain Hello World! My payload includes a trailing CRLF.
HTTP消息格式
HTTP-message = start-line
*( header-field CRLF )
CRLF
[ message-body ]
程序首先读取start-line以及header,然后根据header里面的内容决定是否含有body。body则按照长度读取定长字节。
start-line
start-line = request-line / status-line
request-line = method SP request-target SP HTTP-version CRLF
status-line = HTTP-version SP status-code SP reason-phrase CRLF
其中:
- HTTP中并未直接定义request-line的长度限制,倘若超过服务端实现所支持的长度,则建议返回501(Not implemented)
- HTTP建议request-line支持至少8000字符长
- 若request-target超过服务器预期则必须返回414 (URI Too Long)
header-field
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-fold = CRLF 1*( SP / HTAB )
通常header-field是不应该重复,但有一些例外:
- 若该header有多个值,则可以定义为多个field-name: field-value或是按照逗号分隔值
- Set-Cookie通常会出现多次,但因无法合并到一起,是一个例外
field-value通常只包含US-ASCII 字符集。
对于header的长度HTTP并未直接限制,实际通常根据header的含义在实现上有对应的限制。对于超出预期的请求,规定必须返回4xx错误而不是忽略,以防止 HTTP请求夹带技术攻击。
http body
因为http的body并不是必须的,具体什么时候有body按照如下的规则:
- 在request中,通过Content-Length或者Transfer-Encoding定义;
- 在response中,通过请求的method以及响应的状态码确定。HEAD请求不包含body;对于CONNECT请求的2xx响应也没有body;1xx (Informational), 204 (No Content), 以及 304 (Not Modified)也均无body。所有其他response都具有body,即使其长度为0。
Transfer-Encoding
Transfer-Encoding用来表明body中的消息是如何编码的,例如
Transfer-Encoding: gzip, chunked
表示对body进行了gzip压缩并进行了分组。
transfer-coding = "chunked"
/ "compress"
/ "deflate"
/ "gzip"
/ transfer-extension
transfer-extension = token *( OWS ";" OWS transfer-parameter )
Content-Length
如果不存在Transfer-Encoding头,则可通过Content-Length表明消息的长度。注意两者不允许同时出现。
Content-Length: 3495
Chunked Transfer Coding
Chunked enables content streams of unknown size to be transferred as a sequence of length-delimited buffers, which enables the sender to retain connection persistence and the recipient to know when it has received the entire message.
chunked-body = *chunk
last-chunk
trailer-part
CRLF
chunk = chunk-size [ chunk-ext ] CRLF
chunk-data CRLF
chunk-size = 1*HEXDIG
last-chunk = 1*("0") [ chunk-ext ] CRLF
chunk-data = 1*OCTET ; a sequence of chunk-size octets
The chunk-size field is a string of hex digits indicating the size of the chunk-data in octets. The chunked transfer coding is complete when a chunk with a chunk-size of zero is received, possibly followed by a trailer, and finally terminated by an empty line.