HTTP/1.1

来自WHY42

Http1.1最初定义在Hypertext Transfer Protocol -- HTTP/1.1中,后面被Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing废除,因此当前HTTP1.1协议实际上包括:

  • RFC7230:Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing
  • RFC7231:Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
  • RFC7232:Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests
  • RFC7233:Hypertext Transfer Protocol (HTTP/1.1): Range Requests
  • RFC7234:Hypertext Transfer Protocol (HTTP/1.1): Caching
  • RFC7235:Hypertext Transfer Protocol (HTTP/1.1): Authentication

概要

GET /hello.txt HTTP/1.1
User-Agent: curl/7.16.3 libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3
Host: www.example.com
Accept-Language: en, mi
HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT
Server: Apache
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
ETag: "34aa387-d-1568eb00"
Accept-Ranges: bytes
Content-Length: 51
Vary: Accept-Encoding
Content-Type: text/plain

Hello World! My payload includes a trailing CRLF.

HTTP消息格式

HTTP-message   = start-line
                 *( header-field CRLF )
                 CRLF
                 [ message-body ]

程序首先读取start-line以及header,然后根据header里面的内容决定是否含有body。body则按照长度读取定长字节。

start-line

start-line     = request-line / status-line
request-line   = method SP request-target SP HTTP-version CRLF
status-line    = HTTP-version SP status-code SP reason-phrase CRLF

其中:

  • HTTP中并未直接定义request-line的长度限制,倘若超过服务端实现所支持的长度,则建议返回501(Not implemented)
  • HTTP建议request-line支持至少8000字符长
  • 若request-target超过服务器预期则必须返回414 (URI Too Long)

header-field

header-field   = field-name ":" OWS field-value OWS

field-name     = token
field-value    = *( field-content / obs-fold )
field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar    = VCHAR / obs-text

obs-fold       = CRLF 1*( SP / HTAB )

通常header-field是不应该重复,但有一些例外:

  • 若该header有多个值,则可以定义为多个或是按照逗号分隔值
  • 通常会出现多次,但因无法合并到一起,是一个例外

field-value通常只包含US-ASCII 字符集。

对于header的长度HTTP并未直接限制,实际通常根据header的含义在实现上有对应的限制。对于超出预期的请求,规定必须返回4xx错误而不是忽略,以防止 攻击。

http body

因为http的body并不是必须的,具体什么时候有body按照如下的规则:

  • 在request中,通过或者定义;
  • 在response中,通过请求的method以及响应的状态码确定。HEAD请求不包含body;对于CONNECT请求的2xx响应也没有body;1xx (Informational), 204 (No Content), 以及 304 (Not Modified)也均无body。所有其他response都具有body,即使其长度为0。

Transfer-Encoding

Transfer-Encoding用来表明body中的消息是如何编码的,例如

Transfer-Encoding: gzip, chunked

表示对body进行了gzip压缩并进行了分组。

transfer-coding    = "chunked" 
                   / "compress"
                   / "deflate"
                   / "gzip"
                   / transfer-extension
     transfer-extension = token *( OWS ";" OWS transfer-parameter )

Content-Length

如果不存在头,则可通过表明消息的长度。注意两者不允许同时出现。

Content-Length: 3495

Chunked Transfer Coding

Chunked enables content streams of unknown size to be transferred as a sequence of length-delimited buffers, which enables the sender to retain connection persistence and the recipient to know when it has received the entire message.

chunked-body   = *chunk
                 last-chunk
                 trailer-part
                 CRLF
chunk          = chunk-size [ chunk-ext ] CRLF
                 chunk-data CRLF
chunk-size     = 1*HEXDIG
last-chunk     = 1*("0") [ chunk-ext ] CRLF
chunk-data     = 1*OCTET ; a sequence of chunk-size octets

The chunk-size field is a string of hex digits indicating the size of the chunk-data in octets. The chunked transfer coding is complete when a chunk with a chunk-size of zero is received, possibly followed by a trailer, and finally terminated by an empty line.