Almost every HTTP request contains header fields. They are sent by the browser, or by an application that communicates with a server using HTTP (Hypertext Transfer Protocol).
Virtually none of them are actually required (the exception is the Host field in HTTP/1.1). With so many optional headers, the sheer number of possibilities can be overwhelming to newcomers.
HTTP was developed by Tim Berners-Lee as a way to communicate with web servers, and that's still by far the biggest use for HTTP, but nowadays many other applications, for example mobile apps, also communicate with servers using HTTP. As the uses for HTTP have grown exponentially since its inception in the early 1990s, the number of headers has become unmanageable. But in reality, most requests use quite a small set of common header fields and this page describes the ones you're likely to encounter in everyday requests.
There are many places around the web (such as the wikipedia page) which define the all header fields, but this guide exists to address the most common one in slightly more depth.
HTTP Header FormatEach HTTP header is defined as a key-value string where the key is the header name and it is followed by a colon and then the header value text.
Header-Name: header value text
The format of the header value text is freeform and its interpretation is specific to the particular header.
The HTTP Host request header specifies the domain of the server it is communicating with, and, optionally, a colon followed by the port number. If no port number is specified then the default port for the requested service is assumed.
The Host header is mandatory in HTTP/1.1 requests, and if it is omitted then a 400 response will be triggered.
Referer tells the server where the requested URL came from. It will almost always be another URL, or else empty for a direct request (for example, the requester typed the URL into a browser address bar). The most common scenario is that the requester followed a link from another web page (including a search engine result), and the Referer is the URL to the original page. This header can be used in various ways by the server, including analytics, affiliate tracking and even blocking access to resources.
You may have noticed that Referer is spelt incorrectly. It should be "referrer". This is possibly the most famous example of how a simple incorrect spelling can persist in code and protocols for decades after the original error. There are now almost certainly far more incorrect instances of this spelling that exist in the world than correct ones, such is the ubiquity of the web!
User-Agent identifies the requesting system. It is a string composed of a sequence of so-called product tokens with optional comments. It is quite hard to try to guess the format of the string: at first glance it looks almost freeform, but it does have a specified structure as defined in the IETF documentation.
The general form of each product token is:-
product/version [(optional-comments)] ...
where optional-comments is a sequence of semicolon-separated tokens. These tokens are pretty much freeform and are used to define hardware characteristics such as device model, CPU, etc, but it really is fairly arbitrary.
A typical browser UA string follows the broad pattern:-
Mozilla/version (system information...) extensions...
In reality, you might notice that virtually every browser's User-Agent string starts with the token Mozilla/5.0. This is because in the mid-1990s only the Mozilla browser (the original name of Netscape) could handle certain more advanced HTML tags, so servers had to check the requesting browser's identity before serving this richer content. Then other browsers, most notably the first version of Interner Explorer, copied those features but in order to get served the same content they had to spoof their identity to the server, and basically pretend to be Mozilla. This led to all subsequent browsers behaving the same and in that way it became the de-facto standard User-Agent for all browsers.
This is why browser UA strings are so long and unwieldy today. The example below shows the UA for Google Chrome on a Mac: it claims to be Mozilla, a bit like Gecko, but also Chrome, and also Safari!
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
The Accept header is how a client (browser or application) tells the server what kind of content it can accept in the HTTP response. The content types are comma-separated, and take the form type/subtype such as text/html, application/json or audio/mpeg.
Asterisks can be used as wildcards in place of either type or subtype. For example, text/* means accept any kind of text response, and */* means accept any content type at all.
Additionally, each content type can have an optional quality parameter, for instance q=0.5 where the number is in the range 0.0 (lowest) to 1.0 (highest). Importantly, if omitted then the quality is assumed to be 1.0.
The quality factor is what determines the order of preference, or what content type the client would prefer. The server should respect the this order when determining the type of content to serve in the response.
A list of common media types can be found here.
Accept: text/html, text/plain; q=0.6, */*; q=0.1
Accept: application/graphql, application/json; q=0.8, application/xml; q=0.7
Accept-Encoding defines what type of content encoding (usually a compression algorithm) the client can accept in the response body. Its format is similar to the Accept header, i.e. a comma-separated list of encodings, with an optional quality parameter associated with each one. If omitted, the quality is assumed to be 1.0. Again, wildcards can be used, so * means accept all encodings.
Typically, browsers always send this header and the server must respect it. But that does not mean the server must use the client's preferred encoding. Sometimes the preferred compression algorithm may not result in a smaller payload (for instance if the data is already compressed), and other times the server may not have the computational resources to perform the preferred encoding. If no more suitable algorithm is found then the default encoding defined as identity, meaning no encoding, should be used as a fallback encoding. The only exception to this rule is if the client explicitly prohibits this encoding by specifying identity;q=0 or *;q=0.
After choosing an encoding based on the client's specified preferences and the server's capabilities, the chosen encoding will be returned in the Content-Encoding response header.
Commonly Used Values
Quite a number of different algorithms have been used over the years. Some are out of date or only partially supported. Here are the most common ones you'll find in use today.
- Lempel-Ziv (LZ77) compression algorithm with 32-bit CRC. it is the most commonly used and supported algorithm.
- Despite its name, this is also a compression algorithm, also based on LZ77. In theory it's slightly faster than gzip but has historically been badly implemented in browsers and servers so actually less preferred.
- The relatively new Brotli algorithm developed at Google. A lossless compression algorithm also based on LZ77 but with some of the original optimizations of the deflate algorithm. It is not yet supported on all browsers and servers, however.
- No encoding. The default encoding that must always be supported unless the client explicitly disables it by specifying identity;q=0 or *;q=0.
Accept-Encoding: gzip, deflate
Accept-Encoding: br, gzip;q=0.9, deflate;q=0.8, *;q=0.1
The HTTP Accept-Language header tells the server the client's preferred natural language. Like the other Accept headers it is a comma-separated list of language specifiers, with an optional quality parameter associated with each one. If omitted, the quality is assumed to be 1.0. Again, wildcards can be used, so * means accept all languages.
Each language specifier is a pair of symbols separated by a hyphen. The first symbol is a two-letter language code as defined in ISO 639, and the second is a two letter country code as defined in ISO 3166. For example, en-US means United States English, en-GB means British English, and es-MX. The country codes often appear as uppercase because that's what the standard defines them as, but the server should accept them case-insensitively, so en-us, en-gb and es-mx are equally valid.
The second part of the language specifier may be omitted if regional language variations are not required. For example, a specifier of just en covers all English variants.
The server's negotiation of which language to serve the content in is much like with the other Accept headers. The user's preferences are prioritised based on the quality parameters, and the server must respond with the best match it can.
The Accept-Language header is important in other ways too. Many websites and applications will use the information to infer a user's locale, which can affect not only the displayed language but many different settings including currency, units of measurement, paper sizes, and so on.
Accept-Language: en-GB, en-US, en;q=0.9
Accept-Language: de-AT, de-DE;q=0.9, en;q=0.5
The HTTP Authorization header will only be covered briefly here. It is extremely important for any website or application that requires authorization of the user before allowing access to resources. The header simply specifies the authorization scheme and any associated data or token, and carries that data as a payload. It is up to the client and server to negotiate the authorization using whatever protocols the server requires - the HTTP protocol itself is not actually part of the authorization process.
Commonly Used Authorization Schemes
The scheme is user-defined because any scheme can be implemented, but here are some of the the most common ones you'll find in use today.
- A base-64 encoded username/password pair. This only has any security when sending over HTTPS connections, and even then it is wide open to a man-in-the-middle attack.
- A very common scheme in use today. The bearer token may be an OAuth 2.0 token or a JWT (JSON Web Token). When properly implemented, this is far more secure than the Basic authorization scheme.
- In this scheme a digest (or hash) value is created from a predetermined combination of the username, password, and some information from the server, including random (or "nonce") values. The digest is sent to the server instead of ever sending passwords in cleartext.
Authorization: Basic ZmFsa2VuOmpvc2h1YTU=
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzY290Y2guaW8iLCJleHAiOjEzMDA4MTkzODAsIm5hbWUiOiJDaHJpcyBTZXZpbGxlamEiLCJhZG1pbiI6dHJ1ZX0.03f329983b86f7d9a9f5fef85305880101d5e302afafa20154d094b229f75773