Be friendly to HTTP Conditional Requests.
A conditional HTTP request allows you to fetch new data using HTTP GET without unnecessarily downloading data you already have.
This article focuses on data exchange (sometimes called PULL), where a consumer initiates an HTTP GET request to a publisher’s URL and either receives new data or gets a response indicating that there is nothing new since the last request.
Let’s start with analogy with collecting the latest newspaper at a newsstand.
Speedy Honzales News
Imagine that your local newsstand sells newspapers called “Speedy Honzales News” (abbreviated as Speedy), which provide the freshest information you care about. To stay truly fresh, a new issue can be released at any time when the publisher decides there is something worth publishing.
Let’s think about how to arrange it so that you have these fresh news at home as soon as possible. There are several scenarios:
Scenario “Paper Collector”
As soon as you finish reading the current issue at home, you go to the newsstand, say “one copy of Speedy,” get a copy, take it home, sit in your chair, put on your glasses, and start reading.
You often share the news with your wife, but you sometimes find that you have already read that issue, maybe even several times.
But you always get up again, go to the newsstand, and say your “one copy of Speedy.”
Scenario “Anything new since today at 08:35:15?”
Because you are tired of buying what you already have at home, you go to the newsstand and say: If there is a new issue of Speedy released since 08:35:15 this morning, give it to me.
The news vendor either nods and hands you the new issue or says “nothing yet,” in any case, you head back home. And at home, the number of issues you already have stops piling up.
Scenario “I already have the one with the striped cat”
Maybe you don’t remember times, but you have a visual memory and remember the picture drawn in the bottom right corner, which is different each time.
You go to the newsstand and say: One copy of Speedy, but I already have the one with the striped cat. The news vendor looks at the picture in the latest issue and either hands over an issue with a different picture or says “I only have the one with the striped cat” and you go home empty-handed.
Scenario “The clueless vendor”
Maybe you have an excellent memory for dates and times and also for pictures. But no matter what you say, the vendor always hands you a copy, regardless of your request. Either they are lazy, or they can’t distinguish the times or pictures themselves.
You start to feel like a “paper collector.”
HTTP GET Requests
It’s time to touch real HTTP.
If you want try yourself, here are instructions for setting up your own local web server.
Before using real HTTP requests, we need to prepare or find a webserver serving some content. Let’s start in the simplest way using Python with it’s standard http.server module.
Let’s create a data directory data-dir and put data.json which we are going to serve later on.
mkdir data-dir
cd data-dir
echo '{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}' > data.jsonBeing in the data-dir directory, start simple Python HTTP server on port 8080
python3 -m http.server 8080http CLI) as a nice curl alternative
Throughout this article we use http CLI which provides nice user experience (compared to curl).
The HTTPie project provides Desktop and Terminal version. We use the terminal one.
The installation is supported by many package manages (pip, brew, apt, choco, port, snap, yum, …)
The CLI signature as as simple as: http [METHOD] URL [REQUEST_ITEM ...].
Default GET method can be omited.
To specify header for the request, it must follow the URL in form of http http://localhost:8080 "header-name: header-value"
To print request and response headers: http --pring Hh http://localhost:8080
For more, see HTTPie CLI Usage
Simple HTTP GET (“paper-collector”)
Now use http command to get the content.
http http://localhost:8080/data.json
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:18:36 GMT
Last-Modified: Thu, 31 Oct 2024 19:15:47 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}Repeat the HTTP GET request as before:
http http://localhost:8080/data.json
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:21:31 GMT
Last-Modified: Thu, 31 Oct 2024 19:15:47 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}Carefully examine the headers and find that:
- almost all the headers are identical
- only the
Date:header changes and shall reflect time on the web server at the time of our request. - the
Last-Modified:header stays unchanged and uses a bit old-fashioned formatThu, 31 Oct 2024 19:15:47 GMT
Check modification of the file in our file system (forcing English language and UTC time-zone):
TZ=UTC LANG=C ls -l data.json
-rw-r--r--@ 1 javl staff 166 Oct 31 19:15 data.jsonYou shall find, the file modification time and the Last-Modified are identical.
Many webservers (incl. the primitive one we use here) derive the Last-Modified time from file modificaiton time.
Let’s try to change it:
touch data.jsoncheck the modification time:
TZ=UTC LANG=C ls -l data.json
-rw-r--r--@ 1 javl staff 166 Oct 31 19:30 data.jsonand finally try again the HTTP GET
http http://localhost:8080/data.json
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:32:07 GMT
Last-Modified: Thu, 31 Oct 2024 19:30:35 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}Finally we will use options --print HhB to instruct http to print not only response headers, but also request ones.
http --print Hhb http://localhost:8080/data.json
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:36:10 GMT
Last-Modified: Thu, 31 Oct 2024 19:30:35 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}We see there default http headers:
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2
If-Modified-Since (“Anything new since …?”)
Keep the server running, copy literal value of the Last-Modified header and use it as value for request header If-Modified-Since
http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT"
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2
If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT
HTTP/1.0 304 Not Modified
Date: Thu, 31 Oct 2024 19:38:47 GMT
Server: SimpleHTTP/0.6 Python/3.13.0We can observe following changes:
- there is new
If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMTheader - HTTP status code changed from
200 OKto304 Not Modified - body of the response is empty
Congratulations: we managed to issue conditional HTTP request using If-Modified-Since header bearing value of Last-Modified from previous response and as the content did not change so far, we got “emtpy” response with HTTP status code 304 Not Modified.
You may repeat the same request few times and until the file stays unchanged, it shall provide the same response (only the Date header will change).
Now let’s try to modify modification time by touching it:
touch data.jsoncheck the modification time:
TZ=UTC LANG=C ls -l data.json
-rw-r--r--@ 1 javl staff 166 Oct 31 19:44 data.jsonand repeat the previous HTTP request:
http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT"
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT
User-Agent: HTTPie/3.2.2
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:46:47 GMT
Last-Modified: Thu, 31 Oct 2024 19:44:17 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}Notice, that regardless of the response body being still the same, the value of Last-Modified has changed and this was the reason the conditional HTTP GET response returned the body and HTTP status code 200 OK.
Of-course, typical update of our data changes the content served and at the same time changes the file modification time.
Let’s try new If-Modified-Since request with the latest value of Last-Modified header:
http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT"
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT
User-Agent: HTTPie/3.2.2
HTTP/1.0 304 Not Modified
Date: Thu, 31 Oct 2024 19:51:20 GMT
Server: SimpleHTTP/0.6 Python/3.13.0All as expected, no changes - no response body.
Now use your editor to change content of the file served.
and repeat the previous conditional request:
http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT"
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT
User-Agent: HTTPie/3.2.2
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:53:12 GMT
Last-Modified: Thu, 31 Oct 2024 19:52:35 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:52:05.123Z",
"records": [
{
"alpha/v2": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}To summarize If-Modified-Since conditional HTTP request:
- For request declaring in
If-Modified-Sinceheader the same value as value ofLast-Modifiedin ordinary HTTP GET response, it responds with HTTP status code304 Not Modifiedand the response body is empty. - Many web servers derive value of
Last-Modifiedfrom file modificaiton time - but for dynamically generated response the code can also dynamically generate theLast-Modifiedheader too. - Format for
If-Modified-SinceandLast-Modifiedheaders is a bit old-fashioned - but this is how it goes. Using ISO 8601 formatted datime would not work well. - Values of
Last-Modifieduse precision of whole seconds - for content updated more then once a second theIf-Modified-Sinceconditional request would miss some changes. - If one would change the served content but preserved the
Last-Modifiedtime, the conditional request would not notice the change.
If-None-Match (“I already have the one with the striped cat”)
Previously set up web server (using python -m http.server) does not support by default ETag headers.
If you want try yourself, here are instructions for setting up your own local NGINX web server using docker and docker-compose.yml.
These instructions assume, you have docker installed on your machine.
Let’s create a data directory data-dir and put data.json which we are going to serve later on. Note, that we stay in the project root this time without changing the directory into data-dir
mkdir data-dir
echo '{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}' > data-dir/data.jsonCreate docker-compose.yml:
echo '
services:
web:
image: nginx:latest
ports:
- "8080:80"
volumes:
- ./data-dir:/usr/share/nginx/html:ro
restart: always
' > docker-compose.ymlFinally start the web server using docker compose command:
docker compose up
[+] Running 1/0
✔ Container conditional-web-1 Created 0.0s
Attaching to web-1
web-1 | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
web-1 | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
web-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
web-1 | 10-listen-on-ipv6-by-default.sh: info: IPv6 listen already enabled
web-1 | /docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
web-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
web-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
web-1 | /docker-entrypoint.sh: Configuration complete; ready for start up
web-1 | 2024/10/31 21:38:40 [notice] 1#1: using the "epoll" event method
web-1 | 2024/10/31 21:38:40 [notice] 1#1: nginx/1.27.2
web-1 | 2024/10/31 21:38:40 [notice] 1#1: built by gcc 12.2.0 (Debian 12.2.0-14)
web-1 | 2024/10/31 21:38:40 [notice] 1#1: OS: Linux 6.6.32-linuxkit
web-1 | 2024/10/31 21:38:40 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker processes
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 22
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 23
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 24
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 25
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 26
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 27
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 28
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 29
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 30
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 31Note, that the first time it will download the nginx docker image what can take a few seconds.
The command above will start the NGINX server (serving content present in data-dir folder) and will run until stopped by Ctrl-C. Until then, it will report request sent to it.
The If-None-Match conditional HTTP request builds on unique signature of the content provided by means of ETag response header.
The ETag value is some sort of content fingerprint (NGINX using last-modified time + content lenght, Apache uses the same + inode, but values such as serial number of the content generated, SHA256 etc may be used too).
Let’s use plain HTTP GET request to make sure, ETag header is present.
http http://localhost:8080/data.json
HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 166
Content-Type: application/json
Date: Thu, 31 Oct 2024 21:44:20 GMT
ETag: "6723e003-a6"
Last-Modified: Thu, 31 Oct 2024 19:52:35 GMT
Server: nginx/1.27.2
{
"published": "2024-10-31T18:52:05.123Z",
"records": [
{
"alpha/v2": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}Comparing the headers to what we have seen before with python3 -m simple.http 8080 we see:
- number of headers is a bit higher:
Accept-Rangeswhich we can ignore, andETagwhich is essential for us Last-Modifiedis present and shall work the same wayETagheader is present- The value of
ETaghere has starting and ending quote - these are part of the string value - what makes often some confusion. We will have to carefully escape them on CLI calls.
Let’s try to use If-None-Match request header to form a conditional HTTP request. The header shall use value from the ETag response header.
Notice, that we are escaping the " characters to keep them in.
http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723e003-a6\""
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723e003-a6"
User-Agent: HTTPie/3.2.2
HTTP/1.1 304 Not Modified
Connection: keep-alive
Date: Thu, 31 Oct 2024 21:53:57 GMT
ETag: "6723e003-a6"
Last-Modified: Thu, 31 Oct 2024 19:52:35 GMT
Server: nginx/1.27.2Carefully examine the headers and find that:
- almost all the headers are identical as with plain HTTP GET request
- the
ETagheader stays the same and is identical with the value ofIf-None-Matchrequest header. - the HTTP status code is
304 Not Modified - there is empty response body
- the
DateandLast-Modified:behave the same way as in theIf-Modified-Sincecase described earlier.
Now try to touch the data-dir/data.json file to change file modification time:
touch data-dir/data.jsonand try the request:
http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723e003-a6\""
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723e003-a6"
User-Agent: HTTPie/3.2.2
HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 166
Content-Type: application/json
Date: Thu, 31 Oct 2024 22:04:54 GMT
ETag: "6723ff03-a6"
Last-Modified: Thu, 31 Oct 2024 22:04:51 GMT
Server: nginx/1.27.2
{
"published": "2024-10-31T18:52:05.123Z",
"records": [
{
"alpha/v2": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}I was surprised to get the response body returned with HTTP status code 200 OK.
Carefully examining values of the request header If-None-Match and response header ETag shows some similarity, but the values differ.
It turns out, that NGINX is using an alghorithm for ETag which takes into account content length, but also file modification time.
Note, that this behaviour does not harm usage of conditional HTTP requests with If-None-Match.
Updated conditional request (with actual ETag value) shall work as expected:
http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723ff03-a6\""
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723ff03-a6"
User-Agent: HTTPie/3.2.2
HTTP/1.1 304 Not Modified
Connection: keep-alive
Date: Thu, 31 Oct 2024 22:12:39 GMT
ETag: "6723ff03-a6"
Last-Modified: Thu, 31 Oct 2024 22:04:51 GMT
Server: nginx/1.27.2We may try editing data.json content somehow and see, if repeated conditional request provides the updated content:
http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723ff03-a6\""
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723ff03-a6"
User-Agent: HTTPie/3.2.2
HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 165
Content-Type: application/json
Date: Thu, 31 Oct 2024 22:14:49 GMT
ETag: "67240153-a5"
Last-Modified: Thu, 31 Oct 2024 22:14:43 GMT
Server: nginx/1.27.2
{
"published": "2024-10-31T21:14:08.123Z",
"records": [
{
"gama/v1": "megabytes of data"
},
{
"beta/v5": "megabytes of data"
}
]
}All works as expected.
To summarize If-None-Match conditional HTTP request:
- For request declaring in
If-None-Matchrequest header the same value as value ofETagin ordinary HTTP GET response, it responds with HTTP status code304 Not Modifiedand the response body is empty. - Many web servers assign the value of the
ETagautomatically - but for dynamically generated response the code can also dynamically generate value for theETagheader too. - Format for
ETagoften (but not necessarily) includes leading and trailing"as part of header value. Be sure to include it properly in your request header value. - Values of
ETagare more likely to detect modifications within multiple versions happening in one second. Anyway, if you want to support such scenario - do test it with your real web server. - If your web server or application generates value of
ETagas real fingerprint (e.g. usingMD5,SHA-1orSHA-256), the conditional request usingIf-None-Matchheader would work well in scenario, where scheduled processes create the content repeatedly (thus changing modification time) but the content would often end up as being identical.
Combined If-Modifed-Since and If-None-Match
You may wonder, what would happen, if you use both request headers.
The RFC7232 in sectin 3.3 states:
A recipient MUST ignore If-Modified-Since if the request contains an If-None-Match header field; the condition in If-None-Match is considered to be a more accurate replacement for the condition in If-Modified-Since, and the two are only combined for the sake of interoperating with older intermediaries that might not implement If-None-Match.
So according to this, combination of both request headers is identical to If-None-Match alone.
Support/Use Conditional HTTP Requests - be cool
Nowadays, PULL exchange pattern is very popular due to simplicity of implementation on both publisher as well as consumer side.
The difference in numbers
Typicall DATEX II SituationPublication snapshot might have size of 1.5 MB (in gzipped form).
If a consumer wants to get the updates as soon as possible, frequency of one request in 5 seconds (thus 12 request a minute) is common.
| unconditional | conditional | |
|---|---|---|
| requests/min | 12 | 12 |
| empty rq/min | 0 | 11 |
| download/min | 18 MB | 1.5 MB |
| download/day | 25.92 GB | 2.16 GB |
| download/% | 100% | 8.3% |
| download/ratio | 12 | 1 |
Apart from saving bandwidth - there is even more important aspect of processing power needed:
- consumer might attempt to process fetched content 12 times a minute or only once.
- publisher dynamically generating response for each request (not recommended practice but seen in some cases) will quickly run out of resources.
If this is combined with allowing fetching content in not compressed (gzipped) form, the download size gets typically 10 times higher:
| unconditional | conditional | |
|---|---|---|
| requests/min | 12 | 12 |
| empty rq/min | 0 | 11 |
| download/min | 180 MB | 15. MB |
| download/day | 259.2 GB | 21.6 GB |
These numbers shall be multiplied by number of consumers to see real demand on connectivity.
Implementing conditional requests
First pre-requisite to implementing conditional requests is to understand it. This blog entry tries to help you in that.
The implementation at publisher is mostly relatively simple. As was shown in examples above, even the default NGINX configuraiton serves the content conditionally out of the box. Most web servers can do the same with relatively simple configuraiton.
The implementation at consumer is also relatively simple but the challenge is to find motivation for that. Once the consumer understands advantages of conditional requests (e.g. 12 less resources needed for processing), the motivation is found.
Also publishers might force consumers to use conditional requests, e.g. by introducing rate limiting for unconditional requests (or even better counting amount of data downloaded and setting limits to that).