Be friendly to HTTP Conditional Requests.
A conditional HTTP request allows you to fetch new data using HTTP GET without unnecessarily downloading data you already have.
This article focuses on data exchange (sometimes called PULL), where a consumer initiates an HTTP GET request to a publisher’s URL and either receives new data or gets a response indicating that there is nothing new since the last request.
Let’s start with analogy with collecting the latest newspaper at a newsstand.
Speedy Honzales News
Imagine that your local newsstand sells newspapers called “Speedy Honzales News” (abbreviated as Speedy), which provide the freshest information you care about. To stay truly fresh, a new issue can be released at any time when the publisher decides there is something worth publishing.
Let’s think about how to arrange it so that you have these fresh news at home as soon as possible. There are several scenarios:
Scenario “Paper Collector”
As soon as you finish reading the current issue at home, you go to the newsstand, say “one copy of Speedy,” get a copy, take it home, sit in your chair, put on your glasses, and start reading.
You often share the news with your wife, but you sometimes find that you have already read that issue, maybe even several times.
But you always get up again, go to the newsstand, and say your “one copy of Speedy.”
Scenario “Anything new since today at 08:35:15?”
Because you are tired of buying what you already have at home, you go to the newsstand and say: If there is a new issue of Speedy released since 08:35:15 this morning, give it to me.
The news vendor either nods and hands you the new issue or says “nothing yet,” in any case, you head back home. And at home, the number of issues you already have stops piling up.
Scenario “I already have the one with the striped cat”
Maybe you don’t remember times, but you have a visual memory and remember the picture drawn in the bottom right corner, which is different each time.
You go to the newsstand and say: One copy of Speedy, but I already have the one with the striped cat. The news vendor looks at the picture in the latest issue and either hands over an issue with a different picture or says “I only have the one with the striped cat” and you go home empty-handed.
Scenario “The clueless vendor”
Maybe you have an excellent memory for dates and times and also for pictures. But no matter what you say, the vendor always hands you a copy, regardless of your request. Either they are lazy, or they can’t distinguish the times or pictures themselves.
You start to feel like a “paper collector.”
HTTP GET Requests
It’s time to touch real HTTP.
If you want try yourself, here are instructions for setting up your own local web server.
Before using real HTTP requests, we need to prepare or find a webserver serving some content. Let’s start in the simplest way using Python with it’s standard http.server
module.
Let’s create a data directory data-dir
and put data.json
which we are going to serve later on.
mkdir data-dir
cd data-dir
echo '{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}' > data.json
Being in the data-dir
directory, start simple Python HTTP server on port 8080
python3 -m http.server 8080
http
CLI) as a nice curl
alternative
Throughout this article we use http
CLI which provides nice user experience (compared to curl
).
The HTTPie project provides Desktop and Terminal version. We use the terminal one.
The installation is supported by many package manages (pip, brew, apt, choco, port, snap, yum, …)
The CLI signature as as simple as: http [METHOD] URL [REQUEST_ITEM ...]
.
Default GET method can be omited.
To specify header for the request, it must follow the URL in form of http http://localhost:8080 "header-name: header-value"
To print request and response headers: http --pring Hh http://localhost:8080
For more, see HTTPie CLI Usage
Simple HTTP GET
(“paper-collector”)
Now use http
command to get the content.
http http://localhost:8080/data.json
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:18:36 GMT
Last-Modified: Thu, 31 Oct 2024 19:15:47 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}
Repeat the HTTP GET request as before:
http http://localhost:8080/data.json
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:21:31 GMT
Last-Modified: Thu, 31 Oct 2024 19:15:47 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}
Carefully examine the headers and find that:
- almost all the headers are identical
- only the
Date:
header changes and shall reflect time on the web server at the time of our request. - the
Last-Modified:
header stays unchanged and uses a bit old-fashioned formatThu, 31 Oct 2024 19:15:47 GMT
Check modification of the file in our file system (forcing English language and UTC time-zone):
TZ=UTC LANG=C ls -l data.json
-rw-r--r--@ 1 javl staff 166 Oct 31 19:15 data.json
You shall find, the file modification time and the Last-Modified
are identical.
Many webservers (incl. the primitive one we use here) derive the Last-Modified
time from file modificaiton time.
Let’s try to change it:
touch data.json
check the modification time:
TZ=UTC LANG=C ls -l data.json
-rw-r--r--@ 1 javl staff 166 Oct 31 19:30 data.json
and finally try again the HTTP GET
http http://localhost:8080/data.json
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:32:07 GMT
Last-Modified: Thu, 31 Oct 2024 19:30:35 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}
Finally we will use options --print HhB
to instruct http
to print not only response headers, but also request ones.
http --print Hhb http://localhost:8080/data.json
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:36:10 GMT
Last-Modified: Thu, 31 Oct 2024 19:30:35 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}
We see there default http
headers:
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2
If-Modified-Since
(“Anything new since …?”)
Keep the server running, copy literal value of the Last-Modified
header and use it as value for request header If-Modified-Since
http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT"
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2
If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT
HTTP/1.0 304 Not Modified
Date: Thu, 31 Oct 2024 19:38:47 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
We can observe following changes:
- there is new
If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT
header - HTTP status code changed from
200 OK
to304 Not Modified
- body of the response is empty
Congratulations: we managed to issue conditional HTTP request using If-Modified-Since
header bearing value of Last-Modified
from previous response and as the content did not change so far, we got “emtpy” response with HTTP status code 304 Not Modified
.
You may repeat the same request few times and until the file stays unchanged, it shall provide the same response (only the Date
header will change).
Now let’s try to modify modification time by touching it:
touch data.json
check the modification time:
TZ=UTC LANG=C ls -l data.json
-rw-r--r--@ 1 javl staff 166 Oct 31 19:44 data.json
and repeat the previous HTTP request:
http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT"
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT
User-Agent: HTTPie/3.2.2
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:46:47 GMT
Last-Modified: Thu, 31 Oct 2024 19:44:17 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}
Notice, that regardless of the response body being still the same, the value of Last-Modified
has changed and this was the reason the conditional HTTP GET response returned the body and HTTP status code 200 OK
.
Of-course, typical update of our data changes the content served and at the same time changes the file modification time.
Let’s try new If-Modified-Since
request with the latest value of Last-Modified
header:
http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT"
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT
User-Agent: HTTPie/3.2.2
HTTP/1.0 304 Not Modified
Date: Thu, 31 Oct 2024 19:51:20 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
All as expected, no changes - no response body.
Now use your editor to change content of the file served.
and repeat the previous conditional request:
http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT"
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT
User-Agent: HTTPie/3.2.2
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:53:12 GMT
Last-Modified: Thu, 31 Oct 2024 19:52:35 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
{
"published": "2024-10-31T18:52:05.123Z",
"records": [
{
"alpha/v2": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}
To summarize If-Modified-Since
conditional HTTP request:
- For request declaring in
If-Modified-Since
header the same value as value ofLast-Modified
in ordinary HTTP GET response, it responds with HTTP status code304 Not Modified
and the response body is empty. - Many web servers derive value of
Last-Modified
from file modificaiton time - but for dynamically generated response the code can also dynamically generate theLast-Modified
header too. - Format for
If-Modified-Since
andLast-Modified
headers is a bit old-fashioned - but this is how it goes. Using ISO 8601 formatted datime would not work well. - Values of
Last-Modified
use precision of whole seconds - for content updated more then once a second theIf-Modified-Since
conditional request would miss some changes. - If one would change the served content but preserved the
Last-Modified
time, the conditional request would not notice the change.
If-None-Match
(“I already have the one with the striped cat”)
Previously set up web server (using python -m http.server
) does not support by default ETag
headers.
If you want try yourself, here are instructions for setting up your own local NGINX
web server using docker
and docker-compose.yml
.
These instructions assume, you have docker
installed on your machine.
Let’s create a data directory data-dir
and put data.json
which we are going to serve later on. Note, that we stay in the project root this time without changing the directory into data-dir
mkdir data-dir
echo '{
"published": "2024-10-31T18:06:02.123Z",
"records": [
{
"alpha/v1": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}' > data-dir/data.json
Create docker-compose.yml
:
echo '
services:
web:
image: nginx:latest
ports:
- "8080:80"
volumes:
- ./data-dir:/usr/share/nginx/html:ro
restart: always
' > docker-compose.yml
Finally start the web server using docker compose
command:
docker compose up
[+] Running 1/0
✔ Container conditional-web-1 Created 0.0s
Attaching to web-1
web-1 | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
web-1 | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
web-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
web-1 | 10-listen-on-ipv6-by-default.sh: info: IPv6 listen already enabled
web-1 | /docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
web-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
web-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
web-1 | /docker-entrypoint.sh: Configuration complete; ready for start up
web-1 | 2024/10/31 21:38:40 [notice] 1#1: using the "epoll" event method
web-1 | 2024/10/31 21:38:40 [notice] 1#1: nginx/1.27.2
web-1 | 2024/10/31 21:38:40 [notice] 1#1: built by gcc 12.2.0 (Debian 12.2.0-14)
web-1 | 2024/10/31 21:38:40 [notice] 1#1: OS: Linux 6.6.32-linuxkit
web-1 | 2024/10/31 21:38:40 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker processes
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 22
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 23
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 24
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 25
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 26
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 27
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 28
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 29
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 30
web-1 | 2024/10/31 21:38:40 [notice] 1#1: start worker process 31
Note, that the first time it will download the nginx docker image what can take a few seconds.
The command above will start the NGINX server (serving content present in data-dir
folder) and will run until stopped by Ctrl-C
. Until then, it will report request sent to it.
The If-None-Match
conditional HTTP request builds on unique signature of the content provided by means of ETag
response header.
The ETag
value is some sort of content fingerprint (NGINX using last-modified time + content lenght, Apache uses the same + inode, but values such as serial number of the content generated, SHA256 etc may be used too).
Let’s use plain HTTP GET request to make sure, ETag
header is present.
http http://localhost:8080/data.json
HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 166
Content-Type: application/json
Date: Thu, 31 Oct 2024 21:44:20 GMT
ETag: "6723e003-a6"
Last-Modified: Thu, 31 Oct 2024 19:52:35 GMT
Server: nginx/1.27.2
{
"published": "2024-10-31T18:52:05.123Z",
"records": [
{
"alpha/v2": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}
Comparing the headers to what we have seen before with python3 -m simple.http 8080
we see:
- number of headers is a bit higher:
Accept-Ranges
which we can ignore, andETag
which is essential for us Last-Modified
is present and shall work the same wayETag
header is present- The value of
ETag
here has starting and ending quote - these are part of the string value - what makes often some confusion. We will have to carefully escape them on CLI calls.
Let’s try to use If-None-Match
request header to form a conditional HTTP request. The header shall use value from the ETag
response header.
Notice, that we are escaping the "
characters to keep them in.
http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723e003-a6\""
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723e003-a6"
User-Agent: HTTPie/3.2.2
HTTP/1.1 304 Not Modified
Connection: keep-alive
Date: Thu, 31 Oct 2024 21:53:57 GMT
ETag: "6723e003-a6"
Last-Modified: Thu, 31 Oct 2024 19:52:35 GMT
Server: nginx/1.27.2
Carefully examine the headers and find that:
- almost all the headers are identical as with plain HTTP GET request
- the
ETag
header stays the same and is identical with the value ofIf-None-Match
request header. - the HTTP status code is
304 Not Modified
- there is empty response body
- the
Date
andLast-Modified:
behave the same way as in theIf-Modified-Since
case described earlier.
Now try to touch the data-dir/data.json
file to change file modification time:
touch data-dir/data.json
and try the request:
http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723e003-a6\""
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723e003-a6"
User-Agent: HTTPie/3.2.2
HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 166
Content-Type: application/json
Date: Thu, 31 Oct 2024 22:04:54 GMT
ETag: "6723ff03-a6"
Last-Modified: Thu, 31 Oct 2024 22:04:51 GMT
Server: nginx/1.27.2
{
"published": "2024-10-31T18:52:05.123Z",
"records": [
{
"alpha/v2": "megabytes of data"
},
{
"beta/v1": "megabytes of data"
}
]
}
I was surprised to get the response body returned with HTTP status code 200 OK
.
Carefully examining values of the request header If-None-Match
and response header ETag
shows some similarity, but the values differ.
It turns out, that NGINX is using an alghorithm for ETag
which takes into account content length, but also file modification time.
Note, that this behaviour does not harm usage of conditional HTTP requests with If-None-Match
.
Updated conditional request (with actual ETag
value) shall work as expected:
http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723ff03-a6\""
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723ff03-a6"
User-Agent: HTTPie/3.2.2
HTTP/1.1 304 Not Modified
Connection: keep-alive
Date: Thu, 31 Oct 2024 22:12:39 GMT
ETag: "6723ff03-a6"
Last-Modified: Thu, 31 Oct 2024 22:04:51 GMT
Server: nginx/1.27.2
We may try editing data.json
content somehow and see, if repeated conditional request provides the updated content:
http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723ff03-a6\""
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723ff03-a6"
User-Agent: HTTPie/3.2.2
HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 165
Content-Type: application/json
Date: Thu, 31 Oct 2024 22:14:49 GMT
ETag: "67240153-a5"
Last-Modified: Thu, 31 Oct 2024 22:14:43 GMT
Server: nginx/1.27.2
{
"published": "2024-10-31T21:14:08.123Z",
"records": [
{
"gama/v1": "megabytes of data"
},
{
"beta/v5": "megabytes of data"
}
]
}
All works as expected.
To summarize If-None-Match
conditional HTTP request:
- For request declaring in
If-None-Match
request header the same value as value ofETag
in ordinary HTTP GET response, it responds with HTTP status code304 Not Modified
and the response body is empty. - Many web servers assign the value of the
ETag
automatically - but for dynamically generated response the code can also dynamically generate value for theETag
header too. - Format for
ETag
often (but not necessarily) includes leading and trailing"
as part of header value. Be sure to include it properly in your request header value. - Values of
ETag
are more likely to detect modifications within multiple versions happening in one second. Anyway, if you want to support such scenario - do test it with your real web server. - If your web server or application generates value of
ETag
as real fingerprint (e.g. usingMD5
,SHA-1
orSHA-256
), the conditional request usingIf-None-Match
header would work well in scenario, where scheduled processes create the content repeatedly (thus changing modification time) but the content would often end up as being identical.
Combined If-Modifed-Since
and If-None-Match
You may wonder, what would happen, if you use both request headers.
The RFC7232 in sectin 3.3 states:
A recipient MUST ignore If-Modified-Since if the request contains an If-None-Match header field; the condition in If-None-Match is considered to be a more accurate replacement for the condition in If-Modified-Since, and the two are only combined for the sake of interoperating with older intermediaries that might not implement If-None-Match.
So according to this, combination of both request headers is identical to If-None-Match
alone.
Support/Use Conditional HTTP Requests - be cool
Nowadays, PULL exchange pattern is very popular due to simplicity of implementation on both publisher as well as consumer side.
The difference in numbers
Typicall DATEX II SituationPublication snapshot might have size of 1.5 MB (in gzipped form).
If a consumer wants to get the updates as soon as possible, frequency of one request in 5 seconds (thus 12 request a minute) is common.
unconditional | conditional | |
---|---|---|
requests/min | 12 | 12 |
empty rq/min | 0 | 11 |
download/min | 18 MB | 1.5 MB |
download/day | 25.92 GB | 2.16 GB |
download/% | 100% | 8.3% |
download/ratio | 12 | 1 |
Apart from saving bandwidth - there is even more important aspect of processing power needed:
- consumer might attempt to process fetched content 12 times a minute or only once.
- publisher dynamically generating response for each request (not recommended practice but seen in some cases) will quickly run out of resources.
If this is combined with allowing fetching content in not compressed (gzipped) form, the download size gets typically 10 times higher:
unconditional | conditional | |
---|---|---|
requests/min | 12 | 12 |
empty rq/min | 0 | 11 |
download/min | 180 MB | 15. MB |
download/day | 259.2 GB | 21.6 GB |
These numbers shall be multiplied by number of consumers to see real demand on connectivity.
Implementing conditional requests
First pre-requisite to implementing conditional requests is to understand it. This blog entry tries to help you in that.
The implementation at publisher is mostly relatively simple. As was shown in examples above, even the default NGINX configuraiton serves the content conditionally out of the box. Most web servers can do the same with relatively simple configuraiton.
The implementation at consumer is also relatively simple but the challenge is to find motivation for that. Once the consumer understands advantages of conditional requests (e.g. 12 less resources needed for processing), the motivation is found.
Also publishers might force consumers to use conditional requests, e.g. by introducing rate limiting for unconditional requests (or even better counting amount of data downloaded and setting limits to that).