Conditionally Friendly

Why being friendly to conditional HTTP request is cool

protocol
Author

Jan Vlčinský

Published

October 31, 2024

Be friendly to HTTP Conditional Requests.

A conditional HTTP request allows you to fetch new data using HTTP GET without unnecessarily downloading data you already have.

This article focuses on data exchange (sometimes called PULL), where a consumer initiates an HTTP GET request to a publisher’s URL and either receives new data or gets a response indicating that there is nothing new since the last request.

Let’s start with analogy with collecting the latest newspaper at a newsstand.

Speedy Honzales News

Imagine that your local newsstand sells newspapers called “Speedy Honzales News” (abbreviated as Speedy), which provide the freshest information you care about. To stay truly fresh, a new issue can be released at any time when the publisher decides there is something worth publishing.

Let’s think about how to arrange it so that you have these fresh news at home as soon as possible. There are several scenarios:

Scenario “Paper Collector”

As soon as you finish reading the current issue at home, you go to the newsstand, say “one copy of Speedy,” get a copy, take it home, sit in your chair, put on your glasses, and start reading.

You often share the news with your wife, but you sometimes find that you have already read that issue, maybe even several times.

But you always get up again, go to the newsstand, and say your “one copy of Speedy.”

Scenario “Anything new since today at 08:35:15?”

Because you are tired of buying what you already have at home, you go to the newsstand and say: If there is a new issue of Speedy released since 08:35:15 this morning, give it to me.

The news vendor either nods and hands you the new issue or says “nothing yet,” in any case, you head back home. And at home, the number of issues you already have stops piling up.

Scenario “I already have the one with the striped cat”

Maybe you don’t remember times, but you have a visual memory and remember the picture drawn in the bottom right corner, which is different each time.

You go to the newsstand and say: One copy of Speedy, but I already have the one with the striped cat. The news vendor looks at the picture in the latest issue and either hands over an issue with a different picture or says “I only have the one with the striped cat” and you go home empty-handed.

Scenario “The clueless vendor”

Maybe you have an excellent memory for dates and times and also for pictures. But no matter what you say, the vendor always hands you a copy, regardless of your request. Either they are lazy, or they can’t distinguish the times or pictures themselves.

You start to feel like a “paper collector.”

HTTP GET Requests

It’s time to touch real HTTP.

If you want try yourself, here are instructions for setting up your own local web server.

Before using real HTTP requests, we need to prepare or find a webserver serving some content. Let’s start in the simplest way using Python with it’s standard http.server module.

Let’s create a data directory data-dir and put data.json which we are going to serve later on.

mkdir data-dir
cd data-dir
echo '{
  "published": "2024-10-31T18:06:02.123Z",
  "records": [
    {
      "alpha/v1": "megabytes of data"
    },
    {
      "beta/v1": "megabytes of data"
    }
  ]
}' > data.json

Being in the data-dir directory, start simple Python HTTP server on port 8080

python3 -m http.server 8080

Throughout this article we use http CLI which provides nice user experience (compared to curl).

The HTTPie project provides Desktop and Terminal version. We use the terminal one.

The installation is supported by many package manages (pip, brew, apt, choco, port, snap, yum, …)

The CLI signature as as simple as: http [METHOD] URL [REQUEST_ITEM ...].

Default GET method can be omited.

To specify header for the request, it must follow the URL in form of http http://localhost:8080 "header-name: header-value"

To print request and response headers: http --pring Hh http://localhost:8080

For more, see HTTPie CLI Usage

Simple HTTP GET (“paper-collector”)

Now use http command to get the content.

http http://localhost:8080/data.json
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:18:36 GMT
Last-Modified: Thu, 31 Oct 2024 19:15:47 GMT
Server: SimpleHTTP/0.6 Python/3.13.0

{
    "published": "2024-10-31T18:06:02.123Z",
    "records": [
        {
            "alpha/v1": "megabytes of data"
        },
        {
            "beta/v1": "megabytes of data"
        }
    ]
}

Repeat the HTTP GET request as before:

http http://localhost:8080/data.json
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:21:31 GMT
Last-Modified: Thu, 31 Oct 2024 19:15:47 GMT
Server: SimpleHTTP/0.6 Python/3.13.0

{
    "published": "2024-10-31T18:06:02.123Z",
    "records": [
        {
            "alpha/v1": "megabytes of data"
        },
        {
            "beta/v1": "megabytes of data"
        }
    ]
}
Important

Carefully examine the headers and find that:

  • almost all the headers are identical
  • only the Date: header changes and shall reflect time on the web server at the time of our request.
  • the Last-Modified: header stays unchanged and uses a bit old-fashioned format Thu, 31 Oct 2024 19:15:47 GMT

Check modification of the file in our file system (forcing English language and UTC time-zone):

TZ=UTC LANG=C ls -l data.json
-rw-r--r--@ 1 javl  staff  166 Oct 31 19:15 data.json

You shall find, the file modification time and the Last-Modified are identical.

Many webservers (incl. the primitive one we use here) derive the Last-Modified time from file modificaiton time.

Let’s try to change it:

touch data.json

check the modification time:

TZ=UTC LANG=C ls -l data.json
-rw-r--r--@ 1 javl  staff  166 Oct 31 19:30 data.json

and finally try again the HTTP GET

http http://localhost:8080/data.json
HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:32:07 GMT
Last-Modified: Thu, 31 Oct 2024 19:30:35 GMT
Server: SimpleHTTP/0.6 Python/3.13.0

{
    "published": "2024-10-31T18:06:02.123Z",
    "records": [
        {
            "alpha/v1": "megabytes of data"
        },
        {
            "beta/v1": "megabytes of data"
        }
    ]
}

Finally we will use options --print HhB to instruct http to print not only response headers, but also request ones.

http --print Hhb http://localhost:8080/data.json
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2

HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:36:10 GMT
Last-Modified: Thu, 31 Oct 2024 19:30:35 GMT
Server: SimpleHTTP/0.6 Python/3.13.0

{
    "published": "2024-10-31T18:06:02.123Z",
    "records": [
        {
            "alpha/v1": "megabytes of data"
        },
        {
            "beta/v1": "megabytes of data"
        }
    ]
}

We see there default http headers:

Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2

If-Modified-Since (“Anything new since …?”)

Keep the server running, copy literal value of the Last-Modified header and use it as value for request header If-Modified-Since

http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT"
GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
User-Agent: HTTPie/3.2.2
If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT

HTTP/1.0 304 Not Modified
Date: Thu, 31 Oct 2024 19:38:47 GMT
Server: SimpleHTTP/0.6 Python/3.13.0
Important

We can observe following changes:

  • there is new If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT header
  • HTTP status code changed from 200 OK to 304 Not Modified
  • body of the response is empty

Congratulations: we managed to issue conditional HTTP request using If-Modified-Since header bearing value of Last-Modified from previous response and as the content did not change so far, we got “emtpy” response with HTTP status code 304 Not Modified.

You may repeat the same request few times and until the file stays unchanged, it shall provide the same response (only the Date header will change).

Now let’s try to modify modification time by touching it:

touch data.json

check the modification time:

TZ=UTC LANG=C ls -l data.json
-rw-r--r--@ 1 javl  staff  166 Oct 31 19:44 data.json

and repeat the previous HTTP request:

http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT"

GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-Modified-Since: Thu, 31 Oct 2024 19:36:10 GMT
User-Agent: HTTPie/3.2.2

HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:46:47 GMT
Last-Modified: Thu, 31 Oct 2024 19:44:17 GMT
Server: SimpleHTTP/0.6 Python/3.13.0

{
    "published": "2024-10-31T18:06:02.123Z",
    "records": [
        {
            "alpha/v1": "megabytes of data"
        },
        {
            "beta/v1": "megabytes of data"
        }
    ]
}

Notice, that regardless of the response body being still the same, the value of Last-Modified has changed and this was the reason the conditional HTTP GET response returned the body and HTTP status code 200 OK.

Of-course, typical update of our data changes the content served and at the same time changes the file modification time.

Let’s try new If-Modified-Since request with the latest value of Last-Modified header:

http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT"

GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT
User-Agent: HTTPie/3.2.2

HTTP/1.0 304 Not Modified
Date: Thu, 31 Oct 2024 19:51:20 GMT
Server: SimpleHTTP/0.6 Python/3.13.0

All as expected, no changes - no response body.

Now use your editor to change content of the file served.

and repeat the previous conditional request:

http --print Hhb http://localhost:8080/data.json "If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT"

GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-Modified-Since: Thu, 31 Oct 2024 19:44:17 GMT
User-Agent: HTTPie/3.2.2

HTTP/1.0 200 OK
Content-Length: 166
Content-type: application/json
Date: Thu, 31 Oct 2024 19:53:12 GMT
Last-Modified: Thu, 31 Oct 2024 19:52:35 GMT
Server: SimpleHTTP/0.6 Python/3.13.0

{
    "published": "2024-10-31T18:52:05.123Z",
    "records": [
        {
            "alpha/v2": "megabytes of data"
        },
        {
            "beta/v1": "megabytes of data"
        }
    ]
}
Important

To summarize If-Modified-Since conditional HTTP request:

  • For request declaring in If-Modified-Since header the same value as value of Last-Modified in ordinary HTTP GET response, it responds with HTTP status code 304 Not Modified and the response body is empty.
  • Many web servers derive value of Last-Modified from file modificaiton time - but for dynamically generated response the code can also dynamically generate the Last-Modified header too.
  • Format for If-Modified-Since and Last-Modified headers is a bit old-fashioned - but this is how it goes. Using ISO 8601 formatted datime would not work well.
  • Values of Last-Modified use precision of whole seconds - for content updated more then once a second the If-Modified-Since conditional request would miss some changes.
  • If one would change the served content but preserved the Last-Modified time, the conditional request would not notice the change.

If-None-Match (“I already have the one with the striped cat”)

Previously set up web server (using python -m http.server) does not support by default ETag headers.

If you want try yourself, here are instructions for setting up your own local NGINX web server using docker and docker-compose.yml.

These instructions assume, you have docker installed on your machine.

Let’s create a data directory data-dir and put data.json which we are going to serve later on. Note, that we stay in the project root this time without changing the directory into data-dir

mkdir data-dir
echo '{
  "published": "2024-10-31T18:06:02.123Z",
  "records": [
    {
      "alpha/v1": "megabytes of data"
    },
    {
      "beta/v1": "megabytes of data"
    }
  ]
}' > data-dir/data.json

Create docker-compose.yml:


echo '
services:
  web:
    image: nginx:latest
    ports:
      - "8080:80"
    volumes:
      - ./data-dir:/usr/share/nginx/html:ro
    restart: always
' > docker-compose.yml

Finally start the web server using docker compose command:

docker compose up
[+] Running 1/0
  Container conditional-web-1  Created                                                                                                                                                                                                       0.0s
Attaching to web-1
web-1  | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
web-1  | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
web-1  | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
web-1  | 10-listen-on-ipv6-by-default.sh: info: IPv6 listen already enabled
web-1  | /docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
web-1  | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
web-1  | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
web-1  | /docker-entrypoint.sh: Configuration complete; ready for start up
web-1  | 2024/10/31 21:38:40 [notice] 1#1: using the "epoll" event method
web-1  | 2024/10/31 21:38:40 [notice] 1#1: nginx/1.27.2
web-1  | 2024/10/31 21:38:40 [notice] 1#1: built by gcc 12.2.0 (Debian 12.2.0-14)
web-1  | 2024/10/31 21:38:40 [notice] 1#1: OS: Linux 6.6.32-linuxkit
web-1  | 2024/10/31 21:38:40 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker processes
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker process 22
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker process 23
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker process 24
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker process 25
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker process 26
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker process 27
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker process 28
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker process 29
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker process 30
web-1  | 2024/10/31 21:38:40 [notice] 1#1: start worker process 31

Note, that the first time it will download the nginx docker image what can take a few seconds.

The command above will start the NGINX server (serving content present in data-dir folder) and will run until stopped by Ctrl-C. Until then, it will report request sent to it.

Important

The If-None-Match conditional HTTP request builds on unique signature of the content provided by means of ETag response header.

The ETag value is some sort of content fingerprint (NGINX using last-modified time + content lenght, Apache uses the same + inode, but values such as serial number of the content generated, SHA256 etc may be used too).

Let’s use plain HTTP GET request to make sure, ETag header is present.

http http://localhost:8080/data.json

HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 166
Content-Type: application/json
Date: Thu, 31 Oct 2024 21:44:20 GMT
ETag: "6723e003-a6"
Last-Modified: Thu, 31 Oct 2024 19:52:35 GMT
Server: nginx/1.27.2

{
    "published": "2024-10-31T18:52:05.123Z",
    "records": [
        {
            "alpha/v2": "megabytes of data"
        },
        {
            "beta/v1": "megabytes of data"
        }
    ]
}

Comparing the headers to what we have seen before with python3 -m simple.http 8080 we see:

  • number of headers is a bit higher: Accept-Ranges which we can ignore, and ETag which is essential for us
  • Last-Modified is present and shall work the same way
  • ETag header is present
  • The value of ETag here has starting and ending quote - these are part of the string value - what makes often some confusion. We will have to carefully escape them on CLI calls.

Let’s try to use If-None-Match request header to form a conditional HTTP request. The header shall use value from the ETag response header.

Notice, that we are escaping the " characters to keep them in.

http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723e003-a6\""

GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723e003-a6"
User-Agent: HTTPie/3.2.2

HTTP/1.1 304 Not Modified
Connection: keep-alive
Date: Thu, 31 Oct 2024 21:53:57 GMT
ETag: "6723e003-a6"
Last-Modified: Thu, 31 Oct 2024 19:52:35 GMT
Server: nginx/1.27.2
Important

Carefully examine the headers and find that:

  • almost all the headers are identical as with plain HTTP GET request
  • the ETag header stays the same and is identical with the value of If-None-Match request header.
  • the HTTP status code is 304 Not Modified
  • there is empty response body
  • the Date and Last-Modified: behave the same way as in the If-Modified-Since case described earlier.

Now try to touch the data-dir/data.json file to change file modification time:

touch data-dir/data.json

and try the request:

http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723e003-a6\""

GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723e003-a6"
User-Agent: HTTPie/3.2.2

HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 166
Content-Type: application/json
Date: Thu, 31 Oct 2024 22:04:54 GMT
ETag: "6723ff03-a6"
Last-Modified: Thu, 31 Oct 2024 22:04:51 GMT
Server: nginx/1.27.2

{
    "published": "2024-10-31T18:52:05.123Z",
    "records": [
        {
            "alpha/v2": "megabytes of data"
        },
        {
            "beta/v1": "megabytes of data"
        }
    ]
}

I was surprised to get the response body returned with HTTP status code 200 OK.

Carefully examining values of the request header If-None-Match and response header ETag shows some similarity, but the values differ.

It turns out, that NGINX is using an alghorithm for ETag which takes into account content length, but also file modification time.

Note, that this behaviour does not harm usage of conditional HTTP requests with If-None-Match.

Updated conditional request (with actual ETag value) shall work as expected:

http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723ff03-a6\""

GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723ff03-a6"
User-Agent: HTTPie/3.2.2

HTTP/1.1 304 Not Modified
Connection: keep-alive
Date: Thu, 31 Oct 2024 22:12:39 GMT
ETag: "6723ff03-a6"
Last-Modified: Thu, 31 Oct 2024 22:04:51 GMT
Server: nginx/1.27.2

We may try editing data.json content somehow and see, if repeated conditional request provides the updated content:

http --print Hhb http://localhost:8080/data.json "If-None-Match: \"6723ff03-a6\""

GET /data.json HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:8080
If-None-Match: "6723ff03-a6"
User-Agent: HTTPie/3.2.2

HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: keep-alive
Content-Length: 165
Content-Type: application/json
Date: Thu, 31 Oct 2024 22:14:49 GMT
ETag: "67240153-a5"
Last-Modified: Thu, 31 Oct 2024 22:14:43 GMT
Server: nginx/1.27.2

{
    "published": "2024-10-31T21:14:08.123Z",
    "records": [
        {
            "gama/v1": "megabytes of data"
        },
        {
            "beta/v5": "megabytes of data"
        }
    ]
}

All works as expected.

Important

To summarize If-None-Match conditional HTTP request:

  • For request declaring in If-None-Match request header the same value as value of ETag in ordinary HTTP GET response, it responds with HTTP status code 304 Not Modified and the response body is empty.
  • Many web servers assign the value of the ETag automatically - but for dynamically generated response the code can also dynamically generate value for the ETag header too.
  • Format for ETag often (but not necessarily) includes leading and trailing " as part of header value. Be sure to include it properly in your request header value.
  • Values of ETag are more likely to detect modifications within multiple versions happening in one second. Anyway, if you want to support such scenario - do test it with your real web server.
  • If your web server or application generates value of ETag as real fingerprint (e.g. using MD5, SHA-1 or SHA-256), the conditional request using If-None-Match header would work well in scenario, where scheduled processes create the content repeatedly (thus changing modification time) but the content would often end up as being identical.

Combined If-Modifed-Since and If-None-Match

You may wonder, what would happen, if you use both request headers.

The RFC7232 in sectin 3.3 states:

A recipient MUST ignore If-Modified-Since if the request contains an If-None-Match header field; the condition in If-None-Match is considered to be a more accurate replacement for the condition in If-Modified-Since, and the two are only combined for the sake of interoperating with older intermediaries that might not implement If-None-Match.

So according to this, combination of both request headers is identical to If-None-Match alone.

Support/Use Conditional HTTP Requests - be cool

Nowadays, PULL exchange pattern is very popular due to simplicity of implementation on both publisher as well as consumer side.

The difference in numbers

Typicall DATEX II SituationPublication snapshot might have size of 1.5 MB (in gzipped form).

If a consumer wants to get the updates as soon as possible, frequency of one request in 5 seconds (thus 12 request a minute) is common.

unconditional conditional
requests/min 12 12
empty rq/min 0 11
download/min 18 MB 1.5 MB
download/day 25.92 GB 2.16 GB
download/% 100% 8.3%
download/ratio 12 1

Apart from saving bandwidth - there is even more important aspect of processing power needed:

  • consumer might attempt to process fetched content 12 times a minute or only once.
  • publisher dynamically generating response for each request (not recommended practice but seen in some cases) will quickly run out of resources.

If this is combined with allowing fetching content in not compressed (gzipped) form, the download size gets typically 10 times higher:

unconditional conditional
requests/min 12 12
empty rq/min 0 11
download/min 180 MB 15. MB
download/day 259.2 GB 21.6 GB

These numbers shall be multiplied by number of consumers to see real demand on connectivity.

Implementing conditional requests

First pre-requisite to implementing conditional requests is to understand it. This blog entry tries to help you in that.

The implementation at publisher is mostly relatively simple. As was shown in examples above, even the default NGINX configuraiton serves the content conditionally out of the box. Most web servers can do the same with relatively simple configuraiton.

The implementation at consumer is also relatively simple but the challenge is to find motivation for that. Once the consumer understands advantages of conditional requests (e.g. 12 less resources needed for processing), the motivation is found.

Also publishers might force consumers to use conditional requests, e.g. by introducing rate limiting for unconditional requests (or even better counting amount of data downloaded and setting limits to that).