středa 8. srpna 2018

Cache-friendly secure connection for websites

While commenting Eric Meyer's article about issues that HTTPS bring to Africans, I found that this should be probably also posted as an article. I am discussing how to allow better caching while keeping a reasonable level of security brought by HTTPS.

We still need secure connection, even for public static sites

I still believe we should have a secure communication everywhere; I am not 100% sure if this should be the current HTTPS.

We need secure connection even for public static sites. The reason #1 is not encryption, it is authentication. We do not want infected routers / people with Wi-Fi Pineapple / malicious to ISPs / etc. to modify webpages we see. Without some kind of secure connection, they could for example inject some cryptominers or advertisments or malware. They could also modify the content of static pages to instruct people to do something dangerous, e.g., modify recommended amount of some chemicals.

Do we always want TLS?

The way we secure our communication does not have to be today's HTTPS, though. Encryption is needed just sometimes. On public static sites, it can kind of obscure what are you looking at (e.g., attacker sees you are looking at Wikipedia, but it is not clear what page), but traffic volume analysis can often distinguish between specific pages.

How to make it better?

Let's look at some options to make it better. There will be some tradeoffs to privacy, but we will not let attackers to affect traffic in an arbitrary way, as plain HTTP would allow. Thus, we would not make the user more prone to downgrade attacks than with today's HTTPS. Our main point is allowing the caches doing their jobs, maybe a better one than with the current state of the art HTTP caches can do.

Mixed content secured by SRI

First, we could sometimes achieve a reasonable level of security even with a plain HTTP. We could have loaded some images, stylesheets and even scripts over a plain HTTP, provided they are protected by subresource integrity (SRI). I have wondered why browsers consider even SRI-protected resources as a mixed content. They are protected against modification and they do not necessarily contain anything sensitive. I don't much need to hide the fact I am downloading jQuery 1.8.1… (Today, such change in browsers can be a bit more complex if it has to be compatible with older browsers with a more strict mixed content policy. It would ideally bring something like allowplain atributte, allowing usage of plain HTTP instead of HTTPS.)

Shared cache based on hashes

With SRI, we could go a bit further. Where explicitly approved by some extra header, the browser could just match the hash for caching purposes, even if it has not ever downloaded the specific URL. As a result, we would not needlessly download dozens of exactly same copies of jQuery or Bootstrap. We could download it just once and then use the cache. While this could serve as some minor side channel that reveals information what files are already in your cache, explicit approval through some header can make it a non-issue.

Serving signed responses from caching proxy

We could also have some caches of some signed (but probably unencrypted) data. This however goes with some privacy tradeoff and new protocol to implement, but it does not give up data authentication. A cache server could return some data with expiration time and signature, even without contacting the upstream server. This is quite more complex, but still technically feasible. We cannot use TLS at this point, because TLS serves for transport layer, which we would like to intercept. The handshake could however start as a standard TLS handshake and continue with a different protocol:

Client: ClientHello, I am trying to connect through TLS to host example.com, there are my capabilities (ciphersuites). I am able to use caching proxy instead of standard TLS.
Caching proxy: Hey, I have some content for this server cached. See my non-expired approval from the server, signed by the private key of certificate holder. I am allowed to serve you some of the requests. Plus there is the OCSP response, so you know the server's certificate is not revoked. You see, the private key holder indicates there is nothing sensitive in the URL, you can send it to me.
Client: OK, there is the full URL: htttps://example.com/contact
Caching proxy: OK, there are the data authenticated by the server.

If client or server does not support such a feature, either just because it is not implemented or because they don't want this for a reason, no other party can force the communication to go this way instead of standard TLS.

  • Website owner agreement is needed: If the proxy does not have a signed and non-expired approval, it cannot force the client to reveal the full URL.
  • If the browser chooses not to use this way (e.g., because of user's decision), it can insist on a standard TLS handshake.
  • Standard TLS handshake can ne required for some blacklisted URLs (e.g., /api/*), POST requests or if some specific cookie is present. Those exceptions could be described in the initial approval.

Cache-friendly version?

I am, however, generally against making special cache-friendly sites, similar to past “wap” or “mobile” versions. If they have a different URL, it gets tricky to handle links. When I click a link from elsewhere, it does not necessarily point to the version I want. Also, force website owner not to use HSTS, which is probably not what we want.

Challenges

  • UX issues: Maybe just some users will want such tradeoff, while some others will not. How to allow both of them making an informed decision?
  • None of those suggestions is enough reviewed by others. Furthermore, description of signed caches is quite vague to properly review, because I have prefered to be concise. While I have some security and crypto background, I don't think this should be implemented without any review.
  • This would require multiple parties to implement it. All the ideas require some change in browser and the website. The last one also requires important modification of the webserver and proxy. But incentives to implement this can be quite low for most people with fast Internet connection. On the other hand, the SRI enhancements are not so hard (i.e., they are much easier than extending HTTPS to some TLS alternative) and can be useful even in Europe / America on mobile connections, despite there is no proxy that can speed up loading.
  • Any change in browsers is likely irelevant for people with Windows XP or something similar. On the other hand, they could be welcome anyway if their usage don't break anything.