HTML5 introduces new methods for enabling a web site or web application to function without a network connection. When you’re working on a mobile connection and your signal drops, or you just have no connection to the internet for whatever reason, having some level of access is better than nothing. In this article, we’ll look at how the application cache can store resources to be used by the browser when it’s offline, granting your users partial access to your web site or application.
The application cache is controlled by a plain text file called a manifest, which contains a list of resources to be stored for use when there is no network connectivity. The list can also define the conditions for caching, such as which pages should never be cached and even what to show the user when he follows a link to an uncached page.
If the user goes offline but has visited the site while online, the cached resources will be loaded so the user can still view the site in a limited form. By carefully considering the contents of your manifest file, you can offer a suitable web experience to a disconnected user.
The manifest file
Let’s start with an example of a full manifest file. (Don’t worry, I’ll explain it all in detail!)
CACHE MANIFEST
# This is a comment
CACHE:
/css/screen.css
/css/offline.css
/js/screen.js
/img/logo.png
http://example.com/css/styles.css
FALLBACK:
/ /offline.html
NETWORK:
*
Each directive is placed on a new line, with comments prefixed by a hash (#). The first line, CACHE MANIFEST
, tells the browser that this is a manifest file. The uppercased lines with trailing colons are section headings.
There are three different sections in a manifest file:
- CACHE
- A list of explicit URLs to request and store
- FALLBACK
- What to do when an offline user attempts to access an uncached file
- NETWORK
- Which resources are available only while online
Each section serves a specific purpose that you must understand in order to successfully and effectively cache your resources.
CACHE
The CACHE
section is considered the default — i.e., if no section heading has been defined, the browser will assume this is the CACHE
section. Beneath this heading, you can list URIs to resources you want the browser to download and cache for offline use, including URIs hosted externally.
CACHE MANIFEST
/css/screen.css
/css/offline.css
/js/screen.js
/img/logo.png
http://example.com/css/widget.css
In this example, I’ve omitted the CACHE:
heading to take advantage of the default behaviour. I have provided the browser with four paths relative to the root of the domain plus one external resource. When the browser downloads the cache manifest file, it will read these five resources, fetch them over HTTP, and store them for later use.
Every single resource that you want to cache explicitly should be listed here, right down to the last image. The browser is not aware of a resource unless you provide the full path to it. This means you can’t use wildcards. If you list /images/*
as a resource, the browser will request that URI as if you typed it into your address bar.
But don’t run off and shove URIs for every single page on your site into your manifest! When a user visits a page that points to the manifest file, that page will also be cached. This means that if you want to allow users access to pages they’ve already viewed, just make those pages point to the manifest file and the browser will cache them appropriately.
Now let’s tell the browser what to do with uncached resources.
FALLBACK
The FALLBACK
section tells the browser what to serve when the user tries to access an uncached resource while offline. Because of this, it looks a bit different than CACHE
and NETWORK
. It contains two values per line, separated by a space. The first value is the request URI to match, and the second is the resource sent upon matching. It caches the resource on the right for offline use, so this should be an explicit path.
Lost? Take a look at this example:
CACHE MANIFEST
FALLBACK:
/status.html /offline.html
FALLBACK
sectionOn the line below FALLBACK:
, we have the URI “/status.html” followed by a second URI, “/offline.html”. We’re telling the browser that when an offline user requests a URI matching “/status.html”, it should instead serve the cached file “offline.html”.
However, the FALLBACK
section can be far more powerful:
CACHE MANIFEST
FALLBACK:
/ /offline.html
In this example, I’ve dropped “status.html” and simply provided “/” as the request URI to match. Now when an offline user requests a resource that matches “/”, he will be served “offline.html” in its place. So if the user clicked on a link for “/status.html”, “/about.html”, or even “/my/nested/resource.html”, the browser would match the “/” at the start and serve up “offline.html”. Since I’ve used the root path, every uncached resource under this domain will point to “offline.html”.
Errata 23 June 2011: this article has been corrected as you can’t use a wildcard with the FALLBACK
or NETWORK
namespaces – though you can use the asterisk symbol under NETWORK
as it’s a special flag to indicate all urls should be whitelisted.
The CACHE
section, both the FALLBACK
and NETWORK
namespaces support a prefix rule that aid their URI matching. In that any requests to the /avatar
directory, whilst offline, if the asset is unavailable the browser can serve up an alternative.
CACHE MANIFEST
FALLBACK:
/images/avatars/ /offline_avatar.png
The first line tells the browser to serve “/offline_avatar.png” in place of user-uploaded avatars.
Remember when I said that any document referencing the manifest will also be cached? Well, you can use this to your advantage! You can cache each page the user visits while online so that they will have access to that page while offline. Then anything they didn’t view will be caught by the FALLBACK
section. This keeps you from explicitly stating you want all your pages cached, and, more importantly, avoids the huge performance penalty of serving all the resources you want cached every time someone first visits your site.
NETWORK
Finally, we have the NETWORK
section, used to tell the browser explicitly which resources are only available while online. By default, this uses the asterisk *
symbol, meaning all resources that are not cached will require a connection. Alternatively we can whitelist specific url prefixes, like all the avatars if we wish.
CACHE MANIFEST
NETWORK:
*
NETWORK
sectionYou can explicitly define resources not to cache by providing a list of URIs — essentially a whitelist of online-only assets.
CACHE MANIFEST
NETWORK:
register.php
login.php
Serving the manifest
You can reference a manifest file on a web page by adding the manifest
attribute to your opening <html>
tag. The browser will only cache pages that include this attribute (in addition to those specified in the manifest itself, though in that instance, the user would have to visit a page including the manifest in order for the browser to be aware of it).
<!DOCTYPE html>
<html lang="en" manifest="/offline.appcache">
// your html document
</html>
The linked file should also be served with a MIME-type of text/cache-manifest
. If you’re using Apache as your web server, add this to your .htaccess
file:
AddType text/cache-manifest .appcache
And there you have it! Supporting browsers will retrieve the manifest file and cache each item on the list for offline use. Won’t your parents be proud?
Triggering a cache refresh
Once a cache has been successfully downloaded, the browser will retain those assets until either the user clears the cache or you trigger an update. Triggering an update with your manifest file requires that the contents of that file change, not just the assets themselves.
Updating the assets on your server will not trigger a cache update. You must modify the manifest file.
If you’re adding or removing resources completely, you’ll have to edit your manifest file anyway. But what if you’re just amending an already cached stylesheet?
This is where comments come in handy. Just throw in a simple version number comment that you change when you want to trigger an update:
CACHE MANIFEST
# Version 9
CACHE:
/css/screen.css
The next time you want to trigger a cache refresh, just increment the version number. When the user next visits the online version of a page including this manifest, it will re-download the manifest file, notice the change, download the listed assets, and purge the existing cache.
Browser bug: Firefox caches the manifest file itself and will not update it even if the manifest has changed on the server. With some server config wizardry, you can tell browsers that the cache of the manifest file is instantly invalidated and should be requested from the server every time it’s referenced. Add this to your .htaccess
to put Firefox in its place:
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType text/cache-manifest "access plus 0 seconds"
</IfModule>
Conclusion
The application cache is a powerful beast, and to tame it you need to be clear on what’s involved. Give thought to your CACHE
, FALLBACK
, and NETWORK
sections to provide a suitable offline experience to your users.
In a future article, we’ll show you how to use the applicationCache
JavaScript object to manipulate the cache. Until then, this should be enough to get you started on the path to offline web content.
You can see a live demo using the application cache over on Doctor Remy’s HTML5 Demos. Happy caching!
Go offline with application cache originally appeared on HTML5 Doctor on January 25, 2011.