Skip to main content

Service Workers at Scale, Part I: CDNs and Multiple Origins

By Erik Jung

Published on March 30th, 2016

Topics

We recently got the opportunity to develop a service worker for use on Smashing Magazine, which we’ll write about in more detail soon. This is the first of a multi-part article that will examine some of the solutions we arrived at and lessons we learned.

One of the challenges we’ve encountered while developing this worker is strategic caching for external resources. Most of the service worker examples we’ve seen address handling requests for local resources on the same domain. But pages served by complex sites often need to fetch (and cache) assets from external domains as well. Things can get complicated when these various domains need special handling, so we need a manageable way to structure our service worker to evaluate requests for both local and external resources.

The host of our service worker happens to be a WordPress site with assets requested from both the local server as well as CDNs. We need to handle fetch events differently depending on which domain or subdirectory they relate to. Some CDN requests need to be routed to cache functions, and some local requests (such as those for pages within the admin area) need to be ignored altogether. To fulfill this behavior, we need some way to designate rules for different URL origins.

One approach uses regular expressions. We need to match the URLs of fetch event requests against a set of base URLs, and using RegExp for this makes sense. In practice, this method works well initially, but maintenance becomes more of a concern as the patterns get more complex.

Take for example an entry to match “content pages”:

const CACHE_PATTERNS = {
  // ...
  contentPage: new RegExp(
    `^${self.location.origin}/((?!wp-admin).*)$`
  ),
  // ...
};Code language: JavaScript (javascript)

This will match any local URLs except for ones with wp-admin following the first slash. It’s not a complicated pattern as-is, but what about when we need to add another subdirectory exception? Is there another approach that will cater more to maintainability and comprehension?

Let’s substitute the great power and responsibility of regular expressions for a more explicit way to classify URLs. Using a Map structure to represent the base URLs to handle fetch events for, we can “flag” some items to indicate them as subdirectories to ignore:

const CACHE_FLAGS = {
  ignore: -1
};

const URL_MAP = new Map([
  ['/'],
  ['/wp-admin/', CACHE_FLAGS.ignore],
  ['/search-results/', CACHE_FLAGS.ignore],
  ['http://1.gravatar.com/'],
  ['https://media-mediatemple.netdata-ssl.com/wp-content/'],
]);Code language: JavaScript (javascript)

It’s more verbose, but it provides a clear interface for maintaining the list of base URLs we want to act on.

From this core list, we can derive more specific ones. Each of the derived lists consists of URL instances:

const URLS_TO_IGNORE = [];
const URLS_TO_HANDLE = [];

URL_MAP.forEach(function (flag, baseUrl) {
  const url = new URL(baseUrl, self.location);
  if (flag === CACHE_FLAGS.ignore) {
    URLS_TO_IGNORE.push(url);
  } else {
    URLS_TO_HANDLE.push(url);
  }
});Code language: JavaScript (javascript)

With our subjects of interest stored as URL instances, we can then use properties like origin and pathname in our logic:


function isIgnoredUrl (url) {
  // URL minus the query string and hash
  const urlBase = `${url.origin + url.pathname}/`;
  const isMatch = ({ origin, pathname }) =>
    urlBase.startsWith(origin + pathname);

  return URLS_TO_IGNORE.some(isMatch) // flagged "ignore"
    || !URLS_TO_HANDLE.some(isMatch); // altogether unlisted
}Code language: JavaScript (javascript)

With our URL classifications and helper functions in place, the fetch event handler can use them to decide if it needs to intercept a request:


self.addEventListener('fetch', function (event) {
  const request = event.request;
  const url = new URL(request.url);
  const isGet = request.method === 'GET';

  // Requests we don't care to handle
  if (!isGet || isIgnoredUrl(url)) {
    event.respondWith(fetch(request));
    return;
  }

  // Requests we do care to handle
  event.respondWith(
    // Additional cache and/or fetch strategy
  );
});Code language: JavaScript (javascript)

In part two, we’ll take a look at another unique challenge this project presented: serving URL-aware offline fallbacks.