Analysing initial load bundle sizes using Lighthouse

Even with fast internet speeds, one key metric for improving initial page load times for websites or web apps is reducing the data transmitted over the wire. While you can optimize all kinds of assets, like styles and images, I want to focus on the size of the JavaScript bundles. Code size has an impact on download duration, but also on parsing and JavaScript execution. This can be measured as Total Blocking Time (TBT) or First Input Delay (FID), a metric suggested by Google to measure page performances as part of the Web Vitals.

Often you end up with more code in the bundles than your app needs on initial load. Some common cases are:

  • Parts of your app that aren’t used on initial load and should better loaded lazily.
  • Parts that you expect to get lazy loaded but are still part of the initial load because lazy loading is not implemented correctly.
  • Tree shaking is not performed, and unused modules are included.
  • Backend only modules are included accidentally, especially in code bases that both have client and server code.

From a code perspective, these issues are often not that easy to spot — so you need tools to debug which modules are downloaded. Luckily there are several tools available. Tools like Webpack Bundle Analyzer or the @next/bundle-analyzer package for Next.js require changes to the Webpack config inside the build process. This can sometimes be difficult, especially if you quickly want verify performance in your production deployment.

Recently Lighthouse — our favorite auditing tool for websites — got an update that introduced a way to display loaded bundles:

If you just want to check your production deployment, Lighthouse got you covered. Once you run an audit, it displays the loaded bundles in the performance section. Clicking View Treemap, opens a new tab with a viewer for bundles. Here you can see which modules are loaded on initial load and start your analysis. It also displays unused modules, or how much of each module is used, by blending in coverage data collected during the audit. Therefore, you can see candidates for lazy loading easily.

Lighthouse "View Treemap"
Use 'View Treemap' in Lighthouse to open the bundle explorer.

Best results can be achieved if you include source maps in the build. Lighthouse even has an audit for that. I noticed that some source maps types don’t work (like eval-cheap-module-source-map from Webpack), so make sure that you have full source maps.

Applying it to Backstage

Let’s apply this to a real-world example. Backstage has a lot of plugins, which aren’t used on initial load. Therefore, bundle size can become quite big. The folks at Backstage recommend using lazy loading for plugins. As mentioned above, lazy loading is the solution, but it’s easy to make mistakes that keep your code from getting lazy loaded.

I started by measuring the baseline: The Total Blocking Time on initial load of the example Backstage app is 930 ms, which has much room for improvement. The initial download size is 8.1 MiB, so let’s dive deeper into the analysis.

Treemap before optimization
Lighthouse Treemap before optimization. Interesting are the modules swagger-ui-react and asyncapi/react-component.

In the Lighthouse treemap you can see that the Swagger UI, but also the AsyncAPI and GraphQL are part of the initial bundles. With a bit of background knowledge, you know that these packages are used in the API Docs plugin, but you could also find them by doing a quick search over the codebase. If we can get rid of them, there seems to be a potential saving of ~25% code size.

The good news is that the API Docs plugin is already using lazy loading, but it still doesn’t seem to work. Due to the way custom API widgets are registered, the components are referenced in the API provided by the plugin. In Backstage, plugin APIs are loaded on initial load and therefore our components and all their dependencies are also included. So, the trick is to refactor the API widgets to load lazily by using React.lazy internally. The GraphQL widget already uses lazy loading, but is still referencing big packages and CSS directly, which bloats up the initial bundle size — here the same refactoring helped to exclude these modules, too.

This was quite easy. It becomes more complicated if you have packages that don’t get referenced directly in your codebase. For example, react-virtualized, a quick search doesn’t reveal any uses of the package, but it’s still included in the initial bundle. So, it’s probably an indirect dependency. yarn why react-virtualized shows which packages are including it, in this case it’s react-lazylog. Often it also helps to have a look at the source code of the packages, to understand how and in which situations the indirect dependencies are used. react-lazylog is used in multiple places in Backstage, some of them are already lazy-loading it, but the problem is that there are still some direct uses. Using lazy-loading everywhere solves this problem. Lazy-loading is no silver bullet, it just moves the problem to a later point in time and can cause the app to feel slow during use. You must experiment and decide where is provides a benefit and where not.

It’s time to retest our app with Lighthouse and compare the results with our baseline. The Total Blocking Time already reduced from 930 ms to 580 ms, nice! The modules that we tried to exclude are all gone, and the initial bundle size was decreased by 30% 🎉. Let’s create a pull request for it.

Treemap after optimization
Lighthouse Treemap after applying the optimizations, all target modules are gone from the initial load.

Looking at the treemap, there is still some potential left for a future pull request. Excluding everything related to syntax highlighting in code blocks could save us another 30–40%, which would be a big win.

The treemap feature in Lighthouse is pretty useful, but there is still room for improvements. In the future the view could be improved by showing the dependencies between the modules, to make it easier to analyze why a module is loaded — till now it requires either knowledge about the codebase or some research.

Doing this kind of optimization once might be not that useful. Making careless changes to the code can bring the problem back or new packages introduce new problems. How can we take care that we don’t run into regressions? Probably the best way is to include Lighthouse in you CI pipeline, running it on every commit. It even supports setting budgets to check against. That way you can get notified once your initial bundle size increases in a way you don’t expect. I made some good experience with the Lighthouse CI Action for GitHub Actions which includes this process in your CI — so you can preserve your improvements and be save from slow loading times.