Images, Fonts, and Third-Party Scripts: LCP and CLS Killers

Images, fonts, and third-party scripts are the three categories responsible for the most field LCP and CLS regressions. They interact: a 12KB GTM tag firing synchronously in <head> blocks the parser for 200ms, which means your hero image (already the LCP candidate) now loads 200ms later. LCP jumps from 1.8s to 4.2s. CrUX data reflects it three weeks later.

What this covers: fetchpriority and <link rel="preload"> for LCP images, WebP vs AVIF trade-offs, srcset/sizes resolution selection, font-display values and what each costs, font subsetting, and the facade pattern that eliminates third-party script cost entirely until user interaction.

Images and LCP: Why Your Hero Image Is Almost Always the Culprit

Largest Contentful Paint (LCP) measures when the largest visible element in the viewport finishes rendering. In most marketing sites, e-commerce pages, and landing pages, that element is an image: specifically the hero image above the fold.

The browser’s default behavior for images is loading="lazy", which means it defers fetching images until they’re near the viewport. For images below the fold, this is exactly right. For the LCP image, it is catastrophic.

<!-- Wrong: browser waits until layout is done to decide whether to fetch -->
<img src="/hero.jpg" alt="Hero" loading="lazy" />

<!-- Right: fetch immediately, highest network priority -->
<img
  src="/hero.webp"
  alt="Hero"
  loading="eager"
  fetchpriority="high"
  width="1200"
  height="630"
/>

fetchpriority="high" is the piece most developers miss. Even with loading="eager", the browser’s resource prioritization algorithm might assign the image a lower network priority if it’s discovered late in the HTML parse: for example, if it’s inside a JS-rendered component. fetchpriority="high" overrides this and tells the browser: this resource competes with critical CSS, treat it accordingly.

The most aggressive technique is a <link rel="preload"> in the document head. This fires before the HTML parser even reaches the <img> tag:

<head>
  <link
    rel="preload"
    as="image"
    href="/hero.webp"
    fetchpriority="high"
    imagesrcset="/hero-400.webp 400w, /hero-800.webp 800w, /hero-1200.webp 1200w"
    imagesizes="(max-width: 600px) 100vw, 1200px"
  />
</head>

Note the imagesrcset and imagesizes attributes on the <link>: these match your <img> element’s srcset and sizes so the browser can preload the correct resolution rather than a size it won’t use.

Image Formats: WebP vs AVIF: When to Use Each

Format	Compression vs JPEG	Browser Support	Encoding Speed	Best For
JPEG	Baseline	Universal	Fast	Legacy fallback
WebP	~30% smaller	97%+ (all modern)	Fast	General use, default choice
AVIF	~50% smaller	~90% (Chrome, Firefox, Safari 16+)	Slow	High-quality images where encoding time is acceptable
PNG	Lossless	Universal	Fast	Transparency, screenshots

WebP is the pragmatic default for 2026. Browser support is effectively universal, encoding is fast enough for CI pipelines, and the ~30% size reduction over JPEG is consistent across photo content. If you’re only serving one format, it should be WebP.

AVIF gets you another 20-25% on top of WebP for photographic content. The tradeoff is encoding time: an AVIF encode can be 10-20x slower than WebP for the same image. In practice this means pre-generating AVIF at build time or through a CDN image optimization service (Cloudinary, Imgix, Vercel Image Optimization all support it). Don’t attempt AVIF on-demand at request time on a small server.

Browser support for AVIF is around 90% and Safari added support in version 16. For the remaining ~10%, you always fall back with <picture>.

`srcset` and `sizes`: How the Browser Picks the Right Image

The srcset attribute provides a list of candidate images at different widths. The sizes attribute tells the browser how wide the image will be rendered at different viewport widths. The browser uses both to pick the optimal image for the current viewport and device pixel ratio.

<img
  src="/hero-800.webp"
  srcset="
    /hero-400.webp  400w,
    /hero-800.webp  800w,
    /hero-1200.webp 1200w,
    /hero-2400.webp 2400w
  "
  sizes="(max-width: 768px) 100vw, (max-width: 1200px) 50vw, 1200px"
  alt="Hero"
  width="1200"
  height="630"
  loading="eager"
  fetchpriority="high"
/>

On a 375px-wide iPhone with a 3x retina display, the browser sees: the image will render at 375px (because max-width: 768px → 100vw), and the DPR is 3, so it needs an image that’s at least 1125px wide. It picks /hero-1200.webp. Without srcset, it would fetch the full 2400px image: 4x more data than needed.

The src attribute is the fallback for browsers that don’t understand srcset, which at this point is essentially nothing. But always include it.

CLS From Images: Width, Height, and the `aspect-ratio` Trick

Cumulative Layout Shift (CLS) from images has one root cause: the browser doesn’t know the image dimensions before it loads, so it allocates zero height for it. When the image loads, the layout shifts down pushing content that was already visible.

The fix is declaring dimensions:

<!-- Without dimensions: layout shift guaranteed -->
<img src="/photo.webp" alt="Product photo" />

<!-- With explicit dimensions: browser reserves space -->
<img src="/photo.webp" alt="Product photo" width="800" height="600" />

The browser uses the width and height attributes to calculate the intrinsic aspect ratio and reserves the right amount of space before the image loads. You don’t need to make the image exactly 800x600: the attributes communicate the ratio, and CSS can resize the image freely.

For responsive images where you want CSS to control the rendered size:

img {
  width: 100%;
  height: auto;
  aspect-ratio: 4 / 3; /* explicit fallback if width/height attributes not present */
}

The aspect-ratio CSS property is the backup when you genuinely don’t know the image dimensions ahead of time, but width and height attributes are still the preferred mechanism because they’re understood by the browser before CSS parses.

The `<picture>` Element: Art Direction vs Format Selection

The <picture> element serves two distinct purposes that are easy to conflate.

Format selection serving different formats with a fallback:

<picture>
  <source srcset="/hero.avif" type="image/avif" />
  <source srcset="/hero.webp" type="image/webp" />
  <img src="/hero.jpg" alt="Hero" width="1200" height="630" />
</picture>

The browser picks the first <source> it supports. If it supports AVIF, it uses that. Otherwise WebP. Otherwise JPEG. The <img> tag is always required as the final fallback and carries the alt, width, height, and any loading attributes.

Art direction serving different crops for different viewports:

<picture>
  <source
    media="(max-width: 768px)"
    srcset="/hero-portrait-400.webp 400w, /hero-portrait-800.webp 800w"
    sizes="100vw"
  />
  <source
    srcset="/hero-landscape-800.webp 800w, /hero-landscape-1200.webp 1200w"
    sizes="(max-width: 1200px) 50vw, 1200px"
  />
  <img src="/hero-landscape-800.webp" alt="Hero" width="1200" height="630" />
</picture>

On mobile, you serve a portrait-cropped version where the subject fills the frame. On desktop, the wide landscape version. This is different from responsive sizing: it’s serving a different image, not a different resolution of the same image.

You can combine both: use media for art direction and type for format selection within the same <picture> element.

Font Loading and CLS: The FOUT/FOIT Trade-off

Web fonts are one of the most misunderstood sources of CLS and layout instability. There are two phenomena:

FOIT (Flash of Invisible Text) text is invisible until the font loads. The browser reserves the space but shows nothing. CLS score is zero, but users see invisible content.
FOUT (Flash of Unstyled Text) text renders immediately in a fallback font, then swaps to the web font when it loads. Users see content immediately, but the font swap can cause a layout shift if the metric sizes differ.

The font-display descriptor controls this behavior:

`font-display` value	Block period	Swap period	Best for
`auto`	Browser decides	Browser decides	Don’t use
`block`	~3s	Unlimited	Icon fonts that must not show fallback
`swap`	Minimal	Unlimited	Body text always shows text
`fallback`	~100ms	~3s	Balance of FOUT and FOIT
`optional`	~100ms	None	Non-essential decorative fonts

For body text, font-display: swap is the right call: users see text immediately, and the font swap happens quickly enough that most users don’t perceive it. For hero/headline fonts where the metrics difference between your web font and system font would cause a large CLS, font-display: fallback is better it gives the font a short window to load before swapping, and if it doesn’t arrive in time, it stays on the fallback for that page view.

font-display: optional is underused. It tells the browser: if the font is already cached, use it; otherwise, don’t bother for this page view. For non-critical decorative fonts, this is ideal: no CLS, no FOIT, and cached users get the full experience.

`preconnect` and `preload` for Fonts

If you’re loading fonts from Google Fonts or another CDN, preconnect eliminates the DNS + TCP + TLS handshake latency:

<head>
  <!-- For Google Fonts: preconnect to both origins -->
  <link rel="preconnect" href="https://fonts.googleapis.com" />
  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />

  <!-- The actual font stylesheet -->
  <link
    href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&display=swap"
    rel="stylesheet"
  />
</head>

The crossorigin attribute on the second preconnect is critical: font files are fetched with CORS headers, so the connection must also be established as a CORS connection. Missing this means the preconnect establishes a non-CORS connection that can’t be reused for the actual font fetch.

For self-hosted fonts, preload is more aggressive:

<link
  rel="preload"
  href="/fonts/inter-v13-latin-regular.woff2"
  as="font"
  type="font/woff2"
  crossorigin
/>

Preload the specific font file that renders above-fold text, usually the regular weight of your body font. Preloading every weight and style defeats the purpose and wastes bandwidth on weights that may not even render on the current page.

System Font Stacks: The Right Answer More Often Than You Think

Before reaching for a web font, ask whether a system font stack meets the design requirement. In 2026, -apple-system, Segoe UI, and Roboto are genuinely good fonts that ship with the OS:

body {
  font-family:
    -apple-system,
    BlinkMacSystemFont,
    "Segoe UI",
    Roboto,
    Oxygen,
    Ubuntu,
    sans-serif;
}

No font file downloads. No FOUT. No CLS. Zero font-related LCP impact. For internal tools, dashboards, documentation sites, and developer-facing products, system fonts are often indistinguishable from a web font in practice and eliminate an entire category of performance problems.

If your brand absolutely requires a specific typeface, then use web fonts. But default to system fonts and add custom fonts when there’s a genuine design need, not as a reflexive choice.

Subsetting Fonts: Removing Glyphs You Don’t Need

A full-weight Inter or Roboto font file can be 150-300KB. The Latin subset you actually render on an English-language site uses maybe 250 of those glyphs. Subsetting removes the unused glyphs and can reduce font size by 60-80%.

Using pyftsubset from the fonttools package:

pip install fonttools brotli

# Subset to Latin characters only
pyftsubset inter-regular.woff2 \
  --output-file=inter-regular-latin.woff2 \
  --flavor=woff2 \
  --unicodes="U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+02C6,U+02DA,U+02DC,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD"

Google Fonts does this for you automatically when you include &subset=latin in the URL: it’s one of the real performance benefits of using their CDN. If you’re self-hosting, run subsetting as part of your build pipeline.

For sites with only ASCII content, you can subset even more aggressively, often getting font files down to 15-25KB.

Third-Party Scripts: How They Block the Main Thread

Every third-party script you add to a page comes with a real performance cost. The third-party-web dataset (maintained by Patrick Hulce) aggregates real-world data on how long third-party scripts block the main thread. Some numbers from the dataset:

Third Party	Median Blocking Time
Google Tag Manager	~70ms
Intercom chat widget	~130ms
Hotjar	~90ms
Facebook Pixel	~50ms
Drift chat	~140ms

These are medians: individual sites see much worse. And these costs are additive. If you add GTM + Intercom + Hotjar, you’re looking at 300ms of main thread blocking before the user can interact with anything.

The problem is compounded by Tag Manager: GTM itself blocks for 70ms, then it fires additional tags that each have their own cost. A poorly governed GTM container can fire 15-20 tags synchronously, each making additional network requests, each running arbitrary JavaScript.

Deferring Third-Party Scripts

The simplest fix for most third-party scripts is using async or defer:

<!-- Blocks HTML parsing: never do this for third parties -->
<script src="https://analytics.example.com/script.js"></script>

<!-- Downloads in parallel, executes after parse: use this -->
<script async src="https://analytics.example.com/script.js"></script>

<!-- Downloads in parallel, executes after DOM is ready: use this for non-critical -->
<script defer src="https://analytics.example.com/script.js"></script>

For GTM specifically, the container snippet they provide uses async by default, but the tags fired inside GTM often don’t. The real lever is auditing your GTM container and ensuring tags fire on events (scroll, interaction, DOMContentLoaded) rather than firing immediately on page load.

Moving analytics to server-side is the most impactful option for privacy-focused teams and those who need accurate data without affecting client performance. Server-side GTM routes events through your own server rather than sending data client-side. Setup is more complex but eliminates the client-side script entirely for analytics.

The Facade Pattern for Heavy Embeds

Diagram of the facade pattern for third-party embeds: lightweight placeholder until user interaction loads the full widget.

Facades are lightweight placeholders that replace heavy third-party embeds until user interaction. The most common example is YouTube embeds:

// Heavy: embeds the full YouTube player iframe on page load (~500KB)
<iframe
  src="https://www.youtube.com/embed/dQw4w9WgXcQ"
  width="560"
  height="315"
/>

// Facade: shows a thumbnail and play button, loads the real embed on click
function YouTubeFacade({ videoId, title }) {
  const [loaded, setLoaded] = React.useState(false);

  if (loaded) {
    return (
      <iframe
        src={`https://www.youtube.com/embed/${videoId}?autoplay=1`}
        width="560"
        height="315"
        title={title}
        allow="autoplay"
      />
    );
  }

  return (
    <div
      style={{ position: "relative", cursor: "pointer", aspectRatio: "16/9" }}
      onClick={() => setLoaded(true)}
    >
      <img
        src={`https://i.ytimg.com/vi/${videoId}/hqdefault.jpg`}
        alt={title}
        width="560"
        height="315"
        loading="lazy"
      />
      <button
        style={{
          position: "absolute",
          top: "50%",
          left: "50%",
          transform: "translate(-50%, -50%)",
        }}
        aria-label={`Play ${title}`}
      >
        ▶
      </button>
    </div>
  );
}

The lite-youtube-embed web component is a production-ready version of this pattern: a single small script that renders a YouTube thumbnail and loads the real embed on click. It saves ~500KB of JavaScript that would otherwise execute on page load.

The same pattern applies to Google Maps (use a static map image as placeholder), Intercom and Drift (use a custom “Chat” button that loads the SDK on click), and Calendly embeds (open in a modal on demand).

Lighthouse’s Facade audit specifically flags third-party embeds that have known lightweight alternatives and quantifies the time savings. It’s one of the most actionable audits in the report.

Putting It Together: A Pre-Launch Checklist

Every page I ship now goes through a quick mental checklist before it hits production:

LCP image has loading="eager", fetchpriority="high", explicit width and height, and is preloaded in <head>
LCP image is served in WebP (AVIF if the CDN supports it), with JPEG fallback via <picture>
All images have srcset and sizes for responsive resolution selection
No image is missing width and height attributes
Web fonts use font-display: swap or fallback; preconnect to font CDN is in <head>
Font files are subsetted to the character ranges actually used
GTM container is audited; no tags fire synchronously on page load
YouTube/maps/chat embeds use a facade
third-party-web data checked for any new scripts added in the last sprint

The facade pattern, combined with deferring GTM tags to DOMContentLoaded, is consistently the highest-leverage fix for third-party script LCP impact: the marketing team’s tags still fire correctly, just after the user has already seen the page.