The Performance Obsession

I spent three days shaving 12 milliseconds off a page load. I already know what you’re thinking. And you’re right. But also, I don’t care, because those 12 milliseconds were wrong and now they’re not.

This site runs a full WebGL cosmic background with custom shaders, a particle system, constellation text effects rendered on the GPU, flying icons tracing bezier paths across your screen, and an FPS monitor that adapts visual quality in real-time. All of it running at 60fps. On your phone, too, if your phone cooperates. No promises.

Here’s how.

The Draw Call Problem

If you’ve never thought about draw calls, congratulations on your mental health. For the rest of us: every time you ask the GPU to render something, there’s overhead. The CPU has to set up state, bind buffers, configure the pipeline, and then say “go.” Each one of those round trips is a draw call, and they’re expensive. Not because the GPU is slow at drawing, but because the communication between CPU and GPU is the bottleneck.

The naive approach to rendering 24 flying icons (6 icons with 4 trail copies each) is 24 draw calls per frame. At 60fps, that’s 1,440 draw calls per second just for decorative floating icons. Add in the cosmic background, the constellation text, the particle system, and you’re looking at a frame budget that’s already bleeding. Great start, right?

The solution is GPU instancing. Instead of saying “draw this quad” 24 times, you say “here are 24 sets of per-instance data, draw them all at once.” You pack the instance-specific attributes: the bezier control points, trail indices, icon types, path progress, into buffer attributes that the vertex shader reads per-instance. One draw call. Same visual result. The GPU doesn’t care how many instances you throw at it; it’s the CPU-to-GPU communication that kills you, and instancing eliminates almost all of it.

In my flying icons system, each instance carries 12 floats of data: a 2D offset, four X bezier control points, four Y bezier control points, a path progress value, and an icon type index. The vertex shader evaluates the cubic bezier at the current time, positions the quad, and the fragment shader draws the icon shape procedurally. No textures, no sprite sheets. Pure math.

The result is that rendering 24 animated icons with glow trails costs roughly the same as rendering one. I find this deeply satisfying and I will not be taking questions.

requestAnimationFrame or Die

There’s a special circle of developer hell reserved for people who use setInterval for animation. I’ve been there. I’ve sinned. I learned.

setInterval(render, 16) looks like it should give you 60fps. It won’t. setInterval doesn’t synchronize with the browser’s repaint cycle. It fires based on the event loop timer, which means your render call might land in the middle of a frame, or right after the browser already composited, or during garbage collection. You get jank. Frames pile up. The browser’s compositor is fighting your timer, and nobody wins.

requestAnimationFrame exists specifically to solve this. It tells the browser “call me right before you paint the next frame.” The browser batches your callback with its own rendering work, which means your animation is synchronized with the display refresh rate. On a 60Hz monitor, you get called 60 times per second. On a 120Hz monitor, 120 times. On a 240Hz monitor… you see where this is going.

But here’s the thing people miss: requestAnimationFrame also stops when the tab is in the background. setInterval doesn’t. So if someone opens your site in a tab and forgets about it, setInterval keeps firing, keeps eating CPU, keeps spinning the fan. requestAnimationFrame just… waits. This matters more than the synchronization, honestly. Being a good citizen of someone’s battery is basic decency. Your laptop fans will thank me.

My render loop is a single requestAnimationFrame chain that drives the entire experience. The cosmic background shader, the flying icons, the constellation text, the particle system. One loop, one timestamp, one frame.

Adaptive Quality: The FPS Monitor

Here’s where things get opinionated. Buckle up.

I built an FPS monitoring system that doesn’t just display your framerate. It actively adjusts the visual quality of the site based on what your hardware can sustain. It starts by detecting your display’s refresh rate, which it does by counting requestAnimationFrame callbacks over one second. If your monitor runs at 120Hz, the target is 120fps, not 60. If you’re on a 240Hz display, the target is 240. The system has six tiers: Super Ultra (480Hz, for the truly unhinged), Ultra (240Hz), High (120Hz), Medium (60Hz), Low (30Hz), and Potato.

Yes, Potato. I called it Potato mode. I’m not apologizing. 😂

The monitor samples FPS every 500 milliseconds and compares against the current tier’s target. If your framerate consistently drops below 90% of the target, typically two consecutive samples, the system drops to a lower tier. Tier changes trigger callbacks that adjust particle counts, disable post-processing effects, simplify shaders, or, in the case of Potato mode, shut off the decorative effects entirely. Sorry, your GPU tried its best.

There’s a three-second warmup period after page load where the monitor ignores drops. Page initialization is always janky. JavaScript is parsing, shaders are compiling, textures are uploading. If I penalized startup performance, every visitor would start in Potato mode and the whole system would be pointless. That would be extremely funny but also extremely bad.

The monitor also detects device capabilities up front. Low memory (under 4GB, via navigator.deviceMemory) starts you at Low tier. Mobile user agents start at Medium. Desktop with a high refresh rate display gets the full treatment. It’s progressive enhancement, but for visual fidelity instead of functionality.

You can see the monitor in the corner of the page. Click it and it expands into a draggable panel with a real-time FPS graph. The graph draws on a canvas element, color-coded by tier: green for Medium, cyan for High, purple for Ultra, red for Potato. I spent an unreasonable amount of time making the graph look nice. I regret nothing.

IntersectionObserver: Don’t Animate the Invisible

This one seems obvious in hindsight but I learned it the hard way. If a WebGL canvas is scrolled off-screen, why are you still rendering to it?!

The post navigation on this site uses an IntersectionObserver to track whether the navigation element is visible in the viewport. When it scrolls out of view, the sticky behavior activates. But the same principle applies to any animated element: if it’s not visible, skip the work.

For the cosmic background, this is less relevant since it’s a fixed, full-viewport canvas. But for elements that live in the document flow: constellation text effects on headings, animated cards, anything that scrolls; an IntersectionObserver with a small root margin lets you pause animation when the element is off-screen and resume when it comes back. Zero visual difference to the user. Significant difference to the CPU.

The API is almost embarrassingly simple. Create an observer, give it a callback, observe your elements. The browser tells you when they enter or leave the viewport. No scroll event listeners, no getBoundingClientRect() calls on every frame, no layout thrashing. The browser’s internal compositor already knows what’s visible; IntersectionObserver just exposes that information. It’s free real estate.

Lazy-Loading: Not Everything Needs to Exist at Startup

The chat widget on this site is a full RAG pipeline with embedding models, streaming inference, and a multi-thousand-line UI. It’s also something that maybe 10% of visitors will ever interact with. Probably fewer, let’s be honest. Loading it on page init would be insane.

So it’s lazy-loaded. The main entry point does a dynamic import() after the page is interactive:

import('@/components/chat-widget/chat-widget').then(({ initChatWidget }) => {
  initChatWidget();
});

The browser doesn’t fetch the chat widget’s JavaScript, or any of its dependencies, until this import runs. The initial bundle stays small. First paint stays fast. The chat widget materializes when you need it, not before. ✨

The same principle applies to the WebGL initialization. The experience boots in phases: Phase 1 is the background shader and the render loop, because that’s what’s immediately visible. Phase 2, on the next animation frame, initializes the FPS monitor, 3D cards, view transitions, and pointer event handlers. Phase 3, one more frame later, brings up the constellation text and flying icons. Each phase gets a full frame to complete its work before the next one starts. No single frame is overloaded with initialization.

This staggering is invisible to the user. The background appears instantly. Everything else fades in over the next 30-50 milliseconds. But in profiling, the difference between “initialize everything synchronously” and “stagger across three frames” is the difference between a 200ms main-thread block and three 15ms blocks. The browser stays responsive. Interactions don’t lag. The loading skeleton doesn’t hang. Chef’s kiss.

Cache Busting with Content Hashes

Here’s a fun one. You ship a JavaScript bundle. The user’s browser caches it. You fix a bug and deploy. The user’s browser serves the old cached version. The bug persists. The user blames you. 😤

The fix is content-hashing your output files. Instead of main.js, you emit main-a3f8b2c1.js. The hash is derived from the file’s contents, so when the code changes, the hash changes, the filename changes, and the browser treats it as a new resource. When the code doesn’t change, the hash stays the same, and the browser’s cache is perfectly valid. You get aggressive caching and instant cache invalidation in one mechanism.

Vite does this by default if you configure it correctly. Every JS chunk gets a content hash in its filename. The HTML template references the hashed filenames. Deploy, and returning visitors get the new code immediately while their browser still caches the unchanged vendor chunks. It’s elegant, it’s automatic, and it solves a problem that has plagued web development since browsers learned to cache. Which was approximately five minutes after they learned to load JavaScript.

When Do You Stop?

This is the part where I’m supposed to say something wise about diminishing returns. About how users can’t perceive the difference between 3ms and 8ms of JavaScript execution. About how the real performance gain is closing the profiler and shipping the thing.

And that’s true. Users genuinely cannot tell the difference between a 40ms page load and a 55ms page load. The research on perceptual thresholds is clear: under 100ms feels instant, under 300ms feels responsive, over 1000ms and you’ve lost them. Everything in between is vibes.

But here’s what the “just ship it” crowd misses: performance work compounds. The instancing technique I figured out for flying icons? I reused it for the constellation text. The adaptive quality system catches problems I’d never notice in testing because I develop on a machine with a 3090 in it and my phone is from this year. The lazy-loading pattern keeps the initial bundle small as I add features instead of watching it bloat.

Every optimization I’ve shipped has made the next feature cheaper to add. That’s not diminishing returns. That’s infrastructure.

Do I still spend too long staring at flame graphs? Yes. Have I caught myself profiling the profiler? I refuse to answer that question. But the site loads fast on a five-year-old Android phone with spotty reception, and every WebGL effect degrades gracefully instead of crashing the tab, and the FPS counter stays green on hardware I’ve never tested.

I call that obsession productive. My therapist might disagree. The site runs at 60fps either way. 🚀