One Read Replica Is Not Enough

Last time I argued that internal apps should read from a replica, with credentials scoped only to the replica. That post quietly assumed one replica. What I didn’t say out loud is what happens when the internal apps multiply – and they always do.

I’ve shipped production workloads on all three Azure SQL DB tiers: General Purpose, Business Critical, and Hyperscale. Each move was driven by a specific constraint, and the tier-to-tier story is also the one-replica-to-many-replicas story. Worth writing down.

   BUSINESS CRITICAL                          HYPERSCALE
   ─────────────────                          ──────────

   ┌────────────────┐                         ┌────────────────┐
   │ Primary        │                         │ Primary        │
   │ writes + reads │                         │ writes         │
   └───────┬────────┘                         └───────┬────────┘
           │                                          │
           │ ApplicationIntent=ReadOnly               │ shared page servers
           ▼                                          │
   ┌────────────────┐               ┌─────────────────┼─────────────────┐
   │ HA secondary   │               ▼                 ▼                 ▼
   │ (readable)     │         ┌──────────┐      ┌──────────┐      ┌──────────┐
   │                │         │ bi-rep   │      │ ops-rep  │      │ ml-rep   │
   │ all internal   │         │ 16 vCore │      │ 2 vCore  │      │ 8 vCore  │
   │ reads land     │         │ BI login │      │ admin    │      │ ML login │
   │ on one node    │         └──────────┘      └──────────┘      └──────────┘
   └────────────────┘

   One reader, shared.                One named replica per workload class.

General Purpose: No Free Reader

General Purpose is the entry tier – vCore-based, compute and storage decoupled, no built-in readable secondary. If you want a read replica, you provision an active geo-replica, which is paid, lives in a different region, and is positioned by Microsoft as a DR feature that happens to be readable. It works, but you’re paying full freight for a second copy of the database to get one read endpoint.

For low-traffic apps – an internal admin tool, a back-office form used twice a day, a small dashboard – reading from the primary is fine. The point of GP is that you didn’t need replication. If you’ve outgrown that, you’ve outgrown the tier.

Business Critical: The Reader Is Free, and It’s Great

Business Critical bundles an Always On availability group: a writable primary plus replicas held in sync as failover targets. One of those replicas is exposed as a readable secondary. Add ApplicationIntent=ReadOnly to your connection string, and your reads land on the secondary instead of the primary. Sub-second lag in practice, local SSD-backed, and – this is the part I like – zero extra licensing. You’re already paying for the HA replica. It doubles as your reader for free.

This is the perfect home for the pattern from the previous post. The internal app’s connection string includes ApplicationIntent=ReadOnly and points at the listener name. Its database login has rights only on the readable secondary. The credential boundary holds. Internal reads never touch the primary. This is the cheap, correct version of the architecture, and for one or two internal workloads it’s the right answer.

The load-bearing limit, stated plainly: there is one readable secondary. Every connection that asks for read intent lands on the same node.

The Wall: One Reader Isn’t Enough

The compound-growth argument from the previous post applies at this layer exactly. Reporting jobs, BI dashboards, ML feature pulls, finance exports, ops tooling – they accumulate. By the time you have four or five of them, they’re all pointed at the same read-intent connection, which means they’re all running on the same secondary, sharing its connection pool, its plan cache, and its CPU.

The failures aren’t dramatic. They’re irritating. A BI query and a reporting batch run against the same secondary at the same time. Their plans clash. Parameter sniffing on one workload pessimizes plans for the other. The secondary’s CPU pegs at 80% and one of them times out. None of this affects the primary – which is good – but the secondary is dying, and the workloads on it are stepping on each other.

The realization that pushes you off Business Critical usually isn’t “primary is dying.” It’s “the secondary is dying, and I can’t isolate the workloads I put on it.” That’s the moment you start looking at Hyperscale.

Hyperscale Named Replicas

Hyperscale’s read story is structurally different. Storage is decoupled into page servers shared across compute. On top of that storage, you can attach multiple readable compute endpoints. Two flavors exist: HA replicas (failover targets, like BC’s secondary) and named replicas – the feature that actually matters here.

A named replica is its own compute endpoint sitting on the same shared storage. Concretely:

Each named replica has its own connection string. You don’t use ApplicationIntent=ReadOnly – you connect directly to bi-replica.myserver.database.windows.net (or whatever you named it). No application-side intent juggling, no listener routing surprises. The BI tool gets the BI replica’s connection string and that’s it.

Each named replica is sized independently. Analytics replica at 16 vCore. Ops replica at 2 vCore. Primary at 4 vCore. Each is billed for its own compute. Storage is shared via the page server tier, so you’re not paying for multiple copies of the data – just for the compute that serves it. Scale one up for a heavy month-end run; scale it down for the rest of the month. Other replicas are unaffected.

Each named replica has its own logins. This extends the credential-boundary argument from the previous post cleanly. The BI team’s credentials grant access to the BI replica and nothing else. The internal admin tool’s credentials grant access to the ops replica and nothing else. None of those credentials reach the primary. When you’re talking to the security team about blast radius, “the BI service can only ever talk to bi-replica” is a much shorter conversation than “the BI service has read intent against the primary and we promise it doesn’t go further.”

Honest distinction: named replicas don’t participate in HA failover. Business Critical’s reader was free because it was already paid for as the failover target. Hyperscale’s HA is handled by HA replicas; named replicas are explicit, paid read capacity that exists for read scale-out and nothing else. You are now paying for read capacity that BC bundled in for free. That’s the trade you’re making.

Trade-offs to Be Honest About

This story sells well because it’s a real fit for the right problem. It’s not the right problem for everyone:

Hyperscale costs more; named replicas more again. Each named replica is billable compute. Storage is shared so you’re not duplicating data, but five named replicas is five compute bills sitting next to your primary. Run the numbers before you go shopping for endpoints.

Lag is real but small. Named replicas apply log records out of the same log service the primary writes to. Lag is typically sub-second but can spike under heavy primary write load. If you have an endpoint that reads back a row it wrote five milliseconds ago, point that endpoint at the primary, not at a named replica. Don’t pretend the lag isn’t there.

Migration from Business Critical to Hyperscale is a project. Backup compatibility differs, restore semantics differ, and there are feature gaps – in-memory OLTP, for example, isn’t supported on Hyperscale. You can’t flip a tier toggle in the portal and walk away. If you’re on BC and considering the move, plan it as a small migration, not a configuration change.

Observability differs. Query Store, DMVs, and wait stats behave a little differently on Hyperscale because the storage layer is different. Dashboards you built for BC will mostly work but need a pass. The first time you read Hyperscale wait stats you will be surprised by what’s there and what isn’t.

When to Move From Business Critical to Hyperscale

A short checklist. Business Critical is the right answer for plenty of workloads – be honest with yourself about whether you actually need what’s on the next floor.

You have more than one internal read workload that can’t tolerate sharing a node.
The workloads have different shapes (heavy analytics queries vs cheap point lookups) and one box size doesn’t suit them.
You want a credential boundary per workload – not as a code review note but as a connection-string boundary.
You’re willing to pay for explicit read capacity instead of relying on the free HA reader.

If none of those are true, Business Critical is the right answer. Don’t move because Hyperscale sounds cooler.

General Purpose: No Free Reader#

Business Critical: The Reader Is Free, and It’s Great#

The Wall: One Reader Isn’t Enough#

Hyperscale Named Replicas#

Trade-offs to Be Honest About#

When to Move From Business Critical to Hyperscale#