Overengineering Microservices: When Smart Decisions Compound Into Complexity

I once worked in a microservices architecture built by an exceptionally talented team. They were smart, thoughtful, and well-read in distributed systems theory. Every architectural decision they made was defensible in isolation – backed by conference talks, Netflix blog posts, and Martin Fowler articles.

The result was a system that took new developers three months to become productive in. Not because the business domain was complex (it wasn’t), but because the infrastructure complexity dwarfed the business logic. You couldn’t trace a simple HTTP request without understanding HAProxy sidecar configs, Hystrix command wrappers, Apache Camel routing DSLs, six API versions, and a shared framework that every service was mandated to use.

Each decision was smart. The combination was crippling.

HAProxy Sidecar for mTLS

Every microservice had an HAProxy instance running as a sidecar. All east-west traffic (service-to-service) was encrypted via mutual TLS. The reasoning was zero-trust networking: even within the internal network, every connection was authenticated and encrypted.

The cost:

TLS handshake overhead on every call. Each inter-service HTTP request included a full TLS handshake (or session resumption). For services making dozens of downstream calls per request, this added measurable latency.

Certificate rotation was a recurring operational incident. Every service had its own certificate with an expiry date. When certs expired – and they did, because manual rotation doesn’t scale – the service couldn’t communicate with anything. The team built custom monitoring for cert expiry and a rotation pipeline, which itself required maintenance.

Debugging encrypted traffic was painful. You couldn’t tcpdump between services anymore. You couldn’t use a simple HTTP proxy to inspect requests. Every debugging session required either disabling mTLS temporarily (which nobody wanted to do in production) or decrypting captures with the service’s private key (which was rotated, so you needed the right key for the right time window).

The real issue: they reinvented what Istio and Linkerd provide natively. A service mesh handles mTLS transparently – automated cert rotation, traffic policies, observability dashboards, and you still get to debug using the mesh’s built-in tools. Building the same capability from HAProxy configs and custom cert management scripts is the worst of both worlds: all the operational burden with none of the tooling.

To be fair: mTLS between services is a legitimate requirement in regulated industries. PCI-DSS, SOC 2, and certain financial regulations require encrypted internal traffic. The problem wasn’t mTLS itself – it was implementing it via custom HAProxy sidecars instead of a purpose-built service mesh, and applying it to an environment where the threat model didn’t justify it.

If you’re running services in a private VPC with trusted workloads, network-level encryption (WireGuard, AWS VPC encryption, GCP’s default VM-to-VM encryption) achieves the same goal with zero per-request overhead.

Hystrix: Netflix Cargo-Culting

Every outbound HTTP call was wrapped in a Hystrix command. Circuit breaker, thread pool isolation, fallback methods, timeout configuration – the full Netflix resilience pattern applied to every downstream dependency.

Here’s the problem: Hystrix has been in maintenance mode since 2018. Netflix themselves stopped using it and moved to their internal resilience library. The last meaningful commit to the open-source project was over seven years ago. Choosing Hystrix after 2020 is choosing an abandoned library.

But the deeper problem is the cargo-culting. “Netflix uses circuit breakers, so we need circuit breakers.” Netflix runs thousands of microservices serving hundreds of millions of users, where a single slow dependency can cascade into a site-wide outage. Their scale creates problems that justify circuit breakers, bulkhead isolation, and graceful degradation.

Most companies are not Netflix. If your service has five downstream dependencies and handles a few hundred requests per second, you don’t need circuit breakers. You need:

Proper timeouts. Set a connection timeout (1-2 seconds) and a read timeout (5-10 seconds) on your HTTP client. If the downstream is slow, you fail fast.
Retries with exponential backoff. Retry once or twice on transient failures (5xx, connection reset). Don’t retry on 4xx (client errors).
Health checks. If a downstream is down, your orchestrator (Kubernetes) should detect it and stop routing traffic.

These are three lines of HTTP client configuration. Hystrix adds a wrapper class, a fallback method, a thread pool configuration, a metrics dashboard, and a failure threshold tuning exercise – for every single downstream call. The operational cost of configuring and maintaining Hystrix across dozens of services dwarfs the benefit it provides at non-Netflix scale.

If you genuinely need circuit breaking – because you have an unreliable third-party dependency with no SLA, or you’ve measured cascading failure in production – use Resilience4j. It’s lightweight, actively maintained, integrates cleanly with Spring Boot, and doesn’t require wrapping every call in a command class.

But ask the question first: do you actually have a cascading failure problem? Or are you adding circuit breakers because a Netflix engineering blog post made them sound essential? “Netflix uses it” is not an architecture decision. It’s an appeal to authority from a company whose problems are not your problems.

Apache Camel Between Layers

This was the most baffling decision. Apache Camel is an Enterprise Integration Patterns (EIP) framework. It implements patterns like Content-Based Router, Message Translator, Splitter, Aggregator, and Wire Tap. It’s designed for connecting different systems – routing messages between Kafka, RabbitMQ, FTP servers, databases, REST APIs, and file systems.

The team used it for communication between layers within the same microservice. The controller layer didn’t call the service layer directly. Instead, it sent a message through a Camel route, which routed it to the service layer, which processed it and sent the result back through another Camel route to the controller.

This is like using a postal service to send a letter to someone sitting next to you.

What the code looked like:

// Instead of this:
@PostMapping("/orders")
public Order createOrder(@RequestBody OrderRequest request) {
    return orderService.create(request);
}

// They did this:
@PostMapping("/orders")
public Order createOrder(@RequestBody OrderRequest request) {
    return producerTemplate.requestBody("direct:createOrder", request, Order.class);
}

// With a Camel route:
from("direct:createOrder")
    .process(exchange -> {
        OrderRequest req = exchange.getIn().getBody(OrderRequest.class);
        Order result = orderService.create(req);
        exchange.getIn().setBody(result);
    });

The justification was “decoupling.” But decoupling between layers within a service is solved by interfaces and dependency injection – the pattern that Spring Framework has provided since 2003. If OrderController depends on OrderService (an interface), swapping the implementation is a one-line Spring configuration change. No routing framework needed.

Camel added:

A routing DSL that developers had to learn to understand what was essentially a method call
Type conversion overhead (serializing and deserializing objects through Camel’s exchange mechanism)
A message-passing abstraction that made stack traces useless (the actual business exception was wrapped in Camel’s exchange error handling)
Additional configuration, dependencies, and startup time for the Camel context

For zero benefit over orderService.create(request).

API Versioning: Six Versions Deep

The services exposed REST APIs with URL-based versioning: /v1/orders, /v2/orders, all the way to /v6/orders. Six live versions. All in production. All serving traffic.

Nobody could tell you what the differences were between v3 and v4. The changelog, if it existed, was a Confluence page that hadn’t been updated since v2. New developers had to read the source code of each version’s controller to understand what changed – and the controllers shared service layer code with version-specific branches scattered through the business logic:

public Order processOrder(OrderRequest request, int apiVersion) {
    Order order = new Order();
    order.setItems(request.getItems());

    if (apiVersion >= 3) {
        order.setShippingMethod(request.getShippingMethod());
    }
    if (apiVersion >= 5) {
        order.setGiftWrap(request.getGiftWrap());
    }
    if (apiVersion < 4) {
        order.setLegacyTaxCalculation(true);
    }
    // ... more version branches
}

The business logic was contaminated with version checks. Tests had to cover every version permutation. Dead code accumulated because nobody knew if a client was still using v1 – and nobody wanted to break a client by removing it.

Versioning should be a compatibility contract, not a changelog. A new version should only be created for breaking changes – removing a field, changing a data type, restructuring the response. Adding a new optional field is not a breaking change. It does not warrant a new version.

The practical approach:

Support at most two versions: current and previous. When v3 ships, deprecate v1 with a sunset date and remove it.
Version bumps for breaking changes only. Adding an optional field? Add it to the current version. Renaming a field? New version.
Document the breaking changes. If you can’t articulate what broke between v3 and v4, you didn’t need v4.
Monitor version usage. If no client is calling v2, remove it. Don’t let dead versions accumulate out of fear.

If you have six live API versions, you don’t have versioning. You have six different APIs that share a database. The maintenance cost grows linearly with each version, and the team’s ability to reason about the system degrades with every version branch in the business logic.

The Common Framework Trap

A “core platform team” built a mandated common framework. Every microservice was required to use it. It included:

A mandated base Docker image (specific OS, JDK version, and agent binaries)
A custom Spring Boot starter with opinionated defaults for logging, metrics, tracing, and health checks
Shared libraries for HTTP clients, database access, message queue consumers, and authentication
A mandated project structure and build configuration

The intent was consistency: every service looks the same, uses the same libraries, follows the same patterns. In practice:

The framework didn’t cater to all use cases. A service that only consumed Kafka messages and wrote to a database still had to include the HTTP server components, the REST client libraries, and the authentication middleware. The framework was designed for the common case and every service paid for features it didn’t use – in startup time, memory, and dependency surface.

Teams couldn’t choose their own dependencies. If the framework used Apache HttpClient 4.x for HTTP calls, you couldn’t use OkHttp or the built-in Java 11 HttpClient – even if they were better for your use case. The framework’s choices became your constraints.

Teams couldn’t upgrade independently. When the framework released a new major version, every service had to upgrade simultaneously. A framework change that benefited Team A’s use case could break Team B’s customizations. The upgrade became a coordinated, multi-sprint effort across all teams – exactly the kind of big-bang deployment that microservices were supposed to eliminate.

Teams worked around the framework. When the framework couldn’t do what a team needed, they used reflection to override internal behavior, extended framework classes in fragile ways, or added parallel implementations alongside the framework’s versions. The workarounds were harder to maintain than if the team had just written their own code from scratch.

The framework became a bottleneck. Feature requests piled up on the core team. Product teams were blocked waiting for framework changes. The core team prioritized based on their own roadmap, not product urgency. A team that needed a simple change to the HTTP client configuration waited three sprints for the core team to review, approve, and release it.

This is the DRY principle taken to a destructive extreme. Don’t Repeat Yourself is good advice within a single codebase. Applied across microservices, it creates exactly the coupling that microservices were designed to eliminate.

Two services with similar-but-not-identical code are better than two services coupled through a shared library. Duplication is cheaper than the wrong abstraction. When you share code between services, you share deployment schedules, upgrade timelines, and failure modes. You lose the ability to deploy, scale, and evolve each service independently – which is the entire point of microservices.

Amazon’s “two-pizza team” model works because each team owns their full stack. A mandated common framework violates this by centralizing infrastructure decisions in a core team that doesn’t feel the pain of the product teams they serve.

What a healthy platform team provides instead:

Templates, not mandates. A starter template that teams can fork and own. They start with a consistent base but are free to diverge.
Libraries, not frameworks. Small, focused libraries (a logging adapter, a tracing helper) that teams opt into, not a monolithic framework that teams can’t escape.
Golden paths, not golden cages. Documented recommendations (“we suggest using Resilience4j for circuit breaking”) rather than enforced constraints (“you must use our circuit breaker wrapper”).

Banning Squash Merge

The team banned squash merges in all repositories. Every individual commit in a pull request was preserved in the main branch. The reasoning: granular history enables git bisect, cherry-picking, and per-commit attribution.

This only works if the team writes clean, atomic commits:

feat: add payment validation endpoint
feat: add Stripe webhook handler
fix: handle duplicate webhook deliveries
test: add payment flow integration tests

This history tells a story. Each commit is a coherent, reviewable unit of work. git bisect can pinpoint exactly which commit introduced a regression.

In reality, most pull requests contain:

wip
fix
fix again
address review comments
actually fix
oops forgot file
fix lint
fix tests
please work

Preserving this in the main branch provides no value. git bisect on a “wip” commit tells you nothing. The main branch history becomes noise.

For the 90% of teams that write work-in-progress commits (which is most humans), squash merge produces a cleaner, more useful history: one commit per PR with a descriptive message summarizing the change.

Banning squash merge in this context was another manifestation of the same culture: optimize for theoretical correctness regardless of practical benefit. The team valued process purity over the pragmatic reality that most developers do not write museum-quality commit histories.

The Compound Effect

Here is what a new developer faced when joining this team:

Understand HAProxy sidecar configs to know how services communicate
Learn Hystrix command patterns to understand how outbound calls are wrapped
Read Apache Camel routing DSLs to trace request flow within a single service
Navigate six API versions with undocumented differences to understand the current behavior
Learn the common framework’s opinions, overrides, and workarounds to modify any infrastructure behavior
Read the full commit history (unsquashed) to understand why a piece of code exists

Each of these is a learning curve. Combined, they created a system where the infrastructure was more complex than the business domain. The actual business logic – creating orders, processing payments, managing inventory – was straightforward. But it was buried under layers of integration frameworks, resilience patterns, versioning branches, and framework abstractions.

The team was not incompetent. They were over-informed. They had read every distributed systems paper, watched every Netflix tech talk, and attended every conference. They applied best practices from organizations operating at scales they would never reach, solving problems they did not have.

The Simplest Thing That Works

The best architecture is not the one with the most patterns. It’s the simplest one that meets the actual requirements:

What they did	What would have been sufficient
HAProxy sidecar for mTLS	VPC-level encryption (or Istio if mTLS was actually required)
Hystrix circuit breakers on every call	HTTP client timeouts + retries with backoff
Apache Camel between layers	`service.doThing(data)` – a method call
Six live API versions	Two versions max, deprecate aggressively
Mandated common framework	Starter template + opt-in libraries
Squash merge ban	Allow squash merge, encourage clean commit messages

None of these simplifications would have reduced the system’s reliability, security, or scalability. They would have reduced onboarding time from three months to three weeks, made debugging a matter of reading code instead of reading Camel routes and HAProxy configs, and let teams move at their own pace instead of waiting for the core team’s release cycle.

Complexity is not a sign of sophistication. It’s a cost. Every layer of abstraction, every framework, every pattern must justify its existence against the question: does this solve a problem we actually have, or a problem we read about in a blog post?

HAProxy Sidecar for mTLS#

Hystrix: Netflix Cargo-Culting#

Apache Camel Between Layers#

API Versioning: Six Versions Deep#

The Common Framework Trap#

Banning Squash Merge#

The Compound Effect#

The Simplest Thing That Works#