[{"content":"Go is getting popular. I understand why. It compiles fast, produces small binaries, has excellent concurrency primitives, and powers some of the most important infrastructure tools in the cloud-native ecosystem \u0026ndash; Docker, Kubernetes, Terraform, Prometheus.\nBut popularity is not the same as suitability. Most software is not infrastructure tooling. Most software is business applications: REST APIs, CRUD services, data pipelines, web backends. For these workloads, Go\u0026rsquo;s developer experience is meaningfully worse than Spring Boot\u0026rsquo;s. Not slightly worse. Dramatically worse.\nThis is not a \u0026ldquo;Java vs Go performance benchmark\u0026rdquo; article. Go is faster in many scenarios. That is not the point. The point is that for the vast majority of applications, performance is not the bottleneck \u0026ndash; developer productivity and code readability are. And on those dimensions, Go loses badly.\nError Handling: Where\u0026rsquo;s the Business Logic? Consider a service method that creates an order. It needs to validate the input, check inventory, charge payment, and persist the order.\nSpring Boot:\n@Transactional public Order createOrder(OrderRequest request) { var customer = customerService.findById(request.getCustomerId()); var inventory = inventoryService.reserve(request.getItems()); var payment = paymentService.charge(customer, request.getTotal()); return orderRepository.save(new Order(customer, inventory, payment)); } Four lines of business logic. Every line does something meaningful. If any step fails, the exception propagates, the transaction rolls back, and the global exception handler returns an appropriate HTTP response. The developer writes only the happy path.\nGo:\nfunc (s *OrderService) CreateOrder(ctx context.Context, req OrderRequest) (*Order, error) { customer, err := s.customerService.FindByID(ctx, req.CustomerID) if err != nil { return nil, fmt.Errorf(\u0026#34;find customer: %w\u0026#34;, err) } inventory, err := s.inventoryService.Reserve(ctx, req.Items) if err != nil { return nil, fmt.Errorf(\u0026#34;reserve inventory: %w\u0026#34;, err) } payment, err := s.paymentService.Charge(ctx, customer, req.Total) if err != nil { return nil, fmt.Errorf(\u0026#34;charge payment: %w\u0026#34;, err) } order, err := s.orderRepo.Save(ctx, \u0026amp;Order{ Customer: customer, Inventory: inventory, Payment: payment, }) if err != nil { return nil, fmt.Errorf(\u0026#34;save order: %w\u0026#34;, err) } return order, nil } Same logic. But now the business logic is buried inside twelve lines of error handling boilerplate. The if err != nil pattern repeats four times. Each repetition adds nothing \u0026ndash; it just wraps and forwards the error. The actual business operations \u0026ndash; find, reserve, charge, save \u0026ndash; are the same four calls, but you need a microscope to find them between the error checks.\nGo developers will argue this makes error flow \u0026ldquo;explicit.\u0026rdquo; Yes, it does. It also makes the code twice as long and half as readable. Explicitness is a virtue when it reveals something the reader didn\u0026rsquo;t know. Repeating if err != nil { return nil, fmt.Errorf(...) } four times reveals nothing \u0026ndash; it\u0026rsquo;s pure ceremony.\nAnd notice: the Go version has no transaction management. Adding that requires either a manual tx.Begin() / tx.Commit() / tx.Rollback() ceremony with its own error handling, or a callback-based helper. The Java version gets it with a single @Transactional annotation.\nControllers: 5 Lines vs 40 Lines Spring Boot:\n@RestController @RequestMapping(\u0026#34;/api/orders\u0026#34;) public class OrderController { @PostMapping @ResponseStatus(HttpStatus.CREATED) public Order create(@Valid @RequestBody OrderRequest request) { return orderService.createOrder(request); } @GetMapping(\u0026#34;/{id}\u0026#34;) public Order get(@PathVariable Long id) { return orderService.findById(id); } @GetMapping public Page\u0026lt;Order\u0026gt; list(@PageableDefault(size = 20) Pageable pageable) { return orderService.findAll(pageable); } } Three endpoints. Input validation (@Valid), path variable extraction (@PathVariable), pagination (Pageable), HTTP status codes (@ResponseStatus), content negotiation (JSON serialization) \u0026ndash; all handled by annotations. The controller is a declaration of intent, not an implementation of HTTP mechanics.\nGo (Gin):\nfunc (h *OrderHandler) Register(r *gin.Engine) { g := r.Group(\u0026#34;/api/orders\u0026#34;) g.POST(\u0026#34;\u0026#34;, h.Create) g.GET(\u0026#34;/:id\u0026#34;, h.Get) g.GET(\u0026#34;\u0026#34;, h.List) } func (h *OrderHandler) Create(c *gin.Context) { var req OrderRequest if err := c.ShouldBindJSON(\u0026amp;req); err != nil { c.JSON(http.StatusBadRequest, gin.H{\u0026#34;error\u0026#34;: err.Error()}) return } if err := validate.Struct(req); err != nil { c.JSON(http.StatusUnprocessableEntity, gin.H{\u0026#34;error\u0026#34;: err.Error()}) return } order, err := h.service.CreateOrder(c.Request.Context(), req) if err != nil { c.JSON(http.StatusInternalServerError, gin.H{\u0026#34;error\u0026#34;: err.Error()}) return } c.JSON(http.StatusCreated, order) } func (h *OrderHandler) Get(c *gin.Context) { id, err := strconv.ParseInt(c.Param(\u0026#34;id\u0026#34;), 10, 64) if err != nil { c.JSON(http.StatusBadRequest, gin.H{\u0026#34;error\u0026#34;: \u0026#34;invalid id\u0026#34;}) return } order, err := h.service.FindByID(c.Request.Context(), id) if err != nil { c.JSON(http.StatusInternalServerError, gin.H{\u0026#34;error\u0026#34;: err.Error()}) return } c.JSON(http.StatusOK, order) } func (h *OrderHandler) List(c *gin.Context) { page, _ := strconv.Atoi(c.DefaultQuery(\u0026#34;page\u0026#34;, \u0026#34;0\u0026#34;)) size, _ := strconv.Atoi(c.DefaultQuery(\u0026#34;size\u0026#34;, \u0026#34;20\u0026#34;)) orders, total, err := h.service.FindAll(c.Request.Context(), page, size) if err != nil { c.JSON(http.StatusInternalServerError, gin.H{\u0026#34;error\u0026#34;: err.Error()}) return } c.JSON(http.StatusOK, gin.H{ \u0026#34;content\u0026#34;: orders, \u0026#34;totalElements\u0026#34;: total, \u0026#34;page\u0026#34;: page, \u0026#34;size\u0026#34;: size, }) } Same three endpoints. Forty-plus lines vs fifteen. The Go version manually handles: JSON binding, validation invocation, error responses, path parameter parsing, type conversion, pagination parameter extraction, and response envelope construction. Every one of these is boilerplate that Spring Boot handles declaratively.\nThe Go developer might say: \u0026ldquo;But I can see exactly what\u0026rsquo;s happening!\u0026rdquo; Yes. You can also see exactly what\u0026rsquo;s happening when you write assembly. That doesn\u0026rsquo;t make it productive.\nThe Annotation System: Java\u0026rsquo;s Secret Weapon Here\u0026rsquo;s the question Go developers rarely ask: why can Java do this and Go can\u0026rsquo;t?\nIt\u0026rsquo;s not about language age or ecosystem size. It\u0026rsquo;s a fundamental architectural difference in how the two languages execute code.\nHow Java Annotations Actually Work When you write @Transactional on a method, nothing happens at compile time. The annotation is metadata \u0026ndash; it\u0026rsquo;s stored in the class file\u0026rsquo;s bytecode but has zero effect on execution by itself. The magic happens at runtime, through a chain of mechanisms that Go\u0026rsquo;s architecture fundamentally cannot support:\nStep 1: Component Scanning. When Spring Boot starts, it scans the classpath for classes annotated with @Component, @Service, @Controller, etc. This uses Java\u0026rsquo;s reflection API \u0026ndash; the ability to inspect class structure, annotations, methods, and fields at runtime.\nStep 2: Dynamic Proxy Generation. For any bean that has annotations requiring cross-cutting behavior (@Transactional, @Cacheable, @Async, @Retryable), Spring creates a proxy class at runtime. This uses one of two mechanisms:\nJDK Dynamic Proxy: For interface-based beans. Java\u0026rsquo;s java.lang.reflect.Proxy creates a new class at runtime that implements the same interface but intercepts every method call. CGLIB Proxy: For class-based beans. The CGLIB library generates a subclass of your class at runtime using bytecode generation. This subclass overrides your methods, adding the transactional/caching/retry behavior before and after your actual code runs. This is only possible because Java has a classloader \u0026ndash; a runtime component that loads and defines classes dynamically. The JVM can create new classes that didn\u0026rsquo;t exist at compile time and load them into the running application.\nStep 3: AOP Interception. The generated proxy wraps your method with \u0026ldquo;advice\u0026rdquo; \u0026ndash; before advice, after advice, around advice. For @Transactional, the around advice is:\n1. Get a database connection from the connection pool 2. Begin transaction 3. Call your actual method 4. If no exception: commit 5. If exception: rollback 6. Return connection to pool Your code never sees any of this. You write the business logic. The framework handles the infrastructure.\nWhy Go Cannot Do This Go compiles to a static binary. There is no classloader. There is no runtime type creation. There is no bytecode generation. When the Go compiler finishes, the binary contains every type that will ever exist in the program. You cannot create new types at runtime.\nThis means:\nNo dynamic proxies. You cannot generate a wrapper class around a struct at runtime. If you want to add transaction management to a method, you must write the wrapping code yourself \u0026ndash; either explicitly, or via a higher-order function that still requires manual invocation at every call site. No annotation-driven behavior. Go has struct tags (e.g., `json:\u0026quot;name\u0026quot;`), which look similar to annotations but are fundamentally limited. They\u0026rsquo;re only readable via the reflect package, and the Go community strongly discourages reflection in production code. No AOP. Aspect-oriented programming is structurally impossible without dynamic dispatch or runtime code generation. Go has neither. Go\u0026rsquo;s alternative is go generate \u0026ndash; a tool that runs code generators at build time to produce source code. But generated source code is not the same as runtime behavior. You have to commit the generated code, maintain it, and regenerate it when the source changes. It\u0026rsquo;s a poor substitute for dynamic proxies.\nThe AI Argument Kills the Debugging Concern The traditional counter-argument against Java\u0026rsquo;s annotation model is: \u0026ldquo;It\u0026rsquo;s too magical. When @Transactional doesn\u0026rsquo;t work because you called the method from within the same class (proxy bypass), it\u0026rsquo;s impossible to debug.\u0026rdquo;\nThis was a legitimate concern in 2015. It is not a legitimate concern in 2026.\nAI code assistants understand Spring\u0026rsquo;s proxy model completely. Ask any AI: \u0026ldquo;Why isn\u0026rsquo;t my @Transactional annotation working when I call the method from the same class?\u0026rdquo; You\u0026rsquo;ll get the exact explanation (self-invocation bypasses the proxy because the call goes through this, not the proxy reference) and the fix (inject the bean into itself, use AopContext.currentProxy(), or extract the method to a separate service) in seconds.\nThe \u0026ldquo;annotation magic is hard to debug\u0026rdquo; problem has been solved by AI. The \u0026ldquo;Go error handling is verbose\u0026rdquo; problem has not been solved by AI \u0026ndash; because it\u0026rsquo;s structural. AI can write the boilerplate for you, but you still have to read it during code review, and the business logic is still buried.\nException Handling: The Global Safety Net Spring Boot:\n@ControllerAdvice public class GlobalExceptionHandler { @ExceptionHandler(NotFoundException.class) @ResponseStatus(HttpStatus.NOT_FOUND) public ErrorResponse handleNotFound(NotFoundException e) { return new ErrorResponse(\u0026#34;NOT_FOUND\u0026#34;, e.getMessage()); } @ExceptionHandler(ValidationException.class) @ResponseStatus(HttpStatus.UNPROCESSABLE_ENTITY) public ErrorResponse handleValidation(ValidationException e) { return new ErrorResponse(\u0026#34;VALIDATION_ERROR\u0026#34;, e.getMessage()); } @ExceptionHandler(Exception.class) @ResponseStatus(HttpStatus.INTERNAL_SERVER_ERROR) public ErrorResponse handleAll(Exception e) { log.error(\u0026#34;Unhandled exception\u0026#34;, e); return new ErrorResponse(\u0026#34;INTERNAL_ERROR\u0026#34;, \u0026#34;Something went wrong\u0026#34;); } } One class. Handles every error type across the entire application. Every controller, every service, every repository \u0026ndash; if an exception escapes, this handler catches it and converts it to a proper HTTP response. The catch-all at the bottom ensures that no unhandled exception ever leaks a stack trace to the client.\nControllers are clean because they don\u0026rsquo;t handle errors. Services are clean because they throw meaningful exceptions. The mapping from exception to HTTP response happens in one place.\nGo:\nThere is no equivalent. Every handler must manually map errors to responses:\nfunc (h *OrderHandler) Get(c *gin.Context) { // ... parse id, handle parse error ... order, err := h.service.FindByID(ctx, id) if err != nil { if errors.Is(err, ErrNotFound) { c.JSON(404, gin.H{\u0026#34;error\u0026#34;: \u0026#34;not found\u0026#34;}) return } if errors.Is(err, ErrForbidden) { c.JSON(403, gin.H{\u0026#34;error\u0026#34;: \u0026#34;forbidden\u0026#34;}) return } c.JSON(500, gin.H{\u0026#34;error\u0026#34;: \u0026#34;internal error\u0026#34;}) return } c.JSON(200, order) } This error-to-HTTP mapping is repeated in every handler. You can extract it into a helper function, but you still have to call that helper in every handler. There is no central place where all errors are caught and converted. Every handler is responsible for its own error mapping \u0026ndash; and if one handler forgets, the error leaks as a raw 500 with no structured body.\nGo has panic and recover, which behave like exceptions. But the Go community explicitly discourages using them for error handling. They are reserved for truly unrecoverable situations (programmer bugs, not business errors). So Go chooses to not use the one mechanism that could give it exception-like behavior, and then requires every function to manually propagate errors instead.\nTesting: Spock vs For Loops Spock (Groovy):\ndef \u0026#34;calculates order total with tax\u0026#34;() { expect: orderService.calculateTotal(items, taxRate) == expectedTotal where: items | taxRate | expectedTotal [item(10.00), item(20.00)] | 0.10 | 33.00 [item(100.00)] | 0.20 | 120.00 [] | 0.10 | 0.00 [item(9.99), item(0.01)] | 0.0 | 10.00 } A data table. Each row is a test case. The test name, the inputs, and the expected outputs are all visible in one place. Adding a new test case is adding a new row \u0026ndash; zero boilerplate.\nJUnit 5:\n@ParameterizedTest @CsvSource({ \u0026#34;100.00, 0.10, 110.00\u0026#34;, \u0026#34;200.00, 0.20, 240.00\u0026#34;, \u0026#34;0.00, 0.10, 0.00\u0026#34; }) void calculatesTotal(BigDecimal amount, BigDecimal tax, BigDecimal expected) { assertEquals(expected, service.calculateTotal(amount, tax)); } Still concise. The @CsvSource annotation provides the test data inline. No loops.\nGo:\nfunc TestCalculateTotal(t *testing.T) { tests := []struct { name string items []Item taxRate float64 expected float64 }{ { name: \u0026#34;multiple items with tax\u0026#34;, items: []Item{newItem(10.00), newItem(20.00)}, taxRate: 0.10, expected: 33.00, }, { name: \u0026#34;single item with tax\u0026#34;, items: []Item{newItem(100.00)}, taxRate: 0.20, expected: 120.00, }, { name: \u0026#34;empty items\u0026#34;, items: []Item{}, taxRate: 0.10, expected: 0.00, }, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { result := service.CalculateTotal(tt.items, tt.taxRate) assert.Equal(t, tt.expected, result) }) } } Functionally identical. But structurally bloated. The struct definition, the slice literal, the for loop, the t.Run wrapper \u0026ndash; it\u0026rsquo;s all ceremony. Adding a test case means adding a new struct literal with field names repeated for every case. Compare that to Spock\u0026rsquo;s \u0026ldquo;add a row to the table.\u0026rdquo;\nAnd then there\u0026rsquo;s mocking:\nMockito (Java):\n@MockBean private PaymentService paymentService; @Test void chargesCustomer() { when(paymentService.charge(any(), eq(100.00))).thenReturn(receipt); orderService.createOrder(request); verify(paymentService).charge(customer, 100.00); } Go:\n// First, define a mock struct that implements the interface type mockPaymentService struct { chargeFunc func(ctx context.Context, customer *Customer, amount float64) (*Receipt, error) } func (m *mockPaymentService) Charge(ctx context.Context, c *Customer, amount float64) (*Receipt, error) { return m.chargeFunc(ctx, c, amount) } // Then, in the test: func TestChargesCustomer(t *testing.T) { mock := \u0026amp;mockPaymentService{ chargeFunc: func(ctx context.Context, c *Customer, amount float64) (*Receipt, error) { assert.Equal(t, 100.00, amount) return \u0026amp;Receipt{}, nil }, } service := NewOrderService(mock) _, err := service.CreateOrder(context.Background(), request) assert.NoError(t, err) } In Java, @MockBean and Mockito handle mock creation, injection, stubbing, and verification in three lines. In Go, you define a custom mock struct for every interface, implement every method, and wire it manually. Tools like mockgen and testify/mock help, but they don\u0026rsquo;t approach Mockito\u0026rsquo;s ergonomics.\nSpring Boot\u0026rsquo;s @MockBean is particularly powerful: it replaces a real bean in the Spring context with a mock, including all its transitive dependencies. In Go, you manually construct the entire dependency graph in every test.\nPointers: Solving Problems That Don\u0026rsquo;t Exist Java developers sometimes hear that \u0026ldquo;Java has no pointers.\u0026rdquo; This is wrong. Java has pointers everywhere \u0026ndash; they\u0026rsquo;re called references. NullPointerException is literally the most common runtime error in Java.\nThe difference is that Java hides pointer mechanics. You never write *customer to dereference or \u0026amp;order to take an address. You never decide whether a struct should be passed by value (copied) or by reference (shared). The JVM handles this transparently.\nGo exposes it:\nfunc (s *OrderService) Process(order *Order) error { customer := order.Customer // value copy or pointer? depends on Customer\u0026#39;s type s.updateStatus(\u0026amp;order.Status) // explicit address-of result := *order.Result // explicit dereference // ... } For systems programming \u0026ndash; network protocols, memory-mapped I/O, custom allocators \u0026ndash; pointer control matters. For a REST API that reads from a database and returns JSON, it\u0026rsquo;s cognitive overhead with no payoff.\nThe standard Go developer\u0026rsquo;s response is: \u0026ldquo;But value semantics prevent aliasing bugs!\u0026rdquo; True. In Go, passing a struct by value means the callee can\u0026rsquo;t mutate the caller\u0026rsquo;s copy. This prevents a class of bugs. But in business application code, how often is unintended aliasing actually the bug? Compared to how often the * and \u0026amp; syntax trips up junior developers or clutters code review diffs, the tradeoff is not worth it.\nFor 99% of applications, the milliseconds saved by Go\u0026rsquo;s memory model are irrelevant. The hours lost to pointer-related confusion are not.\nGo\u0026rsquo;s Missed Market Window Go 1.0 shipped in March 2012. At that time, Java was in a genuinely weak period. Java 7 (2011) was underwhelming. Java 6 had been the standard for years with minimal evolution. Spring Framework (pre-Boot) required mountains of XML configuration. Setting up a Spring project meant wrestling with applicationContext.xml, web.xml, dispatcher-servlet.xml, and a dozen Maven dependencies. It was painful.\nIf Go had offered a compelling web framework with strong library support in 2012-2013, it might have captured a significant share of the backend market.\nBut then two things happened in 2014:\nJava 8 shipped with lambdas, streams, and the Optional type \u0026ndash; the most significant language evolution since generics. Spring Boot 1.0 launched, eliminating all the XML configuration pain with convention-over-configuration and auto-configuration. Suddenly, starting a new Java web service went from \u0026ldquo;configure 15 XML files\u0026rdquo; to @SpringBootApplication and a main method. The pain point that Go was positioned to solve had been solved by Java itself.\nGo found its niche where it genuinely excels: infrastructure tooling. Docker (2013), Kubernetes (2014), Terraform (2014), Prometheus (2012) \u0026ndash; all written in Go. For CLI tools, system daemons, and network services that need small binaries, fast startup, and low memory, Go is the right choice. But the enterprise application market that Go might have captured went back to Java.\nWhere Go Genuinely Wins Intellectual honesty requires acknowledging Go\u0026rsquo;s real strengths:\nGoroutines and channels. Go\u0026rsquo;s concurrency model is genuinely elegant. Spawning a goroutine (go func() { ... }()) is simpler than any threading model Java has offered historically. Channels provide typed, safe communication between goroutines. Java 21\u0026rsquo;s virtual threads have narrowed this gap significantly, but Go\u0026rsquo;s concurrency was there a decade earlier.\nBinary size and startup. A Go binary is typically 10-20MB and starts in \u0026lt;100ms. A Spring Boot application needs a 200MB+ JVM and takes 2-10 seconds to start. For CLI tools, serverless functions, and container-dense environments, this matters.\nMemory footprint. A Go service can run in 10-30MB of RAM. A Spring Boot service rarely drops below 200MB. When you\u0026rsquo;re running dozens of microservices, this adds up. For infrastructure at scale, Go\u0026rsquo;s memory efficiency translates to real cost savings.\nSimplicity of deployment. One binary. Copy it to the server. Run it. No JVM installation, no classpath, no dependency hell. This is genuinely pleasant.\nConclusion Go is a well-designed language for a specific domain: infrastructure tooling, CLI applications, and performance-critical network services. In that domain, its strengths \u0026ndash; fast compilation, small binaries, excellent concurrency, low memory \u0026ndash; are real and meaningful.\nBut most software is not infrastructure tooling. Most software is business applications: REST APIs, CRUD services, workflow engines, data processors. For these, Spring Boot offers:\nDeclarative code that reads like intent, not implementation Annotation-driven behavior powered by dynamic proxies and runtime reflection \u0026ndash; machinery Go cannot replicate Global exception handling that keeps controllers and services clean Mature testing frameworks with expressive assertions, data-driven tests, and DI-aware mocking Automatic transaction management that is invisible to the developer The old argument against Java \u0026ndash; \u0026ldquo;annotations are magic, hard to debug\u0026rdquo; \u0026ndash; died with AI code generation. In 2026, any developer can ask an AI to explain exactly what @Transactional does under the hood, why a proxy bypass happened, or how AOP ordering works. The debugging cost of annotation-driven development has collapsed to near-zero.\nGo\u0026rsquo;s verbosity problem, by contrast, is structural. AI can write the if err != nil boilerplate for you, but you still have to read it during code review. The business logic is still buried. The controllers are still four times longer. The tests still require manual mock structs and for loops.\nGo is the right tool for building the next Kubernetes. Spring Boot is the right tool for building the application that runs on it.\n","permalink":"https://blogs.joshuaantony.com/posts/go-developer-experience-problem/","summary":"Go is a fine language for infrastructure tooling. But for the 90% of software that is business applications, Spring Boot\u0026rsquo;s developer experience is dramatically superior \u0026ndash; and AI-assisted development has eliminated the last argument against Java\u0026rsquo;s annotation-driven model.","title":"Go's Developer Experience Problem: Why Spring Boot Still Wins for Business Applications"},{"content":"Running your own photo gallery might sound like overkill when Google Photos exists, but there are compelling reasons to self-host: full ownership of your data, no subscription fees, no AI training on your family photos, and the satisfaction of building something yourself.\nIn this post, I\u0026rsquo;ll walk through the architecture of a self-hosted photo gallery I built on a 3-node Kubernetes cluster, serving over 10,000 photos and videos from a NAS, protected by enterprise-grade authentication, and accessible from anywhere in the world.\nThe Stack Immich \u0026ndash; A Google Photos alternative with face detection, smart search, timeline view, and mobile apps Authentik \u0026ndash; An open-source identity provider handling OAuth2/OIDC, user management, and TOTP MFA Cloudflare Tunnel \u0026ndash; Zero-trust ingress with no open ports on the home network UGreen NAS \u0026ndash; Network-attached storage serving photos via NFS Kubernetes \u0026ndash; Orchestrating everything across 3 mini-PCs Architecture The design follows a simple principle: keep all data on the home network, expose nothing directly to the internet, and make authentication mandatory.\nInternet → Cloudflare Edge (TLS + CDN) → Cloudflare Tunnel → K8s Cluster → NAS Users access photos.example.com which hits Cloudflare\u0026rsquo;s edge network. Cloudflare routes the request through an encrypted tunnel to a cloudflared pod running inside the Kubernetes cluster. The pod forwards traffic to either the Immich server or Authentik, depending on the hostname.\nThe key insight is that the tunnel is outbound-only. The cloudflared pod initiates a QUIC connection to Cloudflare \u0026ndash; no ports are opened on the home router. The NAS sits entirely on the local network, accessible only by Kubernetes pods via NFS.\nWhy Not Just Use Nginx Ingress? For a setup with only two services (Immich and Authentik), a full ingress controller is unnecessary overhead. Cloudflare Tunnel\u0026rsquo;s built-in routing handles hostname-based routing natively. Each service gets a public hostname mapped to an internal Kubernetes service:\nphotos.example.com → immich-server.photo-gallery.svc:2283 auth.example.com → authentik-server.photo-gallery.svc:80 This eliminates an entire component from the stack.\nAuthentication Flow Every user must authenticate through Authentik before accessing photos. The login flow:\nUser opens the photo gallery URL Clicks \u0026ldquo;Login with Authentik\u0026rdquo; (the only login option \u0026ndash; Immich\u0026rsquo;s built-in password login is disabled) Enters username and password at the Authentik login page If first login: forced to change their temporary password If TOTP not enrolled: shown a QR code to scan with Google Authenticator Enters the 6-digit TOTP code Redirected back to Immich with an OIDC token Immich auto-creates their account on first login Only the admin can create user accounts. There is no self-registration. This means the only people who can access the gallery are family members who were explicitly invited.\nNFS for Photo Storage The UGreen NAS exports its photos folder via NFSv3 to the Kubernetes cluster. Immich mounts this as a read-only external library at /mnt/nas/Photos. This means:\nImmich can index and display all photos but cannot modify the originals New photos added to the NAS are picked up on the next library scan The NAS remains the single source of truth for media files Immich also has a separate writable volume for its own data \u0026ndash; thumbnails, encoded videos, ML model cache, and any photos uploaded directly through the Immich app.\nThe Cloudflare CDN Bonus An unexpected benefit of using Cloudflare Tunnel: static assets like photos get cached at Cloudflare\u0026rsquo;s edge. After the first request, subsequent views of the same photo are served from Cloudflare\u0026rsquo;s CDN rather than traversing the tunnel to the home server. This makes the gallery surprisingly fast for family members accessing it from different countries.\nResource Usage The entire stack runs comfortably on three mini-PCs with 16 cores and 28-32GB RAM each. Current utilization is roughly 3% CPU and 3% RAM across the cluster, leaving ample room for additional workloads.\nKey Takeaways Cloudflare Tunnel eliminates the need for port forwarding \u0026ndash; zero attack surface on the home network Authentik provides enterprise-grade auth for free \u0026ndash; OIDC, MFA, user management with an admin panel Immich is a legitimate Google Photos replacement \u0026ndash; face detection, search, mobile apps, and external library support NFSv3 works reliably for media serving \u0026ndash; just make sure to test the NFS version your NAS supports before configuring Kubernetes PVs Keep secrets out of git \u0026ndash; use Kubernetes Secrets and .gitignore from day one ","permalink":"https://blogs.joshuaantony.com/posts/self-hosted-photo-gallery-kubernetes/","summary":"How I built a Google Photos alternative running on a 3-node Kubernetes cluster at home, protected by MFA authentication, and exposed securely via Cloudflare Tunnel with zero open ports.","title":"Building a Self-Hosted Photo Gallery on Kubernetes with Immich, Authentik, and Cloudflare Tunnel"},{"content":"The promise of SaaS identity is compelling: hand off authentication to a managed service, never patch an identity server again, inherit SOC 2 and ISO 27001 compliance, and get global availability for free. Entra ID, Okta, Ping Cloud, Auth0 \u0026ndash; pick one, integrate via OIDC, and move on to building your product.\nFor 80% of enterprises, this works. Employees log in to internal apps. Standard OAuth2 authorization code flow. MFA via Microsoft Authenticator or Okta Verify. Token issued, resource accessed, session ends. The spec covers it. The SaaS IDP handles it. Everyone is happy.\nThen there\u0026rsquo;s the other 20%.\nThe Customization Cliff The OAuth 2.0 and OpenID Connect specifications define a handful of grant types (authorization code, client credentials, device code, refresh token) and a few extension points (custom scopes, claims, token exchange via RFC 8693). These cover the standard cases well.\nBut real-world identity requirements routinely exceed the spec:\nStep-up authentication \u0026ndash; require MFA mid-session for a sensitive operation (transferring money, changing account settings), not just at login Progressive profiling \u0026ndash; collect more user data across multiple sessions rather than requiring a complete registration upfront Risk-based / adaptive authentication \u0026ndash; change the auth flow based on device fingerprint, geolocation, behavioral signals, or threat intelligence Account linking \u0026ndash; merge a social login identity with an existing enterprise account Anonymous-to-authenticated session promotion \u0026ndash; user browses anonymously, adds items to a cart, then registers or logs in without losing their session state Multi-tenant flows \u0026ndash; different authentication requirements per tenant within the same application Custom MFA channels \u0026ndash; WhatsApp OTP, hardware tokens, biometric verification via external providers ForgeRock\u0026rsquo;s Authentication Trees (now called Journeys in PingOne Advanced Identity Cloud) were built for exactly this. Each node in a tree was a decision point: authenticate with password, then check device fingerprint, then branch to SMS MFA if the device is new, or skip MFA if the device is trusted and the IP is known. Nodes could call external services, set session attributes, transform tokens, redirect to external IDPs, or execute arbitrary server-side logic. You could build flows that the OAuth spec never contemplated.\nThe question is: can SaaS IDPs match this?\nIDP Customization Level Mechanism Arbitrary Flow Logic? ForgeRock (on-prem/PingOne AIC) Very High Authentication Trees/Journeys \u0026ndash; visual flow builder with custom JS/Java nodes Yes \u0026ndash; each node can branch, call external APIs, transform data Ping DaVinci High Visual flow builder with connectors \u0026ndash; similar concept to ForgeRock trees Yes \u0026ndash; connectors to external services, conditional branching Auth0 Moderate-High Actions \u0026ndash; arbitrary Node.js code at specific pipeline stages (login, registration, token exchange) Partially \u0026ndash; code runs at fixed trigger points, not arbitrary flow positions Okta Moderate Inline Hooks + Event Hooks \u0026ndash; call external APIs at specific pipeline stages Partially \u0026ndash; hooks at fixed points, can modify tokens/claims but can\u0026rsquo;t restructure the flow Entra ID Low-Moderate Conditional Access policies (powerful but opinionated), B2C Custom Policies (XML, notoriously painful) No \u0026ndash; policies are rule-based, not flow-based. B2C Custom Policies exist but are so complex that most teams avoid them The gap between ForgeRock/DaVinci and Entra is enormous. Auth0 and Ping DaVinci are reasonable compromises \u0026ndash; SaaS with meaningful customization. But Entra, which is what most enterprises default to because they\u0026rsquo;re already Microsoft shops, is where the cliff is steepest.\nThe Entra Paradox: Buy SaaS, Build Custom Anyway Here is a pattern that repeats across enterprises:\nCompany adopts Entra ID as the corporate IDP. Employees authenticate to internal apps. Works perfectly. Company launches a consumer-facing application. Needs anonymous browsing, social login, progressive registration, risk-based MFA, custom branding per locale. Entra B2C is evaluated. Custom Policies are XML-based, poorly documented, and debugging them is an exercise in reading Azure log tables with 30-second propagation delays. The team burns two sprints trying to build a custom registration flow and gives up. The team builds a custom authentication layer \u0026ndash; a Spring Boot / Node.js service that handles the consumer auth flows, issues its own session tokens, and calls Entra only for employee-facing SSO. The company now runs two identity systems: Entra for workforce, custom code for consumers. The custom layer has its own user store, its own session management, and its own token logic. This defeats the entire purpose of adopting a SaaS IDP. The \u0026ldquo;custom IDP in front of Entra\u0026rdquo; is essentially a poor man\u0026rsquo;s ForgeRock \u0026ndash; except ForgeRock gave you audited, security-reviewed building blocks, and your custom Spring Boot service gives you code written by developers who are not identity specialists.\nThe Security Disaster of DIY Auth This is the part that keeps me up at night. Most software engineers are not identity engineers. Identity and access management is a specialization with its own threat models, attack vectors, and subtle failure modes. When teams are forced to build custom auth layers \u0026ndash; because their SaaS IDP can\u0026rsquo;t handle the use case \u0026ndash; they make predictable, dangerous mistakes:\nTokens in localStorage. The custom layer issues JWTs and the frontend stores them in localStorage. This is XSS-vulnerable \u0026ndash; any injected script can read the token and exfiltrate it. HttpOnly cookies exist for exactly this reason, but the developer didn\u0026rsquo;t know that because they\u0026rsquo;re not an identity engineer.\nRolling own JWT validation. The custom service validates JWTs by checking the signature and expiry. It doesn\u0026rsquo;t validate the aud (audience) claim, the iss (issuer) claim, or the azp (authorized party). An attacker with a valid token from a different application can use it here. Or worse: the validation accepts alg: none because the library defaults to permissive mode (the \u0026ldquo;algorithm confusion\u0026rdquo; attack that has breached multiple production systems).\nPredictable password reset tokens. The custom auth layer generates reset tokens using UUID.randomUUID() or, in the worst cases, sequential integers or timestamps. An attacker can enumerate or predict tokens and reset any user\u0026rsquo;s password.\nMissing CSRF protection. The login endpoint accepts POST requests without CSRF tokens. An attacker crafts a page that auto-submits a login form with the attacker\u0026rsquo;s credentials, logging the victim into the attacker\u0026rsquo;s account (login CSRF), then capturing anything the victim types.\nSession fixation. The custom layer reuses the same session ID before and after authentication. An attacker sets a known session ID in the victim\u0026rsquo;s browser, waits for the victim to log in, then hijacks the now-authenticated session.\nInsecure \u0026ldquo;remember me.\u0026rdquo; The custom layer implements persistent login by storing the user ID in an unsigned cookie. Changing the user ID in the cookie logs you in as a different user.\nLogging tokens and secrets. The custom layer logs the full HTTP request for debugging, including Authorization headers with bearer tokens. The log aggregator now contains valid access tokens that anyone with log access can use.\nThese are not hypothetical. Every one of these has been found in production systems built by competent backend engineers who were simply not trained in identity security.\nThe irony is sharp. Entra\u0026rsquo;s inflexibility is the direct cause of these custom layers. The company bought Entra to avoid building identity infrastructure. Entra couldn\u0026rsquo;t handle the use case. The team built custom identity code. The custom code has security vulnerabilities that Entra \u0026ndash; or ForgeRock, or Auth0 \u0026ndash; would never have.\nForgeRock\u0026rsquo;s Authentication Trees, Auth0\u0026rsquo;s Actions, and Ping DaVinci\u0026rsquo;s Connectors are secure, composable building blocks. They let you customize the flow without writing raw auth code. The security of each building block is reviewed, tested, and maintained by identity specialists. Your custom Spring Boot auth service has none of these properties.\nDIY auth on top of Entra gives you the worst of both worlds: vendor lock-in for workforce identity AND security risk for consumer identity.\nAnonymous Sessions: The IDP Gap A common e-commerce flow: a user browses your site without logging in, adds items to a cart, and eventually registers or logs in to complete the purchase. The application needs to track the anonymous session and merge it with the authenticated identity on login.\nEntra ID requires users to exist in the directory before issuing any token. There is no concept of an anonymous or transient session in the IDP. The user must register, be provisioned, and exist as a directory object before Entra will authenticate them.\nForgeRock could handle this: create a transient session with a limited-scope token (no directory entry required), track the user\u0026rsquo;s activity, and on registration, promote the transient session to a full authenticated session and link the accumulated state.\nThe counter-argument: is an anonymous session an IDP concern or an application concern? A purist would say the application should manage anonymous state (a server-side session, a cookie with a cart ID) and only involve the IDP when the user actually authenticates. The IDP handles identity; the application handles pre-identity state.\nIn practice, the separation is messy. The anonymous session needs some form of token or identifier. The application builds its own mini session management system \u0026ndash; cookies, server-side state, expiry logic. When the user logs in via the IDP, the application must link the anonymous session to the authenticated user. Now you have two parallel session systems: the IDP\u0026rsquo;s OIDC tokens and the application\u0026rsquo;s anonymous session cookies. The integration point between them is custom code \u0026ndash; and we\u0026rsquo;ve already discussed what happens when non-identity engineers write custom auth code.\nWhen OAuth2 Is Overkill Here is a question that almost nobody asks: do you actually need OAuth2?\nThe full OAuth2 + OIDC flow for a web application:\nUser clicks \u0026ldquo;Log in\u0026rdquo; Redirect to authorization endpoint with client_id, redirect_uri, scope, state, code_challenge (PKCE) User authenticates at the IDP IDP redirects back with an authorization code Backend exchanges the code for tokens (ID token + access token + refresh token) Backend validates the ID token (signature, issuer, audience, expiry, nonce) Backend stores the access token (for API calls) and refresh token (for renewal) Client receives a session cookie or the tokens directly On token expiry, the refresh token is used to get new tokens On logout, tokens are revoked at the IDP Now consider a traditional server-rendered web application with a single backend. No microservices. No third-party API delegation. The backend serves HTML and handles all business logic.\nThe same flow with a server-side session:\nUser submits username and password Backend validates credentials against its database (or LDAP, or delegates to an IDP via a simple redirect) Backend creates a server-side session (stored in Redis, a database, or memory) Backend sets a Set-Cookie: sessionId=abc123; HttpOnly; Secure; SameSite=Strict header Every subsequent request includes the cookie automatically Backend looks up the session by ID, gets the user context On logout, the session is deleted server-side \u0026ndash; instant revocation No token rotation. No JWT validation. No refresh logic. No PKCE. No token storage decisions. The session lives on the server. The cookie is a pointer. Revocation is instant \u0026ndash; delete the session row. No waiting for token expiry, no revocation endpoint, no token introspection.\nDimension OAuth2 + JWT Server-Side Session Revocation Delayed (until token expires) unless you add token introspection or a blacklist Instant (delete session from store) Token size JWT can be 1-3KB+ (claims, signature) \u0026ndash; sent on every request Session ID is ~32 bytes Client complexity Must handle token storage, refresh logic, silent renewal Zero \u0026ndash; browser handles cookies automatically Validation cost Cryptographic signature verification on every request Session store lookup (Redis: \u0026lt;1ms) Distributed systems Stateless \u0026ndash; any server can validate the token independently Requires shared session store (Redis, database) Third-party API access Access token can be forwarded to other services Not applicable without additional mechanism Mobile apps Tokens work natively Session cookies work but less idiomatically When OAuth2 is necessary:\nMicroservice architectures where multiple backends need to validate the user\u0026rsquo;s identity independently Third-party API delegation (the original OAuth2 use case \u0026ndash; \u0026ldquo;let this app access my Google Drive\u0026rdquo;) Mobile and native apps where browser cookies don\u0026rsquo;t apply naturally SPAs calling multiple backend services on different domains When a session cookie is sufficient:\nServer-rendered web applications (Rails, Django, Spring MVC with Thymeleaf) Monolithic backends that serve a single frontend Applications where all auth decisions happen server-side Internal tools and admin panels The industry has a tendency to default to OAuth2 for everything because it\u0026rsquo;s the \u0026ldquo;modern\u0026rdquo; approach. But OAuth2 was designed to solve the delegation problem \u0026ndash; \u0026ldquo;let a third party access resources on behalf of the user.\u0026rdquo; If there is no third party and no delegation, the ceremony of authorization codes, token exchanges, and refresh rotation adds complexity without adding value. The good old JSESSIONID did the job for twenty years and still does it for applications that don\u0026rsquo;t need token delegation.\nWhere Each IDP Actually Fits Use Case Recommended IDP Why Workforce identity (employees → internal apps) Entra ID, Okta Standard OIDC, Conditional Access, device compliance. This is what they\u0026rsquo;re built for. Consumer identity with moderate customization Auth0 Actions provide meaningful customization. Good social login support. Reasonable pricing. Complex enterprise + consumer flows Ping DaVinci Visual flow builder closest to ForgeRock\u0026rsquo;s trees. Connectors for external services. Conditional branching. Maximum customization, team available to operate ForgeRock / PingOne AIC (self-hosted or managed) Authentication Trees/Journeys. Arbitrary logic. Full control. But you need a team to run it. Small teams, homelab, open source Keycloak, Authentik Surprisingly capable. OIDC-compliant. Free. Good enough for most small-to-medium deployments. Simple server-rendered web app Session cookies + your framework\u0026rsquo;s auth No IDP needed. Spring Security sessions, Django auth, Rails sessions. Add an IDP only when you need SSO or delegation. The critical mistake: using Entra for consumer-facing identity. Entra is an employee directory with an OIDC endpoint. It is not a consumer identity platform. Using it as one leads to the paradox described above: buying SaaS, then building custom code because the SaaS can\u0026rsquo;t handle the use case.\nConclusion The identity industry is stuck in an awkward middle ground:\nSaaS IDPs (especially Entra) solve the 80% case \u0026ndash; workforce SSO, standard OIDC, MFA \u0026ndash; but hit a customization cliff for anything beyond the spec. When teams encounter that cliff, they build custom auth layers that are less secure than what the SaaS IDP replaced.\nOn-prem IDPs (ForgeRock, early Ping) offered maximum flexibility through composable, security-reviewed building blocks. But they required dedicated teams to operate, patch, and secure \u0026ndash; a cost many organizations couldn\u0026rsquo;t justify.\nCustom code fills the gap between the two, but most engineers are not identity engineers. The custom auth layers they build contain predictable, dangerous security vulnerabilities. The irony: the SaaS IDP\u0026rsquo;s inflexibility is the direct cause of this insecure custom code.\nAnd underlying all of this, a simpler question that rarely gets asked: does this application actually need OAuth2? For a significant number of web applications \u0026ndash; server-rendered, single-backend, no API delegation \u0026ndash; the entire token ceremony adds complexity without adding value. A server-side session with an HttpOnly cookie provides the same user experience with instant revocation, zero client-side token management, and decades of battle-tested implementations.\nThe right approach is not to pick the most \u0026ldquo;modern\u0026rdquo; tool. It\u0026rsquo;s to evaluate honestly:\nDo you need an IDP at all? If it\u0026rsquo;s a simple web app with its own user store and no SSO requirement, session-based auth is fine. If you need an IDP, do you need customization? If standard OIDC flows cover your use case, any SaaS IDP works. If you need customization, don\u0026rsquo;t pick Entra. Pick Auth0 or Ping DaVinci \u0026ndash; SaaS with composable flow builders. If you find yourself building a custom auth layer in front of your SaaS IDP, stop and reconsider. You\u0026rsquo;ve already lost the battle the SaaS was supposed to win. Either pick a more flexible IDP or accept the constraints of the one you have. The worst outcome is both: vendor lock-in and custom security code. ","permalink":"https://blogs.joshuaantony.com/posts/identity-provider-customization-cliff/","summary":"The identity industry is stuck between SaaS IDPs that aren\u0026rsquo;t flexible enough and custom solutions that aren\u0026rsquo;t secure enough. And for a surprising number of applications, the entire OAuth2 token ceremony is overkill \u0026ndash; a session cookie would do.","title":"The Identity Provider Customization Cliff: When OAuth2 Is Overkill and SaaS IDPs Aren't Enough"},{"content":"I once worked in a microservices architecture built by an exceptionally talented team. They were smart, thoughtful, and well-read in distributed systems theory. Every architectural decision they made was defensible in isolation \u0026ndash; backed by conference talks, Netflix blog posts, and Martin Fowler articles.\nThe result was a system that took new developers three months to become productive in. Not because the business domain was complex (it wasn\u0026rsquo;t), but because the infrastructure complexity dwarfed the business logic. You couldn\u0026rsquo;t trace a simple HTTP request without understanding HAProxy sidecar configs, Hystrix command wrappers, Apache Camel routing DSLs, six API versions, and a shared framework that every service was mandated to use.\nEach decision was smart. The combination was crippling.\nHAProxy Sidecar for mTLS Every microservice had an HAProxy instance running as a sidecar. All east-west traffic (service-to-service) was encrypted via mutual TLS. The reasoning was zero-trust networking: even within the internal network, every connection was authenticated and encrypted.\nThe cost:\nTLS handshake overhead on every call. Each inter-service HTTP request included a full TLS handshake (or session resumption). For services making dozens of downstream calls per request, this added measurable latency.\nCertificate rotation was a recurring operational incident. Every service had its own certificate with an expiry date. When certs expired \u0026ndash; and they did, because manual rotation doesn\u0026rsquo;t scale \u0026ndash; the service couldn\u0026rsquo;t communicate with anything. The team built custom monitoring for cert expiry and a rotation pipeline, which itself required maintenance.\nDebugging encrypted traffic was painful. You couldn\u0026rsquo;t tcpdump between services anymore. You couldn\u0026rsquo;t use a simple HTTP proxy to inspect requests. Every debugging session required either disabling mTLS temporarily (which nobody wanted to do in production) or decrypting captures with the service\u0026rsquo;s private key (which was rotated, so you needed the right key for the right time window).\nThe real issue: they reinvented what Istio and Linkerd provide natively. A service mesh handles mTLS transparently \u0026ndash; automated cert rotation, traffic policies, observability dashboards, and you still get to debug using the mesh\u0026rsquo;s built-in tools. Building the same capability from HAProxy configs and custom cert management scripts is the worst of both worlds: all the operational burden with none of the tooling.\nTo be fair: mTLS between services is a legitimate requirement in regulated industries. PCI-DSS, SOC 2, and certain financial regulations require encrypted internal traffic. The problem wasn\u0026rsquo;t mTLS itself \u0026ndash; it was implementing it via custom HAProxy sidecars instead of a purpose-built service mesh, and applying it to an environment where the threat model didn\u0026rsquo;t justify it.\nIf you\u0026rsquo;re running services in a private VPC with trusted workloads, network-level encryption (WireGuard, AWS VPC encryption, GCP\u0026rsquo;s default VM-to-VM encryption) achieves the same goal with zero per-request overhead.\nHystrix: Netflix Cargo-Culting Every outbound HTTP call was wrapped in a Hystrix command. Circuit breaker, thread pool isolation, fallback methods, timeout configuration \u0026ndash; the full Netflix resilience pattern applied to every downstream dependency.\nHere\u0026rsquo;s the problem: Hystrix has been in maintenance mode since 2018. Netflix themselves stopped using it and moved to their internal resilience library. The last meaningful commit to the open-source project was over seven years ago. Choosing Hystrix after 2020 is choosing an abandoned library.\nBut the deeper problem is the cargo-culting. \u0026ldquo;Netflix uses circuit breakers, so we need circuit breakers.\u0026rdquo; Netflix runs thousands of microservices serving hundreds of millions of users, where a single slow dependency can cascade into a site-wide outage. Their scale creates problems that justify circuit breakers, bulkhead isolation, and graceful degradation.\nMost companies are not Netflix. If your service has five downstream dependencies and handles a few hundred requests per second, you don\u0026rsquo;t need circuit breakers. You need:\nProper timeouts. Set a connection timeout (1-2 seconds) and a read timeout (5-10 seconds) on your HTTP client. If the downstream is slow, you fail fast. Retries with exponential backoff. Retry once or twice on transient failures (5xx, connection reset). Don\u0026rsquo;t retry on 4xx (client errors). Health checks. If a downstream is down, your orchestrator (Kubernetes) should detect it and stop routing traffic. These are three lines of HTTP client configuration. Hystrix adds a wrapper class, a fallback method, a thread pool configuration, a metrics dashboard, and a failure threshold tuning exercise \u0026ndash; for every single downstream call. The operational cost of configuring and maintaining Hystrix across dozens of services dwarfs the benefit it provides at non-Netflix scale.\nIf you genuinely need circuit breaking \u0026ndash; because you have an unreliable third-party dependency with no SLA, or you\u0026rsquo;ve measured cascading failure in production \u0026ndash; use Resilience4j. It\u0026rsquo;s lightweight, actively maintained, integrates cleanly with Spring Boot, and doesn\u0026rsquo;t require wrapping every call in a command class.\nBut ask the question first: do you actually have a cascading failure problem? Or are you adding circuit breakers because a Netflix engineering blog post made them sound essential? \u0026ldquo;Netflix uses it\u0026rdquo; is not an architecture decision. It\u0026rsquo;s an appeal to authority from a company whose problems are not your problems.\nApache Camel Between Layers This was the most baffling decision. Apache Camel is an Enterprise Integration Patterns (EIP) framework. It implements patterns like Content-Based Router, Message Translator, Splitter, Aggregator, and Wire Tap. It\u0026rsquo;s designed for connecting different systems \u0026ndash; routing messages between Kafka, RabbitMQ, FTP servers, databases, REST APIs, and file systems.\nThe team used it for communication between layers within the same microservice. The controller layer didn\u0026rsquo;t call the service layer directly. Instead, it sent a message through a Camel route, which routed it to the service layer, which processed it and sent the result back through another Camel route to the controller.\nThis is like using a postal service to send a letter to someone sitting next to you.\nWhat the code looked like:\n// Instead of this: @PostMapping(\u0026#34;/orders\u0026#34;) public Order createOrder(@RequestBody OrderRequest request) { return orderService.create(request); } // They did this: @PostMapping(\u0026#34;/orders\u0026#34;) public Order createOrder(@RequestBody OrderRequest request) { return producerTemplate.requestBody(\u0026#34;direct:createOrder\u0026#34;, request, Order.class); } // With a Camel route: from(\u0026#34;direct:createOrder\u0026#34;) .process(exchange -\u0026gt; { OrderRequest req = exchange.getIn().getBody(OrderRequest.class); Order result = orderService.create(req); exchange.getIn().setBody(result); }); The justification was \u0026ldquo;decoupling.\u0026rdquo; But decoupling between layers within a service is solved by interfaces and dependency injection \u0026ndash; the pattern that Spring Framework has provided since 2003. If OrderController depends on OrderService (an interface), swapping the implementation is a one-line Spring configuration change. No routing framework needed.\nCamel added:\nA routing DSL that developers had to learn to understand what was essentially a method call Type conversion overhead (serializing and deserializing objects through Camel\u0026rsquo;s exchange mechanism) A message-passing abstraction that made stack traces useless (the actual business exception was wrapped in Camel\u0026rsquo;s exchange error handling) Additional configuration, dependencies, and startup time for the Camel context For zero benefit over orderService.create(request).\nAPI Versioning: Six Versions Deep The services exposed REST APIs with URL-based versioning: /v1/orders, /v2/orders, all the way to /v6/orders. Six live versions. All in production. All serving traffic.\nNobody could tell you what the differences were between v3 and v4. The changelog, if it existed, was a Confluence page that hadn\u0026rsquo;t been updated since v2. New developers had to read the source code of each version\u0026rsquo;s controller to understand what changed \u0026ndash; and the controllers shared service layer code with version-specific branches scattered through the business logic:\npublic Order processOrder(OrderRequest request, int apiVersion) { Order order = new Order(); order.setItems(request.getItems()); if (apiVersion \u0026gt;= 3) { order.setShippingMethod(request.getShippingMethod()); } if (apiVersion \u0026gt;= 5) { order.setGiftWrap(request.getGiftWrap()); } if (apiVersion \u0026lt; 4) { order.setLegacyTaxCalculation(true); } // ... more version branches } The business logic was contaminated with version checks. Tests had to cover every version permutation. Dead code accumulated because nobody knew if a client was still using v1 \u0026ndash; and nobody wanted to break a client by removing it.\nVersioning should be a compatibility contract, not a changelog. A new version should only be created for breaking changes \u0026ndash; removing a field, changing a data type, restructuring the response. Adding a new optional field is not a breaking change. It does not warrant a new version.\nThe practical approach:\nSupport at most two versions: current and previous. When v3 ships, deprecate v1 with a sunset date and remove it. Version bumps for breaking changes only. Adding an optional field? Add it to the current version. Renaming a field? New version. Document the breaking changes. If you can\u0026rsquo;t articulate what broke between v3 and v4, you didn\u0026rsquo;t need v4. Monitor version usage. If no client is calling v2, remove it. Don\u0026rsquo;t let dead versions accumulate out of fear. If you have six live API versions, you don\u0026rsquo;t have versioning. You have six different APIs that share a database. The maintenance cost grows linearly with each version, and the team\u0026rsquo;s ability to reason about the system degrades with every version branch in the business logic.\nThe Common Framework Trap A \u0026ldquo;core platform team\u0026rdquo; built a mandated common framework. Every microservice was required to use it. It included:\nA mandated base Docker image (specific OS, JDK version, and agent binaries) A custom Spring Boot starter with opinionated defaults for logging, metrics, tracing, and health checks Shared libraries for HTTP clients, database access, message queue consumers, and authentication A mandated project structure and build configuration The intent was consistency: every service looks the same, uses the same libraries, follows the same patterns. In practice:\nThe framework didn\u0026rsquo;t cater to all use cases. A service that only consumed Kafka messages and wrote to a database still had to include the HTTP server components, the REST client libraries, and the authentication middleware. The framework was designed for the common case and every service paid for features it didn\u0026rsquo;t use \u0026ndash; in startup time, memory, and dependency surface.\nTeams couldn\u0026rsquo;t choose their own dependencies. If the framework used Apache HttpClient 4.x for HTTP calls, you couldn\u0026rsquo;t use OkHttp or the built-in Java 11 HttpClient \u0026ndash; even if they were better for your use case. The framework\u0026rsquo;s choices became your constraints.\nTeams couldn\u0026rsquo;t upgrade independently. When the framework released a new major version, every service had to upgrade simultaneously. A framework change that benefited Team A\u0026rsquo;s use case could break Team B\u0026rsquo;s customizations. The upgrade became a coordinated, multi-sprint effort across all teams \u0026ndash; exactly the kind of big-bang deployment that microservices were supposed to eliminate.\nTeams worked around the framework. When the framework couldn\u0026rsquo;t do what a team needed, they used reflection to override internal behavior, extended framework classes in fragile ways, or added parallel implementations alongside the framework\u0026rsquo;s versions. The workarounds were harder to maintain than if the team had just written their own code from scratch.\nThe framework became a bottleneck. Feature requests piled up on the core team. Product teams were blocked waiting for framework changes. The core team prioritized based on their own roadmap, not product urgency. A team that needed a simple change to the HTTP client configuration waited three sprints for the core team to review, approve, and release it.\nThis is the DRY principle taken to a destructive extreme. Don\u0026rsquo;t Repeat Yourself is good advice within a single codebase. Applied across microservices, it creates exactly the coupling that microservices were designed to eliminate.\nTwo services with similar-but-not-identical code are better than two services coupled through a shared library. Duplication is cheaper than the wrong abstraction. When you share code between services, you share deployment schedules, upgrade timelines, and failure modes. You lose the ability to deploy, scale, and evolve each service independently \u0026ndash; which is the entire point of microservices.\nAmazon\u0026rsquo;s \u0026ldquo;two-pizza team\u0026rdquo; model works because each team owns their full stack. A mandated common framework violates this by centralizing infrastructure decisions in a core team that doesn\u0026rsquo;t feel the pain of the product teams they serve.\nWhat a healthy platform team provides instead:\nTemplates, not mandates. A starter template that teams can fork and own. They start with a consistent base but are free to diverge. Libraries, not frameworks. Small, focused libraries (a logging adapter, a tracing helper) that teams opt into, not a monolithic framework that teams can\u0026rsquo;t escape. Golden paths, not golden cages. Documented recommendations (\u0026ldquo;we suggest using Resilience4j for circuit breaking\u0026rdquo;) rather than enforced constraints (\u0026ldquo;you must use our circuit breaker wrapper\u0026rdquo;). Banning Squash Merge The team banned squash merges in all repositories. Every individual commit in a pull request was preserved in the main branch. The reasoning: granular history enables git bisect, cherry-picking, and per-commit attribution.\nThis only works if the team writes clean, atomic commits:\nfeat: add payment validation endpoint feat: add Stripe webhook handler fix: handle duplicate webhook deliveries test: add payment flow integration tests This history tells a story. Each commit is a coherent, reviewable unit of work. git bisect can pinpoint exactly which commit introduced a regression.\nIn reality, most pull requests contain:\nwip fix fix again address review comments actually fix oops forgot file fix lint fix tests please work Preserving this in the main branch provides no value. git bisect on a \u0026ldquo;wip\u0026rdquo; commit tells you nothing. The main branch history becomes noise.\nFor the 90% of teams that write work-in-progress commits (which is most humans), squash merge produces a cleaner, more useful history: one commit per PR with a descriptive message summarizing the change.\nBanning squash merge in this context was another manifestation of the same culture: optimize for theoretical correctness regardless of practical benefit. The team valued process purity over the pragmatic reality that most developers do not write museum-quality commit histories.\nThe Compound Effect Here is what a new developer faced when joining this team:\nUnderstand HAProxy sidecar configs to know how services communicate Learn Hystrix command patterns to understand how outbound calls are wrapped Read Apache Camel routing DSLs to trace request flow within a single service Navigate six API versions with undocumented differences to understand the current behavior Learn the common framework\u0026rsquo;s opinions, overrides, and workarounds to modify any infrastructure behavior Read the full commit history (unsquashed) to understand why a piece of code exists Each of these is a learning curve. Combined, they created a system where the infrastructure was more complex than the business domain. The actual business logic \u0026ndash; creating orders, processing payments, managing inventory \u0026ndash; was straightforward. But it was buried under layers of integration frameworks, resilience patterns, versioning branches, and framework abstractions.\nThe team was not incompetent. They were over-informed. They had read every distributed systems paper, watched every Netflix tech talk, and attended every conference. They applied best practices from organizations operating at scales they would never reach, solving problems they did not have.\nThe Simplest Thing That Works The best architecture is not the one with the most patterns. It\u0026rsquo;s the simplest one that meets the actual requirements:\nWhat they did What would have been sufficient HAProxy sidecar for mTLS VPC-level encryption (or Istio if mTLS was actually required) Hystrix circuit breakers on every call HTTP client timeouts + retries with backoff Apache Camel between layers service.doThing(data) \u0026ndash; a method call Six live API versions Two versions max, deprecate aggressively Mandated common framework Starter template + opt-in libraries Squash merge ban Allow squash merge, encourage clean commit messages None of these simplifications would have reduced the system\u0026rsquo;s reliability, security, or scalability. They would have reduced onboarding time from three months to three weeks, made debugging a matter of reading code instead of reading Camel routes and HAProxy configs, and let teams move at their own pace instead of waiting for the core team\u0026rsquo;s release cycle.\nComplexity is not a sign of sophistication. It\u0026rsquo;s a cost. Every layer of abstraction, every framework, every pattern must justify its existence against the question: does this solve a problem we actually have, or a problem we read about in a blog post?\n","permalink":"https://blogs.joshuaantony.com/posts/overengineering-microservices/","summary":"HAProxy sidecars for mTLS. Hystrix for circuit breaking. Apache Camel between layers within the same service. Six API versions. A mandated common framework. A squash merge ban. Each decision was defensible. Combined, they created a system that took three months to onboard into.","title":"Overengineering Microservices: When Smart Decisions Compound Into Complexity"},{"content":"The Anti-Pattern A disturbingly common pattern in internal APIs:\nGET /items?id=1,2 → 200 OK (2 results) GET /items?id=1,2 → 200 OK (1 result -- one ID didn\u0026#39;t exist) GET /items?id=1,2 → 200 OK (0 results) GET /items?id=999 → 200 OK (empty body) GET /items?id=abc → 200 OK (validation error buried in body) Every response is 200 OK. The only way to know what actually happened is to parse the body, inspect it, and hope the shape tells you something. This is not REST. This is \u0026ldquo;HTTP as a transport layer for mystery payloads.\u0026rdquo;\nThe Production Incident Endpoint: GET /xx?id=1,2\nWhat happened:\nClient requested two records by passing id=1,2. Server found only one record (ID 2 existed, ID 1 did not). Server returned 200 OK with a response body containing one record. Client code assumed that a 200 meant \u0026ldquo;all requested records returned successfully.\u0026rdquo; Client attempted to map the response to two objects \u0026ndash; parse error in production. Root cause: The status code lied. 200 OK means \u0026ldquo;the request has succeeded\u0026rdquo; (RFC 9110). The client had every right to trust it. The API gave no signal that the response was incomplete.\nWhat should have happened: The server should have returned a status code that communicates \u0026ldquo;I couldn\u0026rsquo;t fully satisfy your request\u0026rdquo; \u0026ndash; forcing the client to handle the partial result explicitly.\nWhy This Matters HTTP status codes are not decoration. They are a contract consumed by:\nConsumer What it does with status codes Client application code Branches on 2xx/4xx/5xx to decide success vs. error handling Retry logic / circuit breakers Retries on 503, backs off on 429, never retries on 400 Load balancers Marks backends unhealthy based on 5xx rates API gateways Routes, rate-limits, and caches based on status codes CDN / caching layers Caches 200 responses; never caches 500s Monitoring / alerting Fires alerts when 5xx rate exceeds threshold Logging / observability Dashboards aggregate by status code to show error rates When you return 200 for everything, every one of these systems is blind. Your monitoring shows 0% error rate while production is on fire.\nCorrect Status Codes for Common Scenarios Single-Resource Endpoints Scenario Correct Code Meaning Resource found 200 OK Here is the resource Resource not found 404 Not Found That ID does not exist Resource created 201 Created Resource created; Location header points to it Resource deleted 204 No Content Deleted successfully; no body Invalid input 400 Bad Request Malformed request (bad syntax, missing required fields) Validation failure 422 Unprocessable Entity Syntactically valid but semantically wrong Unauthorized 401 Unauthorized No valid credentials provided Forbidden 403 Forbidden Authenticated but not authorized Server error 500 Internal Server Error Something broke on our side Multi-Resource / Batch Endpoints (The Hard Part) This is where GET /xx?id=1,2 lives. You asked for multiple things. What if some succeed and some don\u0026rsquo;t?\nScenario Option A Option B All found 200 OK with all records 200 OK Some found, some missing 200 OK with partial results + explicit metadata 207 Multi-Status with per-item status None found 404 Not Found 200 OK with empty array + warning Some found, some errored 207 Multi-Status 200 OK with error details per item Recommended Patterns for Multi-ID Endpoints Pattern 1: Strict \u0026ndash; Fail the Whole Request If any ID is not found, return an error. Simple, safe, forces the client to deal with it.\nGET /items?id=1,2 // ID 1 not found: HTTP/1.1 404 Not Found { \u0026#34;error\u0026#34;: \u0026#34;not_found\u0026#34;, \u0026#34;message\u0026#34;: \u0026#34;The following IDs were not found: [1]\u0026#34;, \u0026#34;missing_ids\u0026#34;: [1] } Pros: Impossible to silently lose data. Client must handle it. Cons: One missing record blocks the entire request. Can be frustrating for best-effort use cases.\nPattern 2: Lenient \u0026ndash; Return What You Have, Signal What\u0026rsquo;s Missing Return available records with 200, but include metadata so the client knows the response is partial.\nGET /items?id=1,2 HTTP/1.1 200 OK { \u0026#34;requested_ids\u0026#34;: [1, 2], \u0026#34;returned_count\u0026#34;: 1, \u0026#34;missing_ids\u0026#34;: [1], \u0026#34;data\u0026#34;: [ { \u0026#34;id\u0026#34;: 2, \u0026#34;name\u0026#34;: \u0026#34;Widget B\u0026#34; } ] } The key: requested_ids, returned_count, and missing_ids make it impossible for the client to silently ignore the gap.\nPros: Partial data is usable. Missing items are explicit. Cons: Lazy clients may still ignore the metadata (but that\u0026rsquo;s their bug, not yours).\nPattern 3: Multi-Status (207) \u0026ndash; Per-Item Status Best for batch operations where each item can independently succeed or fail. Used by WebDAV, Microsoft Graph API, and others.\nGET /items?id=1,2 HTTP/1.1 207 Multi-Status { \u0026#34;results\u0026#34;: [ { \u0026#34;id\u0026#34;: 1, \u0026#34;status\u0026#34;: 404, \u0026#34;error\u0026#34;: \u0026#34;not_found\u0026#34; }, { \u0026#34;id\u0026#34;: 2, \u0026#34;status\u0026#34;: 200, \u0026#34;data\u0026#34;: { \u0026#34;id\u0026#34;: 2, \u0026#34;name\u0026#34;: \u0026#34;Widget B\u0026#34; } } ] } Pros: Maximum clarity. Each item has its own status code. Cons: More complex response structure. 207 is less universally understood.\nPattern 4: 206 Partial Content 206 is traditionally used for range requests (byte ranges in file downloads), but some APIs repurpose it to signal \u0026ldquo;I\u0026rsquo;m returning less than you asked for.\u0026rdquo;\nGET /items?id=1,2 HTTP/1.1 206 Partial Content { \u0026#34;data\u0026#34;: [ { \u0026#34;id\u0026#34;: 2, \u0026#34;name\u0026#34;: \u0026#34;Widget B\u0026#34; } ], \u0026#34;missing_ids\u0026#34;: [1] } Pros: Status code itself signals incompleteness \u0026ndash; hard to ignore. Cons: Pedants will argue 206 is only for byte-range requests (they\u0026rsquo;re technically right per RFC 9110).\nWhat Would Have Prevented the Production Incident Any of these would have caught the problem:\nApproach How it helps Pattern 1 (404) Client gets a 404, error handling kicks in, no parse error Pattern 2 (metadata) Client checks returned_count !== requested_ids.length, handles gracefully Pattern 3 (207) Client sees per-item status, knows ID 1 was missing Client-side validation Client compares response array length to request array length (defensive coding \u0026ndash; but the API should not rely on this) The worst option is what actually happened: 200 OK with a silently incomplete body and no metadata.\nHow Mature APIs Handle This Stripe Returns 404 if a single resource is not found. Batch endpoints return arrays with individual error objects.\nGoogle APIs Uses standard HTTP codes. Batch requests return 207-style responses with per-item status.\nMicrosoft Graph Explicitly uses 207 Multi-Status for batch operations with per-item HTTP status codes.\nGitHub API Returns 404 for missing resources. Multi-item endpoints return arrays and document when results may be partial (e.g., paginated with Link headers).\nThe Argument to Your Team \u0026ldquo;But we always parse the body anyway, so what does the status code matter?\u0026rdquo;\nYou are not the only consumer. Load balancers, monitoring, caches, API gateways, and future clients all use status codes. They don\u0026rsquo;t parse your body.\n200 OK means the contract was fulfilled. If you return 200, you are saying \u0026ldquo;everything you asked for is here and correct.\u0026rdquo; If that\u0026rsquo;s a lie, you have broken the HTTP contract.\nSilent failures are the most expensive bugs. The production incident happened because the failure was invisible. A proper status code would have made it loud and immediate.\nDefensive client code is not a substitute. Yes, clients should validate responses. But relying on clients to compensate for a lying API is designing for failure.\nEvery major API in the industry does this correctly. Stripe, Google, AWS, GitHub, Microsoft \u0026ndash; none of them return 200 for partial failures. There\u0026rsquo;s a reason for that.\nQuick Reference 200 OK → Request fully succeeded. Response contains what was asked for. 201 Created → Resource created. Location header included. 204 No Content → Success, but nothing to return (e.g., DELETE). 206 Partial Content→ Only part of the resource is returned (range requests). 207 Multi-Status → Batch response; each item has its own status. 400 Bad Request → Malformed request syntax. 401 Unauthorized → Missing or invalid authentication. 403 Forbidden → Authenticated but not permitted. 404 Not Found → Resource does not exist. 409 Conflict → Request conflicts with current state (e.g., duplicate). 422 Unprocessable → Valid syntax but invalid semantics. 429 Too Many Req. → Rate limited. Retry-After header included. 500 Server Error → Something broke on the server side. 502 Bad Gateway → Upstream service returned invalid response. 503 Unavailable → Server temporarily unavailable (maintenance, overload). 504 Gateway Timeout→ Upstream service did not respond in time. ","permalink":"https://blogs.joshuaantony.com/posts/stop-returning-200-ok-for-everything/","summary":"HTTP status codes are a contract consumed by clients, load balancers, monitoring, and caches. When you return 200 OK for partial failures, every one of these systems is blind.","title":"Stop Returning 200 OK for Everything"},{"content":"SAP Commerce Cloud (Hybris) offers an API layer called OCC \u0026ndash; Omni Commerce Connect. It\u0026rsquo;s marketed as a RESTful, stateless API for building headless storefronts, mobile apps, and single-page applications. It\u0026rsquo;s the API that powers Spartacus, SAP\u0026rsquo;s reference Angular storefront.\nThe API endpoints look right. POST /occ/v2/{site}/carts/{cartId}/entries to add an item to a cart. GET /occ/v2/{site}/products/{productCode} to fetch a product. Cart IDs in the URL. OAuth tokens in the header. JSON in, JSON out. Textbook REST.\nExcept it isn\u0026rsquo;t stateless. Underneath the REST surface, every OCC request creates an HTTP session, loads the cart into it, hydrates user context, site configuration, and currency settings \u0026ndash; then routes the request through the same session-dependent facade layer that was built for the old JSP Accelerator storefront.\nOCC is not a stateless API. It is a REST facade bolted onto a stateful monolith.\nWhat Actually Happens on Every Request When a client calls POST /occ/v2/mysite/carts/12345/entries, the request doesn\u0026rsquo;t go directly to a stateless controller that loads cart 12345 from the database, adds an item, saves, and returns. Instead:\nThe commerceWebServicesSessionFilter intercepts the request. This Spring filter runs before the controller. It reads the cart ID from the URL and the user token from the header.\nThe filter creates an HTTP session (or reuses one if sticky sessions are configured). It loads the cart from the database and puts it into the HTTP session \u0026ndash; session.setAttribute(\u0026quot;cart\u0026quot;, cart). It also sets the current user, the base site, the catalog version, the language, and the currency into session-scoped attributes.\nThe OCC controller calls the facade. CartFacade.addToCart(productCode, quantity) \u0026ndash; the same facade used by the old Accelerator JSP storefront. The facade signature takes no cart parameter. It doesn\u0026rsquo;t need one \u0026ndash; it reads the cart from the session.\nThe facade calls cartService.getSessionCart(). This method reaches into the HTTP session, retrieves the cart object, and returns it. The service adds the item, recalculates totals, and writes the modified cart back to the session.\nThe response is serialized and returned to the client. The session is discarded (in stateless mode) or kept alive (in sticky-session mode).\nThe client thinks it\u0026rsquo;s talking to a stateless API. The server is running the same session-scoped code it has always run. The commerceWebServicesSessionFilter is the glue \u0026ndash; it translates between the REST world (cart ID in URL) and the session world (cart in HttpSession).\nThe Cart: The Most Obvious Example The cart is where the session dependency is most visible. Consider the add-to-cart flow:\nWhat a stateless implementation would do:\n1. Parse cart ID from URL path 2. Load cart from database by ID 3. Add item to cart 4. Save cart to database 5. Return updated cart as JSON No session. No filter. The cart travels through the code as a function parameter.\nWhat OCC actually does:\n1. commerceWebServicesSessionFilter intercepts request 2. Filter reads cart ID from URL 3. Filter loads cart from database 4. Filter puts cart into HTTP session 5. Filter sets user/site/currency in session 6. Controller calls CartFacade.addToCart(productCode, qty) 7. CartFacade calls cartService.getSessionCart() 8. cartService reads cart from HTTP session 9. Item is added to the session cart 10. Modified cart is saved to database 11. Controller reads result and returns JSON Steps 2-5 exist solely to satisfy the facade\u0026rsquo;s assumption that a session cart exists. The filter does a database read, a session write, and context hydration \u0026ndash; on every single API call \u0026ndash; just to set up the environment that the facade expects.\nThe facade was written for the Accelerator storefront, where the user had a persistent browser session. The cart was loaded once at login and lived in the session for the entire shopping experience. It was never designed to be loaded from scratch on every request. OCC forces it to do exactly that.\nPromotions: Session-Dependent Calculation The promotion engine is one of the most complex subsystems in Hybris. It evaluates rules (buy X get Y free, spend $100 get 10% off, etc.) against the cart and applies discounts.\nThe promotion engine reads:\nThe session cart \u0026ndash; cartService.getSessionCart() to get the cart being evaluated The session user \u0026ndash; to determine which customer group promotions apply The session site \u0026ndash; to scope promotions to the correct storefront Cached promotion results \u0026ndash; stored in the session to avoid re-evaluation on every page load For the Accelerator storefront, this made sense. The user was in a session. The cart was in the session. Promotion results were cached in the session for the duration of the shopping experience.\nFor OCC, the commerceWebServicesSessionFilter must hydrate all of this context from scratch on every request. The promotion cache in the session is useless \u0026ndash; the session is created, used once, and discarded. The \u0026ldquo;optimization\u0026rdquo; of caching promotion results in the session becomes pure overhead in the OCC world: allocate session, populate cache, evaluate promotions, serialize response, discard session (and its cache).\nWorse: if the session hydration order changes (e.g., promotions are evaluated before the currency is set in the session), the promotion engine silently uses default values. The promotions might evaluate correctly for one currency and incorrectly for another \u0026ndash; depending on which session attributes the filter happened to set first.\nPricing: Silent Fallbacks from Missing Session Context Price resolution in Hybris depends on session context:\nUser price group \u0026ndash; B2B vs B2C, loyalty tier, contract pricing Currency \u0026ndash; session currency determines which price row is selected Catalog version \u0026ndash; staged vs online catalog, which price catalog is active Date/time \u0026ndash; time-based pricing (handled via session-scoped date context) The PriceService reads these from the session. It does not accept them as method parameters. When the commerceWebServicesSessionFilter sets up the session for an OCC request, it must correctly populate every one of these attributes. If it misses one \u0026ndash; if the currency isn\u0026rsquo;t set, or the user\u0026rsquo;s price group isn\u0026rsquo;t loaded \u0026ndash; the price service does not throw an error. It silently falls back to the default price row.\nThis means a misconfigured OCC filter can return wrong prices without any error, any log message, or any indication that something went wrong. The API returns 200 OK with a price that doesn\u0026rsquo;t match what the customer should see. This is the same class of problem as the \u0026ldquo;return 200 for everything\u0026rdquo; anti-pattern \u0026ndash; the system lies about success.\nIn a properly stateless design, the price service would accept currency, user group, and catalog version as explicit parameters. A missing parameter would be a compile error or a runtime validation error \u0026ndash; not a silent fallback to a default.\nCheckout: The Stateful Wizard Behind REST Endpoints The checkout flow in Hybris was designed as a multi-step wizard:\nSet delivery address → saved to session cart Set delivery mode → saved to session cart Set payment info → saved to session cart Review order → read from session cart Place order → read everything from session cart, create order Each step reads the session cart, modifies it, and writes it back. The next step assumes the previous step\u0026rsquo;s changes are already in the session.\nOCC exposes these as separate REST endpoints:\nPUT /carts/{id}/addresses/delivery (step 1) PUT /carts/{id}/deliverymode (step 2) PUT /carts/{id}/paymentdetails (step 3) POST /orders (step 5) Each of these endpoints hits the commerceWebServicesSessionFilter, which creates a fresh session, loads the cart, hydrates context, calls the facade, and discards the session. The cart\u0026rsquo;s state is persisted to the database between calls, so the flow works \u0026ndash; but it works by accident of persistence, not by design.\nThe real problem: concurrency. If two requests arrive simultaneously for the same cart (e.g., set delivery address and set payment info in parallel), both requests load the cart into separate sessions, both modify their respective fields, and both save. The last save wins. One change is silently lost.\nIn the Accelerator storefront, this was never a problem \u0026ndash; the user clicked through a wizard sequentially, and the session serialized access. In a headless SPA making parallel API calls, it\u0026rsquo;s a data integrity risk.\nThe Scaling Problem The promise of a stateless API is horizontal scalability: any server can handle any request, no affinity required, no shared session state.\nOCC technically achieves this \u0026ndash; any server can handle any request, because the commerceWebServicesSessionFilter loads everything from the database on every request. But the cost is significant:\nPer-request overhead in OCC:\nCreate HTTP session object (memory allocation) Load cart from database (DB query) Load user context (DB query or cache lookup) Set site, catalog, currency, language in session (multiple attribute sets) Execute the actual business logic Discard the session (GC pressure) Per-request overhead in a truly stateless service:\nParse cart ID from URL Load cart from database (DB query \u0026ndash; same as above) Execute the actual business logic Steps 1, 3-4, and 6 in the OCC flow are pure overhead \u0026ndash; they exist only to satisfy the session-dependent facade layer underneath. The database query to load the cart is necessary in both designs, but OCC wraps it in session machinery that adds memory allocation, context hydration, and garbage collection for no functional benefit.\nUnder high load (Black Friday, flash sales), this overhead multiplies. Every concurrent request creates its own session object, hydrates its own context, and discards it. The garbage collector works harder. The session filter becomes a serialization point if it uses any shared state (like the session service\u0026rsquo;s internal locks).\nWhat a Properly Stateless Design Looks Like Here\u0026rsquo;s the same add-to-cart operation, designed from scratch as a stateless service:\n@RestController @RequestMapping(\u0026#34;/api/carts\u0026#34;) public class CartController { @PostMapping(\u0026#34;/{cartId}/entries\u0026#34;) @ResponseStatus(HttpStatus.CREATED) public CartResponse addToCart( @PathVariable UUID cartId, @Valid @RequestBody AddToCartRequest request, @AuthenticationPrincipal JwtUser user) { return cartService.addItem(cartId, request.getProductCode(), request.getQuantity(), user); } } @Service public class CartService { @Transactional public CartResponse addItem(UUID cartId, String productCode, int quantity, JwtUser user) { Cart cart = cartRepository.findByIdAndUserId(cartId, user.getId()) .orElseThrow(() -\u0026gt; new NotFoundException(\u0026#34;Cart not found\u0026#34;)); Product product = productRepository.findByCode(productCode) .orElseThrow(() -\u0026gt; new NotFoundException(\u0026#34;Product not found\u0026#34;)); cart.addEntry(product, quantity); BigDecimal price = priceService.resolve(product, user.getPriceGroup(), user.getCurrency()); cart.recalculate(price); cartRepository.save(cart); return CartMapper.toResponse(cart); } } Notice what\u0026rsquo;s different:\nAspect OCC Stateless Design Cart access cartService.getSessionCart() (reads from HTTP session) cartRepository.findById(cartId) (reads from DB by parameter) User context Read from session, set by filter @AuthenticationPrincipal from JWT \u0026ndash; no session Price resolution PriceService reads currency/group from session priceService.resolve(product, priceGroup, currency) \u0026ndash; explicit parameters Session object Created per request, hydrated, discarded Does not exist Filter chain commerceWebServicesSessionFilter does DB read + session hydration No filter needed Concurrency safety Last-write-wins on parallel requests @Transactional with optimistic locking on cart version Missing context Silent fallback to defaults Compile error (missing parameter) or validation error The stateless design has no session, no filter, no implicit context. Every dependency is an explicit parameter. Missing context is a compile error, not a silent default. Concurrency is handled by the database (optimistic locking), not by hoping requests don\u0026rsquo;t overlap.\nWhy SAP Didn\u0026rsquo;t Fix It The honest answer: rewriting the facade and service layer to be truly stateless would mean rewriting the entire Hybris commerce engine.\nThe session dependency is not limited to a few convenience methods. It\u0026rsquo;s woven through hundreds of services:\nSessionService.getAttribute() \u0026ndash; called across cart, pricing, promotions, CMS, search, catalog CartService.getSessionCart() \u0026ndash; used by every cart-related facade UserService.getCurrentUser() \u0026ndash; reads from session-scoped context BaseSiteService.getCurrentBaseSite() \u0026ndash; session attribute CatalogVersionService.getSessionCatalogVersions() \u0026ndash; session attribute I18NService.getCurrentCurrency() / getCurrentLanguage() \u0026ndash; session attributes Every one of these methods assumes a session exists and has been populated. Refactoring them to accept explicit parameters would touch thousands of classes across dozens of extensions. It would break every custom extension built by every SAP customer and partner.\nThe commerceWebServicesSessionFilter was the pragmatic shortcut: keep the stateful engine, bolt a REST skin on top, and use a filter to translate between the two worlds. It works. It scales well enough for most production loads. But it is not what it claims to be.\nKey Takeaway OCC is a leaky abstraction. The REST contract says stateless. The implementation says stateful. The filter is the seam where the two worlds meet, and it\u0026rsquo;s where the architecture\u0026rsquo;s compromises are most visible.\nThis matters because developers building on OCC make assumptions based on the REST contract:\n\u0026ldquo;I can make parallel requests\u0026rdquo; → You can, but concurrent writes to the same cart may lose data. \u0026ldquo;Any server can handle any request\u0026rdquo; → True, but every request pays the session hydration tax. \u0026ldquo;Prices and promotions are deterministic for the same input\u0026rdquo; → They are, unless the session context is hydrated in a different order. \u0026ldquo;Missing parameters will cause errors\u0026rdquo; → They won\u0026rsquo;t. They\u0026rsquo;ll cause silent fallbacks. The lesson is broader than Hybris: wrapping a stateful system in a REST API does not make it stateless. If the implementation depends on session state, the API inherits those dependencies \u0026ndash; regardless of what the endpoint URLs look like. True statelessness is an implementation property, not a contract property.\n","permalink":"https://blogs.joshuaantony.com/posts/hybris-occ-rest-facade-over-stateful-monolith/","summary":"SAP Commerce OCC promises a stateless REST API for headless commerce. Underneath, every request hydrates an HTTP session and routes through the same stateful facades built for the JSP storefront. The API is stateless in contract but stateful in implementation.","title":"Hybris OCC: A REST Facade Over a Stateful Monolith"},{"content":"How the Converter/Populator Pattern Works In SAP Commerce (Hybris), data flows from the persistence layer (Models) to the presentation layer (DTOs) through a Converter/Populator chain in the facade layer:\nController → Facade → Converter → [Populator1, Populator2, ...] → DTO Converter: Creates a new DTO instance, then iterates over a list of Populators. Populator: Implements Populator\u0026lt;SOURCE, TARGET\u0026gt;, copies specific fields from the Model to the DTO. Both are Spring beans, wired via XML or annotation config. public class ProductConverter extends AbstractPopulatingConverter\u0026lt;ProductModel, ProductData\u0026gt; { private List\u0026lt;Populator\u0026lt;ProductModel, ProductData\u0026gt;\u0026gt; populators; } public class ProductBasicPopulator implements Populator\u0026lt;ProductModel, ProductData\u0026gt; { public void populate(ProductModel source, ProductData target) { target.setName(source.getName()); target.setCode(source.getCode()); } } The design intent is extensibility: any extension can add a populator to any converter\u0026rsquo;s list via Spring config, without modifying existing code.\nThe Production Memory Incident What Happened The team enabled populator-level caching to speed up conversion. The cache lived in the JVM heap \u0026ndash; a ConcurrentHashMap keyed by source object identity, storing converted DTO fragments.\nWhy It Blew Up No eviction policy. Cached DTOs accumulated indefinitely. Cache keys held references to source Models, preventing garbage collection of the entire object graph. DTOs are not small. A fully populated ProductData can hold media, categories, prices, promotions, stock levels \u0026ndash; each with their own nested DTOs. Every product variant, every locale, every price row generated a unique cache entry. JVM heap filled up. GC pauses went from milliseconds to seconds. Eventually: OutOfMemoryError. The Real Problem The cache was a band-aid. Conversion was slow because the populator chains were deep. The chains were deep because the framework made it trivially easy to inject converters inside populators. Nobody added the cache because they wanted caching \u0026ndash; they added it because conversion was unacceptably slow, and nobody wanted to untangle the chain.\nThe cache didn\u0026rsquo;t fix the performance problem. It traded CPU pressure for memory pressure.\nThe Deep Chaining Problem Because populators are Spring beans, they can inject other converters. Those converters have their own populators. Those populators can inject yet more converters:\nProductConverter ├── ProductBasicPopulator ├── ProductPricePopulator │ └── PriceConverter │ ├── PriceBasicPopulator │ └── CurrencyPopulator │ └── CurrencyConverter ├── ProductCategoryPopulator │ └── CategoryConverter │ ├── CategoryBasicPopulator │ └── CategoryMediaPopulator │ └── MediaConverter ├── ProductMediaPopulator │ └── MediaConverter ├── ProductStockPopulator │ └── StockConverter └── ProductPromotionPopulator └── PromotionConverter This tree is invisible at compile time. You cannot look at ProductConverter.java and know its actual depth. The graph is assembled at runtime by the Spring container from XML scattered across dozens of extensions.\nConsequences Performance: Converting a single product can trigger hundreds of populator calls across dozens of converters. Debugging: A stack trace from a failed conversion is 40+ frames deep. Memory: Each intermediate DTO is allocated on the heap. Multiply by catalog size. Unpredictability: Adding one populator in one extension can degrade conversion performance across the entire platform. The Core Design Flaw Hybris conflated two fundamentally different concerns into one abstraction:\nConcern Example Nature Needs Dependencies? Structural mapping dto.setName(model.getName()) Mechanical field copy No Business computation Calculate tax-inclusive price for user\u0026rsquo;s locale Domain logic with service calls Yes Both are implemented as Populator\u0026lt;S,T\u0026gt;. Both are Spring beans. Both can inject anything.\nThis means:\nSimple field copies carry the overhead of Spring beans \u0026ndash; proxy creation, AOP interception, bean lifecycle management \u0026ndash; for what amounts to a getter/setter call. Complex business logic hides behind the same interface as field copying. When you see ProductPricePopulator in a populator list, nothing tells you it triggers a full price calculation engine. Chaining is unrestricted. There is no structural constraint preventing a populator from triggering an arbitrarily deep conversion tree. The Argument: Populators Should Not Be Spring Beans Pure Mapping Is Not a Business Component A method that copies model.getName() to dto.setName() is not a service. It has no dependencies, no state, no lifecycle, and no reason to exist in the DI container. Making it a bean adds allocation overhead, enables chaining, and prevents the JVM from inlining the call.\nStatic Methods Would Have Enforced Discipline public final class ProductMapper { private ProductMapper() {} public static ProductData toDto(ProductModel model) { ProductData dto = new ProductData(); dto.setName(model.getName()); dto.setCode(model.getCode()); dto.setDescription(model.getDescription()); return dto; } } Then:\nNo injection is possible. A static method cannot have Spring-injected dependencies. Deep chaining is impossible by construction. The mapping is visible at compile time. Open the file, read the code. No XML, no runtime assembly. Performance is trivial. No proxy, no AOP, no bean lookup. The JVM can inline the entire method. Testing is trivial. Call the static method, assert the output. No Spring context. Business Enrichment Belongs in the Facade For the cases that genuinely need services (pricing, stock, promotions), the logic should be explicit in the facade:\npublic class ProductFacade { private final PriceService priceService; private final StockService stockService; public ProductData getProduct(String code) { ProductModel model = productService.getByCode(code); // Structural mapping -- static, flat, fast ProductData dto = ProductMapper.toDto(model); // Business enrichment -- explicit, visible, debuggable dto.setPrice(priceService.calculatePrice(model, sessionContext)); dto.setStock(stockService.getStockLevel(model, warehouse)); return dto; } } Now the facade is the single place where conversion happens. Business logic is visible. Structural mapping is fast and unchainable.\nWhat a Better Design Looks Like For Structural Mapping: Compile-Time Code Generation Tools like MapStruct generate mapper implementations at compile time:\n@Mapper public interface ProductMapper { ProductData toDto(ProductModel model); } MapStruct generates a concrete class with direct getter/setter calls. No reflection, no proxies, no runtime cost. If a field is missing, it\u0026rsquo;s a compile error, not a runtime surprise.\nComparison Aspect Hybris Populator (Current) Static Mapper + Explicit Enrichment Field mapping cost Spring bean proxy + AOP Direct method call (inlineable) Chain depth Unbounded, runtime-determined Zero (static methods can\u0026rsquo;t chain) Visibility Requires inspecting Spring XML across extensions Read the facade method Debugging 40+ frame stack traces Flat call in facade Caching needed? Often, because chains are slow Rarely, because mapping is already fast Memory risk High (cache + deep DTO trees) Low (no cache, flat mapping) Extensibility Very high (any extension can add populators) Moderate (requires facade override) Predictability Very low Very high Why the Cache Was a Band-Aid The chain of causation:\nPopulators are beans → Beans can inject converters → Chains grow deep (nobody audits the full tree) → Conversion becomes slow → Someone adds a JVM heap cache → Cache has no eviction → OutOfMemoryError in production The cache addressed a symptom (slow conversion) of a symptom (deep chains) of the root cause (mapping utilities treated as injectable components). The actual fix is at the root: stop treating field mapping as a business component. Make it static, make it flat, make it fast enough that caching is unnecessary.\nKey Takeaway The Hybris Populator framework optimized for extensibility at the cost of predictability. By making every mapper a Spring bean, it enabled any extension to modify any conversion \u0026ndash; but it also made the conversion graph invisible, unbounded, and slow. The populator cache was an attempt to paper over the performance cost, which then created its own crisis (heap exhaustion).\nThe lesson: not everything that transforms data is a service. Structural mapping is a compile-time concern. Business enrichment is a runtime concern. Conflating the two into one abstraction, and putting it all in the DI container, creates a system that is easy to extend but impossible to reason about.\n","permalink":"https://blogs.joshuaantony.com/posts/hybris-populator-framework-design-flaws/","summary":"The SAP Commerce Converter/Populator pattern optimized for extensibility at the cost of predictability. Deep chaining, invisible runtime graphs, and JVM heap exhaustion are the consequences.","title":"Hybris Populator Framework: Design Flaws and Memory Pitfalls"},{"content":"What Katalon Was Built For Katalon Studio launched in 2015 as an all-in-one test automation platform built on top of Selenium and Appium. Its target audience was clear: QA teams who didn\u0026rsquo;t write code.\nCore value proposition:\nRecord and playback \u0026ndash; click through the app, Katalon generates the test Visual test builder \u0026ndash; drag-and-drop keywords instead of writing code Low-code abstraction \u0026ndash; Groovy-based keywords hidden behind a GUI All-in-one \u0026ndash; web, mobile, API, desktop testing in a single desktop IDE This was a genuine gap in the market. In 2015-2019, QA was a separate discipline, Selenium was powerful but raw, there was no Playwright, Cypress was new and limited, and AI code generation did not exist.\nKatalon was that bridge. It served its purpose. That era is over.\nWhy It Doesn\u0026rsquo;t Make Sense Anymore 1. Testing Has Shifted Left The industry has moved from \u0026ldquo;QA writes tests after development\u0026rdquo; to \u0026ldquo;developers own tests as part of development.\u0026rdquo; The people writing tests today are developers and SDETs who are fluent in TypeScript, Java, or Python. They don\u0026rsquo;t need a GUI abstraction layer \u0026ndash; they need a fast, scriptable, debuggable framework.\n2. Open-Source DX Has Leapfrogged Katalon Capability Katalon (2016) Open Source (2026) Test generation Record and playback Playwright codegen (records to real code) Auto-wait Manual waits / keyword-based Playwright auto-wait built in Cross-browser Via Selenium (flaky) Playwright: Chromium, Firefox, WebKit natively Debugging Katalon IDE debugger Playwright Trace Viewer, Cypress time-travel Parallel execution Katalon Runtime Engine (licensed) Playwright: built-in, free Reporting Katalon TestOps (licensed) Playwright HTML report, Allure, free API testing Built-in (basic) REST Assured, Karate, Hurl (purpose-built, free) Mobile testing Appium wrapper Appium direct (same thing, no abstraction tax) 3. AI Has Made the Abstraction Redundant Katalon\u0026rsquo;s core value was: \u0026ldquo;QAs don\u0026rsquo;t need to write code.\u0026rdquo;\nIn 2026, AI code assistants generate Playwright tests, REST Assured tests, and Cypress tests from natural language descriptions. The person who couldn\u0026rsquo;t write code in 2016 can now describe what they want and get working test code instantly.\nKatalon\u0026rsquo;s abstraction was a bridge between \u0026ldquo;can\u0026rsquo;t code\u0026rdquo; and \u0026ldquo;needs automation.\u0026rdquo; AI is a better bridge. It produces real, portable, version-controllable code in standard frameworks \u0026ndash; not proprietary Groovy scripts locked inside a desktop IDE.\nThe Cost Problem Katalon Pricing (2026) Plan Cost What you get Free $0 Core features, local execution only Create $84/user/month AI optimization, self-healing Expand $168/user/month Runtime Engine for CI, TestCloud Enterprise Custom pricing SSO, audit logs, private cloud Katalon Runtime Engine \u0026ndash; required to run tests in CI/CD \u0026ndash; is not included in the free tier.\nOpen-Source Cost Tool Cost CI execution Reporting Playwright $0 Free Free HTML report, Trace Viewer Cypress $0 (core) Free Free (dashboard paid, optional) REST Assured $0 Free Free (Surefire/Allure) Karate $0 Free Free built-in You are paying $10,000+/year for capabilities that are free in open source.\nThe Lock-In Problem Proprietary Test Format Katalon tests are stored in .tc (test case) files \u0026ndash; a proprietary XML format tied to Katalon\u0026rsquo;s object repository and keyword system. These are not portable:\nYou cannot run a .tc file outside Katalon Studio You cannot import a .tc file into Playwright, Cypress, or any other framework Migrating away from Katalon means rewriting every test from scratch Groovy-Only Scripting When you do write code in Katalon, it\u0026rsquo;s Groovy \u0026ndash; a language that your frontend team (TypeScript/React) doesn\u0026rsquo;t know, your backend team (Java/Spring Boot) could use but won\u0026rsquo;t, and has a shrinking ecosystem compared to TypeScript or Python.\nObject Repository Lock-In Katalon stores element locators in a centralized object repository (XML files). This is a proprietary abstraction over standard CSS/XPath selectors. Migrating means extracting every locator and rebuilding page objects.\nWhat to Use Instead For E2E / UI Testing Playwright (recommended for new projects):\nFree, open source (Microsoft-backed) TypeScript, JavaScript, Python, Java, .NET Cross-browser (Chromium, Firefox, WebKit) Built-in auto-wait, codegen, trace viewer, parallel execution Massive community, rapid development Cypress (if already invested):\nFree core, open source Excellent developer experience, time-travel debugger Chromium-focused For API Testing REST Assured (Java stacks), Karate (JVM, DSL preference), Hurl (language-agnostic, plain-text HTTP tests).\nFor Mobile Testing Appium directly \u0026ndash; Katalon wraps Appium anyway. Use it without the abstraction tax.\nComparison Table Dimension Katalon Studio Playwright Cypress Cost $0 (limited) / $84-168/user/month $0 $0 (core) CI execution cost Requires paid license Free Free Languages Groovy only TS, JS, Python, Java, .NET JS, TS Test format Proprietary .tc files Standard code files Standard code files Version control Awkward (XML + Groovy + object repo) Native (code is the test) Native PR review Difficult (proprietary format) Standard code review Standard Cross-browser Via Selenium Chromium, Firefox, WebKit Chromium Parallel execution Paid feature Built-in, free Built-in, free AI code generation Katalon\u0026rsquo;s built-in AI (paid) Any AI assistant Any AI assistant Vendor lock-in High None None Migration path out Rewrite everything Copy files anywhere Copy files anywhere The Uncomfortable Question \u0026ldquo;We\u0026rsquo;ve invested in Katalon. We have hundreds of tests. Migration is expensive.\u0026rdquo;\nThis is a sunk cost argument. Every month you continue paying Katalon licenses is money spent on a tool that open source provides for free, tests written in a format that locks you in deeper, and skills built in Groovy/Katalon that don\u0026rsquo;t transfer.\nThe migration cost is real. But it is a one-time cost that eliminates a recurring cost. The longer you wait, the more expensive the eventual migration becomes.\nKey Takeaway Katalon solved a real problem for its era. That problem has been solved more effectively by two forces:\nOpen-source frameworks (Playwright, Cypress) that are now easier to use than Katalon, with better debugging, better CI integration, and zero cost. AI code generation that eliminates the \u0026ldquo;can\u0026rsquo;t write code\u0026rdquo; barrier entirely. Katalon is not bad software. It is software whose problem no longer exists. The right move is to stop accumulating debt in a proprietary platform and invest in the open-source tools that the rest of the industry has already adopted.\n","permalink":"https://blogs.joshuaantony.com/posts/katalon-is-obsolete/","summary":"Katalon solved a real problem for its era: giving non-coding QA teams a path to automation. That problem has been solved more effectively by open-source frameworks and AI code generation.","title":"Katalon Is Obsolete: Open Source Has Won"},{"content":"The Promise Module Federation (Webpack 5) promised the dream of micro-frontends:\nIndependent deployment \u0026ndash; each team ships on their own schedule Technology agnosticism \u0026ndash; use whatever framework you want Runtime composition \u0026ndash; load remote components dynamically at runtime Shared dependencies \u0026ndash; avoid duplicate React, UI kits, state management The pitch: \u0026ldquo;Build once, deploy independently, compose at runtime.\u0026rdquo;\nThe Reality In practice, with 40+ MFEs federated into a single shell application:\nAll MFEs must use the same framework version (locked to the shell\u0026rsquo;s version) All MFEs must use the same React version (runtime singleton) Shared libraries must be version-compatible across all MFEs Deploying one MFE can break every other MFE due to shared dependency mutations Independent deployment is theoretically possible, practically terrifying What you actually have is a distributed monolith \u0026ndash; all the operational complexity of microservices with all the coupling of a monolith.\nThe Version Coupling Evidence Across 40+ MFEs in a real production system, version drift is both inevitable and catastrophic:\nDependency Shell MFE A MFE B MFE C MFE D Framework (Next.js) ^12.3.5 12.3.4 ^12.3.4 \u0026ldquo;latest\u0026rdquo; 12.3.4 MF Plugin 5.12.7 5.11.5 5.12.7 5.12.7 5.12.7 HTTP Client (axios) ^1.12.2 1.11.0 1.11.0 ^0.24.0 ^0.23.0 UI Kit ^6.0.41 ^6.0.41 ^6.0.28 6.0.41 6.0.41 Notice: one MFE has next: \u0026quot;latest\u0026quot; in its package.json. Any npm install can pull a completely different framework version. This drift happens because 40+ teams cannot coordinate dependency updates simultaneously. They shouldn\u0026rsquo;t have to. But Module Federation forces them to.\nStuck on an Old Framework The entire platform is locked to Next.js 12 (released October 2022). That\u0026rsquo;s three major versions behind. Why?\nAll 40+ MFEs must upgrade simultaneously. If the shell runs Next.js 14 and an MFE runs Next.js 12, the runtime behavior is undefined. The MF plugin lags behind the framework. @module-federation/nextjs-mf must be updated to support each new Next.js version. The framework is dropping MF support. Next.js 15 moved to Turbopack, which does not support Webpack plugins. The \u0026ldquo;independent deployment\u0026rdquo; promise becomes ironic: you can independently deploy MFEs, but you cannot independently upgrade them.\nThe Shared Singleton Trap When singleton: true, Module Federation ensures only one copy of a library is loaded at runtime. The \u0026ldquo;winning\u0026rdquo; version is determined by negotiation (typically the highest compatible version).\nProblem: If MFE A ships with state-management@2.0.2 and MFE B ships with state-management@2.1.0, the runtime picks one. If there are breaking changes between 2.0.2 and 2.1.0, one MFE crashes. You don\u0026rsquo;t find out until production.\nProblem: The winning version can change based on MFE load order. Deploy MFE B first? Version 2.1.0 wins. Deploy MFE A first? Version 2.0.2 wins. Same code, different behavior, depending on deployment timing.\nThe UI Kit SSR Style Collision (Production Incident) This is the most insidious problem. It is intermittent, route-dependent, and extremely difficult to diagnose.\nHow SSR Works in This Architecture The MFEs do not return HTML. They expose JavaScript components via remoteEntry.js. All compute happens in the shell:\nThe shell\u0026rsquo;s Node.js process loads each MFE\u0026rsquo;s remoteEntry.js via Module Federation\u0026rsquo;s runtime. The shell imports the MFE components. The shell calls ReactDOM.renderToString() on all MFE components in its own process. All MFEs render in the same Node.js process, the same webpack runtime, and the same React instance. What Happens with the UI Kit When the shell loads multiple remoteEntry.js files with different UI Kit versions, multiple versions emit CSS with the same class names but different styles:\n/* UI Kit 6.0.28 (loaded by Header MFE) */ .btn-primary { padding: 8px 16px; border-radius: 4px; background: #0066cc; } /* UI Kit 6.0.41 (loaded by Footer MFE) */ .btn-primary { padding: 10px 20px; border-radius: 8px; background: #0052a3; } Both versions\u0026rsquo; CSS ends up in the same HTML document. CSS cascade: last one wins. Whichever version\u0026rsquo;s CSS is emitted last overrides all other components on the page.\nWhy It\u0026rsquo;s Route-Dependent Different routes load different MFE combinations. Each route loads a different set of remoteEntry.js files. The webpack runtime negotiation produces a different \u0026ldquo;winning\u0026rdquo; UI Kit version per route.\nSame application, different routes, different UI Kit version wins, different visual output.\nModule Federation explicitly does not handle CSS isolation:\n\u0026ldquo;Module Federation does not directly handle CSS style isolation because shared dependencies can conflict.\u0026rdquo;\nThis is not a bug. It is a fundamental architectural limitation.\nConfiguration Explosion With 40+ MFEs, the shell must know the URL of every remote. Each MFE has two endpoints (SSR server + client-side chunk URL), often per environment:\n80+ env vars that must be correct per environment Zero type safety across MFE boundaries \u0026ndash; every prop is unknown One wrong URL = silent failure or runtime crash Adding an MFE requires updating the shell\u0026rsquo;s config, env vars, and typings The Alternative: Vertical Split Instead of runtime composition (Module Federation), use build-time isolation with route-level splitting.\n┌──────────────┐ │ Reverse Proxy│ Browser ────\u0026gt;│ (Nginx/CDN) │ └──────┬───────┘ │ Route-based routing │ ┌─────────────────┼──────────────────┐ │ │ │ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ /orders/* │ │ /catalog/* │ │ /account/* │ │ Next.js 14 │ │ Next.js 15 │ │ Next.js 14 │ │ React 18 │ │ React 19 │ │ React 18 │ │ UI Kit 7.0 │ │ UI Kit 6.5 │ │ UI Kit 7.0 │ │ Own deploy │ │ Own deploy │ │ Own deploy │ └──────────────┘ └──────────────┘ └──────────────┘ Each MFE owns a route prefix. The reverse proxy routes by path \u0026ndash; no shell application. Each MFE is a complete, independent application \u0026ndash; own framework version, own React, own UI kit, own deployment. Shared UI is distributed via npm packages (build-time), not runtime sharing. Navigation between MFEs is standard HTML links \u0026ndash; full page transitions. Comparison Table Dimension Module Federation Vertical Split Deployment independence Theoretical; blocked by shared deps Real; each app deploys independently Framework version freedom All MFEs must match the shell Each MFE chooses its own CSS isolation None; styles leak across boundaries Complete; each page is separate Type safety across boundaries None (unknown props) N/A; no cross-MFE imports Configuration overhead 2N env vars, N type declarations One proxy rule per MFE Failure blast radius One bad deploy can break entire shell Only affects its routes Framework upgrade path All MFEs simultaneously One MFE at a time User experience (navigation) SPA-like transitions Full page load on MFE boundary The One Trade-Off Vertical split means full page loads when navigating between MFEs. But users navigate between major sections far less frequently than they interact within a section. Modern browsers make full page transitions fast (\u0026lt; 300ms with prefetching). The operational cost of maintaining seamless transitions across 40+ federated MFEs vastly outweighs the UX benefit.\nMigration Path Phase 1: Identify standalone MFEs (full-page features) and redirect their routes through the proxy to independent deployments. Phase 2: Extract shared components (header, footer, nav) into an npm package. Phase 3: Retire the shell. It becomes a thin proxy with routing rules. Key Takeaway Module Federation was adopted to enable team independence, but it created a system where no team can deploy, upgrade, or even change a UI kit version without coordinating with every other team. That is the opposite of independence.\nVertical split is less technically elegant. Full page loads between sections feel less \u0026ldquo;modern.\u0026rdquo; But it delivers the thing that actually matters: each team ships on their own schedule, with their own framework version, without breaking anyone else. That is what micro-frontends were supposed to be.\n","permalink":"https://blogs.joshuaantony.com/posts/module-federation-is-coupling-disguised-as-micro-frontends/","summary":"With 40+ MFEs federated into a single shell, Module Federation creates a distributed monolith \u0026ndash; all the operational complexity of microservices with all the coupling of a monolith.","title":"Module Federation Is Coupling Disguised as Micro-Frontends"},{"content":"The Argument \u0026ldquo;We already use Playwright for frontend E2E tests. Let\u0026rsquo;s reuse it for backend API testing too, so we don\u0026rsquo;t introduce another tool.\u0026rdquo;\nThis sounds pragmatic. It is not. It confuses tool consolidation with engineering discipline.\nContext Stack: React frontend + Spring Boot backend. Repos: Separate repositories for frontend and backend. Existing coverage: @SpringBootTest + MockMvc for component/integration tests (already done). The question: For black-box API tests against a running backend instance, should we use Playwright\u0026rsquo;s APIRequestContext or a dedicated API testing tool? The \u0026ldquo;Reuse\u0026rdquo; Fallacy The argument for Playwright rests on reuse: \u0026ldquo;We already have it, so using it again is free.\u0026rdquo;\nThis only holds when the frontend E2E tests and backend API tests share the same repo, same CI pipeline, same test suite, and same runtime. Our frontend and backend are in separate repositories.\nUsing Playwright in the backend repo means:\nWhat you think you\u0026rsquo;re getting What you\u0026rsquo;re actually getting Reuse of an existing tool A second, independent installation of Playwright Shared test infrastructure Two separate setups that share nothing Familiarity Backend developers writing TypeScript tests for a Java application What It Actually Costs 1. A Foreign Ecosystem in the Backend Repo The backend is a Spring Boot / Gradle project. Introducing Playwright means:\npackage.json in a Java repo node_modules/ alongside src/main/java/ TypeScript test files alongside JUnit tests npm as a second dependency manager alongside Gradle Node.js in every developer\u0026rsquo;s local environment and CI This is dual-stack maintenance for a single project.\n2. Assertions Designed for UI, Not APIs Compare how API assertions look in practice:\nPlaywright (TypeScript):\nconst response = await request.get(\u0026#39;/items?id=1,2\u0026#39;); expect(response.status()).toBe(207); const body = await response.json(); expect(body.results).toHaveLength(2); expect(body.results[0].status).toBe(200); expect(body.results[0].data.name).toBe(\u0026#39;Widget A\u0026#39;); // No JSON schema validation without extra libraries REST Assured (Java):\ngiven() .queryParam(\u0026#34;id\u0026#34;, \u0026#34;1,2\u0026#34;) .when() .get(\u0026#34;/items\u0026#34;) .then() .statusCode(207) .body(\u0026#34;results.size()\u0026#34;, equalTo(2)) .body(\u0026#34;results[0].status\u0026#34;, equalTo(200)) .body(\u0026#34;results[0].data.name\u0026#34;, equalTo(\u0026#34;Widget A\u0026#34;)) .body(matchesJsonSchemaInClasspath(\u0026#34;items-response-schema.json\u0026#34;)); Karate (DSL):\nGiven url baseUrl + \u0026#39;/items\u0026#39; And param id = \u0026#39;1,2\u0026#39; When method get Then status 207 And match response.results == \u0026#39;#[2]\u0026#39; And match response.results[0].status == 200 And match response.results[0].data.name == \u0026#39;Widget A\u0026#39; Hurl (plain text):\nGET {{base_url}}/items?id=1,2 HTTP 207 [Asserts] jsonpath \u0026#34;$.results\u0026#34; count == 2 jsonpath \u0026#34;$.results[0].status\u0026#34; == 200 jsonpath \u0026#34;$.results[0].data.name\u0026#34; == \u0026#34;Widget A\u0026#34; REST Assured, Karate, and Hurl were built for this. Playwright was not. The difference compounds as the test suite grows.\nWhen Tests Go Beyond HTTP Our black-box tests also:\nQuery the database to verify state changes Check Redis keys for cache population/invalidation Verify WireMock stub invocations for downstream API calls Verification Available in Java? Available in Node.js? Database state JDBC / JPA (already in project) Requires pg/mysql2 Redis keys Jedis / Lettuce (already in project) Requires ioredis WireMock stubs WireMock Java API (already in project) HTTP API only Message queues Spring AMQP / Kafka client (already in project) Requires amqplib/kafkajs Using Playwright means maintaining a second set of infrastructure clients in a second language for the same databases and services your Java backend already connects to.\nThe AI-Assisted Development Reality \u0026ldquo;But Java is verbose and slow to write.\u0026rdquo;\nThis was true three years ago. AI writes the test code now. Whether the test is in Java or TypeScript, an AI assistant generates it in seconds.\nWhat AI does not eliminate:\nConcern AI helps? Still matters? Writing test code Yes No longer the bottleneck Java boilerplate Yes No longer the bottleneck Node.js runtime in Java CI pipeline No Yes Second set of infrastructure clients No Yes Debugging Node.js stack traces in a Java project No Yes Unified test reporting (JUnit/Surefire) No Yes AI eliminates the authoring argument. It does not eliminate the operational argument.\nRight Tool for the Job Test type Recommended tool Why Pure HTTP smoke tests Hurl or Karate Scripting feel, lightweight, no compilation Integration verification (DB/Redis/WireMock) REST Assured + JUnit Same language, same clients, same pipeline Frontend E2E with API setup Playwright Right tool in the frontend repo When Playwright API Testing IS Valid Scenario Why Playwright fits API setup/teardown within E2E tests Share auth cookies between browser and API calls Frontend-perspective contract checks Test that the API returns what the React app expects Same-repo monolith One test runner, one CI step Quick smoke tests in E2E suite \u0026ldquo;Is the login endpoint up?\u0026rdquo; before browser tests None of these apply to \u0026ldquo;comprehensive black-box API testing of a Spring Boot backend in a separate repo.\u0026rdquo;\nComparison Table Dimension Playwright REST Assured Karate Hurl Language TypeScript Java Karate DSL (JVM) Plain text Native to Spring Boot repo? No Yes Yes Yes JSON Schema validation Requires Ajv Built-in Built-in No CI overhead in Java repo Node.js + npm install Zero Zero Single binary DB/Redis/WireMock access Separate Node.js clients Existing Java clients Java interop No Purpose-built for API testing No (side feature) Yes Yes Yes The Bottom Line Three arguments were made for Playwright. All three collapse:\n\u0026ldquo;Reuse\u0026rdquo; \u0026ndash; separate repos mean no reuse. You\u0026rsquo;re independently installing a Node.js tool in a Java project. \u0026ldquo;Lightweight mode\u0026rdquo; \u0026ndash; skipping browser downloads removes binary bloat, but not the ecosystem mismatch. \u0026ldquo;Java is too verbose\u0026rdquo; \u0026ndash; AI generates REST Assured tests as easily as Playwright tests. The authoring cost is equal. The operational cost is not. Use the right tool for the job, not the same tool for every job.\n","permalink":"https://blogs.joshuaantony.com/posts/playwright-is-not-a-backend-api-testing-tool/","summary":"Using Playwright for backend API testing in a Spring Boot repo is not reuse \u0026ndash; it\u0026rsquo;s dual-stack maintenance. The right tool depends on what you\u0026rsquo;re verifying, not what you already have installed.","title":"Playwright Is Not a Backend API Testing Tool"},{"content":"What is a Tensor? A multi-dimensional array with two superpowers: GPU acceleration and automatic gradient tracking.\nRank Name Shape Example 0D Scalar () torch.tensor(3.14) 1D Vector (N,) torch.tensor([1, 2, 3]) 2D Matrix (M, N) torch.randn(3, 4) 3D+ Tensor (B, M, N) torch.randn(8, 3, 224, 224) (batch of images) GPU Support = CUDA Under the Hood PyTorch → CUDA kernels → cuBLAS/cuDNN → NVIDIA GPU\nx = torch.randn(1000, 1000, device=\u0026#39;cuda\u0026#39;) # lives on GPU y = x @ x # matrix multiply on GPU For large matrices (10,000x10,000+), GPU matmul can be 50-100x faster than CPU. At smaller sizes (1,000x1,000), the speedup is more modest (5-20x) because data transfer overhead is significant relative to the compute.\nNever mix devices in one operation:\nx.cpu() @ y.cuda() # → RuntimeError in PyTorch 2.x (silent copy in older versions) x.to(y.device) @ y # correct way Autograd = Automatic Backpropagation You only write the forward pass. PyTorch computes all gradients automatically using the chain rule.\nx = torch.tensor(2.0, requires_grad=True) w = torch.tensor(3.0, requires_grad=True) b = torch.tensor(1.0, requires_grad=True) y = w * x + b # forward: 3*2 + 1 = 7 loss = (y - 10) ** 2 # (7 - 10)^2 = 9 loss.backward() # computes ALL gradients print(w.grad) # → tensor(-12.) dloss/dw = 2(y-10) * x = 2(-3)(2) = -12 print(b.grad) # → tensor(-6.) dloss/db = 2(y-10) * 1 = 2(-3)(1) = -6 This is automatic differentiation \u0026ndash; PyTorch builds a computation graph during the forward pass and walks it backwards to compute gradients.\nWhere Are Intermediate Values Stored? During training, PyTorch saves intermediate results for the backward pass. Understanding where they live is critical for managing GPU memory:\nWhat Where Size Activations (forward pass outputs saved for backward) GPU memory (same device as the tensor) Large \u0026ndash; this is the main VRAM consumer Computation graph nodes (grad_fn objects) CPU memory Tiny (just metadata and pointers) Saved tensors referenced by graph nodes GPU memory Large \u0026ndash; these are the activations above Gradients (.grad) Same device as the parameter Same size as parameters out = x @ w # 5000×5000 result saved on GPU for backward out.mean().backward() # uses saved tensor to compute gradients, then frees it This is why large models consume 20-100+ GB VRAM during training \u0026ndash; the forward pass must save activations at every layer for the backward pass to use.\nOne Training Step = One Full Backpropagation for batch in dataloader: optimizer.zero_grad() # clear previous gradients pred = model(batch) # forward pass loss = criterion(pred, target) loss.backward() # backward pass (compute all gradients) optimizer.step() # update parameters Real models perform hundreds of thousands to millions of these steps:\nModel Parameters Training tokens Approx. training steps BERT-base 110M 3.3B (BooksCorpus + Wikipedia) ~1M (900k @ seq_len 128, then 100k @ seq_len 512) LLaMA 7B 7B 1T ~250k LLaMA 2 7B 7B 2T ~500k LLaMA 3.1 405B 405B 15.6T ~500k-1.5M (exact count not published; large batch sizes reduce step count) Note: Step count depends on batch size. Larger batch sizes mean fewer steps for the same number of tokens. LLaMA 3.1 405B uses massive batch sizes (up to 16M tokens per step), which is why the step count is lower than you might expect for 15.6T tokens.\nEach step involves one complete backpropagation through the entire network \u0026ndash; computing gradients for every parameter simultaneously.\nWhy PyTorch Won Dynamic computation graph \u0026ndash; write normal Python, debug with normal tools. No graph compilation step (unlike TensorFlow 1.x). Autograd computes perfect gradients automatically \u0026ndash; no manual derivatives, ever. Full GPU acceleration with a clean API. Dominant in research \u0026ndash; the vast majority of ML papers and large model training runs use PyTorch (with JAX gaining ground at Google/DeepMind). Quick Memory Demo Copy-paste this to see GPU memory behavior during forward and backward passes:\nimport torch x = torch.randn(5000, 5000, device=\u0026#39;cuda\u0026#39;) w = torch.randn(5000, 5000, device=\u0026#39;cuda\u0026#39;, requires_grad=True) print(f\u0026#34;{torch.cuda.memory_allocated() / 1e9:.2f} GB\u0026#34;) # ~0.20 GB (two 5000x5000 tensors) out = x @ w # matmul result: 5000x5000 = 100MB, saved for backward out = out.relu() # relu output: another ~100MB saved print(f\u0026#34;{torch.cuda.memory_allocated() / 1e9:.2f} GB\u0026#34;) # ~0.40 GB (+200MB from intermediates) out.mean().backward() # uses saved tensors, then frees them print(f\u0026#34;{torch.cuda.memory_allocated() / 1e9:.2f} GB\u0026#34;) # back down (intermediates freed) The ~200MB spike during forward pass is autograd saving the matmul result and relu output for the backward pass. After backward() completes, these intermediates are freed.\nKey Takeaways Tensors are arrays that can live on GPU and track gradients. Autograd builds a computation graph during forward pass and walks it backwards for gradients \u0026ndash; you never write derivative math. GPU memory during training is dominated by saved activations (intermediates), not model weights. Training is millions of forward-backward-update cycles, each computing gradients for every parameter. PyTorch won because it lets you write normal Python while handling the hard parts (differentiation, GPU dispatch) automatically. ","permalink":"https://blogs.joshuaantony.com/posts/pytorch-essentials-cheat-sheet/","summary":"A dense, correct reference covering tensors, GPU acceleration, autograd, backpropagation, and training loops. Everything you need to understand how PyTorch trains models.","title":"PyTorch Essentials Cheat Sheet: From Zero to Backpropagation"},{"content":"This is a condensed reference covering the core concepts behind large language models and the transformer architecture, inspired by Andrej Karpathy\u0026rsquo;s \u0026ldquo;Neural Networks: Zero to Hero\u0026rdquo; series \u0026ndash; particularly \u0026ldquo;The spelled-out intro to language modeling: building makemore\u0026rdquo; and \u0026ldquo;Let\u0026rsquo;s build GPT: from scratch, in code, spelled out.\u0026rdquo;\nLanguage Modeling = Advanced Auto-Complete The entire magic of large language models comes from one simple task: predict the next token in a sequence.\nIf you have the sequence [\u0026quot;The\u0026quot;, \u0026quot;cat\u0026quot;, \u0026quot;is\u0026quot;], the model learns to predict \u0026quot;sleeping\u0026quot; as the most likely next word.\nWhy this works: the internet gives us infinite free training data. Every webpage is a sequence where the next word is already known. We turn unsupervised text into a supervised learning problem for free.\nWords aren\u0026rsquo;t independent. \u0026quot;I enjoyed reading a ___\u0026quot; \u0026ndash; \u0026quot;book\u0026quot; is far more likely than \u0026quot;thermometer\u0026quot;. The model learns these relationships by seeing billions of examples.\nTokenization: Breaking Text into Pieces Models don\u0026rsquo;t understand text \u0026ndash; they understand numbers. Tokenization splits text into pieces called tokens, each mapped to an integer ID.\nThree approaches, from simple to practical:\nMethod Example: \u0026quot;The cat\u0026quot; Tradeoff Character-level \u0026quot;T\u0026quot;, \u0026quot;h\u0026quot;, \u0026quot;e\u0026quot;, \u0026quot; \u0026quot;, \u0026quot;c\u0026quot;, \u0026quot;a\u0026quot;, \u0026quot;t\u0026quot; Tiny vocabulary, but sequences are long and slow to process Word-level \u0026quot;The\u0026quot;, \u0026quot;cat\u0026quot; Clean, but can\u0026rsquo;t handle new/rare words Subword-level (BPE) \u0026quot;The\u0026quot;, \u0026quot; cat\u0026quot; Best balance \u0026ndash; common words stay whole, rare words split into learnable pieces Modern LLMs (GPT, LLaMA, Claude) all use subword tokenization. OpenAI\u0026rsquo;s tiktoken and Google\u0026rsquo;s SentencePiece are the two dominant implementations. A typical vocabulary is 32,000-100,000 tokens.\nEmbeddings: Turning Tokens into Vectors Each token ID maps to a learned vector \u0026ndash; a list of numbers (e.g., 768 dimensions for BERT-base, 4,096 for LLaMA-7B) that encodes meaning.\n# Conceptual example (dimensions simplified) \u0026#34;cat\u0026#34; → [0.23, -0.45, 0.12, ..., 0.89] # 768 numbers \u0026#34;dog\u0026#34; → [0.21, -0.41, 0.14, ..., 0.85] # similar direction (similar meaning) \u0026#34;car\u0026#34; → [0.67, 0.12, -0.34, ..., -0.21] # very different direction These vectors are learned during training. The model discovers that words appearing in similar contexts should have similar vectors. Nobody hand-codes these relationships \u0026ndash; they emerge from the next-token prediction objective.\nThe embedding matrix is a simple lookup table: vocab_size × embedding_dim. For GPT-2, that\u0026rsquo;s 50,257 tokens x 768 dimensions = ~38M parameters just for the embedding layer.\nThe Transformer Architecture The transformer processes tokens through a stack of identical layers. Each layer has two sub-components: self-attention (tokens exchange information) and a feed-forward network (each token processes its information independently). Critical details that are often glossed over: residual connections, layer normalization, causal masking, and multi-head attention.\nPositional Encoding Attention operates on sets, not sequences \u0026ndash; it has no built-in sense of order. \u0026quot;cat ate mouse\u0026quot; and \u0026quot;mouse ate cat\u0026quot; would produce identical attention patterns without positional information.\nThe solution: add a positional signal to each embedding. The original transformer used fixed sine/cosine functions at different frequencies. Modern models (GPT, LLaMA) use learned positional embeddings \u0026ndash; a second embedding table indexed by position, added element-wise to the token embedding.\nSelf-Attention: The Core Mechanism Self-attention lets each token look at every other token in the sequence and decide how much information to gather from each.\nFor each token, three vectors are computed from its embedding:\nQuery (Q): \u0026ldquo;What am I looking for?\u0026rdquo; Key (K): \u0026ldquo;What do I contain?\u0026rdquo; Value (V): \u0026ldquo;What information do I provide?\u0026rdquo; All three are produced by multiplying the token\u0026rsquo;s embedding by learned weight matrices: Q = x @ W_Q, K = x @ W_K, V = x @ W_V.\nThe attention computation:\ndef self_attention(x, W_Q, W_K, W_V, mask=None): Q = x @ W_Q # (seq_len, d_k) K = x @ W_K # (seq_len, d_k) V = x @ W_V # (seq_len, d_v) scores = Q @ K.T # (seq_len, seq_len) -- pairwise similarity scores = scores / sqrt(d_k) # scale to prevent extreme softmax values if mask is not None: scores = scores.masked_fill(mask == 0, float(\u0026#39;-inf\u0026#39;)) # causal mask weights = softmax(scores, dim=-1) # normalize to probabilities output = weights @ V # weighted combination of values return output Why divide by sqrt(d_k)? The dot product of two random vectors with d_k dimensions has variance proportional to d_k. Without scaling, large dimensions produce large dot products, which push softmax into regions with near-zero gradients. Dividing by sqrt(d_k) normalizes the variance back to ~1, keeping softmax in a useful range.\nCausal Masking: The Critical Detail for GPT-Style Models This is the part most simplified explanations skip, and it\u0026rsquo;s fundamental to how autoregressive language models work.\nIn a GPT-style model, when predicting the next token at position i, the model must not see tokens at positions i+1, i+2, ... \u0026ndash; those are the future that hasn\u0026rsquo;t been generated yet.\nThis is enforced with a causal mask \u0026ndash; a lower-triangular matrix that blocks attention to future positions:\nToken: \u0026#34;The\u0026#34; \u0026#34;cat\u0026#34; \u0026#34;is\u0026#34; \u0026#34;sleeping\u0026#34; \u0026#34;The\u0026#34; ✓ ✗ ✗ ✗ \u0026#34;cat\u0026#34; ✓ ✓ ✗ ✗ \u0026#34;is\u0026#34; ✓ ✓ ✓ ✗ \u0026#34;sleeping\u0026#34; ✓ ✓ ✓ ✓ ✓ = can attend, ✗ = masked (set to -inf before softmax, so the weight becomes 0).\nThis means:\n\u0026quot;The\u0026quot; can only attend to itself \u0026quot;cat\u0026quot; can attend to \u0026quot;The\u0026quot; and \u0026quot;cat\u0026quot; \u0026quot;sleeping\u0026quot; can attend to all previous tokens Without this mask, the model could \u0026ldquo;cheat\u0026rdquo; during training by looking at the answer it\u0026rsquo;s supposed to predict. The mask forces each position to predict the next token using only the past \u0026ndash; which is exactly how generation works at inference time.\nNote: BERT does not use causal masking \u0026ndash; it uses bidirectional attention (every token sees every other token). This is why BERT is good for understanding tasks but cannot generate text autoregressively. GPT uses causal masking, which is why it can generate text token by token.\nMulti-Head Attention Transformers don\u0026rsquo;t run one attention operation \u0026ndash; they run multiple attention heads in parallel, each with its own Q, K, V weight matrices.\nHead 1: might learn syntactic relationships (\u0026#34;subject → verb\u0026#34;) Head 2: might learn coreference (\u0026#34;he\u0026#34; → \u0026#34;John\u0026#34;) Head 3: might learn positional patterns (\u0026#34;next word\u0026#34; proximity) ... Head 12: might learn semantic similarity Each head operates on a smaller dimension: if the model dimension is 768 and there are 12 heads, each head works with 768/12 = 64 dimensions. The outputs of all heads are concatenated and projected back to the full dimension:\n# Multi-head attention (simplified) heads = [attention(x, W_Q[i], W_K[i], W_V[i]) for i in range(n_heads)] concat = torch.cat(heads, dim=-1) # (seq_len, n_heads * d_head) = (seq_len, d_model) output = concat @ W_O # project back to d_model GPT-2 uses 12 heads. GPT-3 uses 96 heads. LLaMA-7B uses 32 heads. The total computation is the same as a single large attention, but multi-head allows the model to attend to different types of relationships simultaneously.\nResidual Connections and Layer Normalization Two mechanisms that are often omitted from explanations but are critical for training deep networks:\nResidual connections (skip connections): The output of each sub-layer (attention, feed-forward) is added to its input, not used as a replacement:\nx = x + self_attention(layer_norm(x)) # attention with residual x = x + feed_forward(layer_norm(x)) # FFN with residual Without residual connections, gradients vanish in deep networks (GPT-3 has 96 layers). The skip connection provides a direct gradient path from the output back to early layers.\nLayer normalization: Normalizes the activations to have zero mean and unit variance at each layer. Modern transformers use pre-norm (normalize before attention/FFN) rather than the original paper\u0026rsquo;s post-norm, because pre-norm is more stable during training.\nFeed-Forward Network After attention blends information across tokens, a feed-forward network processes each token independently:\ndef feed_forward(x): return W_2 @ relu(W_1 @ x + b_1) + b_2 The hidden dimension is typically 4x the model dimension (e.g., 768 → 3072 for GPT-2). This is where the model does per-token \u0026ldquo;thinking\u0026rdquo; \u0026ndash; transforming the attention-blended representation into a richer one. Recent models use SwiGLU or GeGLU activations instead of ReLU.\nOutput: Predicting the Next Token The final layer projects each token\u0026rsquo;s representation to the vocabulary size and applies softmax:\nlogits = x @ W_vocab # (seq_len, vocab_size) e.g., (1024, 50257) probs = softmax(logits, dim=-1) next_token = sample(probs) # greedy (argmax) or random (temperature sampling) During training, the loss is cross-entropy between the predicted probability distribution and the actual next token at every position in the sequence.\nTraining vs. Inference Training Feed a sequence of tokens: [\u0026quot;The\u0026quot;, \u0026quot;cat\u0026quot;, \u0026quot;is\u0026quot;, \u0026quot;sleeping\u0026quot;] At each position, the model predicts the next token (using causal masking) Compare predictions to actual tokens using cross-entropy loss Backpropagate gradients through the entire network Update all weights (Q, K, V, embeddings, feed-forward, etc.) Repeat for trillions of tokens over weeks on GPU clusters All positions in the sequence are trained simultaneously (this is why transformers are faster than RNNs \u0026ndash; parallel training, not sequential).\nInference (Generation) Start with a prompt: \u0026quot;The cat is\u0026quot; Run a forward pass through the transformer Sample the next token from the output distribution: \u0026quot;sleeping\u0026quot; Append to the sequence: \u0026quot;The cat is sleeping\u0026quot; Repeat from step 2 This is why ChatGPT appears to type one word at a time \u0026ndash; it is literally generating one token per forward pass, iteratively.\nScaling: Why Bigger Models Work Better The transformer architecture scales predictably:\nDimension Small (GPT-2) Medium (LLaMA-7B) Large (GPT-4-class) Parameters 1.5B 7B estimated 200B+ Training tokens 40B 1-2T 10T+ Context length 1,024 4,096 128,000+ Training compute ~$50k ~$1M ~$100M+ As models scale, they develop capabilities that weren\u0026rsquo;t explicitly programmed \u0026ndash; coherent essay writing, code generation, mathematical reasoning, multilingual translation. Whether these represent genuinely \u0026ldquo;emergent\u0026rdquo; discontinuities or smooth capability curves that cross usefulness thresholds is an active area of research (Schaeffer et al., 2023 argue the latter).\nWhat\u0026rsquo;s not debated: the simple \u0026ldquo;predict next token\u0026rdquo; objective, when applied at sufficient scale with enough data, produces remarkably capable systems.\nWhy Transformers Won Before transformers (2017), the dominant sequence models were RNNs and LSTMs. They processed tokens sequentially \u0026ndash; token 1, then token 2, then token 3. This meant:\nTraining couldn\u0026rsquo;t be parallelized across sequence positions Long-range dependencies were hard to learn (information had to survive through every intermediate step) Training was slow Transformers process all tokens in parallel via attention. Every token directly attends to every other token (within the causal mask). Long-range dependencies are just as easy to learn as short-range ones. Training parallelizes perfectly across sequence positions and across GPUs.\nThe result: transformers scale to sequences and model sizes that were completely impractical with RNNs.\nWhere to Go Deeper Andrej Karpathy\u0026rsquo;s \u0026ldquo;Neural Networks: Zero to Hero\u0026rdquo; series builds these concepts from scratch in Python:\nmicrograd \u0026ndash; backpropagation engine from scratch makemore (bigram) \u0026ndash; character-level language model, the \u0026ldquo;spelled-out intro to language modeling\u0026rdquo; makemore (MLP) \u0026ndash; multi-layer perceptron language model makemore (activations/BatchNorm) \u0026ndash; training dynamics and normalization makemore (backprop) \u0026ndash; manual backpropagation through the network Let\u0026rsquo;s build GPT \u0026ndash; full transformer from scratch, the \u0026ldquo;spelled-out\u0026rdquo; walkthrough Let\u0026rsquo;s build the GPT Tokenizer \u0026ndash; BPE tokenization from scratch The code is available at github.com/karpathy/nanoGPT and github.com/karpathy/minGPT. The best way to learn: code along with him, line by line.\n","permalink":"https://blogs.joshuaantony.com/posts/intro-to-language-modeling-and-transformers/","summary":"A dense walkthrough of how large language models work \u0026ndash; from next-token prediction to tokenization, embeddings, self-attention with causal masking, multi-head attention, and the full transformer architecture. Based on Andrej Karpathy\u0026rsquo;s teaching approach.","title":"The Spelled-Out Intro to Language Modeling and Transformers"}]