I had an interesting experience during an application migration a while back. The application used a no-sql data store and and event streaming platform, and an unforseen hiccup resulted in the event stream not being able to make the migration smoothly - we'd wind up starting up the "after" version of this application with nothing at all in the event stream, though we'd have all the no-sql data from thr prior application.
My initial reaction was panic. There's no way this could stand. But in examining the problem, not only was it going to be genuinely difficult to port the live event stream, it turned out our application didn't really care too much as long as we throttled down traffic before migrating state. That was the moment I came to really understand state vs. events, and how an event streaming application (or service) is fundamentally different than a traditional state-based application.
Retrain your brain
We've been trained to build applications with a focus on state and how we store it. This can make it hard to transition to an architecture of services where things are in motion -- events. There will always be circumstances in which at-rest state is appropriate and necessary -- reporting, for instance. But when analyzing software requirements, look for places where state-bias causes you to think about dynamic systems as if you're looking at static snapshots.
A shopping cart / order scenario is typical of this sort of state flattening. Consider the simple order shown here - an order with three line items. Ignoring how this order came to be formed, much information is potentially lost. It's possible this customer considered other items before settling on the items in this cart, and by considering only the state at the end of check-out, much value is lost.
This alternative stream yields the same final shopping cart, but in this case, we can see additional / alternative items that were abandoned in the cart. This view is more aligned to what you'd expect in an event stream like Kafka vs. a state store (SQL or no-SQL).
In this example, it's likely we'd want to understand why items (a) and (b) were removed - perhaps because of availability or undesirable shipping terms. We'd also want to understand whether the new items (2) and (3) were added as replacements or alternatives to (a) and (b). Regardless, the absence of the events in the first place prevents any such awareness, let alone analysis.
As a developer, you might relate to an example that hits closer to home. Recall the largest source code base you've worked on -- ideally one with several years of history. Now zip it up -- minus the .git folder, and remove the hosted repository. That static source snapshot certainly has value, but consider the information you've lost. You no longer are able to see who contributed which changes or when. Assuming you've tied commits to work items, you've lost that traceability, too. The zip file is a perfect representation of state, but without being able to see the events that created that state, you're in a very diminished capacity.
Hybrid approaches
Of course, many systems combine state and events, and the focus on one or the other could be visualized on a continuum. On the left, storing only events, you'd have something that looks like a pure event-sourcing design1. If and when you need to see current state, the only way to get that is to replay all the events in the stream for the period that interests you. On the right, of course, you have only current state, with no record of how that state came to be.
In practice, most services and almost all applications will be somewhere between the extremes of this continuum. Even a pure event-sourcing design will store state in projections for use in queries and/or snapshots to mark waypoints in state. Even more likely will be a mix of services with relative emphasis differing from service to service:
There's no right answer here, and the best design for many applications will employ a hybrid approach like this to capure dynamic behavior where it's appropriate without losing speed advantages typically seen in flattened-state data stores. When considering these alternatives, though, I'd encourage you to keep your historical bias in mind and try not to lose event data when it may be important.
- For more on event sourcing, see Martin Fowler's excellent intro as well as this excellent intro on EventStore.com. ↩︎
One Reply to “State vs. Events”