dlambert

November 10, 2023

UX Lessons from Visual Studio

Commonly, when people think "user experience", they think about screen designs. Apps, web pages, Figma -- all that stuff. The shift from UI design to UX alone is a nod to this practice being more than skin-deep, but I still think a lot of the deeper behavioral aspects of user satisfaction are lost on many -- especially people who have only a casual interest in UI/UX.

I believe there are clues all around us, though, that we can take advantage of if we pay attention. I'll begin with the premise / reminder that real usability is completely invisible. You can only see usability by the absence of impediments to usability. In short, a system is usable when it does exactly what a user expects, even if they can't articulate those expectations.

One of the best examples of invisible usability can be found in Visual Studio, which has been a market-leading IDE for decades now. Built by developers for developers, they've gotten a lot of things right, and even though you're probably not building a tool for use by developers, some of these lessons likely can apply for you, too.

Starting Fresh

Right off the bat, Visual Studio shows its usability by setting users up to be successful when working with a new project. Pick any template you like in Visual Studio, create a new project from it, and it will build successfully. While it might not seem like a big deal, it makes a lot of difference for a new user learning one of these technologies. The ability to start from a solid foundation gives that user confidence to build on that new solution.

What's the lesson? Be aware of the "getting started" experience for your users. Anything you can do to start a user more quickly and with more confidence will help them start using your software with confidence. When users get started with confidence and gain momentum, they're more likely to persevere a bit more if they get stuck later, too.

Save Always

Another subtle way Visual Studio helps is one you probably never gave a second thought. No matter what state your working files are in, you'll never mess one up so badly that Visual Studio won't let you save your work:

A more elegant take on this is to save automatically, so if something goes really wrong, you have a recovered version of your work (which Visual Studio obviously does). Note that automatically saving can actually introduce some small usability hiccups of its own, but that's a problem for another day.

What's the lesson? Document-based applications like Visual Studio and office apps (spreadsheets, docs and the like) typically consider this sort of behavior table stakes these days, and it's not likely you're building one of these, but watch out for behaviors that cause a user to not be able to save work-in-progress. In a UI, you may run into this if you've got a bunch of required fields on a page, and as the field count goes up (and maybe the page count, too), expect that your users' frustration level will go up, too.

It's actually a lot more likely that you'll run into this problem in your APIs. For example, here's a simple method I used in a post about fluent validation:

        public ActionResult CreateDealer(Dealer dealer)
        {
            Guard.Against.AgainstExpression<int>(id => id <= 0, dealer.Id, "Do not supply ID for new entity."); 

            var dvalidator = new DealerValidator();
            var result = dvalidator.Validate(dealer);
            if (result.IsValid)
            {
                _sampleData.Add(dealer);  // SaveChangesAsync() when working with an actual data store
                return CreatedAtAction(nameof(CreateDealer), new { id = dealer.Id }, dealer);
            }
            else
            {
                return BadRequest(result);
            }
        }

While this method effectively validates inputs, the return results to the user aren't especially helpful, and as an object like this scales up in size, it's more likely that a consumer will have partial inputs that don't pass all validation criteria. Rather than preventing this from saving altogether, consider structuring an object like this so that as many values as possible are optional for saving -- even if they're needed in order to have a "valid" object.

In this simple example, we can create a result type that appends a ValidationResult property to the model. We'll return this instead of the "naked" object to show the validation propertied next to the model. The result object is preferrable to embedding a ValidationResult in the model because this keeps the request intact -- there's no reason for the ValidationResult to be part of the CRUD requests.

    public class ModelValidationResult<M, V> where V : AbstractValidator<M>, new()
    {
        public ModelValidationResult(M model)
        {
            this.Model = model; 
        }
        public M Model { get; private set; }
        public FluentValidation.Results.ValidationResult ValidationResult { get; set; }
        public bool Validate()
        {
            V validator = new V();
            this.ValidationResult = validator.Validate(this.Model);
            return ValidationResult.IsValid;
        }
    }

With Validate in the Dealer object, the controller method also becomes a bit simpler:

         public ActionResult CreateDealer(Dealer dealer)
        {
            Guard.Against.AgainstExpression<int>(id => id <= 0, dealer.Id, "Do not supply ID for new entity.");

            ModelValidationResult<Dealer, DealerValidator> res = new ModelValidationResult<Dealer, DealerValidator>(dealer);
            res.Validate();
            _sampleData.Add(dealer);  // SaveChangesAsync() when working with an actual data store
            return CreatedAtAction(nameof(CreateDealer), new { id = dealer.Id }, res);
        }

Note that I left the guard clause in place here -- there will still be some validations that really need to prevent saving bad data, but now that these changes are in place, it's possible to save an object with incomplete data.

This technique won't work in all cases, but consider it if you can support saving objects with a minimal set of data and check for a fully-populated object later.

And more!

If you view Visual Studio with an eye to borrowing UX techniques, you'll see more lessons like these - the Dynamic Validation example I covered earlier is an example. Since you're probably not building an IDE or even a tool for developers, you'll need to interpret some of the techniques liberally, but I assure you the lessons are there. You may also see some negative usability examples -- in fact, sometimes these are easier to see because they get in our way and draw our attention.

If you're interested in source code for this examle, you can find it on Github: https://github.com/dlambert-personal/CarDealer/tree/Guard-vs-validate

October 31, 2023

Expected and Actual

I'm a fan of simple hooks to summarize complex ideas, and one of my favorite go-to's is simple, memorable and versatile:

Expected == Actual

Not much to look at by itself, but I've gotten a lot of mileage out of this in some useful contexts. You probably recognize one of the most obvious applications immediately: testing / QA. We're well-trained in applying expected == actual in testing, and most bug-reporting contexts will define a bug as a scenario in which actual behavior doesn't match expected behavior.

Even in this simplest application, all the value here is in how we exercise the framework, and I promise if you really work this equation, you'll have a better handle on bugs immediately. Of the two sides of this equation, "expected" usually seems easier, but is almost always more difficult becuase we most of our expectations are unspoken. If I had a dime for every bug report I've seen where "actual" was well-documented but "expected" existed only in the mind of the reporter, for instance, I'd be writing this from a beach in Hawaii.

Of course, that's where the exercise begins to get interesting. "What were you expecting" is a great place to start here. If you can trace expectations to real documented requirements, it's a short conversation (perhaps followed by another conversation to see why that scenario was missed in autometed testing). It's not unheard-of, though for these bugs to occur in scenarios that weren't explicitly called-out in requirements. If this is the case, be sure to examine "expected" to see if it's reasonable for most users to expect the same thing under those conditions, which is a great segue to my next favorite application of expected == actual.

I'm of the opinion that expected == actual is also a pretty good foundation for understanding usability. I touched on this recently in a post about API usability, and it holds true generally, I believe. In that post, I referenced Steve Krug's Don't Make Me Think, which is both an excellent book and another great oversimplification of UX. Both of these are built on the premise that whenever an application does what a user expects it to do, they don't have to think about it, and they consider it usable.

Mountains have been written about how to accomplish this, of course, and I'm not going to improve on the books that talk about those techniques, but as I pointed out with resepct to API usability, consistent behavior goes a long way. Even devices or applications we consider highly usable have a short learning curve, but one a user is invested in that curve, it's tremendously helpful to leverage that learning consistently -- no surprises!

A final consideration that applies in both these areas - when you find a case where "expected" really isn't well-definded yet, and can't be easily-derived based on other use cases (consistency), this is a golden opportunity. Rather than just taking their version of "expected" at face value, ask why they expected that (when possible). You probably won't be able to invest in a full "nine-why's" exercise, but keep that idea in mind. The more you understand about how those expectations formed, the more closesly you can emulate that thinking as you project other requirements.

Besides these examples, where else might you be able to apply "expected vs. actual"?

October 25, 2023

REST is for nouns

It's hard to believe that REST is over twenty years old now, and although the RESTful style of designing APIs isn't specifically limited to JSON payloads, this style has become associated with transformation from SOAP/XML to a lighter-weight style better-suited to web applications.

The near ubiquity of REST at this point can allow us to slip into some designs that I don't believe are particularly well-suited for this style of API. In this article, I'm going to briefly examine scenarios that work well in a RESTful style -- and why -- and for scenarios that aren't well-suited for REST, we'll look at some alternatives.

A closer look at REST

Most software developers will recognize a RESTful API, but for many, this is a "I'll know it when I see it" recognition. There's no shortage of defintions on the web for REST APIs, and I'd encourage you to browse a few to see what commonalities emerge. Among the most common traits are statelessness, cacheability, and most importantly, a resource-based interface.

The "resource-based interface" is key here - this is the bedrock of the "style" of a RESTful interface. Let's work through a simple example -- say, a car dealer web service. The GET methods here are predictable given a Dealer type:

Here, we've got a list of dealers and a GET for one dealer specified by ID. This is REST at its simplest. The resource here is the Dealer, and much of the fluency of REST comes from being able to predict not only how these APIs will work, but also the remainder of the major operations we'd expect. So, here, GET /Dealer will return a list of all Dealers known to this service, and GET /Dealer/{id} looks for just one.

Without even looking at the API spec, we know what most of the remaining operations should be for Dealers:

Add - we expect to be able to add new dealers with a POST and a dealer model.
Edit- we expect to POST to /Dealer/{id} with a dealer model to edit that entity.
Delete - not all API's will support this, but for if this one does, it would be a DELETE action at /Dealer/{id}.
Patch - again, where it's supported, it allows updates by specifying only fields that have changed, and again, we'd expect to find it at /Dealer/{id}.

A big theme contributing to the predictability of REST (which is one of the major appeals) is that you (the designer / developer) determine the resource - Dealer in this case - and the main operations shown here are largely implied. You design the noun, and REST supplies the verbs (they're even commonly referred to as HTTP verbs). Another way to look at these is to consider the resource as an object that's normally at rest (aka REST). Changes to this resource happen only when an HTTP verb acts on the resource.

The tendancy for these resources to change only when impacted by an HTTP verb via this API also contributes to the effectiveness of caching in a RESTful API, as you'll recall from the definitions referenced earlier.

Less universal, but very helpful, in my opinion, for a RESTful interface, is for the resource to behave atomically. Admittedly, the example we're using here is ultra-simplified, but when we act on a Dealer, the closer we can be to these guidelines, the easier it is for API consumers to predict how the API will perform (API's have usability, too!). These guidelines are based on the premise that REST is a web-native design, and developers will expect the API to behave as they'd expect other HTTP resources to behave.

Operations are deterministic and cohesive. Ideally, each operation should stand on its own. Domain objects manipulated with these methods will certainly be subject to validation -- as in the url validation shown here. This is a local problem with the specific parameters sent on this call, and most API developers should have no trouble sorting out what they need to do to make this call work.
Avoid "bell-ringing". A REST operation should be done when the status is returned to the caller. There will be cases where a changed resource will kick off a workflow, but monitor these carefully. If I kick off a workflow because I changed a resource and I don't care about the workflow, it may still be ok, but watch out for cases where a caller kicks off a workflow that they care about -- this may not be suitable for a simple REST-syntax method.
Permissions apply to the whole resource. Imagine that the combination of resource and verb is all the information you've got to determine whether you're authorizing the operation. A user in a given role should ideally be able to create / edit / delete a resource or not.

Complications arise

As API operations become more complex, it's common to see use cases that bend the ideals of those simple REST operations. Many of these nuanced use cases can be shoehorned into a REST-like syntax. This is quite common for simple variations on verbs like Search or modifications or overloads of Add/POST or Edit/PUT. Watch for factors like these - they signal you're moving into API scenarios that require some special attention, and may be clues that you're not really working with RESTful methods anymore:

Methods refer to a verb other than normal REST verbs. Some of these (ex: search) can be very compatible with a RESTful syntax and style. If you start running into bespoke verbs, consider switching to an RPC-like call. This can still live in the same API and use JSON for transport -- it's just going to be more verb-forward than noun-forward. These calls are invaluable as your API becomes more complex, but be aware that you give up some of the automatic usability of noun-based REST because these bespoke verbs are unique to your use case and application!
Nouns begin to become complex. Consider a trivially-simple car dealer model:

        public int Id { get; set; }
        public string? Name { get; set; }
        public string? Description { get; set; }
        public string? Website {  get; set; }

This ultra-simple model works great in a RESTful API, but it clearly doesn't have enough detail to be useful. As soon as you add more detail / complexity -- in this case, an array of brand affiliations -- the model becomes more difficult to use in the API. This example works great in the domain model, but with the addition of one new BrandAffiliation collection, the JSON structure becomes much more difficult for an API user / developer:

  {
    "id": 3,
    "name": "Capital City Acura",
    "description": "desc",
    "website": null,
    "brandAffiliation": [
      {
        "parent": {
          "name": "Honda",
          "id": 3
        },
        "name": "Acura",
        "id": 1
      }
    ]
  }

A more friendly approach for the API user would likely be a nested REST structure that permits a get or post like /dealer/{dealerid}/parent/{parentid}. If this is starting to look like an OData style, that's great - that's a direction we'll explore more in the future!

Beyond Simple REST

In some cases, complex or complicated scenarios may be better-suited with an RPC-style method. These can live in a predominantly RESTful API - nothing about these is incompatible with the API specifications governing REST, but I think it's helpful to be aware when you're moving away from a pure REST model. Amazon has published a great summary of REST vs. RPC, and it's a great place to start in considering these styles. For our purposes, it's important to recognize that "RPC-like" in style does not imply an actual RPC API!

As use cases evolve beyond that simple REST/OData-like style we looked at earlier, here are some specific places where "simple REST" may not have enough gas in the tank:

Verb-centric operations -- APIs where nouns participate, but operations are more about what's happening with those objects. These can be good places to explore those RPC-like calls.
Operations tying multiple objects or object types together (likely in a non-heirarchical way). In some cases, graph-ql APIs can fall in this category.
Anything that begins to take on state-machine characteristics -- I consider this a special form of verb-centric.
Events. Depending on your use case and how deep you're diving into the event pool, these could wind up looking RPC-like or you may be interested in something like Async API.

In all these cases, be sure to be aware of discoverability and predictabilty of your API. In many cases, it can be worth shoehorning an operation into a noun-centric REST call for the sake of consistency. I'll explore some of these nuances in future posts, including places where CQRS meets events and API styles.

In the meantime, try paying attention to nouns & verbs in your API and see if that helps guide some decisions about API style.

October 24, 2023October 31, 2023

APIs have usability, too!

Usability has a long history in software. In fact, as I sat down to pen these words, I googled "history of usability ux" and turned up some scholarly articles going back over 100 years. Too far. In software, you can't go wrong starting with Apple, which puts the origin of UX in the mid 90's. Better.

But for much of this time, we've tuned in to software usability as experienced through our user interfaces by end-users. Today, there's more to usability than user interface design, and I'd like to broaden the discussion a bit.

Usability? What usability?

When you think about great user experiences, typically, we don't consider them to be great user experiences unless we start comparing them to lesser experiences. I believe this is a big clue into how we can apply UX more universally. I really think a lot of usability boils down to doing what a user is expecting you to do... so when you do it, most users will never even notice.

Think about it - when's the last time you gave a passing thought to usability for an application that was already behaving the way you wanted? There are scores of great books on how to achieve usability (I love Steve Krug's Don't Make Me Think, and Don Norman's The Design Of Everyday Things), but I really believe if you can manage to do what the user is expecting you to do, you're typically in pretty good shape.

The changing landscape of applications

Next, let's look at changes in applications and application design. You can probably see where this is going. Whereas once the only users of our applications were end-users operating via a Windows or web interface, cloud-native applications based on microservice architecture rely heavily on APIs to orchestrate, integrate and extend functionality. In many cases, APIs are the interfaces for services, and developers are our users. In this sense, APIs are the interface, and the user experience (UX) is found in the ease-of use of these APIs.

And what is it about an API we'd consider more usable? As with the generalized case above, I think the ultimate yardstick is whether the API behaves the way a developer would expect. I believe the popularity of RESTful APIs, for instance, isn't just because JSON is easy to work with -- it's because a well-written RESTful API is discoverable and predictable.

Note that discoverable isn't the same as documented - even correctly documented, which never happens. Discoverable starts with tools like swagger that expose live documentation of API methods and objects, but it connects with predicatable in an important way: as developers engage with your API and discover how some of it works, consistent behavior and naming creates predicability. When these two factors are combined, they reinforce one another and create an upward spiral for developers in which learning is rewarded and also helps future productivity. And yes - this exact relationship is part of understanding usability in a visual / UX context, as well.

Watch for more posts on API conventions and style soon, and watch for the ways these ideas support one another and ultimately contribute to Krug's tagline: Don't make me think!

October 11, 2023

A few words on CapEx and OpEx

I initially learned about CapEx and OpEx about five years ago in the sense that I knew they were accounting terms. It's been an ongoing journey for me to learn why they keep coming up in software development. These terms popped up on my radar again recently in a conversation on the excellent ProKanban community slack channel¹.

The conversation prompt was an open-ended "what's up with these", and hopped in with some thoughts, summarizing my understanding accrued over the years.

Basic Definitions

First, the easy stuff. These accounting terms do have some good defintions², and that's a great place to start. Capital Expenditures (note "expenditure" vs. "expense") are considered long-term purchases or investments, wheres Operating Expenses are short-term expenses. The key bits in these definitions are the association of CapEx to an asset account vs. the association of OpEx as an expense.

Although in software development, most of us are used to these terms originating from what we perceive to be the same activity (developing the software), when that time is accounted for, the activity is treated very differently. Let's start with CapEx.

It's important to understand that the accounting definition of CapEx predates its use in tracking software development expenses (also true of OpEx). The original intent or application of CapEx was to account for the purchase of an expensive asset (truck, building, machine) which would then be depreciated over time to reflect its diminishing worth. For software purchased as a one-time license fee, this analogy makes some sense, and starting in the mid 80's, this model was extended to software built in-house. In this scenario, the asset is accrued, or built up in the period prior to "go-live" and then depreciated afterwards. I have several issues with how this relates to modern development practices, but we'll come back to those. In accounting terms, there is an asset that grows as the software is built, and it begins to be tracked as an expense after that go-live event.

In the case of OpEx, the expense and the accounting are considered short-term outlays, tracked against current-period performance. This is a simpler accounting concept, probably more aligned with our experience in development -- we spend time doing a task, and that time is paid for and shows up as an expense in that period. Note that even in the case where CapEx is used to account for the development of a software asset, there will still be expenses associated with running that software, and these costs will be treated as OpEx. A further note on OpEx -- if you've encountered FinOps, note that this field is primarily associated with OpEx.

Accounting deep-dive

For a better understanding of CapEx, I think some examples might be helpful. Forewarned, I'm going to do a T-chart-based deep dive here, so skip ahead if you'd like, but I think if you're in a position where you need to understand why we're tracking CapEx and OpEx, a basic understanding of these concepts is appropriate. I'm also taking some liberties to combine accounts that are likely tracked separately in the interest of simplicity. Let's start with that truck purchase example - a simple cash purchase of a truck (asset) using cash (asset):

Here, we're exchanging one asset for another in a transaction that affects the balance sheet, but does nothing to an income statement. As the truck ages, depreciation is recorded - this reduces the net asset value of the truck (as you'd expect, given age & mileage) and we also record an expense that will show up on the income statement:

Incidentally, just as software incurs OpEx while it's being used, the truck will, too. Not shown in this example are those additional operating expenses like gas, maintenance, and insurance. Pivoting this example to software, we'd expect to see the Capital represented by the software accumulated as an asset during construction. During this period, the increase in assets represents an increase in equity, but this is likely balanced by the expenses occurring in this same period (payroll, contract payments). Once the software is considered operational, the asset begins to incur depreciation as seen with any other hard asset.

Impact of CapEx

So, what's the point of recording software development in this way? First, this method allows financial accounting statements to reflect the intent of the business when development occurs for a while prior to that software providing any real business value. Consider that OpEx is supposed to represent the cost of goods (or services) sold in the current accounting period. If software is built but not being used yet, it's really not fair to count that expenditure against income generated in that period.

I also believe this method of accounting provides an ability to smooth business activities over time -- both the accumulation of the asset value of the software and the discounting of that asset in terms of depreciation occur over a period of time, and both of these streams of accounting entries occur in a fairly predictable way. In a company that's reporting financial results (and especially one that issues guidance about what's expected), this method helps management communicate transactions that are expected with high reliability that the actual transactions that show up on income statements match those expectations. That's important in any company, but vital in a publicly-traded company where surprised are typically punished.

Less clear to me is whether firms take any liberties capitalizing some software vs. others for purposes of steering asset values as these balance sheet and income statement accounts have an impact on measures commonly attributed to company performance³. My understanding of GAAP suggests that an accounting policy shouldn't really vary from one expenditure or project to the next (such that one project is capitalized and the next isn't), but I'd love to hear from practitioners who could walk through particulars.

Flaws in the model

I promised I'd get to some gripes on CapEx and OpEx. Here they are. The elephant in the room, which will surprise nobody in software, is the time and cognitive load involved in tracking time that's supposed to be capitalized separately from time that's not. Best case, if a team or developer can be considered to be only working on CapEx or OpEx-based projects in a given period, frequently, all the time worked in that period can be considered CapEx or OpEx, which reduces reporting overhead. Worst case, you need to ask developers to understand from one task to the next whether that task is considered CapEx-related or OpEx-related. This sort of reporting bugs me no end, because it really detracts from brainpower being brought to bear on building software.

The other big problem I have with the CapEx model for software is the mental model it suggests. In rare cases, software may have been a static one-time expenditure, and accounting for this like it's a filing cabinet or a cement truck may have made sense. In an agile development world, however -- especially one in which we expect to ship an MVP product and add onto it while we operate it -- this model becomes horribly messy, at best, and downright harmful at worst. I strongly suspect that the consensus among developers aligns with the idea that the juice isn't worth the squeeze in this case, but again, I'd love to hear from accountants who could speak to the financial reporting benefits of CapEx for software.

ProKanban community website, twitter/x, and slack links. ↩︎
Some additional good explorations of these terms can be found on Investopedia and Wallstreetprep. ↩︎
These links are good follow-on reading to learn about key accounting kpis and metrics. Growth in assets can also be seen as impacting a comany's future returns. Finally, this thread on Reddit touches on financial reporting and tax implications of these practices. ↩︎

October 6, 2023

State vs. Events

I had an interesting experience during an application migration a while back. The application used a no-sql data store and and event streaming platform, and an unforseen hiccup resulted in the event stream not being able to make the migration smoothly - we'd wind up starting up the "after" version of this application with nothing at all in the event stream, though we'd have all the no-sql data from thr prior application.

My initial reaction was panic. There's no way this could stand. But in examining the problem, not only was it going to be genuinely difficult to port the live event stream, it turned out our application didn't really care too much as long as we throttled down traffic before migrating state. That was the moment I came to really understand state vs. events, and how an event streaming application (or service) is fundamentally different than a traditional state-based application.

Retrain your brain

We've been trained to build applications with a focus on state and how we store it. This can make it hard to transition to an architecture of services where things are in motion -- events. There will always be circumstances in which at-rest state is appropriate and necessary -- reporting, for instance. But when analyzing software requirements, look for places where state-bias causes you to think about dynamic systems as if you're looking at static snapshots.

A shopping cart / order scenario is typical of this sort of state flattening. Consider the simple order shown here - an order with three line items. Ignoring how this order came to be formed, much information is potentially lost. It's possible this customer considered other items before settling on the items in this cart, and by considering only the state at the end of check-out, much value is lost.

This alternative stream yields the same final shopping cart, but in this case, we can see additional / alternative items that were abandoned in the cart. This view is more aligned to what you'd expect in an event stream like Kafka vs. a state store (SQL or no-SQL).

In this example, it's likely we'd want to understand why items (a) and (b) were removed - perhaps because of availability or undesirable shipping terms. We'd also want to understand whether the new items (2) and (3) were added as replacements or alternatives to (a) and (b). Regardless, the absence of the events in the first place prevents any such awareness, let alone analysis.

As a developer, you might relate to an example that hits closer to home. Recall the largest source code base you've worked on -- ideally one with several years of history. Now zip it up -- minus the .git folder, and remove the hosted repository. That static source snapshot certainly has value, but consider the information you've lost. You no longer are able to see who contributed which changes or when. Assuming you've tied commits to work items, you've lost that traceability, too. The zip file is a perfect representation of state, but without being able to see the events that created that state, you're in a very diminished capacity.

Hybrid approaches

Of course, many systems combine state and events, and the focus on one or the other could be visualized on a continuum. On the left, storing only events, you'd have something that looks like a pure event-sourcing design¹. If and when you need to see current state, the only way to get that is to replay all the events in the stream for the period that interests you. On the right, of course, you have only current state, with no record of how that state came to be.

In practice, most services and almost all applications will be somewhere between the extremes of this continuum. Even a pure event-sourcing design will store state in projections for use in queries and/or snapshots to mark waypoints in state. Even more likely will be a mix of services with relative emphasis differing from service to service:

There's no right answer here, and the best design for many applications will employ a hybrid approach like this to capure dynamic behavior where it's appropriate without losing speed advantages typically seen in flattened-state data stores. When considering these alternatives, though, I'd encourage you to keep your historical bias in mind and try not to lose event data when it may be important.

For more on event sourcing, see Martin Fowler's excellent intro as well as this excellent intro on EventStore.com. ↩︎

October 3, 2023

Dyamic Validation using FluentValidation

Validation in a C# class is an area where configuration / customization pressures can sneak into an application. Simple scenarios are typically easily-served with some built-in features, but as complexity builds, you may be tempted to start building some hard-coded if-then blocks that contribute to technical debt. In this article, I'll look at one technique to combat that hard-coding.

Simple Validation

Consider an application tracking cars on a dealer's lot. For this example, the car model is super-simple -- just make, model, and VIN:

public class Vehicle
    {
        public string Make { get; set; }
        public string Model { get; set; }
        public string VIN { get; set; }
    }

Microsoft has a data annotation library that can handle very simple validations just by annotating properties, like this:

        [Required]
        public string Make { get; set; }

When used with the data annotations validation context, this is enough to tell us a model isn't considered valid:

            var v = new Vehicle.Vehicle();
            Assert.IsNotNull(v);
            ValidationContext context = new ValidationContext(v);
            List<ValidationResult> validationResults = new List<ValidationResult>();
            bool valid = Validator.TryValidateObject(v, context, validationResults, true);
            Assert.IsFalse(valid);

This is not, however, especially flexible. Data annotation-based validations become more difficult under circumstances like:

Multiple values / properties participate in a single rule.
Sometimes a validation is considered a problem, and sometimes it's not.
You want to customize the validation message (beyond simple resx-based resources).

Introducing FluentValidation

Switching this to use FluentValidation, we pick up a class to do validation now, and the syntax for executing the validation itself isn't much different so far.

    public class VehicleValidator:AbstractValidator<Vehicle>
    {
        public VehicleValidator()
        {
            RuleFor(vehicle => vehicle.Make).NotNull();
        }
    }
...
            // testing
            var v = new Vehicle.Vehicle();
            var validator = new VehicleValidator();
            var result = validator.Validate(v);

By itself, this change doesn't make much difference, but because we've switched to FluentValidation, a few new use cases are a bit easier. Forewarned - I'm using some contrived examples here that are super-simplified for clarity.

Contrived Example 1 - search model

Again, this example isn't what you'd use at scale, but let's say you'd like to use the Vehicle mode under two different scenarios with different sets of validation rules. When adding a new vehicle to the lot, for instance, you'd want to be sure you're filling in fields required to properly identify a real car -- make, model, vin (and presumably a boatload of other fields, too). This looks similar to the example above where we required only Make. The same model could be used as a request in a search method, but validation would be very different - in this case, we don't expect all the fields to be present, but we very well might want to ask for at least one of the fields to be present.

        public SearchVehicleValidator()
        {
            RuleFor(v => v.Make).NotNull()
              .When(v => string.IsNullOrEmpty(v.Model) && string.IsNullOrEmpty(v.VIN))
              .WithMessage("At least one search criteria is required(1)");

            RuleFor(v => v.Model).NotNull()
              .When(v => v.Model != null)
              .WithMessage("At least one search criteria is required(2)");

            RuleFor(v => v.VIN).NotNull()
              .When(v => v.Make != null && v.Model != null)
              .WithMessage("At least one search criteria is required(3)");
        }
    }

In FluentValidate, this syntax isn't too bad with just three properties, but as your model increases in size, you'd want a more elegant solution. In any event, we can now validate the same class with different validation rules, depending on how we intend to use it:

            var v = new Vehicle.Vehicle() { Make = "Yugo"};
            var svalidator = new SearchVehicleValidator();
            var result = svalidator.Validate(v);
            Assert.IsTrue(result.IsValid);  // one non-null field here is sufficient to be valid 

            var lvalidator = new ListingVehicleValidator();
            result = lvalidator.Validate(v);
            Assert.IsFalse(result.IsValid);  // all fields must now be valid

Contrived Example #2:

Buckle up - we're going to shift into high gear on the contrived scale. For this example, let's say you're deploying this solution to several customers, and they have different rules about handling car purchases -- some expect to see a customer pre-qualified for credit prior to finalizing a purchase price, and some won't. Again, in this case, the example isn't as important as the technique, so please withhold judgement on the former. In addition to extending the Vehicle model to become a VehicleQuote model:

    public class VehicleQuote: Vehicle
    {
        public decimal? PurchasePrice { get; set; }
        public bool? CreditApproved { get; set; }
    }

We're also going to validate slightly differently. Here, note the context we're passing in and the dynamic severity for the credit approved rule:

    public class PurchaseVehicleValidator : AbstractValidator<VehicleQuote>
    {
        VehicleContext cx;
        public PurchaseVehicleValidator(VehicleContext cx)
        {
            this.cx = cx;
            RuleFor(vehicle => vehicle.Make).NotNull();
            RuleFor(vehicle => vehicle.Model).NotNull();
            RuleFor(vehicle => vehicle.VIN).NotNull();
            // dynamic rules
            RuleFor(vehicle => vehicle.CreditApproved).Must(c => c == true).WithSeverity(cx.CreditApprovedSeverity);
        }
    }

This lets us pass in an indicator for whether we consider credit approved to be an error, warning, or even an info message. The tests evaluate "valid" differently here, as any message hit at all -- even one with an Info severity -- will evaluate IsValid as false.

        var vq = new Vehicle.VehicleQuote() { Make = "Yugo", Model = "Yugo", VIN = "some vin" };
        var cx = new VehicleContext() { CreditApprovedSeverity = FluentValidation.Severity.Error };
        var qvalidator = new QuoteVehicleValidator(cx);
        var result = qvalidator.Validate(vq);
        Assert.IsTrue(result.Errors.Any(e => e.Severity == FluentValidation.Severity.Error));  // non-approved is an error here.

        cx.CreditApprovedSeverity = FluentValidation.Severity.Info;
        qvalidator = new QuoteVehicleValidator(cx);
        result = qvalidator.Validate(vq);
        Assert.IsFalse(result.Errors.Any(e => e.Severity == FluentValidation.Severity.Error));  // all fields must now be valid

Note that the same technique can be used for validation messages -- you could drive these from a ResX file or make them entirely dyanmic if you need to create different experiences under different circumstances.

Wrapping it up

The examples here are clearly light in depth, but they should show some of the ways FluentValidation can create dynamic behavior in validation without incurring the technical debt of custom code or a labrynth of if-then's. These techniques are simple and easy to test, as well.

You can find the code for this article at https://github.com/dlambert-personal/dynamic-validation

September 28, 2023

Customer Lifecyle Architecture

If you've gone through any "intro to cloud development" presentations, you've seen the argument for cloud architecture where they talk about the needs of infrastructure scaling up and down. Maybe you run a coffee shop bracing for the onslaught of pumpking-spice-everything, or maybe you're just in a 9-5 office where activity dries up after hours. In any event, this scale-up / scale-down traffic shaping is well-known to most technologists at this point.

But, there's another type of activity shaping I'd like to explore. In this case, we're not looking at the aggregate effect of thousands of PSL-crazed customers -- we're going to look at the lifecycle of a single customer.

Consider these two types of customers -- both very different from one another. The first profile is what you'd expect to see for a customer with low aquisition cost and low setup activity -- an application like Meta might have a profile like this, especially when you expect users to keep consuming at a steady rate.

Compare this to a customer with high acquisition cost and high setup activity -- this could be a customer with a long sales lead time or one with a lot of setup work, configuration, training, or the like. Typically, these are customers with a higher per-unit revenue model to support this work. Examples of a profile like this could be sales of high-dollar durable goods, financial instruments like loans, or insurance policies, and so on.

So, What?

Why, exactly, would we care about a profile like this? I believe this sort of model facilitates some important conversations about how a company is allocating software spend relative to customer activity, costs, and revenue, and I also think an understanding of this profile can help us understand the architectural needs of services supporting this lifecycle.

Note that this view of customer activity has some resemblance to a discipline called Customer Lifecyle Management (CLM). Often assoociated with Customer Relationship Mangement (CRM), CLM is typically used to understand activities associated with acquiring and maintaining a customer using five distinct steps: reach, acquisition, conversion, retention and loyalty.

Rather than just working to understand what's happening with your customer across the time axis, consider some alternative views. We can map costs and revenue per customer in a view like this, for example. Until recently, the cost to acquire a customer could be known only on an aggregate basis - computed based on total costs across all customers and allocated as an estimate. But as finops practices improve in fidelity and adoption, information like this could become a reality:

FinOps promises to give us the tools to track the investment in software you've purchased, SaaS tools, and of course, custom tools and workflows you've developed. Unless these tools are extremely mature in your organization, you'll likely need to allocate costs to compute these values. Also note that these costs are typically separated on an income statement - those occurring pre-sales tend to show up as Cost of Goods Sold (GOGS), while those after the sale are operating expenses (OpEx), and if you intend to allocate COGS, these will need to be interpolated, as none of the customers you don't close generate any revenue at all.

Less commonly considered is the architectural connection to a view like this. You might see a pattern like this if you're writing an insurance policy and then billing for it periodically:

Not only do these periods reflect different software needs, they also reflect different oppportunities to learn about your customer. Writing a new insurance policy is critically-important to an insurance company -- careful consideration is made to be sure the risk of the policy is worth the premium to be collected. For that reason, an insurance company will invest a lot of time and energy in this process, and much will be learned about the customer here:

On the other hand, the periodic billing cycle should be relatively uneventful for customer and carrier alike, and less useful information is found here -- at least when the process goes smoothly.

I believe the high-activity / high-learning area also suggests a high-touch approach to architecture. Specifically, I think this area is likely where highly-tailored software is appropropriate for most enterprises, and it also likely yeilds opportunities for data capture and process improvement. Pay attention to data gathered here and take care not to discard it if possible. Considering these activity profiles temporally may also lead us to identify separation among services, so in this case, the origination / underwriting service(s) are very likely different from the services we'd use after that customer is onboarded:

What's next?

I'm early in exploring this idea, but I expect to use this as one signal to influence architectural decisions. I expect that an acvity-over-time view of customers will likely help focus conversations about technologies, custom vs. purchased software, types of services and interfaces, and so on.

At a minimum, I think this idea is an important part of modeling services to match activity levels, as not all customers hit the same peaks at the same time.

September 26, 2023October 6, 2023

When is an event not an event?

Events, event busses, message busses and the like are ubiquitous in any modern microservice architecture. Used correctly, events support scalability and allow services to evolve independently. Used poorly, though, events create a slow, messy monolith with none of those benefits. Getting this part of your architecture right is crucial.

The big, business-facing events are easiest to visualize and understand - an "order placed" event needs no introduction. This is the sort of event that keeps a microservice architecture flexible and decoupled. The order system, when it raises this event, knows nothing about any consumers that might be listening, nor what they might do with the event.

sequenceDiagram participant o as Order Origination participant b as Event Bus participant u as Unknown Consumer o ->> b : Order Placed b ->> u : Watching for Orders

Also note that this abstract event, having no details, also doesn't have a ton of value, nor does it have much potential to create conflicts if it changes. This is the "hello, world" of events, and like "hello, world", it won't be too valuable until it grows up, but it represents an event in the truest sense -- it documents an action that has completed.

A more complete event will carry more information, but as you design events, try to keep this simple model in mind so you don't wind up with unintended coupling system-to-system when events become linked. This coupling sneaks into designs and becomes technical debt -- in many cases befor you realize you've acrued it!

The coupling of services or components is never as black & white as the "hello, world" order placed event - it's a continuum representing how much each service knows about the other(s) involved in a conversation. The characterizations below are a rough tool to help understand this continuum.

When I think about the characterization of events in terms of coupling, the less coupling you see from domain-to-domain, the better (in general).

Characteristics of high coupling:

Events produced with no knowledge of how or where they will be produced. Note that literally not knowing where an event is used can become a big headache if you need to update / upgrade the event. Typically, you'd like to see that responsibility fall to the event transport rather than the producing domain, though.
A single event may be consumed by more than one external domain. In fact, the converse is a sign of high coupling.

Characteristics of high coupling:

Events consumed by only one external domain. I consider this a message bus-style event - you're still getting some of the benefits of separating domains and asynchronous processing, but the domains are no longer completely separate -- at least one knows about the other.
Events that listen for responses. I consider these to be rpc-style events, and in this case, both ends of the conversation know about the other and depend to some extent on the other system. There are places where this pattern must be used, but take care to apply it sparingly!

If you see highly-coupled events - especially rpc-style events, consider why that coupling exists, and whether it's appropriate. Typically, anything you can do to ease that coupling will allow domains to move more freely with respect to one another - one of the major reasons you're building microservices in the first place.

A little more on events: State vs Events

September 21, 2023September 26, 2023

“Hello world” is always beautiful

Every programming language you've ever learned began with "Hello world".

"Hello World" is typically a couple lines long, and it's always simple and easy to understand. You probably moved on to something like a to-do list after that -- also enticingly simple and easy.

Now, stop for a moment and consider the last bit of production code you touched. Not so beautiful, right?

There are a couple takeaways from this juxtaposition. First, any language / framework looks great in that introductory scope. Not only is a simple use case a godsend to show how elegant a tool is, any tool will pick a domain / scenario that suits their tool when they create that demo app.

The second takeaway is a little more reflective. Remember the joy of learning that new language, the freedom you felt looking at the untarnished simple code, and the optimism you felt when you started to extend it. Now, flip back to that production code again. What can you do to get some of that simplicity and usability back to your production code? Certainly, it'll never look like "hello, world", but I think it's worth a moment's reflection every now and again to see if you can get just a little closer.