Customer Lifecyle Architecture

If you've gone through any "intro to cloud development" presentations, you've seen the argument for cloud architecture where they talk about the needs of infrastructure scaling up and down. Maybe you run a coffee shop bracing for the onslaught of pumpking-spice-everything, or maybe you're just in a 9-5 office where activity dries up after hours. In any event, this scale-up / scale-down traffic shaping is well-known to most technologists at this point.

But, there's another type of activity shaping I'd like to explore. In this case, we're not looking at the aggregate effect of thousands of PSL-crazed customers -- we're going to look at the lifecycle of a single customer.

Consider these two types of customers -- both very different from one another. The first profile is what you'd expect to see for a customer with low aquisition cost and low setup activity -- an application like Meta might have a profile like this, especially when you expect users to keep consuming at a steady rate.

Compare this to a customer with high acquisition cost and high setup activity -- this could be a customer with a long sales lead time or one with a lot of setup work, configuration, training, or the like. Typically, these are customers with a higher per-unit revenue model to support this work. Examples of a profile like this could be sales of high-dollar durable goods, financial instruments like loans, or insurance policies, and so on.

So, What?

Why, exactly, would we care about a profile like this? I believe this sort of model facilitates some important conversations about how a company is allocating software spend relative to customer activity, costs, and revenue, and I also think an understanding of this profile can help us understand the architectural needs of services supporting this lifecycle.

Note that this view of customer activity has some resemblance to a discipline called Customer Lifecyle Management (CLM). Often assoociated with Customer Relationship Mangement (CRM), CLM is typically used to understand activities associated with acquiring and maintaining a customer using five distinct steps: reach, acquisition, conversion, retention and loyalty.

Rather than just working to understand what's happening with your customer across the time axis, consider some alternative views. We can map costs and revenue per customer in a view like this, for example. Until recently, the cost to acquire a customer could be known only on an aggregate basis - computed based on total costs across all customers and allocated as an estimate. But as finops practices improve in fidelity and adoption, information like this could become a reality:

FinOps promises to give us the tools to track the investment in software you've purchased, SaaS tools, and of course, custom tools and workflows you've developed. Unless these tools are extremely mature in your organization, you'll likely need to allocate costs to compute these values. Also note that these costs are typically separated on an income statement - those occurring pre-sales tend to show up as Cost of Goods Sold (GOGS), while those after the sale are operating expenses (OpEx), and if you intend to allocate COGS, these will need to be interpolated, as none of the customers you don't close generate any revenue at all.

Less commonly considered is the architectural connection to a view like this. You might see a pattern like this if you're writing an insurance policy and then billing for it periodically:

Not only do these periods reflect different software needs, they also reflect different oppportunities to learn about your customer. Writing a new insurance policy is critically-important to an insurance company -- careful consideration is made to be sure the risk of the policy is worth the premium to be collected. For that reason, an insurance company will invest a lot of time and energy in this process, and much will be learned about the customer here:

On the other hand, the periodic billing cycle should be relatively uneventful for customer and carrier alike, and less useful information is found here -- at least when the process goes smoothly.

I believe the high-activity / high-learning area also suggests a high-touch approach to architecture. Specifically, I think this area is likely where highly-tailored software is appropropriate for most enterprises, and it also likely yeilds opportunities for data capture and process improvement. Pay attention to data gathered here and take care not to discard it if possible. Considering these activity profiles temporally may also lead us to identify separation among services, so in this case, the origination / underwriting service(s) are very likely different from the services we'd use after that customer is onboarded:

What's next?

I'm early in exploring this idea, but I expect to use this as one signal to influence architectural decisions. I expect that an acvity-over-time view of customers will likely help focus conversations about technologies, custom vs. purchased software, types of services and interfaces, and so on.

At a minimum, I think this idea is an important part of modeling services to match activity levels, as not all customers hit the same peaks at the same time.

When is an event not an event?

Events, event busses, message busses and the like are ubiquitous in any modern microservice architecture. Used correctly, events support scalability and allow services to evolve independently. Used poorly, though, events create a slow, messy monolith with none of those benefits. Getting this part of your architecture right is crucial.

The big, business-facing events are easiest to visualize and understand - an "order placed" event needs no introduction. This is the sort of event that keeps a microservice architecture flexible and decoupled. The order system, when it raises this event, knows nothing about any consumers that might be listening, nor what they might do with the event.

sequenceDiagram participant o as Order Origination participant b as Event Bus participant u as Unknown Consumer o ->> b : Order Placed b ->> u : Watching for Orders

Also note that this abstract event, having no details, also doesn't have a ton of value, nor does it have much potential to create conflicts if it changes. This is the "hello, world" of events, and like "hello, world", it won't be too valuable until it grows up, but it represents an event in the truest sense -- it documents an action that has completed.

A more complete event will carry more information, but as you design events, try to keep this simple model in mind so you don't wind up with unintended coupling system-to-system when events become linked. This coupling sneaks into designs and becomes technical debt -- in many cases befor you realize you've acrued it!

The coupling of services or components is never as black & white as the "hello, world" order placed event - it's a continuum representing how much each service knows about the other(s) involved in a conversation. The characterizations below are a rough tool to help understand this continuum.

When I think about the characterization of events in terms of coupling, the less coupling you see from domain-to-domain, the better (in general).

Characteristics of high coupling:

  • Events produced with no knowledge of how or where they will be produced. Note that literally not knowing where an event is used can become a big headache if you need to update / upgrade the event. Typically, you'd like to see that responsibility fall to the event transport rather than the producing domain, though.
  • A single event may be consumed by more than one external domain. In fact, the converse is a sign of high coupling.

Characteristics of high coupling:

  • Events consumed by only one external domain. I consider this a message bus-style event - you're still getting some of the benefits of separating domains and asynchronous processing, but the domains are no longer completely separate -- at least one knows about the other.
  • Events that listen for responses. I consider these to be rpc-style events, and in this case, both ends of the conversation know about the other and depend to some extent on the other system. There are places where this pattern must be used, but take care to apply it sparingly!

If you see highly-coupled events - especially rpc-style events, consider why that coupling exists, and whether it's appropriate. Typically, anything you can do to ease that coupling will allow domains to move more freely with respect to one another - one of the major reasons you're building microservices in the first place.

“Hello world” is always beautiful

Every programming language you've ever learned began with "Hello world".

"Hello World" is typically a couple lines long, and it's always simple and easy to understand. You probably moved on to something like a to-do list after that -- also enticingly simple and easy.

Now, stop for a moment and consider the last bit of production code you touched. Not so beautiful, right?

There are a couple takeaways from this juxtaposition. First, any language / framework looks great in that introductory scope. Not only is a simple use case a godsend to show how elegant a tool is, any tool will pick a domain / scenario that suits their tool when they create that demo app.

The second takeaway is a little more reflective. Remember the joy of learning that new language, the freedom you felt looking at the untarnished simple code, and the optimism you felt when you started to extend it. Now, flip back to that production code again. What can you do to get some of that simplicity and usability back to your production code? Certainly, it'll never look like "hello, world", but I think it's worth a moment's reflection every now and again to see if you can get just a little closer.

Ten years of Nvidia

I haven't been writing actively for a couple years, and having a breath to catch up, I'm working on remediating that absence. I just took a quick scan through drafts I'd started, and I tripped over one from 2013 (yes, ten years ago). The draft was really an email I'd sent pasted into the editor to be picked up later. At the time, it may have been somewhat interesting, but in hindsight, I think it may be more impactful. Here's the email - the original context was me trying convince my ex-wife that a game development class my son was interested in wasn't a waste of time:

I sort of hinted at some reasons I thought the "intro to gaming" class [my son] was looking at next semester might be more useful than it sounds, but I probably wasn't very clear about it.

Here's an announcement from this morning from a graphics card mfg:

If you actually look at the picture of the card, you'll notice there's nowhere to plug in a monitor, because this graphics card isn't really a graphics card in the same sense that we think of them.

As these cards have become crazy-specialized & powerful in the context of their original purpose, their insane number-crunching capabilities haven't been lost on people beyond game developers.  In the same way that you see these cards supporting physics engines in games, they're now becoming more and more commonly used in scientific and engineering applications because this same type of computing can used for simulations and other scientific uses.

As I mentioned to [my son] when he was working in Matlab, the type of programming needed for this sort of processing is very different from the traditionally-linear do-one-thing-then-do-the-next-thing style of programming used for "normal" CPU's -- by its nature, it has to be asynchronous and massively parallel in order to take advantage of the type of computing resources offered by graphics-type processors.

Cutting to the chase, even though "game programming" probably sounds like a recreational activity, I think there's a decent chance that some of the skills touched on in the class might translate reasonably well into engineering applications -- even if he doesn't necessarily see that during the class.

Damn. Right on the money. Ten years on, Nvidia rules AI. Why? It's the chips and the programming model. All the reasons I cited to give game programming a chance have come to pass, and for what it's work, the kid wound up using Nvidia GPUs to build neural networks in grad school.

Game programming, indeed.

To SaaS or not to SaaS

I just saw an announcement from 37signals about a “new” initiative to sell old-fashioned purchased, installed-on-premise software. Is this a legitimate reversal of the seemingly inevitable march to SaaS software, or merely another option for customer?

My first reaction: this is a breath of fresh air. Personally, I’ve been miffed seeing software I’d purchased when I wanted and upgraded when wanted move to a subscription model. It messed with my sense of self-determination. But when I took five minutes to work out the math, the subscription model worked out to just about what I’d been spending on purchases and upgrades. As an individual consumers, that stigma was all in my head.

But 37signals sells enterprise software, and for enterprise software, there are some pretty interesting implications if the pendulum starts to swing back. I’m interested to see how some of these turn out.

One of the largest implications of the SaaS model for companies is way these purchases impact accounting. While purchased software is typically treated as a capital investment that’s amortized with expenses incurred over the life of the software, the SaaS model typically shows directly as expenses. I’m curious whether this has factored into the shift back to purchase-once software.

Another defining characteristic of SaaS is its continual delivery model. Agile methodologies in general favor frequent deployment, and a true CI/CD model can see this happen very frequently — much more frequently than any enterprise would want to install on-prem software. So, an on-prem delivery model implies going back to distributing releases and hot-fixes that will be installed by customers. Anyone who’s ever supported that model will tell you it’s no piece of cake.

Finally, the announcement from 37signals triggered a mental model that I believe exists in a lot of business decision-makers. The model is rooted in that same CapEx accounting model, and suggests that software can be bought, installed, and left to run as you would treat a filing cabinet. While this model may be appropriate for some applications that change very infrequently, I think this idea can prove harmful when applied to software that needs to grow and change with a business. There’s a nuanced view of the care and feeding and evolution of an application that can be lost in the filing cabinet model, and I’m frankly nervous to see that model perpetuated.

I’ll be watching this development from 37signals as it’s rolled out. They’ve always been thought leaders in the industry, and I’m curious to see if this the beginning of a pendulum swing back toward a self-hosted model.

An Unhelpful Exception

Last year, my son introduced me to a podcast called “Black Box Down”.  Each episode is a discussion about an air disaster of some sort, and in the vast majority of cases, the eventual BOOM is preceded by a long sequence of small problems, dealt with poorly, or over-reacted-to until recovery is impossible.

Exceptions are like that.

I’ve been working on getting DataStax / Cassandra installed on a machine running on Hyper-V, and although it’s been a little while since I experienced the joy of Java stack traces, it’s coming back to me in a hurry.  I Just worked through one that turned out to be rooted in my Java version.  It turns out when the installation requirements indicated Java 8 or better, they left out, “but not too much better”.  After a bit of googling, I found a link suggesting I might need to roll back my Java version, and that did, in fact, make things better.  The process to diagnose has left a bit to be desired, though - starting with the epic stack trace..

What might have been better than this?  How about an assertion on startup to flag the Java version as a problem?  Once upon a time, this software was using a contemporary version of Java, and it would never have occurred to any reasonable developer to put an assertion like this in place, but at this point, it would be really helpful.  Do I have any reasonable expectation that this sort of assertion would make it into the source code?  Nope.  I fully expect that anyone expert enough in this package to contribute to it is well aware of the Java version required, so there’s just no benefit to them to include an assertion like this.  Next problem: packages built on packages built on packages, which make it pretty hard to pin down who should really be responsible for calling out the version dependency.

You may be thinking at this point that an IaC or container solution might be helpful, and I agree.  In this particular case, one of the alternatives at my disposal was an Oracle Virtual Box image that was already set up.  I opted out of that because I’ve already set up Hyper-V and Docker Desktop, and I’m concerned about having too many virtualization platforms sparring with one another.  The “bare metal” setup wasn’t ideal right out of the gates, but the silver lining in this case is that I’m learning a ton.

My biggest takeaway from this experience is going to be to “embrace the suck” – paying attention to the frustration and impact on productivity so I can watch for problems like this in software I’m producing.  As always, putting yourself in your users’ shoes will pay dividends, I believe.

Team Topologies and Conway’s Law

Most of the reading I've done lately has been combinatorial in nature; that is, each book I pick up seems to reference another book I feel compelled to add to my reading list. My latest, Team Topologies, has been no different. This book has a pretty dry title but a really compelling pitch: it aims to help companies finally sort their org charts.

The book is structured with a "choose your own adventure" feel with an introductory section and a series of deeper dives that can apply more or less to different scenarios. In this way, a reader can skip straight to a section that's likely to apply to a problem they're experiencing. In working through the the first couple chapters, though, I was already experiencing "aha" moments.

A central source of inspiration for the ideas in Team Topologies is Conway's Law, which features prominently in the foundational chapters, Conway's Law suggests that organizations build systems that mimic their communication channels, so Team Topologies authors Matthew Skelton and Manuel Pais believe that the human factors of organizational design and the technical factors of systems design are inexorably intertwined, and in digesting this book, I found myself reflecting on scenarios that bear out this premise.

If there's a vulnerability in this book at all, it's that intertwining of organization and technical design. Strictly speaking, it's not a problem with the book at all; rather, it's an inexorable implication of Conway's Law. To the extent you buy into that relationship, it means that no software organization problem is merely technical and no organizational problem can be fixed by sliding names around on an org chart. Ostensibly, this may make change seem harder. In reality, I believe it points out that no change you may have believed easy was ever really changed at all.

Author Allan Kelly has done some writing on Conway's Law, as well, and I believe his assertion here carries deep implications:

More than ever I believe that someone who claims to be an Architect needs both technical and social skills, they need to understand people and work within the social framework. They also need a remit that is broader than pure technology -- they need to have a say in organizational structures and personnel issues, i.e. they need to be a manager too.

Perhaps this connection of technology and organization helps explain the tremendous challenges found in digital transformation. I'm reminded of a conference speech I attended several years ago. The speaker was a CTO / founder of a red-hot technical startup, and he was sharing his secrets to organizational agility. "Start from scratch and do only agile things," to paraphrase, but only slightly. Looking around the room at the senior leaders from very large corporations in attendance, I saw almost exclusively despair. While a neat summation of this leader's agile journey, that approach can't help affect change in the slightest.

Does Team Topologies help this problem? Indirectly, I believe it can. Overwhelmingly, just highlighting the deep connection between people, software, and flow is a huge insight. Lots of other books and presentations kick in to address flow, but the background and earlier research docuented by Skelton and Pais in the early chapters of the book helped create a deeper sense of "why" for me. The lightbulb moment may be trite, but it's appropriate for me in this case. I found the book really eye-opening, and it's earned a spot on my short list of books for any business or technical leader.

Why we Architect for Experiences

Change is afoot once again. We've heard a ton about 5G for the last couple years, and we're just now starting to see this technology emerge in the market. Plenty of pundits have speculated that 5G is going to have the same magnitude of impact as the introduction of the web or the birth of mobile computing. Last week, we saw Facebook launch a major rebranding last week. The Metaverse is coming, powered in part by that same 5G network.

What's the connection to 5G? Edge computing and experiences. In this article from Ericsson, Peter Linder cites 5G as a key enabler for AR and VR without the use of a tethered PC. Few of us can imagine how our business applications will take advantage of experiences like this, but recall that just a few years ago, we'd have had a hard time imagining the rich mobile applications we take for granted today.

These new experiences aren't going to take the place of the experiences we've got today -- they're going to add to them. If you're not already seeing the same explosion in channels, that's coming, too. Partners, new brands, bundled products -- all these channels demand access to the capabilities you've got, packaged up in new ways.

There's no way to support this explosion without careful separation of experiences from capabilities. You should never see the mechanisms by which a command is carried out implemented in the same place your customer experiences it!

Spotify has some great experiences enabled by this sort of separation. I was streaming from my desktop PC this week through my "good" speakers (better than my laptop!), but I also had a Spotify window open on my laptop. I just happened to see the play list sync'ed to the laptop screen, and when I clicked next there, my session on my desktop followed right along. Although this is an ultra-simple example, it's evidence that Spotify is using its interfaces to send Commands to a back-end service -- nothing about the session that's streaming is connected to the commands at all -- otherwise, the audio source would have changed!

At this point, a little faith may be required in order to see a need like this in your enterprise, but for those with preparation on their side, I believe a myriad of new experiences will be possible in a few short years.

A whole lotta chickens

You may be familiar with the old joke about the chicken & pig's relative contributions to breakfast -- the chicken, of course, being involved by virtue of providing the eggs, and the pig being committed vis-a-vis the bacon.  The origin of this saying is now lost to antiquity, but it has been adopted as an illustration of dedication by sports personalities and business coaches because it manages to capture these relative levels of engagement succinctly and powerfully.

I found myself reaching for the chicken & pig business fable this week in the context of Product Ownership.  We've got a back-office product that's just not getting a lot of love from the business -- lots of people who want to provide input, but nobody who's interested in taking ownership.

This isn't unexpected or unreasonable.  Product Management as a professional discipline is still largely nascent.  On top of that, initiatives that raise the top line are always an easier sell than back-office cost-center programs.  For these reasons, I'm not sure that accounting, billing, document-management and other cross-cutting infrastructure-like programs are likely to lead the way in agile adoption or digital transformation, but transform they must -- eventually.

HTML is dead! Long live HTML!

While catching up on newsletters from CodeProject, I came upon an interesting article talking about the "why" of JavaScript UI's -- not the typical "how".

Why JavaScript is Eating HTML

In "Why JavaScript is Eating HTML", Mike Turley walks through the "classic" static HTML for structure + CSS for appearance + JavaScript for behavior example, and then examines how this application evolves as JavaScript begins to control the application more deeply by interacting directly with the DOM.

Reflecting on this article, we've done this sort of thing before.  Going all the way back to CICS to run terminal applications on mainframes, we've separated UI structure from behavior.  Microsoft Access had its forms, which propagated to Visual Basic, and eventually to .Net, WPF, XAML, and so on.  Static is easy, and frankly, it works pretty well most of the time, but as UI behavioral needs become more sophisticated, these static structures are ill-equipped to handle those needs.

So, I'm skeptical these techniques are going to put HTML out of business anytime soon, but in a dynamic application, they make a boatload of sense.