“Follow the moon” architecture

Cloud Computing has gained a lot of momentum this year.  We're hearing about more new platforms all the time, and all the big players are working hard to carve out a chunk of this space.  Cloud computing originally promised us unlimited scalability at a lower cost than we could achieve ourselves, but I'm starting to see cloud technologies promoted as a "green" technology, too.

Refractive phenomena, such as this rainbow, ar...
Image via Wikipedia

According to an article on ecoinsite.com, cloud vendors with worldwide networks could choose to steer traffic to data centers where it's dark (thus, cooler) to cut cooling costs.  Since this typically corresponds to off-peak electricity rates, researchers from MIT and Carnegie Mellon University believe that a strategy like this could cut energy costs by 40%.

Clearly, this is cause for great celebration, but how ready are our systems for "follow the moon" computing?

One of the tricky bits that crossed my mind was increased latency.  As important as processing speed is, latency can be even more important to a user's web experience.  Most of the "speed up your app" talks and articles I've seen in the last year or so stress the importance of moving static resource files to some sort of Content Delivery / Distribution Network (CDN).  In addition to offloading HTTP requests, CDN's improve your application's speed by caching copies of your static files all around the globe so that wherever your users are, there's a copy of this file somewhere nearby.

"Follow the moon" is going to take us in exactly the opposite direction (at least for dynamic content).  While we may still serve static content from a CDN, we're now going to locate our processing in an off-peak area to serve our peak-period users.

While this might not seem like a big problem (given that we routinely access web sites from around the globe right now), I believe the added latency is going to adversely affect most contemporary web architectures.

A quick, "back of the napkin" calculation can give us a rough idea of the sort of latency we're talking about.  The circumference of the earth is around 25,000 miles.  Given the speed of light, which is the fastest we could hope to communicate from here to there, we're looking at a communication time of at least 12,500 / 186,000 = 0.067 seconds (67 ms each way), for a round trip of 135ms.

Taking a couple quick, random measurements shows that we're not too far off.  I pinged http://www.auda.org.au/ in 233ms and http://bc.whirlpool.net.au/ in 248ms, which shows the additional overhead incurred in all the intermediate routers.

If your application is "chatty", you're going to notice this sort of delay.  The AJAX-style asynchronous UI favored in modern web apps will buffer the user a bit by not becoming totally unresponsive during these calls, but on the other hand, these UI's tend to generate a lot of HTTP requests as the various UI elements update and refresh, and I believe that the overwhelming majority of UI's are going to show a significant slowdown.

Although increased latency means that you may have a hard time moving to "follow the moon" on some applications, there are steps you can take that will prepare your architecture so it's able to withstand these changes.

Partition your application to give yourself the greatest deployment flexibility.  If you can find the large chunks of work in your app and encapsulate them such that you can call them and go away until processing is done, then these partitions are excellent choices to be deployed to a "follow the moon" cloud.

Finally, when assembling components and partitions into an application, use messaging to tie the pieces together.  Messaging is well-supported on most cloud platforms, and its asynchronous nature minimizes the effect of network latency.  When you send a message off to be processed, you don't necessarily care that it's going to take an extra quarter of a second to get there and back, as long as you know when it's done.

These changes will take some time to sink in, but we're going to see more and more cloud-influenced architectures in the coming years, so you'd do well to get ready.

Reblog this post [with Zemanta]

The role of QA – it’s not just testing anymore

In most development organizations, Quality Assurance is treated as an under-appreciated necessity.  In organizations that develop software while not considering themselves "development organizations", it's quite possible that you won't even find a QA group.  QA, it seems, just can't get no respect.

U.S. Army Photo of :en:EDVAC as installed in :...
Image via Wikipedia

Yet QA, if it's executed well, can give your organization the confidence to move boldly and quickly without fear, because QA done right is all about controlling the fear of the unknown.

Too often, QA is viewed as a mechanical exercise.  It's all about writing test plans and clicking through applications.  But this view is short-sighted, and it misses the context that makes a great QA team a strategic partner in a development shop.  In order to really create an effective QA organization, I believe it's crucial to keep an eye on the big picture.  Software is inherently unreliable, and your job is to reduce uncertainty.

I know - this is blasphemy - especially from a software developer, and yes, there once was a time when software was entirely predictable and deterministic.  When I began programming, it could fairly be said that doing the same thing twice in a row would yield the same results.  This is no longer the case.  The incredible list of factors that contribute to variations in outcome ("it works on my machine") grows longer every day.  If you weren't already twitching, multi-core CPU's are working hard to make each trip through the execution pipeline a new and exciting journey (CPU performance is no longer exactly deterministic).

Testing, once the cornerstone of QA, is now only anecdotally interesting.  My only advice on testing is to try to understand the enormity of the total space of possible testing scenarios so that you come to realize that you can't possibly test every combination of function point, hardware, and software. This will help you focus on picking the scenarios you test strategically. Automate where you can, and use your brightest testers to look for weakness in the product.

Beyond traditional testing, true QA requires that you find ways to make the product better.  Obviously, you can write feature requests, but think bigger than that, too. Build testability into the application so that your team is better equipped to reduce uncertainty.   Use active verification, logging crosschecks, and so on to provide tangible evidence that your system is really doing what you think it's supposed to do.

Testing is passive, and it's becoming less effective all the time.  Quality Assurance is active, so get involved early and make sure that the testing you do actually means something.

Reblog this post [with Zemanta]

Deborah Kurata blogging on OO basics

All Saints Chapel in the Cathedral Basilica of St.
Image via Wikipedia

Deborah Kurata is well-known to us old-timers who used to read her excellent articles on OO techniques years ago.  She's been flying under the radar for a while, but she's recently started blogging (great news)!

Over the last couple weeks, Deborah has been doing something of a quick primer on OO for her readers, based on some of her earlier writings.  While I expect most readers here already consider themselves OO experts, I really encourage everyone to read through this series:

If you're new to OO, these will be a great way to improve your skills.  If you're experienced in OO, though, I'd like you to consider something else.

Notice the simplicity and elegance in Deborah's descriptions of Objects.  Compare this to the big, framework-based, computer-generated, uber-designs we deal with every day.  There's a reason every developer prefers green-field development to brown-field development, and there's a reason why we see a sensation like Ruby-On-Rails every few years.

The simplicity and beauty of starting over is liberating.  Starting a brand-new object hierarchy with only the "real" properties of those objects brings a clarity to our design that we rarely see in Enterprise applications.

I urge you to experience this feeling of simplicity from time to time, and remember what it feels like.  When you start looking at frameworks, tools, standards, and all the other trappings of grown-up development, consider how these things move us away from the simplicity of Deborah's definitions.  If you can implement frameworks and tools that stay out of your way and leave the purity of real objects visible to the developer, please give some thought to how that's going to help connect your developers to your customers (you know -- the folks who pay for you to build this stuff in the first place).

Now, go read those articles!

Reblog this post [with Zemanta]

1&1 – Unlimited bandwidth hosting

Web host 1&1 just announced that they're lifting the bandwidth caps on all their hosting plans.  The previous caps were high enough that they rarely affected most customers, but anyone who was ever "slash-dotted" will appreciate this move.

Image representing 1&1 as depicted in CrunchBase
Image via CrunchBase

As you may have noticed, I use 1&1 to host this site, and I also run a couple of club sites on them.  Although I saw some reliability problems with 1&1 a couple years ago, I have to say they've been pretty good since then.  I use mon.itor.us to watch uptime on all these sites, and I haven't seen any major issues in a long time.

If you need a host, include these guys in your eval list.

Reblog this post [with Zemanta]

End resume-driven design

How many times have you seen the latest technology injected into a project, used for the duration of a feature / release, and then left to whither?  I've seen it more times than I'd like, and it's got to stop.

Don't get me wrong.   I love new technology -- a lot.  I love learning how it works, and I love to figure out where it's appropriately used.  I also know that you can't just throw new technology into a software project without a plan, and yet I see it happen over and over.

Abstract Technology architecture parabolic Windows
Image by Wonderlane via Flickr

Last week, I saw someone try to shove Entity Framework into a project on the sly, as if nobody was going to notice.  Chaos ensued when this plan blew up, and repairs are now being made.  The bungle itself was alarming, but I was even more disturbed to reflect on how many checks and balances were already blown before the right people learned what was going on, and why it was a bad idea.

This is a failure on multiple levels.

First, developers themselves should know better than this.  The reason EF was chosen in this case was nominally because it was supposed to help the team deliver a feature more quickly.  As developers, we've all seen this argument fail dozens of times, and yet we fail to learn our lesson.  New technology certainly improves our craft over time, but the first time we use any new tool, we run into teething problems.  If we grab a brand-new, shiny box of tech goodness off the shelf and honestly think that it's going to work perfectly the first time we plug it in, we should be hauled out back and bludgeoned.

Next failure: architectural guidance.  In this case, there exist architectural standards that cover data access, so at first glance, it would appear that this is an open and shut case.  In practice, though, the standards are very poorly socialized, and they're badly out of date.  In short, they have the appearance of not being relevant, so it's easy for developers to discount them.  Architectural standards need to be living and breathing, and they need to evolve and grow so that designs can adopt new technologies at a measured pace.  To do less than this is to erect a static roadblock in front of developers.  The developers will drive around it.

Finally, management allowed this failure in a couple ways.  Early in this process, a dysfunctional conversation occurred.  Management needed a feature by such-in-such date.  Development thought this wasn't nearly long enough.  Wringing of hands and gnashing of teeth ensued, and eventually, the developer capitulated, claiming that we could make the date, but only by departing from our normal development standards and using this new tech tool instead.   Some form of this conversation has been responsible for more software disasters than I could count.

Follow The Cops Back Home
Image by David A. Pinto via Flickr

No matter how much time we put into defining our processes, no matter how many years of experience we draw upon, and no matter how many times it's proven that shortcuts kill, we keep getting suckered into them.

Personally, I draw a couple of conclusions from this.  First, we just need to have a little more personal integrity and discipline.  That's sort of a cheap shot, but it's true.  The second conclusion, though is more of a reminder to us as an industry: if we're so desperate that we'll take long shots like this, despite the odds, then the state of the industry must be pretty bad, indeed.  As an industry, we need to acknowledge that we're causing this sort of reaction, and we need to find a way to be more productive, more reliably.

But not by throwing out the process just when we need it most.

Reblog this post [with Zemanta]

Feet on the ground

Here's your free management tip for the day: get out of your chair and go see what's happening on the floor.

Every summer, I go to Boy Scout summer camp with my son.  Although this passes for vacation, it invariably ends up being a management clinic.  You might think you see where I'm going with this: chasing 30 Scouts around is just like chasing developers, right?

Boy Scouts enjoying Summer Camp
Image by Max Wolfe via Flickr

But that's not actually what I had in mind.  My real job at summer camp is to remove obstacles for the kids.  They're at camp to earn badges, and it's amazing how many trivial problems pop up and totally confound either the kids or the camp counselors, who are also kids.  Left to their own, both the Scouts and the counselors can be expected to churn on these problems, waiting for them to fix themselves.

I've seen counselors, for instance, find themselves without some tool or supply they need to teach a class, and they'll just wait for the tool fairy to drop by and bestow upon them a brand-new left-handed clipboard, or box of triple-gussetted ziplock bags, or whatever they're missing.  In the mean time, Scouts sit idle because they can't get their work done, and badges are not earned.

Other problems occur as well: sometimes there are disruptive students; sometimes the counselors don't know the material they're supposed to teach; sometimes kids aren't paying attention to work they're supposed to do outside of "class" time.  In all these cases, though, if you're sitting back at camp asking kids how they're doing, they'll all tell you things are great -- right up to the end of the week when they don't get their badges.

Now, it's true that some of these problems would eventually sort themselves out, but probably not in time for most of these kids to catch back up again.  I consistently find that if I spend time checking out classes myself, I can spot problems and take them to someone who can help, and we can get small issues turned around before they become large issues.

What's the lesson here for a software manager?

First, I'm not suggesting that you hover behind someone's desk so you can swat them when they stop coding.  I'm talking about finding the real obstacles to effective development and addressing them.

You might find that your developers need better equipment.  You might find that the requirements they're working from are woefully inadequete.  You might find that servers they depend on are slow or frequently down.

But don't limit your observations to developers only.  Check out the help desk.  This is one of the best sources for information about how your software is really performing.  It's one thing to read about a problem in a help desk ticket, but it's another thing altogether to observe your customers' emotional reaction to problems with your software.  Similarly, watch your software in use by real customers, or by your operations team, and you'll gain a new appreciation for real-world usability and performance.

When there's something wrong with your software, you'll probably learn about it eventually through your TPS reports, but you can keep an awful lot of small problems from becoming big problems by getting out there and looking around.

Reblog this post [with Zemanta]

Apple is really pushing its luck

Apple has been the undisputed darling of electronics marketing since the introduction of the iPod.  Everything they touch turns to gold, and they've built the mystique of the Apple brand into a legendary golden goose.

Image representing iPhone App Store as depicte...
Image via CrunchBase

But it's been a tough week for Apple.

People have been grumbling about the arbitrary and seemingly random approval process for apps on the iPhone, but the applesauce really hit the fan this week when Apple revoked apps that work with Google Voice app:

There's No App For That

This has set off a small firestorm among developers:

Is this the beginning of the end for Apple?  Probably not.  This isn't the first time Apple has made some consumers mad, but there's only so many times they can pull stunts like this before it starts to catch up with them.

Reblog this post [with Zemanta]

Backwards compatibility can kill you

Windows XP
Image via Wikipedia

"Release early, release often."  This is the Web 2.0 mantra, and it's also a major guiding principle behind agile development proceses.

In product development, conventional wisdom has it that first-to-market- or first-mover advantage is hugely important.  But for software products, this can kill you by painting your product into a corner from which it never recovers.  Just about every software product is burdened with backward compatibility issues, and for many products, compatibility is paramount as customers create files with these products, integrate them into their environments, and come to depend on the software acting the way it does.

With its very first release, then, a software company can find itself wearing an anchor.  Let's look at some examples.

Microsoft Windows

Windows is the definitive example of a product that's become hamstrung by backward compatibility.  Compatibility was one of the things that killed Vista.  It turns out that you can only slap so many coats of paint over a dry-rotted wall before you need to rebuild the frame.  Even security (another noted weakness) would be better if Microsoft had really had the option to start over.

In the mean time, Apple was able to come out with a fresh operating system, and they're now cleaning Microsoft's clock in high-end systems.

Smartphones

Let's look at another example: smartphones. Palm and Blackberry were the first big players in this market, and Windows Mobile came along after that. Since those early days, all three of these OS's have remained essentially the same, and that's now causing them to look very, very dated. The iPhone, and to a lesser degree, Google's Android, are eating their lunch.

Palm's situation had gotten so dire, in fact, that many predicted their demise. Out of this desperation, though, Palm launched the Pre, which (surprise) is based on a brand-new OS. This isn't a coincidence: Palm simply wouldn't have been able to get to the Pre by migrating PalmOs incrementally, taking care not to ruffle any feathers along the way.

Lessons Learned

These disruptive changes are risky. Again, it's not surprising that it took a near-collapse on Palm's part to force them to roll the big dice on the Pre. This is a bet-the-company move for Palm, and they couldn't find the stones to make this move until they had little to lose by failing. This is not to say that I blame them; on the contrary, it's really an acknowledgment that it's a truly bold move to see companies innovating like this when they're still on top.

So what's the lesson here?

Simple. When you make platform decisions, understand that you're going to be sticking with them for a while. Your designs and decisions have the potential to stick around and haunt you for a very long time. But when you do determine that it's time for a change, you need to be able to cut the chains and move on every once in a while.

It's never too late to learn. Microsoft is demonstrating that they've learned this lesson (to some extent) in Windows 7. They've just announced that they're going to include a virtualized, licensed copy of Windows XP so that users can run old software in that old OS. Not only will this make the customers happy (because the can keep their old apps), it also sets the stage for Microsoft to finally kick some of that old OS compatibility code to the curb. Good riddance.

What's the albatross around your neck? What's it going to take for you to finally get rid of it?

Reblog this post [with Zemanta]

My development fabric is unraveled

I'm a few days into working with the Azure July CTP, using Steve Marx's excellent PDC presentation as a bit of a primer.  I'm following along with Steve's presentation, and it was working just fine for a while.  I had a working Azure app, using an MVC front-end, and I was reading and writing images as blobs from the local development-version storage pools.

Then, the next thing I knew, I was busted:

Role instances did not start within the time allowed.
Role instances did not start within the time allowed.

I shut down the Development Fabric, as instructed, and even restarted my machine, to no avail whatsoever.  I started Googling this error, and found a handful of other people who've seen this error, too.  I even found a bug logged on Microsoft's Connect site, but no solution has presented itself.  There's an event (3006: Parser error) logged every time I try to start the app, but that's the sum total of the clues I've got to go on with this one.

So far, I've tried tearing out all the stuff I've added to the project since it was last working (which didn't help at all), I've tried blowing away and recreating the storage pools (no joy), and I've tried creating a brand-new Azure project (which worked).  Thus, I'm forced to conclude that something caused this particular project to be irreparably hosed, but I've still got no idea what caused the problem.

This could really slow me down...

Reblog this post [with Zemanta]

Azure isn’t supposed to do this!

I'm doing some research on Azure (finally playing with the bits), and in the process, spending some time cruising Steve Marx's blog.  Steve presented some Azure stuff at PDC earlier this year, and there's some great stuff on his blog if you're working with Azure for the first time.  One of the things you'll learn if you watch his PDC stuff is that his blog is, itself, built on Azure and running in the cloud, so it should be a showcase for all of Azure's scalability claims.

Thus, it was with great surprise that I clicked a link and saw this Azure app apparently taking a dirt nap:

This webpage is not available.

The webpage at http://blog.smarx.com/?ct=1!8!c21hcng-/1!48!MjUyMTc1NjA1Nzk4NDI2NzA1MiBhdC1sYXN0LS1zcGFtLQ-- might be temporarily down or it may have moved permanently to a new web address.

I refreshed a couple of times, to no avail, and finally tried going back to the home page, which worked fine.  I'm still not sure exactly what went wrong, but it would appear that the god-awful token that was used to track my navigation got lost in the cloud somewhere.

The lesson here?  For production apps, you're still going to need to build your Azure apps defensively, and make sure that customer-facing hiccups are handled in a user-friendly fashion.  As a user, I don't know (or care) if this error was an Azure failure or a failure of the app that's hosted on Azure.  This isn't a dreadful error when I'm browsing a blog, but it could have been if I'd been paying bills or making an online purchase.