Data Smells

Developers are familiar with "code smells" --the little signs you see upon superficial examination of code that lead you to fear deeper pathalogical problems.  Over time, many developers become pretty good at spotting these signs, and volumes have been written about how to address these problems once they're detected.

But code smells aren't the only signs that problems are lurking in your system.  Most systems with even moderately complex data models can hide all sorts of problems in their data.

Skunk works logo
Image via Wikipedia

A good system, of course, will be coded defensively, such that it can tolerate, or maybe even fix bad data.  This is feasible and practical in small to mid-sized systems, but it becomes increasingly difficult as systems become larger and more complicated.  In all but trivially-small applications, bad data is a very real problem.

Like bad code, bad data is sometimes bad in very subtle ways.  Database constraints can (and should) be used to prevent obvious problems with things like unique id's, foreign-key references, required values, and so on.  This is a minimal requirement, but it won't help you deal with data that violates complex business rules (ex: an order must have an associated invoice if the status of the order is "placed order").

Typically, you'll find examples of data rule violations when you're diagnosing error, or maybe when you're doing reporting or data analysis.  When an instance of bad data is discovered, you've really got only two ways to deal with the problem:  Fix the code, or fix the data.

Fixing the code is often our first reaction, since we're (generally) more comfortable working in code than data in the first place.  We'll often go to the source of the error, and we'll change the code to tolerate this particular class of bad data, but it's important for us to ask ourselves if this is truly a fix for the problem:

  • Just because we fix the code in one place, how can we know we won't blow up somewhere else because of the same bad data?
  • If there's really a business rule governing this data, how are we helping by tolerating violations to those rules?
  • Are the business rules governing this data known at all (I know this sounds silly, but it's going to be a valid question more often than you might think)?
  • How did the bad data get into the system to begin with (is there a real bug upstream that's allowing bad data to be created)?

So in some cases, at least part of the problem is to fix the bad data.  Again, there are some questions you should pose of your system before you dive headlong into SQL:

  • Are there other instances of this data corruption?  How many?
  • What are the circumstances of the problem?  Is there a way to predict the scope or context of the problem?  Perhaps the context can lead you to the source of the data corruption.
  • Can the data be fixed at all?  Sometimes, the damage is irreversible, and repairs can be quite difficult.

As you may have gathered by now, the sooner these issues can be nipped in the bud, the better off you'll be.  I'll cover some strategies to help you with this in a future post.

Reblog this post [with Zemanta]

Windows 2008 Server Licensing == FAIL

2k8-server-licensing-for-dummiesI just saw a blog post from Bill Sempf describing a book he'd written for Microsoft to help them explain licensing for Windows Server 2008.  At first, I read right past a key metric, but I doubled back and read it again -- the book is 86 pages long.

Eighty-Six pages?? Really???

Now, don't get me wrong.  I have every faith that Bill's done a fine job of documenting the licensing requirements in the simplest fashion possible.  I don't mean to bash the book; I mean to bash the licensing requirements.

Do you think there's a chance that the real problem here isn't the fact that nobody liked reading licensing whitepapers?  Maybe the real problem is that the licensing model takes 86 pages to explain in a "Dummies" book.  How long is the "Licensing Unleashed" book going to be??

I'm not exactly sure how you fill up 86 pages with licensing guidelines, but I have to guess you're going to see chapters like this:

  • What's it going to take to put you in a new OS today?
  • If you have to ask, you can't afford it.
  • Feeding Ballmer's Ellison-envy since 1998.
  • If you think the licensing rules are complicated, you should see our commission calculations.
  • This is going to hurt me more than it hurts you.
  • Hang on a second while I go talk to our General Manager.
  • How much did you say your budget was again?

Enjoy the read, though.

Best. Logger. Ever.

Logging is one of those "system" components that always seems to either be left out or way over-engineered (glares at Microsoft's Enterprise Application Blocks). Today, I'd like to introduce you to a logging framework that's everything it needs to be and nothing it doesn't.

The .Net Logging Framework from The Object Guy is powerful enough to handle any of your logging needs, but simple and painless to use.  Here's a relatively complicated example -- we're going to log to three logging sources to demonstrate how easy it is to set up.  In most cases, of course, you'll log to only one or two sources:

/* first instantiate some basic loggers */
Logger consoleLogger = TextWriterLogger.NewConsoleLogger();
Logger fileLogger = new FileLogger("unit_test_results.log");
Logger socketLogger = new SerialSocketLogger("localhost", 12345);

/* now instantiate a CompositeLogger */
logger = new CompositeLogger();

/* add the basic loggers to the CompositeLogger */
logger.AddLogger("console", consoleLogger);
logger.AddLogger("file", fileLogger);
logger.AddLogger("socket", socketLogger);

/* now all logs to logger will automatically be sent
to the contained loggers as well */

/* logging is a one-liner */
logger.LogDebug("Logging initialized.");

When you download this logger, you'll get all the source code, including a socket reader to catch the logs thrown by the socketLogger in the example above.  Extending the logger is a piece of cake, too, so you could build yourself a WCF Logger, for instance, in no time flat.

You'll note the lack of config file-driven settings in the example above -- this is purely intentional.  You can decide if you want to make any of these settings configurable, and do so in the format you're comfortable with, so you don't need to try to get your config files to conform to whatever format your logger insists on using.  This small simplification can be a big time-saver for simple apps, debugging / test harness apps, and so on.

NDepend review

Introduction

There's no question that Visual Studio is a class-leading tool for building large applications.  The IDE is incredibly helpful to coders, and the .Net framework lends itself to managing dependencies among components and classes in large applications.  In addition, Visual Studio is designed to be extended by third-party tools that can make it even better.  NDepend is one of these tools; its purpose is to analyze large applications and expose information that's typically hidden deep inside your code.

Installation and getting started

The NDepend web site shows some great screen shots with all manner of graphs and charts and reports, so naturally, you want to see that stuff for your code, too, right?  Good news: installation is a piece of cake.  Just unzip into a directory, add the license file, and you're ready to start your first analysis.

When you start the NDepend Startup shows a screen reminiscent of Visual Studio (start screen).  Create a new project, point it at a Visual Studio solution file, and let NDepend do its thing.  Zero to more graphs than you can shake a stick at in about four minutes:

ndepend1
Continue reading "NDepend review"

Kobe outrage – why?

Over the last couple days, I've seen a small firestorm erupt over Microsoft's Kobe MVC sample project.  First out of the gates was Karl Seguin, with a rant about all sorts of coding ills found in the source.  Then, today, I saw another great post that breaks down some specific problems with duplicate lines and cyclomatic complexity.

My first reaction was to share in the shock and horror of it all -- after all, how could Microsoft issue a sample app so obviously riddled with problems, but as I thought about it some more, the real issue became obvious:

Real-world apps deal with this sort of coding all the time.

You see, very few apps evolve in environments that are conducive to crafting museum-quality code throughout the entire application.  Very few software teams, in fact, are comprised entirely of developers who are capable of crafting this sort of code.  Our development teams have imperfections, and so do our systems.

Maybe Microsoft should have been held to a higher standard because (a) they're Microsoft, and (b) this is a sample application where the primary deliverable is the source code itself.  That's valid, but what's more interesting to me is that this brand-spanking-new framework is so damned easy to mangle and abuse!

As architects and managers, one of the most valuable long-term effects we have on the software we develop is to leverage good ideas across our teams and our code.  It's not sufficient that we can dream up the perfect architecture and the perfect solution if it stays in our heads.  The real measure of effectiveness is how well we can project these good ideas across the entire breadth of code in all of our systems, and this is where our current tools fall short.

Developers love C# because you can do anything with it.  It's a fabulously-flexible tool.  Architects try to impose order by mandating the use of standards, frameworks, and constraints.  It's a perpetual tug of war.  We're supposed to be working together like an orchestra, but in most cases, we're still a bunch of soloists who just happen to be on the same stage.

In Kobe's case, the mere fact that people are opening up this code and recoiling in shock because the code doesn't look the way they expected is proof that we're not where we need to be  -- on many levels.

Reblog this post [with Zemanta]

Software maintenance – are you feeling locked-in?

In the early days of shrink-wrapped PC software, I used to buy a software title and not expect to pay anything more for that software until I decided to upgrade it. I might upgrade when the publisher released a new title, or I might skip a release -- it was up to me to decide.

Software Update
Image via Wikipedia

Pretty soon, though, software companies discovered that it was expensive to staff a help desk to support customers, and then they started to discover that it was painful to have customers working with software that was several versions out-of-date. The solution: software subscriptions and maintenance plans.  Enterprise software companies had already been doing this for years; it lets the software company generate revenue from support areas, and smooths their revenue stream (so it's not clustered around new releases).

Although enterprise customers had been paying maintenance for years, consumers have been wary of subscriptions.  They want to know what they're getting for their money, and they want to be able to decide when to upgrade.

In a recent blog entry (Software maintenance pricing - Fair or out of control?), Scott Lowe shows that this sentiment affects the enterprise customer, too.  Especially in these times of constrained budgets, enterprises aren't too excited about big increases in software maintenance prices without a whole lot of additional perceived value.

What's the difference between maintenance and a subscription plan?
Although these terms are frequently used interchangeably, maintenance typically entitles you to bug-fix releases only, while a subscription plan should provide feature releases, too. In either case, make sure you read and understand the license agreement so you're not surprised later.

If you're a software developer, you need to understand that you can't get away with milking your existing customers just because they decided to buy your software years ago.  This should go without saying, but if you sell subscription-based support, make sure you provide upgrades that are worth the cost your customers are paying.  In a recent Joel on Software thread, a developer sounded off against Lowe's article, but he completely missed Lowe's point:  the prices of software maintenance are going up, but value isn't.

If you're a customer, you hope your vendors are committed to providing value for your support dollars, but this won't always be true.  If you've ever felt locked into a vendor, you know this is no fun at all.  When you're faced with a vendor who's got you over a barrel, you can feel like your organization is being held for ransom, and you're powerless to extract yourself.

As a manager or an architect, part of your job is to manage vendor risk. I've got some thoughts on this, and I'll share them in another post.

Let me know what you're thinking - what are you doing to manage vendor lock-in in your organization?

Reblog this post [with Zemanta]

Reboot needed – really??

Microsoft, o que vem por aí?
Image by Daniel F. Pigatto via Flickr

It just happened to me again.  I came into work and was greeted by the cold, grey screen that told me my PC had rebooted.  Windows Update had struck again.

I've got a PC that hosts my development environment in a VM, so every time this happens, not only do I need to restart my VM (which takes for.ev.er...), I then need to wait for my VM to install the same damned update, and then reboot itself (which takes for.ev.er...).  This process alone typically eats a good hour or so.

Last night, though, I was running SQL scripts -- one in my VM, and another on another PC in my cube that I use just for SQL Server.  These were long-running scripts to do data migration to support testing and bug fix verifications that really need to be done asap.  Both machines were dispatched by Windows Update in the middle of the scripts, and both scripts had to be restarted.

This time, I'm losing an entire day because of Windows Update.

I've only got one question: Why the hell does Windows need to reboot every single time it installs any kind of update?

I've used Ubuntu for months on end, and I've seen it install all sorts of updates; rarely did one require a reboot.

Windows Update 649mb!
Image by Tom Raftery via Flickr

In terms of operating system enhancements, there's nothing I've wanted this badly since plug & play.  I'm really trying to understand how the Windows product planners keep missing this.   I'm picturing the product planning meeting: all the great minds gathered around the conference room table, and a list of enhancements up on the board.  Somewhere up there, between WinFS and Internet Explorer 14 (the one that finally supports all the W3C standards), there's my bullet point:  Windows Updates without reboots.

"Nope.  Gonna have to pass on that this time.  We need another new 3-D task switcher, so 'Updates without reboots' is just going to have to push to the next release."

Really??

I don't have the foggiest idea how many engineers are on the Windows team, but it's difficult to imagine that there isn't a spare rocket scientist somewhere who could banish this problem to the scrap heap of forgotten PC memories, right there next to QEMM where it belongs.

Back in the 90's, I used to work with a guy who started running one of the early Linux distros, and he'd brag that his Linux box had been up for six months straight, or something like that.  That's fifteen years ago, folks.

Is it possible that Microsoft hasn't fixed this problem because it realizes that Windows still needs to be rebooted every once in a while even after all these years of trying to get it right?

Wouldn't that be sad?

Reblog this post [with Zemanta]

Table variables in SQL Server

Working in SQL requires a mind-shift from procedural thinking to set-based thinking.  Although you can get SQL to operate one record at a time, it's usually slower, and almost always painful to force SQL to work like that.

One of the implications of set-based operations is that traditional single-value variables can't quite handle many processing needs -- you need a collection-like variable, and traditionally, that's meant a temporary, or temp table.  Temp table implementation is pretty similar across database engines, but it's also slightly cumbersome.  In many cases, SQL Server's Table variables are a better solution.

Although Microsoft introduced table variables with SQL Server 2000, I hadn't had occasion to try them out until recently.  Like cursors, temporary tables have usually been just enough of a pain that I avoided them until I really needed one.


CREATE TABLE #mystuff
(
id INT,
stuff VARCHAR(50)
)

-- do stuff with your table

DROP TABLE #mystuff

The thing that always messed me up when using temp tables was the DROP TABLE at the end.  When developing or debugging, or when working with ad-hoc SQL, I frequently missed the DROP, which showed up shortly afterwards when I tried to CREATE a table that already existed.

Table variables eliminate the need for the DROP, because they are reclaimed automatically as soon as they go out of scope:


CREATE TABLE @mystuff
(
id INT,
stuff VARCHAR(50)
)

-- do stuff with your table

Other advantages: for small tables, Table variables are more efficient than "real" temp tables, and they can be used in user-defined functions (unlike temp tables).  Like temp tables, you can create constraints, identity columns, and defaults - plenty of power to help you work through set-based problems.

For more on Table variables, see the following articles:

Table Variables In T-SQL

Should I use a #temp table or a @table variable?

Reblog this post [with Zemanta]

New syncing options for WinMo phones

Several mobile phones
Image via Wikipedia

Microsoft and Google have each announced syncing tools for Windows Mobile phones recently, but based on what I'm seeing, I'm sticking with a service you've probably never heard of.

Microsoft announced "My Phone" last week, and today announced that it will be available for free.  At present, it's in limited beta, but I'd expect it to be unleashed on willing participants pretty soon.  This service looks to be pretty limited, though -- it looks to be a great way to back up your phone, but not too much beyond that.

Today, though, Google announced plans to support syncing for contacts and calendars to a bunch of phones, including Windows Mobile.  Woot!  A closer look at the fine print, though confirms that this (like all new Google features) is a beta feature, and may not be completely baked yet.  Two-way sync, in particular, seems to be iffy for Windows Mobile devices.

Microsoft Office Outlook
Image via Wikipedia

So who needs these services, anyway? WinMo phones already synch to Outlook just fine, don't they?

Personally, I want something like this because I don't use Outlook anymore.

A few months ago, in a fit of Vista-inspired digust, I loaded Ubuntu on my home desktop, and vowed never to be stuck on a single desktop OS again.  I'd been using Outlook, and this was the only way I'd been able to sync contacts and calendars to my Windows Mobile phone, and so began a long quest for a wireless sync option for my phone.  I was amazed at how difficult this actually turned out to be, but I ended up using a service that I've been pretty happy with.

Continue reading "New syncing options for WinMo phones"

Countdown to release, according to Microsoft

Chart showing the stages in the software relea...
Image via Wikipedia

A while ago, I pointed out a Microsoft development team that was doing a great job of giving us glimpses inside the sausage factory.  In that instance, the Windows Home Server team was showing us how bugs are managed late in the release cycle.

Now, Microsoft is opening up again - on a larger scale this time.  A blogger on the Windows 7 team  has written a really informative post on the process the Win 7 team expects to use to move from Beta to General Availability.

If you ship commercial software, you need to understand the terms used in this article.  You also need to make sure your developers and your boss understand them.  These milestones are the dates that drive your release, and it's critical that everyone in your company shares an understanding of what these milestones mean.

If you develop in-house software, you may not use these terms, but you should still understand the concepts.  The same idea applies to size: smaller projects will have fewer public milestones.

Once you've decided on how you're going to define these milestones for your organization, keep track of projected and actual dates for a couple of releases.  Now, as you plan future releases, you've got some valuable data to help you: all things being equal, you'd expect to spend similar amounts of time in each stage of the release for similarly-sized features.  Use these ratios as a sanity check against your project plan; if your ratios are way off, you'd better be able to explain why.

Note that if you change the meaning of these terms every time you use them, they instantly become useless for estimates and measurements, and quickly become useless as a means of communication, so pick a quantifiable definition you like, and stick to it!

Reblog this post [with Zemanta]