Too Defensive

The topic of this post sorts reminds me of the whole DbC mess that paralyzed our group for years. Wow, the fallout from some key bad decsions! Well, this example is much more harmless, but I thought it was interesting.

A tester discovered that the SQM code wasn’t achieving the correct code coverage numbers. The target is normally 85% and the SQM code was hovering somewhere around 60%. Why was this?

Because there was a lot of code that looked like this:

if( FAILED( sqmInitStatus_ ) )
{
   HWSW::Trace( L"SQMSession::Initialize:: SQM temporary file path error: %s", pSqmFileLocation );
   return sqmInitStatus_;
}

Normally, getting inside an “if” statement like this is quite easy from a unit test. You simply have to write a test that forces that error condition. Then, testing your error handler is a simple matter. The only problem was that the author used a traditional method for validating string arguments at the beginning of functions. These traditional validation methods are fine for string validation when you’re using functions like strcat and strcpy. But, if you’re using the safe string functions (StringCch*), then is it overkill?

 if ( pAppVersion == NULL ||
 ( ::wcslen( pAppVersion ) == 0 ) || 
 ( ::wcslen( pAppVersion ) > MAX_VERSION ) ||

If these any of these conditions fail, the function will return an INVALID_PARAMETER return value. It’s therefore basically impossible for the StringCch* functions to fail, which makes getting inside those if statements impossible.

So, what should be done?

My opinion is to modernize this sort of code. The code does not need so much argument validation at the beginning because the StringCch* functions manage a lot of this. Also, writing unit tests early could have helped catch this sort of problem.

Code Reviews

History:

After some time, I started getting the idea that the evolved code review process may not be so good. Basically, team members use SD Pack and email. Everyone on the team gets the email and everyone sends the review back to the author. The author then pastes all the responses back into another email, comments on each response, and then sends the email. It suffices to say: Not a great use of time (the author has to do a lot of reading, synchronizing email directions with real code), and encourages superficial comments (everyone spends the same amount of time on general areas).

Action:

I started brainstorming ways to improve code reviews. At first I was just thinking of having more sit down paper code reviews. Fortunately, and very serendipitously, an EE training class on code reviews came to Shenzhen. It was taught by a principle developer on loan here in China from Redmond. He’s a solid guy, awesome dev, and one of the best teachers I’ve seen here.

The training was exactly what we were looking for. The great thing was that it actually demonstrated an easy way to measure how useful your code reviews are. We brainstormed some more on this topic, I gave three presentations to the team on the various ideas, we had one quick sample code inspection, and have derived a streamlined process that we’re trying out here at MACH.

My goal: Make the process easy and rewarding.

Process:

The training focused mostly on “code inspections”. We’ve taken some of the ideas and simplified it:

For example, code inspections identifies roles like: Gatekeeper, Moderator, Scribe. We’ve rolled many of these together. We still see value in a moderator (someone other than the author who runs the meeting), but that person can also act as the scribe (writing down bug statistics). The Gatekeeper (the one who ensures the code is ready to be reviewed) can be performed by each author (using an optimized and more visible version of the style guide, there should be no style issues).

There are simple but effective statistics that we will take too:

1. Counting the total number of issues.

2. Of the total issues, which ones are major? Major issues should be labeled and counted. Right now, we’re defining “major” as: Any issue that causes a crash, leaks a resource, hangs the application, affects performance, serious thread issues, or maintainability and design problems. If we know our team’s bug fix rate (we are calculating this now), then we can calculate if the code review was a good use of time. Also, as the percentage of major issues rise (without the overall of quantity of Major issues rising), we’re getting better at Gatekeeping and style.

3. Of the total issues, which ones overlap? Meaning, how many people found the same issue? This helps us to judge the effectiveness of the review. If similar issues, especially major issues, are found, the we have higher confidence in the effectiveness of the code review at finding bugs.

There are two other documents that fell out of this exercise:

1. Improved Style Guide: This idea came from the realization that most of our comments were high level style issues. So, how do we fix it? The first thought was to have a more useful style guide. So, I took the initiative to cut 80% out of the style guide so that it only covered style (not coding best practices or project settings)! We want to refer to it frequently and make changes as needed. One of the main goals though is to keep it short.

2. Code Review Checklist: This idea came from the training class. The team brainstormed and created this list. We haven’t made much use of this yet, but the idea is it’s a reminder of what’s important to review and, more importantly, if we need someone to focus deeply in a particular area, the checklist helps with this.

Of course, this process would not be not required for all code reviews, but only for the biggest, most important reviews. Walk-in peer reviews are still useful for quick reviews, SD Pack and email reviews are still useful in small sections of existing code that you’ve already touched, etc. Part of my job here will be to help the team members understand just when the appropriate review is necessary.

LINQ – .NET Language-Integrated Query for Relational Data

LINQ is basically a translation layer between the class paradigm for programming languages and rows in a relational database. Instead of the programming languages using SQL to query the database, they use instances of their classes and LINQ together. LINQ does all the SQL interaction.

When I first started reading about it, it reminded me of “Ruby on Rails”, specifically the “Convention over Configuration” paradigm.

For example, look at this C# class:

[Table(Name="Customers")]
public class Customer
{
public string CustomerID;
public string City;
}

And now this quote from the MSDN help: “The Table attribute has a Name property that you can use to specify the exact name of the database table. If no Name property is supplied, LINQ to SQL will assume the database table has the same name as the class.”

Except, there’s one little problem. Does the class really need to have the same name as the Table, or does the class need to have the same name as the Table minus the “s”?

Hopefully, I’ll remember to update this post when I learn the answer.

Oh, but look at this code!

[Table(Name="Customers")]
 public class Customer
 {
 [Column(IsPrimaryKey=true)]
 public string CustomerID;
 [Column]
 public string City;
 }

Now, there’s a “Column” keyword that tags the data member as a column in a database table. According to the same MSDN Help, “As with the Table attribute, you only need to supply information in the Column attribute if it differs from what can be deduced from your field or property declaration.”

This is “Convention over Configuration” at work.

Code Review Process

If I were to answer the question, “What are the elements of a worthwhile code review?”, to a IT reporter in need, I would answer something like this:

A proper code review is a valuable tool to any software team. The longer that bugs go undiscovered, the more costly it is to fix them. For example, if a developer works out a design on a whiteboard, updates and improvements can be made very cheaply, just by erasing part of the software diagram and starting again. After the product has shipped, an undiscovered bug becomes very expensive to fix, usually through a laborious patching process. A code review is a way to reduce costs by eliminating bugs early. Code reviews occur early enough to catch bugs before they have an opportunity to become very expensive to fix.

But, in order for the code review to be useful, it should be conducted in a responsible manner by a mature team with a good leader. All developers have certainly experienced code review horror stories or at least ones that qualify as wastes of time: Reviewers come to the meeting unprepared and begin reading the code line-by-line, or, reviewers focus on silly issues like “oh, this ‘for’ statement should have a space after the opening parenthesis.”

There are ways to eliminate these sorts of time-waster. There are even ways to measure if your code reviews are worth the time spent. After all, if you’re spending more time in your code review than you could have been fixing bugs, then your code reviews are not useful. Of course, this is rare.

First of all, to eliminate the silly code review comments like spaces or capitalization, you need a good style guide. A style guide describes how the team prefers to comment, how the team prefers to name classes, variables, and methods, and how to format. It should not attempt to convey software best practices, like how best to deal with memory that can’t be allocated, or tracking threads, and so on. It should only focus on style. The style guide should be short enough so that it can be easily grasped by everyone on the team and not cover every nauseating detail so that team members still have a sense of freedom to write the code the way they see fit. After all, writing code can be very personal.

Developers are then held accountable that they will have no style issues in their code. If the team has established good style guidelines which are readily accessible and used frequently, this ought to be easy.

To ensure that everyone prepares in advance, you can create some fun rules. For example, we have simple rule that says, “If you have no red marks on your code print-out’s you’re not allowed into the review. Red marks are the ticket!”

To help the team focus on important issues, a code review checklist is useful. The team can create this together, again keeping it very short so that it’s easy to refer to, and then use it to coordinate focusing on specific areas. Too often, poorly run code reviews has everyone focus on the same superficial issues. A code review checklist can help encourage a smaller group to focus deeply on a specific area, another group to focus on a different area, and so on. This helps the code review achieve depth.

It’s also a good idea to have someone other than the author to run the code review meeting. Software developers tend to be a proud lot and an independent moderator can keep the discussion from getting too personal. A good moderator is someone good at running meetings, keeping people on track and focused on important issues.

Finally, you can calculate just how effective your code reviews are by having two pieces of information: The average bug fix rate for a team member (usually ranges between 1 and 2 per day), and how many major issues you uncovered in your code review. Assume you found five major issues in your last code review and your team’s major bug fix rate is one per day. If there was no code review, it would have cost approximately five days to fix those bugs (the number of bugs times how long it takes to fix a major issue). If your code review had five participants, each prepared for one hour, and the review meeting lasted one hour, the total time is 10 hours. That’s well less than five days, so your code review was worthwhile!

Our team in China is just now adopting these changes and we’re excited about it. We especially enjoy being able to measure the effectiveness our our reviews.

Simple and Confusing Language

In the US, when someone asks, “We don’t have any bugs in our code, do we?”, we answer “No” (OK, don’t get too caught up here in the fact that all software has bugs somewhere.  That “no” is the normal answer, of course, if we’re feeling good about or code.  It’s generally well-designed, written, and tested.)

But, in China, for the exact question, the answer would be “Yes”, which means the exact same thing as a westerner answering “No”.

Translation:

In Chinese, the “Yes” basically means, “Yes, that is correct. We do not have any bugs in our code.”

In the US, the “No” means, “No, there are no bugs in our code.”

See how easy it is for us to misunderstand each other?!? This happened to a tester and me just the other day.

There are three cultural issues that we are establishing here at MACH:

  • Culture of Learning
  • Culture of Curiosity
  • Culture of Awareness

The MACH Side Projects Initiative helps to establish the Culture of Curiosity. Another way, that I started trying was to simply talk about other technologies to team members when we’re at lunch, or after work, when we go to a movie, and so on. The idea is to simply set an example for showing excitement for learning. The most recent example was having a lunch conversation with Gavin about my investigation of Ruby on Rails and how its design paradigms gives me some ideas on how we can improve our team’s design skills. I suppose this all goes back to the famous Gandhi quote: (paraphrasing) “You must be the change you want to see in the world.”

The Most Valuable Skill

I was thinking about this the other day… If there was one skill that members of the team could have to ensure their success, what would it be?

Of course, technical ability, communication skills, blah blah, those are all very important, but I’m thinking more basic.

If I had to pick one, I’d say it’s the ability to read English. Some on our team are quite good at this, others are not so good and it shows. Their ability to absorb larges amounts of important information is limited thus making them less able to quickly understand a technology, a spec, or, yes, even an email. Figuring out the most simple new task can be stopped in it’s tracks because a lack of access to deep information.

Learning English isn’t easy and though everyone at MACH is quite good at the basics of English, they are still living in China. As soon as they go home, they’re back in a world of Chinese. So, it takes much longer to get good enough at English, especially reading, where MACH Team members can absorb at the rate of native speakers.

Consider technical information. MSDN isn’t available in Chinese. In fact, there is only a trickle of technical information on the web in Chinese. Imagine having to research an obsure UAC bug when your English reading skills are sub par. Understanding what is needed to begin solving the bug will take a magnified amount of time. Often, the developer can never solve the bug simply because of a lack of deep access to information.

I’ve also been told by one of our developers that reading technical magazine articles is frustrating because it takes so long just to read and comprehend a few pages. I never would have thought this before I came to MACH, but then I started learning Chinese…and I still can’t read a newspaper.

What to do?

One idea: encourage team members to purchase and read technical books in Chinese. Though many books are reputed to have bad translations, this is at least a start. Reading books in Chinese and applying the learning immediately will help forge a stronger base to learn some of the more arcane details.  And of course, keep up your English training.  How about this:  Read the book in Chinese, and then read it again in English.  Not only do you get the benefits of repetition, but the reader can also more easily start connecting the meanings of technical vocabulary.

Managment vs. Leadership

From an exhibit in Leading Change by John P. Kotter

Management

Planning and budgeting: establishing detailed steps and timetables for achieving needed results, then allocating the resources necessary to make it happen

Organizing and staffing: establishing some structure for accomplishing plan requirements, staffing that structure with individuals, delegating responsibility and authority for carrying out the plan, providing policies and procedures to help guide people, and creating methods or systems to monitor implementation.

Controlling and problem solving: monitoring results, identifying deviations from plan, then plannng an dorganizing to solve these problems

Produces a degree of predictability and order and has the potential to consistently produce the short-term results expected by various stakeholders (e.g., for customers, always being on time; for stockholders, being on budget)

Leadership

Establishing direction: developing a vision of th efuture – often the distant future – and strategies for producing the changes needed to achieve that vision

Aligning people: communicating direction in te words and deeds to all those whose cooperation may be needed so as to influence the creation of teams and coalitions that understand the vision and strategies and that accept their validity

Motivating and inspiring: energizing people to overcome major political, bureaucratic, and resource barriers to change by satisfying basic, but often unfulfilled, human needs

Produces change; often to a dramatic degree, and has the potential to produce extremely useful change (e.g., new products that customers want, new approaches to labor relations that help make a firm more competitive)

Confusing Monday

I noticed simple communication mistakes and inefficiencies today. Here are the ones that I can remember. Now, imagine this multiplied over every day and over email from the US.

Email instructons: “From the dpgcmd project, take the files MediaCommands.h and .cpp and copy them to your new component” were misunderstood. I’m assuming this is an English skill problem. It could also be a way to avoid having to admit that there isn’t enough time to complete the problem!

Email instuctions: “Stop using try/catch in GetInstallTypeFromCmdLine(), GetInstallData(), and LaunchUnattendedInstall()” weren’t understood and required face-to-face followup. English/Cultural issue.

“OEM Unattended Setup must not show any UI” was interpreted to include not even showing a crash dialog! Cultural difference.

Alphabeticlly-organized lists are slowly searched. Whenever we scan a list of files together, the difference in our comparative search time is dramatic – a few seconds difference. English skill problem.

An interesting quote from a colleague here at MACH: “If Americans are much more open about communication in the US, then why are there so many miscommunications still?”

Shuffling Devs

I have a three recent examples of Extreme Programming failure, specifically “Collective Code Ownership”. Some of these examples are serious.

Collective Code Ownership means that everyone is responsible for all the code. Any developer may change any part of the code. An advantage claimed for collective ownership is that it speeds up the development process because if an error occurs in the code any programmer may fix it*.

Thankfully, we’ve abandoned most of the Extreme Programming Principles, but some of the attitudes persist. One example is our attitude towards developer resources. Except for a few key developers, it’s considered “boring” to leave someone working on the same codebase. Teams are often in firefight mode and need help. So, we shuffle developer resources from project to project, especially with the MACH Team, who often play a “sweeper” role.

The effect of this practice is that context, the important background and “whys”, are lost to the followup team being shuffled into a new project. As a result, the implementation isn’t satisfactory and the original team is perplexed with the follow-up team’s implementation, the code gets messier, or, worse, needs to be rewritten. A team that has to pick up where another team left off does not understand the decisions that were made, the history of the project, and the challenges.

Even when team leaders recognize the need to communicate the requisite context, the background description is often so complex, or communicated so poorly, that it’s too hard to grasp. This is multiplied across oceans and languages. It’s analogous to learning a foreign language by reading a book rather than doing.

How can that particular problem be eliminated? Answer: Stop the high frequency of shuffling. “That project is too boring” is not an excuse. It’s up to the leader to make it fun and challenging. I don’t think that any of our products are boring. There are all sorts of fun opportunties. A good leader will find them.

Here are my examples:

OEM Setup – Here’s an example where the MACH Team got lucky. The team was prepared to write a bunch of new (but easy) code to support the OEM setup scenarios when Robert, a WAI developer on 2.0, pointed out that the new code was not needed because the HWSW SDK Web Update APIs supported local downloads. In retrospect, I wish I would have recognized this and pointed the team to Patricia, who wrote the webupdate APIs, but I don’t think I was deep enough in the trenches to see it. Thankfully, a chance review by Robert saved us.

ITP Macro Feature – The Macro feature team was successful in delivering the feature for ITP, but it could have gone more smoothly with less architectural violations and hacks. This project also featured a huge amount of background and context needed, like three weeks worth, before the team could even start real work! My two examples:

Dependencies on the IP codebase violated the NextGen rules and make the component less portable.

We had to hack the code to hide the “Macro command…” and “Quick Turn” commands in ITP. This is easily solvable with the current architecture, but the team was not aware. When it was discovered, it was far too late to turn back.

ENDA – This was a serious lack of communication that resulted in a recreation of an existing architecture. We basically paid a lot more for a less robust and flexible solution. Having the right people involved would have avoided the problem.

I can’t stop now because here’s a fourth example happening right now as I write this entry. A developer in Redmond, who has reviewed MACH’s proposal for OEM’s Auto Update, lacks the background context of the original Auto Update. So, questions are naturally asked. Three MACH developers have now spent nearly two hours (and counting) discussing how to respond to the questions. The MACH Team only recently learned the context themselves. So, you have one developer, twice removed from the original implementation, spinning up a six-hour conversation (three developers times two hours…and counting) to help explain the context. In this problem, the answer is simple: Firmly state that you are remaining cosistent with the original implementation and stick with your design.

I’m not stepping in and to suggesting them what to say. This is their opportunity to learn.

Fires Across the Ocean

The perception: A Redmond development team in gets in to trouble and needs help with fixing bugs. The MACH SW Team has functioned as a bug fixing house before, so we follow with tradition and send the bugs across the ocean. The MACH Team needs to ramp up on a new area and manage a high level of communication overhead (mostly in email). The MACH Team spends much more time than a team in Redmond would to understand what is needed.

There are two cultural influences on this issue that I see:

Mindfulness of authority – The MACH Team is generally more receptive to requests coming from authority figures (the Redmond Team).

Fear of mistakes – The MACH Team is generally more worried about making mistakes than their counterparts are in the US.

The result: A huge amount of wasted time. When large emails come across the ocean describing a task, every single word is scrutinized. Even if not every word is understood, team members are often reluctant to ask for clarification. The problem continues: more confusing and log emails and more time wasted reading emails. Even small suggestions in emails can send the team out on a wild goose chase. A simple suggestion of “You might want to look at this solution” can often be interpreted as “You should understand everything about this option because it might be a solution”. The reason? The cultural influences listed above.

What’s the perception on the Redmond side? My guess: “The MACH SW Team is slow! How can it take 3-4 weeks to fix such a bug?” A bad reputation ensues. Who’s fault is it? I personally don’t blame a single group. I see MACH making efforts to understand through study and training. But, I do not see such efforts in Redmond. Is this their fault? No. So, let’s make people in Redmond aware that communication improvement is a two-way street.

True story: Two team members on the MACH HWSW Team spent over three weeks each on two bugs. My question: How much did this help the team? Was it worth the effort to send these bugs to MACH? How can having two guys occupied for over 6 weeks of man-time on two bugs be valuable in firefight mode?

Six weeks on two bugs is a gross abuse of Pareto’s principle. Enter fear of conviction. Which choice would you pick:

Use your past experience and understand that throwing a bug over the ocean could occupy a magnified amount of someone else’s time. Instead of doing that, you find creative ways to eliminate the problem. Maybe that unit test isn’t so well-conceived, so you take a chance (with good payoff chances) and bend the rule of “always have unit tests” and cut the unit test in favor of a manual test case. You save the remote team lots of time, but get no credit for it because there’s not tangible “time saving” bottom line to credit you with.

Throw the bug over the ocean content that you’re following the rules. It’s now someone else’s problem that they can’t get the bug done in time; you’ve absolved yourself of responsibility.

In our environment, rational folks pick #2 and I don’t blame them, but it’s ridiculously ineffiecient. I see it over here. What is our goal with the MACH Team? To absolve our worry and responsibility or to have a productive team. I know the answer is the latter. To achieve that, the engagement of the MACH SW Team must be improved.