Archive for the ‘Reference Material’ Category

Troubleshooting Techniques: Final Thoughts

Wednesday, December 19th, 2007

A great bullet point to be able to place on your performance evaluation reads something like this:

“Led the team of people that solved the Tractor Beam escalation, which was costing the company $X per day in schedule slippage, and communicated progress to a wide variety of upper management personnel throughout the troubleshooting process.”

Something like this speaks for itself and shows you to be a valuable commodity as someone who can save the day, setting you apart from the people you are being compared with for rewards. Demonstrating a knack for troubleshooting tends to help you develop a reputation. The more you are successful at it, the more often you will be asked to become involved in them.

While that does come with its pressures it also brings its rewards, the least of which is a strong statement to be able to make to your boss when it is time to pass out raises. It can lead to something more immediate too, like being first in line for personal equipment upgrades. Regardless, performing well under these kinds of circumstances demonstrates you to be a knowledgeable and dependable employee who truly defines grace under pressure.

Troubleshooting Techniques: Step 5 - It’s fixed, now what?

Wednesday, December 19th, 2007

The problem is solved and now you are done, right? Wrong. Now that the crisis has been resolved, there are a few things you should do next. First, communicate that the problem is solved to the interested parties you sent your status updates to earlier in the process and anybody else who might be interested. Again, the rule here is to err on the side of over communicating instead of under. Summarize the solution as much as possible, keeping in mind that not everyone will have your point of view and understand the complexity of the problem (or its solution) to the degree you do. Also, be sure to give credit to everyone involved, which is crucially important as you build and maintain relationships. Even if it was your brilliant deduction that uncovered the core problem, someone else probably helped you capture the information upon which you drew your conclusions. This is yet another place where you can make a friend or an enemy based on your behavior. An example is shown in Figure 2.



Figure 2: A sample troubleshooting resolution email (click to enlarge)

Next, there may have been some process problem that led to the faulty behavior. This could mean incorrect instructions as mentioned previously, but it could also uncover a design, test, or manufacturing flaw of some kind. Look at the point in your product development processes where the problem should have been caught and fixed and suggest any changes that might prevent it from occurring again on subsequent projects.

For your own purposes, make sure to record the incident and include as many specifics as you can. Should this problem, or a similar one, occur in the future it is handy to have the details at your disposal before you forget about them. Having a good desktop searching tool, like Google Desktop Search, installed on your personal computer can make it easier to find these recordings than exclusively relying upon some manual organization of documents you design.

Finally, you have a decision to make. Now that you are the hero of the day because you solved the problem that everybody was worried about, what else do you do with the information? One option is, you could hoard it, insuring that you will get the call again the next time something similar comes along and give yourself a chance to save the day again. The downside to that is that you may get called every time something similar comes along and put a great deal of pressure on yourself as well as be a distraction to the other tasks you are supposed to be working on. The other way to handle it is to teach someone else how to solve the problem you just fixed. This might be the person who called you with the problem to begin with, so that they can help themselves next time, or it might be a coworker with similar technical knowledge. This gets you out of dealing with troubleshooting situations in the future, but it also limits your hero opportunities. This is another area that you have to decide yourself with experience over time as your guide.

Troubleshooting Techniques: Step 4 - What isn’t broken versus what is?

Wednesday, December 19th, 2007

At this point, you have eliminated the easy solutions. Set up instructions have been verified and the item in question is in a known state. Because the problem appears to occur in multiple, similar instances, environmental issues have been eliminated as a root cause. Nothing has changed recently that is affecting the results either. These first three steps will catch most problems, but there will be times where you need to dig deeper and your knowledge of the product in question, access to information, and your creativity all become bigger factors in finding a solution.

Before you take that step though, it is a good idea to pause and report status to those who might be interested. This may seem a bit bizarre since you do not have a solution yet and may not appear to even be close to one. But, realize that you have demonstrated what is not the problem by performing the first three steps. That in itself is progress and can signal to people in management or other positions of power that this is likely not a simple problem that will easily be solved. Communicate to them the steps that you have completed thus far, what preliminary conclusions you can draw from those steps, and describe what you are about to try next. For a lot of people, just knowing that work is being done on the problem is considered progress. Explaining to them that you have narrowed the scope of the problem by eliminating the simple solutions helps them understand the potential complexity of the situation. Figure 1 shows a good example of such a communication.



Figure 1: A sample troubleshooting status email message (click to enlarge)

Tracing the logical flow of the device is the next step. Whatever kind of processing is involved, your item likely has one to many forms of input and a similar set of outputs. When stimulated a certain way, the item is expected to react accordingly but is not. With your knowledge of the system, give it a known input and trace its logical flow, measuring intermediate results along the way. The idea, at first, is a process of elimination. There are more things that aren’t related the problem than there are things that are related to it. Isolate those things that are working from those things that are broken.

For example, suppose your device is a toaster. The first step in its operation is to plug it in. When you do that, can you verify that the unit is receiving electricity, perhaps through some indicator light? If so, then see if a piece of bread will fit correctly in the slots. Assuming that works, will the lever depress and drop the bread in the slots properly? Does that cause the coils to heat up? Does adjusting the darkness knob alter the amount of time that the bread stays in the slots and the coils stay hot as you might expect? When the timer is up, does the bread pop up out of the slots and do the coils cool down? These are the basic steps of toaster operation and walking through them, verifying the expected results along the way, gives a more granular look at the problem. You can discover what is working and what is not, allowing you to focus your efforts on a lower level of investigation.

Depending upon your specific situation, a similar approach is to start by taking your device apart completely. Slowly put it back together in functional layers, adding pieces and verifying desired results as you go. As you approach the complexity of the entire finished product, you will gain confidence in the working order of the underlying functionality and reduce the set of potential problems to the more elaborate use cases.

Returning to the toaster example, imagine the components detached from one another and strewn on a workbench. First, you connect the coils and the power supply together and plug in the cord. This set up lets you test the basic functionality of a toaster: getting the coils hot enough to partially char bread products. Now add the adjustment knob that dictates how hot the coils get and for how long. Having already established a functional baseline of the coils heating at all, this adds another level of complexity to the system. Then add the lever mechanism that lowers and pops up the bread, which is essentially the final piece that forms the finished product. Similar to the step by step approach, this strategy attempts to isolate pieces of functionality but does so in a slightly different way. The goal is the same, though: find what is working and what is not.

With the scope now narrowed as much as possible, this is the point at which each troubleshooting situation is unique and any formulaic approaches no longer help much. Again, this is where your knowledge of the system and your creativity become assets in trying to figure out what is going wrong and why. Ask yourself some of the following questions:

  • Is there a pattern to the incorrect results given different inputs?
  • Is there any relationship between the sub-steps involved in the flows that are working correctly and the ones that are not working correctly? If so, how is the processing between the two situations different that causes one to succeed and the other to fail?
  • In the search for a potential workaround, can one of the successful flows be altered to approximate the desired output of the broken flow?

Once you reach this stage of troubleshooting an incident, it is difficult to say where the investigation will take you or how long it will take. You may reach a point of diminishing return, though, where the time you have invested in finding a solution may be great enough that it makes more sense to simply use a different device or start from scratch another way. Only you can determine this for certain and determining when to do so comes with experience.

Troubleshooting Techniques: Step 3 - Something changed recently, what is it?

Wednesday, December 19th, 2007

The classic customer support call starts with the person reporting the problem claiming, “I didn’t change anything and it suddenly quit working!” Digging for details may reveal that this statement may have conveniently ignored a power failure, reinstallation of the operating system on a personal computer, a daylight savings changeover, or spilling large quantities of carbonated beverages on the equipment.

Once you have determined whether or not the problem is environmental, the next key is to determine what is different from the last time the device functioned correctly. Approach this like you would if you were helping someone find their car keys, wallet, or eye glasses. Have them go back to the last time everything was working and conceptually backtrack through the events that have happened since. This exercise will reveal the seemingly insignificant changes in ether the device or the environment that could have a big impact on the outcome.

Troubleshooting Techniques: Step 2 - Is this happening here or is it happening everywhere?

Wednesday, December 19th, 2007

Any law enforcement officer can tell you that there are two kinds of traffic violations: speeding and everything else. That is, more people get cited for speeding than any of the other types of violations combined. Something similar can be said for troubleshooting. There are two kinds of problems that are generally resolved by troubleshooting: environmental problems and everything else.

The problem that has been brought to your attention may have nothing to do with the device itself, but instead have everything to do with the environment in which it is operating. For example, voltages supplied from electrical outlets vary in different locations throughout the world. In the United States, most sockets have three prongs (two flat, one round) whereas in Germany there are only two (both round). Often, adapters that transform voltages are required to make a device work properly in a particular country. The device itself is the same, but the environment in which it runs is quite different. This is a simple example, but things like this are often the root cause.

After verifying the set up instructions, it is wise to try to duplicate the problem being experienced in an environment other than the one where it is being reported. If the problem is repeatable in a different environment, you know the root cause is related to the device itself. Otherwise, you have just learned that environmental factors are involved. This narrows the scope of possibilities significantly.

Troubleshooting Techniques: Step 1 - Were all the instructions followed correctly?

Wednesday, December 19th, 2007

Things that are obvious to you are a complete mystery to others who possess a different perspective of the world. Logic would dictate that before escalating a problem such that someone like you is involved in its solution, somebody would have double-checked that the set up instructions were followed correctly. To assume this, unfortunately, is giving others a bit too much credit.

Whatever it is that is broken, it is highly likely that instructions for using it or for installing it were provided to whoever is now having a problem with it. Too often, people either do not read, do not follow, or do not understand directions given to them when setting up a product for use. In situations as simple as inserting batteries backwards in some handheld electronic device, everyone has had this experience. By walking through those instructions yourself and by verifying the correct completion of each of the steps, you establish a baseline for the present state of the device in question that you can rely on for more complex steps later on.

The most important aspect of this step is to use the exact instructions you or your team provided the person with the problem and don’t rely on your expertice in verifying the set up. This will help you discover any errors in the instructions themselves. Perhaps they are ambiguous or make assumptions about its readers that it should not have. Regardless, a large percentage of solutions to problems are because a product was not initialized correctly. As simple as it sounds, verifying the instructions given to the person experiencing the problem is always a good place to start.

Troubleshooting Techniques: Knowledge, Access, and Creativity

Wednesday, December 19th, 2007

Before specific steps can be discussed, there are a few aspects of a successful troubleshooting session that need to be in place. As demonstrated in the anecdote, the most fundamental aspect to troubleshooting is knowledge. You, or collectively the team of people working on the issue, need to completely understand the components, their interactions, and the sequencing of the equipment that is failing. Otherwise, there is potential for the solution to go undiscovered because its root cause lies in the gap of knowledge that the people involved possess. If you do not have all the knowledge you need, make sure those who do are ready to help you assess the situation.

Just as important is access to the environment where error is occurring. In order to verify the various execution steps of whatever is broken, you will need to be able to make measurements at multiple points in the processing. If you have to rely on an intermediary to interpret those results and relay them to you, information can potentially get lost in the translation leading you to draw incorrect conclusions.

An example of this is the children’s party game “Telephone”. Line up a number of people side by side. Have the first person in the line whisper a phrase to the second person, who then tries to repeat it to the third person, and so on. By the time the last person tries to repeat the phrase for the group, it has changed dramatically. Each person in the line filtered what they heard and interpreted it for the next person. This illustrates that information gets lost, or changed entirely, when even well-intentioned intermediaries are involved. While direct access is not always possible, it is the best way to guarantee reliable data throughout the troubleshooting process.

Finally, your device is supposed to work a certain way and for some reason it does not. In some cases the reasons for this will be obvious, but creativity comes into play when factors not previously considered are influencing the outcome. Based on your knowledge of the situation and of your more general domain expertise, you may need to apply creative thinking in order to “see” aspects affecting your result that you may not have thought of in earlier phases of your project. When stumped, try to think “outside the box” and reconsider things you may have assumed away previously.

Knowledge. Access. Creativity. Taking these three elements into consideration, the remainder of this chapter takes you through steps that will help you uncover solutions to most troubleshooting problems and what you should do with the results.

Troubleshooting Techniques: Wrath of Khan Anecdote

Wednesday, December 19th, 2007

Star Trek II: The Wrath of Khan is generally considered to be the best film in the series. The first movie was little more than a knee jerk reaction by Paramount Pictures to the success of Star Wars. It featured kooky light blue jumpsuit uniforms and thinly combined two recycled plots from the original series. Before it really got started, the first film almost grounded the entire franchise, but The Wrath of Khan changed that. It fixed many of the visually distracting aspects of its predecessor and told an original story that was an extension of a popular episode from the television run, resurrecting a classic villain in the process. You may already have known all this and even if you did not, you are probably asking yourself, “What does this have to do with troubleshooting?” A lot, it turns out.

As the film starts, Khan - a genetically engineered evil genius portrayed by Ricardo Mantalban - escapes his exile on a dead planet blaming our hero, James T. Kirk (William Shatner, of course), for the passing of his wife while they were marooned. Seeking revenge, Khan commandeers the starship Reliant, a member of the Federation fleet that Kirk serves. He then sets a trap by sending a distress signal, knowing Kirk and the Enterprise will respond. Thinking the Reliant to be a friendly ship, Kirk is completely surprised when he is attacked and only when the Enterprise is seemingly disabled does Khan reveal himself. He then demands that Kirk hand over the plans for the Genesis Device, a terraforming apparatus that can be utilized as a weapon of mass destruction. With the lives of his crew in his hands and facing a brilliant madman, what does Kirk do? He applies classic troubleshooting techniques.

At this critical point, Kirk realizes the three elements that are critical to any troubleshooting situation. The first is knowledge. He possesses an understanding of how starships work that Khan does not, which he can use to his advantage. Next is access. Based on his credentials as an Admiral, Kirk has security clearance that others do not that gives him a wider array of choices with which to remedy his predicament. The final element is creativity. Taking into account the first two elements, Kirk comes up with a solution that is out of the norm.

If you are a Star Trek fan, you are aware of what happened next. Kirk knows that each ship in the star fleet contains a security access prefix code that controls who can issue instructions to each ships computer system. He uses his heightened status within the star fleet military to obtain this code for Reliant. Finally, after obtaining the code, he uses it to send Reliant instructions to lower its shields, enabling Kirk and the Enterprise to launch a successful surprise counterattack. This forces Reliant to retreat and allows Kirk to escape for the second half of the movie. In the Director’s Edition of The Wrath of Khan, Kirk later says, “We’re alive only because I knew something about these ships that he didn’t.” Knowledge, access, and creativity are all fundamental pieces of troubleshooting demonstrated nicely in this example.

Troubleshooting Techniques: Introduction

Wednesday, December 19th, 2007

When something breaks somebody needs to fix it and when they do gratitude is typically sent in that person’s direction. The very nature of engineering is to create new things. During this process, the creations are inevitably passed around to different people during design, prototyping, testing, and manufacturing phases. As products progress through these steps (or whatever sequence is used in your specific situation), it is rare that everything goes smoothly.

As such, something will break along the way, somebody will receive accolades for fixing them, and those kudos will appear on that persons performance evaluation. It might as well be you. Few things enhance a work reputation like being the person who fixes broken stuff. This series examines the essential attributes of successful troubleshooting as well as provides a common set of steps to increase the odds of finding a solution as quickly as possible.

PowerPoint Tactics: Part 8 - Final Thoughts

Wednesday, November 29th, 2006

This series of posts described basic tactics for PowerPoint slides, among the most common forms of business communication. The organization and tone of a presentation plays a critical role in how your thoughts are conveyed to others. Other aspects such as knowing your time and audience, listening to input from others, and being sensitive to cultural boundaries, all contribute to your ability to get others to understand (and hopefully get behind) your ideas.

These techniques form the foundation that lead to achievements you can place on your performance evaluation. Being able to interact with customers (who buy your products), partners (who may be suppliers or retailers that help your company build or sell your products), and upper management (who pay your salary) are accomplishments that distinguish you from your peers. You will never get the chance, though, if you cannot clearly and concisely communicate your visions to anyone.

If you stammer through an oral presentation that goes off on wild tangents and nobody understands what your point was, or if you are the only one who grasps the concepts in your design documentation, you will never have the chance to get yourself in front of anyone significant beyond your own project group. Conversely, if you can present well organized information in both oral and written forms while accepting input from others and clearly articulating key points to others, you will stand out. Your manager will think of you as someone who is comfortable explaining things to others and this will result in the highly sought after interaction opportunities with customers, partners, and upper management that lead go a great performance evaluation.