Top to bottom troubleshooting: Part 1

In this post I will be discussing the techniques that I have found to be the most tried and true methods for fixing broken Windows machines.  Granted there are a million and one ways that things can go wrong (as we all know) but using this approach I have found that it can cure 9/10 computers that make their way to me.  For the other 1/10, I haven’t really discovered a bulletproof technique for fixing as each of those issues is usually something entirely unique and may lead to future posts.

The first and absolute most important step in the troubleshooting process is ensuring that the hardware of a system is functioning properly.  The reason is simple;  often times a machine can display wacky symptoms due to bad hardware and it becomes a situation where you chasing to try to fix the symptom rather than the core problem.  I have seen it (and learned it painfully myself) too many times before, a piece of hardware is faulty and causes the computer to fail at different points, making it impossible to isolate problems effectively, causing many headaches in the troubleshooting process.

Physical Inspection

The easiest and often times most overlooked technique to fixing an issue quickly is to simply crack open the case and check for symptoms.  You would be surprised how filthy a neglectd case can become over time so vacuuming and spraying out the case with compressed air becomes just as important as any other step in the process, laptops included.

After cleaning out the case inspect for physical damage to internal components.  So many times I have seen leaky capacitors cause sporadic issues on otherwise perfectly running machines.  At this point you should also check to make sure the fans are running (especially for graphics cards), the CPU heat sink is clean, correct wiring to components, etc.

Power Supply

This step can be tricky, and is something that you just seem to get a feeling for over time (it certainly doesn’t hurt try at any stage but can sometimes be more work than is necessary).  I have been seeing fewer and fewer instances of bad power supplies recently so I don’t know if the manufacturing quality has gone up or if I have just had good luck.

There are power supply testers that can be purchased but I will usually just grab a known good power supply off the shelf and hook it up to test if the original power supply is bad or in the process failing.  Simple enough.  This is an issue that is pretty straight forward, either it works or it doesn’t work.  Green good.  If the power supply doesn’t work then you won’t be able to test anything else.

Hard Drive (doesn’t apply to SSD drives)

This is the stage where I see the majority of problems.  It is vital to ensure the hard drive is healthy and working properly.  The most battle tested and reliable tool in my bag for hardware troubleshooting is a tool called “Drive Fitness Test” from Hitachi.

Essentially this tool scans the hard drive for bad sectors as well as testing the features of S.M.A.R.T.  It is simple in function but so many times is overlooked in the process.  Another tool that for testing drives is “MHDD”.  This tool is VERY comprehensive in its analysis of the drive but  unfortunately  lacks good documentation (it was made by some mad scientist  Russian  dude) so there is somewhat of a learning curve to it.

Memory

Another important step to consider is testing out the memory modules of a misbehaving machines.  Although this is the least often cause of failures it is an important step the process because it may come back to bite you later, the same way a leaky cap or some other simple, overlooked step can be.  The go to tool for testing RAM is called “Memtest 86”, this is found on most Linux distributions these days so if you have an old disc laying around you are ready to rock.

Conclusion

These are some of the fundamentals that I have painstakingly learned the hard way over the past 5 years.  There are many, many other tools for testing out faulty pieces of hardware.  Even with so many options for testing tools, I keep coming back to these basics time and time again.  They have really become a foundation in my troubleshooting techniques and just seem to get the job done.

So to recap, here is the general order that things should be checked when testing for broken hardware:

  • Physical Inspection
  • Power Supply
  • Hard Drive
  • Memory

Next Up: Infection Removal

Josh Reichardt

Josh is the creator of this blog, a system administrator and a contributor to other technology communities such as /r/sysadmin and Ops School. You can also find him on Twitter and Facebook.