Dealing on a daily basis with customers and the problems they encounter using Microsoft technologies and products, and having to help them troubleshoot and resolve those problems, specially at the beginning of a new support call (after I pick the case up from the incoming queue and put it in my wipbin, as it’s called my personal queue in our CRM software) I call the customer to discuss about the problem and to get a detailed overview of the faulting application, the environment where it’s running (or it is supposed to run), the error they get, under which circumstances they get it (under load picks, any specific operations going on at that time on the server, any reproducible behavior by one of the clients…), side effects (high memory, high CPU, performance slowness…), how they (at least temporarily) get rid of it (recycle the application pool, iisreset, recompile…). in those situations, quite often I end up asking almost the same questions to every customer to start building a picture of the situation, and when I think I have this quite clear in my mind I can elaborate an action plan to start digging into the situation questing for the bad guy.
No need to say that those first steps are very important to start with the right foot, because a misunderstanding at this stage could direct our work in the wrong direction and we could easily end up searching the problem in the wrong direction and be diverted by some useless facts which have nothing to do with the problem we need to solve; this as you can imagine brings frustration (in both the customer and me) and makes us waste precious time while the users are still affected by the problem… not a nice situation!
Sometimes I found that even after what I considered a satisfactory first call with a customer and I thought I had everything I needed to start working on the problem, I then find that something very important is still missing… if a customer tells me “My application is crashing randomly” I tend to trust him, and I can then suggest an adequate action plan for a process crash (i.e. capture a crash dump with adplus). Well… the problem is that sometimes the worker process is not really crashing, and what customer refers to as a crash is some other kind of exception which affects the application and of course give the users a bad working experience, but that need a different troubleshooting approach… I learnt this lesson myself the hard way, where the customer and I spent 2-3 days desperately trying to understand why we were not able to capture a dump, to then discover that we were using the same term but with different meanings… (for the records: now I’m always very clear with the customers regarding what really is a crash and what is not!).
So, to clarify: we have a real crash when looking at your ASP.NET worker process (aspnet_wp.exe under Windows 200/XP, w3wp.exe under Windows 2003) when the problem occurs you see that it gets a new PID (ProcessID); this means that the process has somehow been recycled (either by IIS because of a configuration setting, or because of a critical error which terminated the process). Looking at your System event log you’ll also see a message similar to “a process serving application pool xxxx terminated unexpectedly”.
Other types of problems/exceptions require a different approach (sometimes we could use the -crash switch to get information about first chance exceptions, but that’s a different story). If the application is somehow blocked and stops responding to new incoming http requests you get a hang/deadlock (most likely because all of your threads a busy doing other things, rather serving new requests…), and in this case even if you get an exception or an ugly error message (“Server too busy” is a quite bad one) you don’t have a crash and we’ll probably need to capture a hang dump to start looking at the inside of the process…
Hope this makes sense, and if you’ll ever need to raise a support call be prepared to give as much details as possible to let us understand your problem, so that we can suggest the most appropriate action plan to address the issue.
Cheers
Carlo