So, for the last several months we have been troubleshooting this issue. There was no pattern in regards to when servers would freeze. At any given time, any of the four servers we have in production would freeze. There was also no consistent user base on the server that would freeze (the only consistency being that they weren't too happy when it would happen). After bringing in several consultants that helped set up this environment initially, we took our case to Citrix. Several log and memory dump files later, they came to the conclusion that Internet Explorer was causing our servers to lock up. Naturally, I then presented this information all to Microsoft support. Upon further analysis, they discovered that we were experiencing a bug that has been resolved by a hotfix:
http://support.microsoft.com/kb/976674
Basically, the hotfix resolves an issue that occurs when Server 2008 or Windows Vista is under a heavy load and there are a lot of network share accesses going on. Well, in our case, the user profile is a network share, plus their Outlook PST files were out on a network share, plus their other file shares were network shares, and the list goes on. After applying this hotfix (which was a little over two weeks ago) we have not experienced any freezes. Good news for everyone.
If anyone is interested in the detailed symptoms:
- Users sessions (terminal services/Citrix) would become completely unresponsive
- The server would become unresponsive even at the console level
- The server would respond to pings
- Apparently anything in memory at the time of the freeze would continue to function -- as soon as you tried to access something else, the session would freeze
- The only workaround when this occurred was to hard reboot the server