Hyper-V backups and spurious entries in the plug and play database
For several months, i’ve had a problem on a Hyper-V host described WS08 and the black screen of waiting. Basically, the machine boots up, hangs 50 minutes being completely unresponsive, and then goes on working perfectly for weeks.
The problem was resolved (temporarily) by deleting shadow copies, but it still exists. As i’ve had time this weekend to investigate this closely, i’m pretty sure that i found the root cause of the problem, but i have no solution yet. Remember, this is all just a theory i cooked up – i’m putting this information out there in case anyone else has a similar problem.
My theory is that this is related to Plug & Play manager running enumeration of devices left by the Hyper-V VSS writer backup.
On the affected machine, the C:\windows\system32\config\SYSTEM file is around 170 MB. Using dureg, i could boil this down to two registry keys:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Disk&Ven_Msft&Prod_Virtual_Disk
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\DeviceClasses\{53f56307-b6bf-11d0-94f2-00a0c91efb8b}
Which are about 6 megabytes each, when looking at them using dureg:
C:\Users\z-l.beeler\Desktop>dureg.exe /lm “SYSTEM\CurrentControlSet\Enum\SCSI\Disk&Ven_Msft&Prod_Virtual_Disk”
Size of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Disk&Ven_Msft&Prod_Virtual_Disk: 6575468
Since this machine has been operational since about a year, with daily backups (BE12.5), it is much more pronounced here than on other machines. The Virtual Disk being part of the backup procedure is visible in the System log – it produces errors during the backup and Microsoft even has a KB article on the issue KB958669.
The eventlog on the affected machine looks like this:
18:02 The quota minifilter driver completed rescanning directories under quota management on volume “\Device\HarddiskVolume3 (G:)”. All quota information is up-to-date.
18:48 The Plug and Play service entered the running state.
Which for me further indicates that there is some kind of issue with the Plug and Play service. Unfortunately, the machine is not reachable remotely during the issue, but my guess would be that the Plug and Play service is hung in a “Starting” state, causing the lockup issue because of kernel interactions.
Unfortunately, i don’t have enough information and i’m not sure if deleting random registry keys is a good approach on this. I’ve posted on MCSEboard.de and the TechNet Forums – in the hope of getting valuable feedback from other long-term Hyper-V users.
Update: I don’t have a solution yet, but i’ve received a few insights. Thanks to zahni from MCSEBoard.de i got a link to KB959476, which doesn’t match my specific issue, but definitively goes into the right direction.
I’ve also found the Device Remover software, which gives me a clear graphic representation of the issue – over 9500 devices on the affected server. It even offers a removal function, but i don’t want to risk using this tool on a production server.
I’ve also opened a case with Microsoft PSS, in hope of getting an official solution to this issue soon.
Update 2:Removing the devices cut down the number of devices to about 300. I did this after Microsoft PSS recommended me to remove them. As i assumed, this resolved the issue during boot-up hang. Unfortunately, even after installing WS08 SP2, the machines still creates new virtual hard drives when running backup. I will try to get this resolved completely.
