Archive for May 2009

Hyper-V backups and spurious entries in the plug and play database

For several months, i’ve had a problem on a Hyper-V host described WS08 and the black screen of waiting. Basically, the machine boots up, hangs 50 minutes being completely unresponsive, and then goes on working perfectly for weeks.

The problem was resolved (temporarily) by deleting shadow copies, but it still exists. As i’ve had time this weekend to investigate this closely, i’m pretty sure that i found the root cause of the problem, but i have no solution yet. Remember, this is all just a theory i cooked up – i’m putting this information out there in case anyone else has a similar problem.

My theory is that this is related to Plug & Play manager running enumeration of devices left by the Hyper-V VSS writer backup.

On the affected machine, the C:\windows\system32\config\SYSTEM file is around 170 MB. Using dureg, i could boil this down to two registry keys:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Disk&Ven_Msft&Prod_Virtual_Disk
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\DeviceClasses\{53f56307-b6bf-11d0-94f2-00a0c91efb8b}

Which are about 6 megabytes each, when looking at them using dureg:

C:\Users\z-l.beeler\Desktop>dureg.exe /lm “SYSTEM\CurrentControlSet\Enum\SCSI\Disk&Ven_Msft&Prod_Virtual_Disk”
Size of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Disk&Ven_Msft&Prod_Virtual_Disk: 6575468

Since this machine has been operational since about a year, with daily backups (BE12.5), it is much more pronounced here than on other machines. The Virtual Disk being part of the backup procedure is visible in the System log – it produces errors during the backup and Microsoft even has a KB article on the issue KB958669.

The eventlog on the affected machine looks like this:
18:02 The quota minifilter driver completed rescanning directories under quota management on volume “\Device\HarddiskVolume3 (G:)”. All quota information is up-to-date.
18:48 The Plug and Play service entered the running state.

Which for me further indicates that there is some kind of issue with the Plug and Play service. Unfortunately, the machine is not reachable remotely during the issue, but my guess would be that the Plug and Play service is hung in a “Starting” state, causing the lockup issue because of kernel interactions.

Unfortunately, i don’t have enough information and i’m not sure if deleting random registry keys is a good approach on this. I’ve posted on MCSEboard.de and the TechNet Forums – in the hope of getting valuable feedback from other long-term Hyper-V users.

Update: I don’t have a solution yet, but i’ve received a few insights. Thanks to zahni from MCSEBoard.de i got a link to KB959476, which doesn’t match my specific issue, but definitively goes into the right direction.

I’ve also found the Device Remover software, which gives me a clear graphic representation of the issue – over 9500 devices on the affected server. It even offers a removal function, but i don’t want to risk using this tool on a production server.

I’ve also opened a case with Microsoft PSS, in hope of getting an official solution to this issue soon.

Update 2:Removing the devices cut down the number of devices to about 300. I did this after Microsoft PSS recommended me to remove them. As i assumed, this resolved the issue during boot-up hang. Unfortunately, even after installing WS08 SP2, the machines still creates new virtual hard drives when running backup. I will try to get this resolved completely.

Windows Server 2008 SP2 and the crashing Network Policy Server

Since SP2 was released on April 30th, i’ve installed it on a few uncritical machines.

One of these runs our TS Gateway Server and our NPS Server for Wireless LAN authentication.

Unfortunately, since the SP2 installation, the NPS service started crashing, taking several other services with him.

Error message is as follows:

Faulting application svchost.exe_IAS, version 6.0.6001.18000, time stamp 0×47919291,
faulting module msvcrt.dll, version 7.0.6002.18005, time stamp 0x49e04189, exception code 0xc0000005,
fault offset 0×0000000000001467, process id 0×1444, application start time 0x01c9d570f76f56bc.

I’ve found one other reference to this issue on the on the TechNet Forums.

I’ve uninstalled SP2 and delayed SP2 deployment until this has been resolved.

Don’t buy ZyXEL equipment

I’ve had my share of experiences with ZyXEL equipment, like the ZyWALL vs. Exchange post i did a few years ago.

But today i experienced the most grave issue with their equipment that critically impacted a customers business.

The customer has two sites – an HQ with an SBS 2008 and a branch office with two Lenovo SFF machines running Windows Vista Business. Both sites are using 20/2 VDSL lines from Swisscom, with ZyXEL P-2802HWL routers.

There is an IPsec VPN configured between these two sites. This has been working fine since January.

Now, about a month ago a telecom service company installed VoIP telephones in the branch office, and enabled QoS on both ZyXEL routers.

Since then, Outlook was unable to synchronize correctly with the SBS server. Unfortunately, the customers personnel isn’t that technically savy, so they weren’t able to tell that they had a problem – because smaller e-mails were able to successfully synchronize, but larger ones failed. This led to very inconsistent states of the OST files, with some mails there and some mails not there.

When i arrived at the branch office i didn’t have a single clue what the issue was or may be. At first i suspected an Outlook problem, so i deleted the OST file. But from there on, nothing happened – Outlook wasn’t able to download anything.

Next, i tried to copy a 50kbyte Excel file from a share to the local computer. This worked. So i tried a 2 megabyte Word file. This failed about halfway through, with Explorer just hanging there and doing nothing. From that point on, i suspected a network issue, but the fact that copying a 50kbyte file worked and a 2 megabyte file didn’t was very odd.

Using Outlook with Outlook Anywhere also worked (when the VPN tunnel was downed).

Whenever i’m confronted with strange network problems, i suspect MTU issues (which was my first “real” network problem i solved back on my first ADSL line – took me weeks for a simple fix). ping -l 5000 CUSTSBS01 worked. ping -l 15000 CUSTSBS01 worked, too. So thought it wasn’t an MTU issue.

Disabling QoS on the ZyXEL router fixed the issue, but made the phones unusuable while Outlook was filling it’s OST files.

So i ran through the usual check points – tcp checksum offloading, chimney, receive window autotuning, reboots, etc. Nothing helped. At the end i was just changing network settings at will. But nothing helped.

Out of any reasonable ideas, i changed the MTU to 1300. That fixed it – with QoS enabled and the NIC MTU of the two machines, everything was working as it should. File transfers worked, Outlook worked, Phones worked.

Don’t buy ZyXEL.

Two weeks on Windows 7 RC

Since the 30th of April, Windows 7 RC is available. I’ve been using Windows 7 for quite some time, but that usually doesn’t tell us much about end user experience with Windows 7.

At work, we’ve decided to move several people with a strong technical background over to Windows 7 x64 (if they want, of course). In order to drive internal testing, usage data and generally bring awareness to the whole personnel at the company and also our customers.

By now, i’ve migrated 8 laptops to Windows 7 RC – with which people are working in production and using for their everyday work. Of course in case we run in real troubles with Windows 7, we still have a few spare laptops that run Windows Vista SP2 x32.

The migration has been without any major issues moving from Windows Vista to Windows 7 than when moving from XP to Windows 7, most of this can probably be attributed to the fact that all the applications we use internally are compatible with Windows Vista and we also got a lot of experience with the new deployment model and tools available since Windows Vista.

Still, we ran into a few smaller problems that are mostly un-resolved as of yet, but do not majorly impact anything.

We use Lenovo T60, R61, T61, T500, W500 and R500 laptops. All of these have been running Windows Vista SP1 x32 with BitLocker enabled in TPM+PIN Mode. We installed Windows 7 using Clean (Custom), without formatting the hard drive first – this required us to suspend Bitlocker protection in Windows Vista before running setup. Two devices were reformatted – at the wish of the person using them.

I also upgraded all laptops to 4GB of RAM – which now can actually be used. For example, my W500 with Vista x32 only saw 2.25GB of the 4GB RAM (not a typo – only 2GB).

My biggest issue was that Bitlocker on Windows 7 didn’t properly backup it’s Bitlocker Key and TPM to Active Directory. This is a major issue, as i now had to manually backup the Bitlocker Keys to a secure network share. I didn’t find much about this on the Web, i suspect that not many people used this functionality, and there’s almost no documentation available about Windows 7 Bitlocker. As the workaround of saving the key works just as well, i can live with this.

The fingerprint reader installed on all those Thinkpads has a driver available, but the different drivers have different issues (most of them just crash when using them). I didn’t try installing the Lenovo tools. We don’t use the fingerprint readers, so that’s a non issue for me, but if you do this might require some investigation.

Switchable graphics on the W500 and T500 doesn’t work. Also, the Intel GMA adapter seems to be a lot slower than it was under Windows Vista – so i switched these devices to the internal ATI graphics card. No issues with that, except higher power usage.

WSUS does not contain Windows 7 updates – which makes perfect sense. I created a new WMI filter and a GPO to ensure that Windows 7 got updates directly from Microsoft.

After installing Windows 7 on the devices, all hardware including UMTS modems worked perfectly. Intel AMT doesn’t have Windows 7 drivers yet, but we don’t use that either.

I migrated user data using USMT Hardlink Migration, for which i created a nice batch file using the idea from this feature walkthrough.

I’ll keep you up to date – there’s one more machine considered for migration next week, and after a weeks i’ll have proper feedback from the power users at my office. I’ll even try to persuade our head sales and CEO to try Windows 7, just for the heck of it.

Exam 70-680: TS Windows 7, Configuring

This morning i attended the Beta for Exam 70-680 – i was one of the lucky few that got a seat in this beta.

I already did 70-270 (Windows XP) and 70-620 (Windows Vista) two years ago, and the Vista exam was far too easy for my taste. It took me about 20 minutes, and i walked out with a score about 900. That’s not good – too easy questions will just devalue the certification.

With this in mind, i expected 70-680 to get Microsoft back on track, and they did. The exam has much better and much more difficult questions than 70-620. Not questions which require you to memorize stuff, but questions which require you to understand the subject matter.

As usual for beta exams, there were no simulations, VM tasks or anything else except multiple choice questions. I can understand why that’s the case (they probably want to use the final version for that), but i’m still not entirely with this as it is.

One thing that was new in this exam is that you get a questionary that asks you to judge your knowledge levels on Windows 7 for yourself. Several fields are presented, in which you have to choose between very high, high, mediocre, low and very low skills – another questions asks how much experience you already had with Windows 7 (with options such as “Over a year”).

I think that’s a good idea – most exam betas are open now, which means that many less-skilled people will also attend them. As long as those are truthful, this can actually help to improve the exam.

Unfortunately, i had very much difficulty finding what’s my personal baseline. I opted to choose either High or Mediocre for most answers, but was that correct? What does high mean? What does mediocre mean? What’s my knowledge level?

It might make sense to ask questions which are more task oriented – if you already did a task X and if you think if you’re proficient at doing task X.

The exam content was pretty much what was in the official docs – there’s a lot more focus on using group policies (local ones in this case), and also a few more detailed networking questions regarding Subnetting, in both IPv4 and IPv6.

General list of things i’ve seen:

  • New features: BranchCache, DirectAccess and VPN (not overly technical – if you got it to work once, you can answer these)
  • Bitlocker – not overly many questions
  • Setup – the USB stick install gets featured more
  • USMT gets a lot more focus and also Windows EasyTransfer
  • Imaging, Deployment, VHDs

I’ll see if i passed the exam in officially 8 weeks, so probably in about 4 real moths ;)