Archive for April 2007

Buying tape drives for small businesses

Backups are very important, and the media and technology used for them are even more important. While Disk to Disk is the best form of Backups for home users, tapes still make a lot of sense in companies, because they make it a lot easier to get something off-site for disaster recovery.

However, tape technologies available for x86 servers are numerous. On the other hand, choosing tape drivers is as easy as it gets. The more expensive they are, the better they are.

I only recommend one type of tapes to customers - LTO. LTO tapes and drives are among the fastest and most reliable on the market. They are more expensive than cheaper alternatives like the VXA drives, but they are trouble free, which can’t be said about VXA drives.

LTO2 (200GB) Half-Height external drives can be had for about 2′500CHF from Tandberg. Buying them directly from HP/IBM, they are a bit more expensive, about 3000-3500CHF. Do not buy internal tape drives when they’re not from your server manufacturer, as this could cause trouble down the road.

LTO3 drives are a bit more expensive, but pack 400GB instead of 200GB. If that’s still not enough, you should consider purchasing a small tape libary - LTO 2 libraries with 8 tapes can be had from about 8000CHF, which is quite a bargain.

Remember, you can’t extend the capacity of your tape drive, except if you have a library. So if you buy a LTO2 drive, but need more than 200GB of storage, you should buy LTO3. If you think you need more than 400GB of storage, buy a LTO3 tape library.

I’ve had experiences with DLT (which are usually to small), VXA (unreliable), and 4mm tapes (unreliable). What i’ver never worked with are Sony’s AIT tape drives - i would be interested to hear some experiences with those drives.

Recovering deleted items in Outlook 2003/2007 has some pitfalls

Recover deleted items in Microsoft OutlookRecovering deleted items in Outlook 2003/2007 works great.

However, some users reported that not all deleted items were visible in the “Recover deleted items” dialog. The reason for this was that they didn’t just delete the file, but instead used shift-delete to delete the file immediately, skipping the recycle bin.

In this case, the deleted item is not visible through selecting the recycle bin, and then “Recover deleted items”. Instead, you must select the Inbox, and then the “Recover deleted items” option. This is not as obvious as it should be.

Choosing the correct QTIMZON value in europe

If you’re using i5/OS V5R3 or V5R4, you can use time zones. If you aren’t using V5R3 or V5R4 yet, you should upgrade ASAP as support for V5R2 and earlier has been retired.

The system value QTIMZON specifies the timezone to be used, but which one of the following four is the right one:

QP0100CET
QP0100CET2
QP0100CET3
QP0100CET4

You can’t really tell from the names, but CET4 is the right one, it changes the clock at 3 in the morning, CET2 changes the clock at 2 in the morning, and CET doesn’t have daylight savings time at all.

Now, the question is where is this documented? I didn’t find it. I used CET2 at first, which seemed to work - but an IBM technician told me the difference between the different CET values.

Windows XP supports 4GB of RAM, period.

Many people say that Windows XP doesn’t fully support 4GB of RAM. That’s not true, because Windows XP supports exactly 4GB of usuable RAM, by using PAE.

If you can’t use 4GB of RAM, and have PAE enabled, you have bought hardware that doesn’t support 4GB of RAM. There’s a KB entry, which details some of the problems. If you want to use the full 4GB, buy better hardware.

Redundant equipment has to be monitored

While this might sound pretty much obvious, i’ve seen this more than once.

The problem with redundant equipment is that nobody notices when it fails. Okay, this is pretty much the target of having redundant equipment, but if nobody replaces the failed component, you’ve just lost that redundancy.

Better servers with multiple fans, power supplies, etc. usually offer integrated diagnostics with audible alert, which is usually enough for a small business (running MOM on your only server has limited usefullness). But smaller machines, lacking any redundant PSU/fans usually don’t have embedded diagnostics. These won’t make any audible alert when a disk in a RAID set fails.

On IBM servers with ServeRAID adapters, you can install the ServeRAID management program from the ServeRAID application CD (not the drivers CD, there are two of them). The ServeRAID management program is downward compatible with almost all ServeRAID controllers, as long as you have the IBM driver installed (for the 7e or similar controllers, there is also an Adaptec driver which works fine, but ServeRAID management doesn’t recognize it.

ServeRAID management can be configured to send mails automatically in case of a disk failure.

Booting PXElinux from RIS

You already have RIS/WDS set up to do a network installation of windows? But now you have some linux machines, that you want to setup using you already implemented Windows Server?

It’s quite easy to do:

Place the following text in X:\RemoteInstall\Setup\German\Tools\PXELinux\i386\templates

[OSChooser]
Description = "PXElinux"
Help = "Linux Network Boot Loader - Currently loads a Debian 4.0 Installer"
LaunchFile = "Setup\German\Tools\PXElinux\i386\templates\pxelinux.0"
Version = "1.00"
ImageType=Flat

You can get all the necessary debian files from netboot.tar.gz. You generally do not need to edit any files in there, just extract them into the templates directory.

Virtual Server 2005 R2, Windows Server 2003 and Broadcom NetExtreme II cards

Interesting issues with Microsoft’s Virtual Server.

A new IBM x3650 with two Broadcom NetExtreme II cards, running Windows Server 2003 since a few months, flawlessly.

After installing Virtual Server 2005, everything went mayhem. Some machines were still able to contact the server, some not. It looked like something was horribly broken, and at first i had no idea why something like this could happen.

After searching the web, i’ve found a few references to this and similar problems with newer NICs and Virtual Server.

The Broadcom NetExtreme II seem to have a special problem related to Virtual Server 2005, with IPMI. There is a fix from Broadcom available

IPMI disabling tool [Mirror]

Just a short network interruption, no restart necessary.

But there are other problem with modern network cards and Virtual Server 2005 (and possibly VMware’s offerings, but i don’t know that).

There’s a KB entry which talks about disabling checksum/segmentation offloading when using Virtual Server 2005.

Windows 98 and vmm.vxd

I’ve got a rather interesting case today, a private customer brought in an old machine running Windows 98, which his son still used.

The customer told me that he tried to install a driver for accessing USB sticks, and after that, the machine didn’t boot anymore.

The error message displayed was:

Configuration Manager cannot load because one of the following files is either not present or has an invalid version number:
VMM.VXD, SHELL.VXD, VTD.VXD, VXDLDR.VXD, VPICD.VXD.

Try Running SETUP again.
Press Any Key to Continue.

The machine wasn’t able to boot in safe mode either.

At first, i looked for the named files using a old boot floppy. I was able to find vmm.vxd, located in C:\windows\system\vmm32. I wasn’t able to find the other files.

That seemed strange to me, installing a driver wouldn’t delete these files. I looked around on the Web, and found
KB186771, but this one was talking about Windows 95.

I then setup a clean install of Windows 98 in a VM (using the free Microsoft Virtual PC), to have a clean reference machine. I found out that vmm.vxd is not there in a default Windows 98 install.

The next step was obvious - rename vmm.vxd to something else, and try booting. The effect of this was rather interesting, as the machine now just hung during boot, not displaying any error message. While this wasn’t what i hoped for, it was better than nothing.

I tried booting the machine in safe mode, and this worked fine. I tried uninstalling the USB driver installed by the customer using the Software Panel in the Control Panel, but that didn’t help.

As a next step, i looked for files that looked like USB drivers - i’ve found a file called uhci*. Next step was to look for files with a similar timestamp as the one i’ve found. There were about 7 of them. I moved them to another directory, and the machine finally booted.

As a final step, i’ve found Windows 98 USB Mass Storage drivers, installed them, and tested them.

Everything was working fine again. The whole procedure took me about 1.5 hours, a lot of that was attributed to the fact that i no longer have a Windows 98 VM lying around, this was one of the first Windows 98 systems i’ve seen in ages.

A lot of customers seem to not really care about the fact that Windows 98 is End of Support. Using supported operating systems is usually a guarantee for a faster and more cost efficient solution to any problem you might have with your hardware.

Looking up SRC codes

What Microsoft has for EventIDs, IBM has SRCs (System Reference Code). While Microsoft usually offers some text to go with their EventIDs, IBM does very little in the operating system itself to explain the SRC itself.

If the system doesn’t even boot, you’re stuck with a Dot Matrix Display, which shows you a number. While there are resources to look up causes for EventIDs like EventID.Net, it’s a bit more convoluted to find appropriate information for IBMs SRC codes.

In newer System i5 hardware, there are two infocenters you have to consult in order to find out what your SRC means. This is the i5/OS Infocenter V5R4 and also the Hardware Infocenter. Both these websites use language negotation with your browser, so it makes sense to configure your web browser to only show the english pages, because the translation made by IBM is abysmal.

There are two primary links for a list of SRC codes:

Hardware SRC list
i5/OS SRC list

Another problem with looking up SRCs is that they are split values. Under certain circumstances, the first 4 numbers indicate the adapter, and the last 4 numbers indicate the error code itself. This makes it very hard to find sensible information on the internet.

The Hardware SRC list is the one you probably need more often, as i5/OS rarely has such fatal problems to be unable to display the information on the screen.

The SRCs you will see most often usually start with a letter:

x is usually a number.

  • Axxx - I always remember this as the “Action” SRC. It usually means that a easily fixable error occured (e.G. unable to find the console).
  • Bxxx - “Broblem” SRC. B usually indicated a fatal failure of a hardware part.
  • Cxxx - Startup SRC. Seen while the system starts up.
  • Dxxx - Shutdown SRC. Seen while the system shuts down or restarts.

The second part of the SRC usually also has a special meaning. Usual codes during system start say “C200″, “C600″, “C900″.

x is usually a letter, and nn can be a %, but is usually 00.

  • x1nn - Service processor
  • x2nn - Service processor
  • x3nn - Service processor/Firmware
  • x6nn - LIC IPL
  • x9nn - i5/OS IPL

Usually when you’ve reached C600 during startup, the machine will start for sure, and there are no inherent hardware problems. There are some common SRC codes like C6004301 (Applying LIC PTFs) which can take a very long time, on older systems even multiple hours.

Checking windows replication

As soon as you have multiple windows servers, or even multiple sites, you will use replication to ensure having the same data available locally and redundant.

However, the tools for checking successful replication are not always obvious, or even integrated into the rest of the system.

For Active Directory, the tool is called repadmin. This tool is part of the Support Tools, located on the Windows Server 2003 CD (or CD1, for R2) in \SUPPORT\TOOLS\SUPTOOLS.MSI.

repadmin can do a lot - the official documentation can tell you much more than i.

The one option of repadmin i use most often is /showrepl. It shows you the last time a DC has synced to other DCs, and if errors occured what they were.

For DFS-R, there are multiple ways available. The dfsmgmt.msc console offers a graphic view of the replication status in a nice and executive friendly report. But you can also use the dfsrdiag commandline tool to query specific DFS information parameters.