Archive for the ‘Servers’ Category.

USB floppy drives during Windows Server 2003 R2 setup

So you’ve bought a new rack server, like the IBM System x3250. But your Boss or your customer was to cheap to buy an RSA II card. And now you need to install Windows Server 2003.

This is usually the part where the fun begins. Newer servers do not have a floppy drive, but the only way to load drivers into Windows Server 2003, besides RIS or remastering CDs are floppy drives.

Getting an USB floppy drive is no big deal, you connect it to the machine, it boots, you press F6, select the storage adapter driver, format your hard disks, and then setup asks for the floppy again and again. Bummer.

The problem is that the first part of setup (loading the Mass Storage driver) is not handled by Windows, but instead by the BIOS’s floppy emulation. But the latter part, after formatting the hard drive is handled by Windows. And some of them are not recognized by the builtin USB storage drivers.

In my case, i had an iomega USB floppy with a built in card reader (don’t ask). I used device manager to find out the vendor and product id of this USB floppy.

I opened the txtsetup.oem supplied with my mass storage driver, and modified the section that my mass storage driver had.

I added the following line, directly to after the SCSI adapter itself:
id = "USB\VID_08BD&PID_1100", "usbstor"

I had no idea if this would work at all, but it did.

For your reference, i’ve included my txtsetup.oem, which works with iomega usb floppy dirves.

SERR/PERR errors on IBM’s System x3650

After updating all the new firmware on a newly delivered IBM System x3650, i installed the operating system Windows Server 2003 R2. The machine worked fine, but crashed mysteriously after about 3 hours into operation, logging a RAID failure into the RSA.

When looking further through the RSA error logs, i’ve found this error occuring multiple times:

Unknown SERR/PERR detected on PCI bus Chassis#=NA Slot#=0 Bus#=0 Dev.ID=0x25e3 Vend.ID=0x8086 Status=0x0 DevFun#=0xff

I’ve called IBM support, and they told me that i should power cycle the machine after a firmware update. I did that and then continued to setup the machine. It’s been working flawlessly under heavy load for the past 3 months.

I’m going to remark this for the feature – after a firmware upgrade on a server, do a power cycle.

Disk IO performance is dependent on the number of disk arms

If you already know where i’m going with this after reading the subject, you can stop reading now.

As hard disks get bigger and bigger, servers in the Small Business environments are usually setup with too few disk arms to satisfy performance needs.

The problem is quite simple – a standard 36GB 2.5″ SAS Hard Drive can read data at factor x, and can do y IOPs per second.

A standard 72GB 2.5″ SAS Hard Drive can read data at factor x.1 (or similar), and can do y IOPs per second.

As you can see, disks get bigger, but they do not really get faster. If you need more IOPs per second, you need more disks.

If you have a legacy systems, with a considerable number of disk arms (more than 10), each at 4 or 8GB capacity, and migrate this setup two a new system with a RAID1 over two 147GB disks, you will get _WORSE_ performance than the old system.

And if we look at consumer hard drives, with 750GB, 1TB per disk, the performance gets even worse.

This is usually not a problem in more professional environments where systems are purchased by requirements, but in Small Businesses systems are usually purchased by the amount of money that is around for them.

Never forget about the need for disk arms.

Buying tape drives for small businesses

Backups are very important, and the media and technology used for them are even more important. While Disk to Disk is the best form of Backups for home users, tapes still make a lot of sense in companies, because they make it a lot easier to get something off-site for disaster recovery.

However, tape technologies available for x86 servers are numerous. On the other hand, choosing tape drivers is as easy as it gets. The more expensive they are, the better they are.

I only recommend one type of tapes to customers – LTO. LTO tapes and drives are among the fastest and most reliable on the market. They are more expensive than cheaper alternatives like the VXA drives, but they are trouble free, which can’t be said about VXA drives.

LTO2 (200GB) Half-Height external drives can be had for about 2’500CHF from Tandberg. Buying them directly from HP/IBM, they are a bit more expensive, about 3000-3500CHF. Do not buy internal tape drives when they’re not from your server manufacturer, as this could cause trouble down the road.

LTO3 drives are a bit more expensive, but pack 400GB instead of 200GB. If that’s still not enough, you should consider purchasing a small tape libary – LTO 2 libraries with 8 tapes can be had from about 8000CHF, which is quite a bargain.

Remember, you can’t extend the capacity of your tape drive, except if you have a library. So if you buy a LTO2 drive, but need more than 200GB of storage, you should buy LTO3. If you think you need more than 400GB of storage, buy a LTO3 tape library.

I’ve had experiences with DLT (which are usually to small), VXA (unreliable), and 4mm tapes (unreliable). What i’ver never worked with are Sony’s AIT tape drives – i would be interested to hear some experiences with those drives.

Redundant equipment has to be monitored

While this might sound pretty much obvious, i’ve seen this more than once.

The problem with redundant equipment is that nobody notices when it fails. Okay, this is pretty much the target of having redundant equipment, but if nobody replaces the failed component, you’ve just lost that redundancy.

Better servers with multiple fans, power supplies, etc. usually offer integrated diagnostics with audible alert, which is usually enough for a small business (running MOM on your only server has limited usefullness). But smaller machines, lacking any redundant PSU/fans usually don’t have embedded diagnostics. These won’t make any audible alert when a disk in a RAID set fails.

On IBM servers with ServeRAID adapters, you can install the ServeRAID management program from the ServeRAID application CD (not the drivers CD, there are two of them). The ServeRAID management program is downward compatible with almost all ServeRAID controllers, as long as you have the IBM driver installed (for the 7e or similar controllers, there is also an Adaptec driver which works fine, but ServeRAID management doesn’t recognize it.

ServeRAID management can be configured to send mails automatically in case of a disk failure.

Virtual Server 2005 R2, Windows Server 2003 and Broadcom NetExtreme II cards

Interesting issues with Microsoft’s Virtual Server.

A new IBM x3650 with two Broadcom NetExtreme II cards, running Windows Server 2003 since a few months, flawlessly.

After installing Virtual Server 2005, everything went mayhem. Some machines were still able to contact the server, some not. It looked like something was horribly broken, and at first i had no idea why something like this could happen.

After searching the web, i’ve found a few references to this and similar problems with newer NICs and Virtual Server.

The Broadcom NetExtreme II seem to have a special problem related to Virtual Server 2005, with IPMI. There is a fix from Broadcom available

IPMI disabling tool [Mirror]

Just a short network interruption, no restart necessary.

But there are other problem with modern network cards and Virtual Server 2005 (and possibly VMware’s offerings, but i don’t know that).

There’s a KB entry which talks about disabling checksum/segmentation offloading when using Virtual Server 2005.

Creating simple graphs using rrdtool

rrdtool graph for temperatures
As discussed in the previous post, you can gather temperature data from RSA II or iLO cards using SNMP quite easily.

While the data itself can be good enough to make a decision, executives in a company always like nice diagrams. So my first try was to load the CSV-like datafile generated using said script into Excel, and make a diagram out of it. But Excel is restricted to 255 parameters per axis, which was severely limiting.

I’ve been using Cacti for quite some time, but wasn’t willing to implement it because we’re mostly a Windows shop, and my plan was to integrate the linux boxes into Operations Manager 2007. Cacti uses Tobi Oetiker’s rrdtool to create the graphs.

Creating graphs using rrdtool is quite easy, actually. I wrote a simple script that handled this:

makerrd

Creates the appropriate rrd file. Replace the unix timestamp as appropriate. The last value on the RRA lines is the number of values saved into the data file.

#!/bin/sh
rrdtool create test.rrd           \\
           --start 1176465000     \\
           -s 300                 \\
           DS:temp:GAUGE:600:U:U  \\
           RRA:AVERAGE:0:1:5000

inputrrd

Loads the data from the simple CSV-like file into the RRD file. The more elegant approach would be to load the data directly from SNMP into the rrd database, but i’m no programmer.

#!/bin/zsh
while IFS=';' read timestamp temp ; do
        temp=`echo $temp | sed 's/\\..*//;'`
        rrdtool update test.rrd ${timestamp}:${temp}
        if [ $? != 0 ] ; then
                rrdtool failed
        fi
done < machine

makegraph

Creates a graph from the data in the rrd file. The HRULE lines create lines for error margins. In this case 35C and 30C.

#!/bin/sh
rrdtool graph temp.png                       \\
        --start 1176465385 --end `date +%s`  \\
        DEF:mytemp=test.rrd:temp:AVERAGE     \\
        LINE2:mytemp#0000FF                  \\
        HRULE:35#FF0000                      \\
        HRULE:30#FFA500

See the created graph to the right. Of course, rrdtool has much more options and can create much nicer graphs.

Do i need AC?

Another SMB topic, as most enterprises are obviously capable of doing this by the book.

Summer seems to be starting, with the days here getting warmer and warmer. A particular problem that seems to crop up every summer is servers shutting down or failing due to excessive temperatures. The tolerances of these machines to temperatures is actually quite low, even with redundant fans installed.

Most Small Businesses actually don’t follow any kind of strategy when choosing a place for servers, and usually try to ignore the AC problem – this works quite well when the new systems are installed during cold times.

While it might be possible to operate a server room without AC, this only works in rather rare circumstances:

  • No windows, or very small windows
  • Room is only during direct sunlight for a short time of the day
  • A very small number of machines installed in the room (one or two)

So, in general you will need an AC. But what are acceptable temperatures in a server room? The ideal would be 22C during the entire year. But it’s possible to run a server in a bit hotter environment. These specs usually depend on the server itself. Consider that there is other temperature sensitive equipment in the room – tape drives, UPSs, etc.

Start reviewing the spec sheets of your server to see what is acceptable. Here is an example for an IBM System x3650:

  • Air temperature:
    • Server on:
      10° to 35°C (50.0° to 95.0°F); altitude: 0 to 914.4 m (3000 ft). Decrease system temperature by 0.75°C for every 1000-foot increase in altitude.
    • Server off:
      10° to 43°C (50.0° to 109.4°F); maximum altitude: 2133 m (7000 ft)

As you can see, the maximum temperature during operation is 35C. With outside temperatures reaching this level during summer, an AC is almost always necessary. A UPS like the Powerware 9125 is specified to work from 0C to 40 C. This is a bit more generous than the x3650, but it’s still easy to get up to 40C with several servers in a room.

In order to figure out if you need an UPS, the best way to figure this out now is monitoring your server. If you are using IBM Director or HP Insight Manager, these tools already have this functionality integrated. I personally don’t like these two products (and they’re usually overdesigned for less than 10 servers). If you have an iLO or RSA II card in your server, you can use SNMP to get the temperature, write it to a file, and get a graph from this later.

I wrote a quick and dirty script to this. It runs on linux, but the same would be easily implementable in PowerShell or VB.

#!/bin/sh
while true ; do
        echo -n "`date +%s`;" >> ~/tempmon/machine
        snmpget -Onqv -c public -v 1 \
                machine.rsa.int.dataline.ch \
                SNMPv2-SMI::enterprises.2.3.51.1.2.1.5.1.0 |
                sed 's/"//g;s/Centigrade//;s/ //g' >> ~/tempmon/machine
        sleep 5m
done

Ugly? Yes. But it works fine. You can later load this “CSV” into Excel, and create appropriate graphs from the data. And get management to buy the AC before your servers die a fiery death. If you want to monitor this long term, you could integrate the appropriate values into cacti quite easily.

A sidenote about ACs from my personal experience: The same points as for servers apply – you get what you pay for. Buying self install ACs from Fust, MediaMarkt or some other chain in that direction won’t do you much good. Get a decent, two component AC, and let it get installed by a professional. This also avoids building damage. Also, let a professional size the system, provide him with the maximal output of your servers (measured in BTU), and then double that value just to be sure. Network managed ACs are usually not available for Small Business-acceptable pricing,

Sizing memory for Windows Server Systems in Small Businesses

Sizing memory is easy – but i’ve seen many people run into trouble with it and buying more ram than they can run, and lot’s of other troubles.

There are several limits to the amount of memory you can use, because of several factors. These factors are:

  • Hardware
  • Operating System
  • Application

When buying hardware from a distributor (and not preconfigured systems directly from the manufacturer like enterprises), you usually get a base memory of 1GB, in the form of 2x 512MB.

For smaller 1U machines with only 4 memory slots, the most economic configuration is thus 3GB (with an additional 2x1GB feature set) of memory, which as we later see is supported by all OS/Application combinations.

For bigger machines, with 8 or 12 memory slots, you can get a lot more RAM. 2x1GB is still somewhat cheaper than 2x2GB. At this point, OS and applications become a factor.

Microsoft has set the following limits for it’s Operating Systems:

  • Windows Server 2003 for Small Business: 4 GB
  • Windows Server 2003 Standard Edition: 4 GB
  • Windows Server 2003 Enterprise Edition: 64 GB
  • Windows Server 2003 Standard Edition x64: 32 GB
  • Windows Server 2003 Enterprise Edition x64: 1 TB

Source: KB889654

These are the constraints by Microsoft. However, there are additional constraints on the maximum amount of memory, inflicted by the architecture itself.

Even for full 4 GB on a single server, you might need to enable PAE by setting the /PAE flag in the boot.ini file – this is necessary because there are virtual address spaces used by the PCI-Bus and similar equipment in your server. Note that /PAE is supported on SBS and Standard Edition, though you can’t use more than 4 GB of physical RAM.

Accessing more than 4GB of memory on 32bit Platforms requires certain tricks, especially if you want to access more than 4GB of RAM in a single process.

This leads us to the next set of problem – application support.

For example, a single exchange server running with 64GB of memory will not really make use of it – sure, the excess memory can be used as a disk cache, but store.exe won’t be able to use more than 4GB of memory.

There are certain application which supports AWE, most notable Microsoft SQL Server. AWE allows a single process to access more than 4GB of memory, using even more tricks. These usually slow down performance a lot.

So, having more than 4GB of RAM only makes sense if the application you are running consists of multiple, independent processes. There are other considerations such as kernel memory, which mostly come into play with a terminal server environment (which i have no experience with).

For Small Business Server, 4GB is the maximum, and in my opinion also the minimum. A SBS Server with only 1 GB of memory will be very, very slow and swapping constantly. With 2GB, it will probably work fine. With 4GB, you have the maximum amount of memory supported, and the server will probably need it, giving you an extra speed boost. The most economic way for this is usually 4x512MB, 2x 1GB.

If you think you need more than 4GB of memory, enterprise edition is very, very expensive and can lead to other performance problems when using AWE. So, if you think you need more than 4GB, go for 64bit. Exchange 2007 even requires 64bit, in contrast, Exchange 2003 doesn’t even support 64bit.

Remember that 32bit Standard Edition supports 4GB, but the 64bit Standard Edition supports 32GB.

In closing, it’s not that difficult if you can wrap your head around all these limitations. Here are my general sizing rules:

  • Windows Server 2003 for Small Business – get 4GB, you will need them
  • Windows Server 2003 Standard – get 3GB if that’s enough, or 4GB if you need the extra Gigabyte
  • Windows Server 2003 Standard x64 – Decide how much you need according to application. 8GB is a good starting point
  • Windows Server 2003 Enterprise – Don’t consider them for a Small Business, too expensive and too much hassles

Hope this braindump help someone. I won’t be writing on Friday and Monday, because i don’t have to work then.

IBM System x3250

A few days ago, i got hands on my first IBM System x3250. The x3250 isn’t a middle class server like the x3650, it’s IBMs low end rack server. You will see the difference on the pictures – there’s also a large pricing difference. This machine was to serve as a router/firewall/vpn concentrator, and thus doesn’t have any demands toward hardware. The OS installed was Debian GNU/Linux 3.1, which has it’s own set of problems.

The x3250 doesn’t have Light Path diagnostics, hot pluggable fans, or even hot pluggable hard disks. You can order them with 2.5″ HP SAS disks though, but that makes it a lot more expensive (to the point where an x3550 might be the better choice).

Here’s the configuration ordered:

  • System x3250 Xeon 1.83Ghz DC, with 2×512MB Base Memory, 3.5″ Simple Swap SATA
  • 2x 80GB SS-SATA

Unpacking and opening

IBM System x3250 Package Contents
This machine came packaged nicely into a big box, secured on a wooden pallet. It contained the usual low cost rack mount kit, without the facility the remove the machine halfway out of the rack, there was no cable tray, and no rails. It’s hard to see on the picture, because of the missing frame of reference, but the x3250 is very, very short. It would probably fit in a Telco Rack.

The disk blanks fit nicely (they have to – you don’t remove them, even if there are disks installed). Interestingly this machine still has PS/2 inputs, and the case is the same as the one of it’s predecessor, the xSeries 306m. Of course now with Intel Xeon DC. Even though a baseline model, you still have the ability to install an IBM RSA II card for remote maintenance. It also has a dedicated slot for installing a SAS/SATA raid controller, allowing you to do real hardware RAID without loosing a precious PCI-E slot.

Interiors

IBM System x3250 Fans
As you can see on the pictures, it’s clearly visible that this machine belongs to another price class than the x3650. While all cables are nicely tied together, and nothing is flying around, it’s still different from a middle class machine. The fans aren’t hot pluggable, neither are the disks. You can only install 4 DIMMs in total.

There’s an interesting heat pipe attached to the CPU, which i haven’t seen before – not even the x3650 has a heat pipe. Documentation however, is still top notch. The included documentation on the inside of the upper lid is is very detailed, and contains all the information you probably need.

Installing options

A simple swap SATA disk for an x3250
The cheapest x3250 has so called Simple Swap SATA disks. You can install and replace them while the server is mounted into the rack, but they aren’t hot pluggable. You don’t require any tools for this tasks, too. I think this was solved much better than HPs approach in their baseline machines – they use screws, and you will need to remove the machine from the rack.

Installing the SS-SATA disks is easy – just remove the filler pannel, and insert the disk till it clicks. Then place the filler panel pack into the server. Removing the disks is a breeze too, just pull on the blue latches attached to the disk.

Booting the server

IBM System x3250 System Diagram
The baseline x3250 doesn’t have a hardware RAID controller, just a standard Intel AHCI SATA controller, which is well supported on Linux. And by Linux, i mean “not Debian”. The current stable release of Debian doesn’t support AHCI SATA. This isn’t such a big problem, because you can install the OS using IDE emulation, build or install a newer kernel, and then switch the system to AHCI SATA mode.

However, this proved to be much more of a problem than i initially thought. Linux was able to recognize the disks, but after configuring the software RAID, the machine become really, really slow. Like 386 16Mhz slow. The RAID was rebuilding in the background, with about 2Mbytes per Minute. While this installing was very, very slowly skipping ahead, i built a proper kernel on another machine.

After the install finished, i quickly installed the new kernel, booting the machine in AHCI mode – thanks to Linux SW RAID autodection, there was no need to reconfigure anything. The RAID finished rebuilding with 50Mbyte/s, which i found much more acceptable – no slowdowns either.

Resumee

The x3250 is a cheap baseline model, and it’s visible. But i still think it trumps the alternative models from HP and Dell, while being similar in pricing.

Also, the obligatory plug to DATALINE AG which sells this server and other IBM System x or System i servers.