Archive for February 2007

Load testing for webservers

Before moving anything important into production, you should test if it works.

While this sounds easy enough, it isn’t. In this specific case, we will talk about doing load tests for webservers. While web serving is usually a non issue when serving static .html pages without big downloads, things get much more interesting when you’re using some kind of automatically generated content. A blog is usually a simplified content management system with not-that-much features. But there are bigger CMS, which for example might generate pictures automatically. They might also have search possibilities built in, which is always non-cacheable.

Things get even more interesting when we’re talking about completely interactive, mostly not-cacheable sites like discussion portals. But i don’t have any experience with those.

Load testing isn’t difficult, but the important thing is to use the right numbers. Getting them is much more difficult than doing the testing itself.

When we’re talking about smaller projects, you usually have to gather these numbers on your own – more about this below. When we’re talking about bigger projects, you can use the numbers used to size the system (You did size the system, right?).

Your first step would usually be to gather the necessary data for normal load. If you’re a startup, and don’t have any data for a normal load, then estimate. Then use three times your estimate.

But what should you estimate? What kind of numbers do we need?

  • Number of concurrent users
  • Delay between hits

You can usually calculate these numbers using your previous statistics. Never use your hits statistic, always use your visitor statistic. Why? Because the hits will change together with your page layout.

Another important part is the time a visitor spends on your site during a visit, and how many pagehits he generated in this time. If you’re not using frames, then you can use the page hits of your old page of your new page. You will need to calculate the number of hits (as opposed to page hits) for the new site, though.

There’s no simple hands on formula you can use to get valid values to use with your load testing software. See if the numbers “feel” right. For a small company, the concurrent visitors can easily be below ten. For sites like bluewin.ch they can be much, much higher.

You will also need a software to generate the necessary load. I’ve been working with Siege in the past, and it’s a very cool, simple suite of tools. I’ve usually used Siege with Sproxy, which allows me to gather a list of URLs to test.

You can configure your web browser to use Sproxy, and you can then gather an appropriate URL List for Siege easily. It supports both GET and POST requests, which means it will work with everything you can throw at it. When creating a URLs.txt, make sure to include pages which use heavy processing power on the backends (like searches) multiple times to simulate a more real load. You shouldn’t filter out images or static pages from your URLs.txt to simulate a more real load.

Then you can run siege. At first, you should run with low concurrency and a delay of 1 second. This is just to make sure that everything works as it is supposed to work. You can also use this to setup your monitors and testing stuff.

  • Monitor the operating system – usually your statistics system (i use Cacti, MoM and other tools will work fine, too) won’t gather everything. Have a window with vmstat 1 and one with top open, when you’re using Linux. On windows, i would recommend Process Explorer. If you’re running into limits, it’s important to see where your system comes to it’s limit. Processes can be either IO-bound (DB, Filesystem, Network), CPU-bound or application bound (the last is the only one not easily fixable using money).
  • Monitor your web server. Apache has a nice server-status handler. I don’t know about IIS, but i’m sure it has equivalent tools.
  • Monitor your back end. This one depends a lot. For the standard SMB/SOHO setup, the backend usually is a MySQL database. SHOW STATUS is your friend, though you will want to have a look at the manual to be able to interpret these values. For bigger CMS, this will be completely different. Trust the experts you’ve hired to monitor it.
  • Monitor your load balancer. This one is also different depending on the solution you use.

Note that you should monitor the operating system on all systems affected. Start the test slow, and increase at a steady rate. There’s a tool included called bombardment which does the increasing on it’s own.

When you’ve successfully passed a load test at twice the expected maximum load, you should make a backup of everything. Start defining sane resource limits for all processes and servers you’re using. Verify again with the expected maximum load. The results should not change.

Now run siege with the -b option, to run a benchmark. You will probably have to up the resource limits for the shell your siege process is running in, because the defaults don’t suffice for many, many virtual users. Wait for the first component to get killed by resource limits.

If nothing gets killed by resource limits, you either have insane hardware, or you’re not trying hard enough. If a machine runs into it’s hardware limits before the resource limits you have defined, it may crash (thus the backup). If you’re using a multi tier infrastructure, make sure to also run load tests behind your caching frontend servers. This is necessary to ensure proper resource limits on your back end machines. Otherwise, a bug, runaway process or something similar can cause one of your backend machines to crash.

Changing MX records

I’ve seen many people losing mail when changing MX records, for example when you switch from POP3 Retrieval to direct SMTP retrieval.

It’s rather simple to change MX records correctly, though there are some administrative hassles involves when you don’t have your own DNS servers. But the procedure can still be used, just ignore all the things about POP3.

If you control both mail exchangers, the whole thing is extremely easy

  • Change your TTL to a small value. 5 Minutes is a good start. Note that you can change either the TTL of the MX record itself, if you change the Name, or the TTL of the A records the MX records points to, if you’re changing the IP address, and the MX keeps the same name
  • Wait till your old TTL has expired. Note that with a default TTL of 86400, this means you will have to wait a day.
  • Make sure the new MX is running, then change the MX record (or the A IP address). If you’re switching from MX to MX, now is the time to add an SMTP route
  • Wait 5 Minutes, then retrieve your mails from POP3, this step ensures that no mail is lost for sure
  • Change your TTL back to a sane value. If you’re a low traffic site like a SMB business, i recommend a TTL of 1 hour, 3600s. This makes it easier for future changes.

Using your Windows Mobile Phone for internet access from Windows Vista

I’ve got a Swisscom branded HTC MTeoR called the Mobile Assistant XPA 1405 since about half a year. It features UMTS connectivity, a really nice display, and all the usual functionality.

I nevered bothered to deal with the software provided by Swisscom, and setup connections manually. There are a few things that are more complicated than they were with my previous mobile – a Nokia 6310.

For communication with the mobile, i’ve always used Bluetooth. It’s the easiest way to setup, and doesn’t require extra software. While the Windows Mobile Device Center is quite an improvement, i don’t like extra software in general, and Bluetooth handles this nicely.

Setting up Bluetooth is rather easy, just enable your Bluetooth chipset (either a physical switch on your Laptop, or FN-F5 on ThinkPads), then create a partnership with your mobile device. An appropiate standard modem should be created automatically.

Windows Mobile Modem Configuration for GPRSAnd here are the important parts: You will need to modify the modem definition in order to use your cellular connection successfully. I do not really know why this stuff doesn’t work out of the box.

Add the following init string to the modem definition:
+cgdcont=1,"IP","gprs.swisscom.ch"

GPRS settings on Windows Mobile
The last value is the name of the GPRS access point. If you’re not using Swisscom, you can find this value in the Settings Dialog, under Connections, GPRS. Make sure to choose the right configuration, because MMS uses a GPRS connection too. But they’re usually marked quite clearly.

Windows Vista GPRS Connection ConfigurationThe next thing you will need to do is to create an appropriate Dial-Up configuration in Windows. Just choose the correct Modem, and enter *99# as the number. Also, Vista won’t accept empty usernames for dialup connections. Use gprs as both user and password, or ask your phone carrier what you should use.

Windows Mobile Wireless Modem ApplicationLast but not least, you will need to launch the “Wireless Modem” Application on your handset, in order to enable your PC to connect. Note that this wasn’t always that reliable for me. I never had to reboot my mobile phone or something like that, but from time to time i had to cycle the wireless modem emulation on and off in order to be able to connect correctly.

Why you should use SMTP to receive mails

Many SMB setups i’ve seen still use the included POP3 Connector to receive mails. Microsoft’s POP3 Connector is included in Small Business Server, but not in regular Exchange installations (this was different with Exchange 2000). That’s actually a good thing, because non-SBS customers can’t use POP3.

The POP3 connector included with SBS 2003 sucks – it can only poll at 15 minutes intervalls, which makes for unnecessary delays. It’s nasty to debug, and setting up new accounts means doing the work twice.

SMTP has a few requirements over POP3, which is why it sometimes not used

  • A static IP Address (Dyndns is okay for Home setups)
  • A rudimentary knowledge of DNS
  • A port forwarding from your router

When sending out mails, you should still use your ISP’s Smarthost. The reasons for this are several

  • Sending out big files to multiple addresses will take ages from a standard broadband connection
  • Even static ip ranges might be blacklisted and filtered as spam

On the other hand, you will lose a bit of control when using your ISPs smarthost. It’s best to make your own judgment.

More power outages at Layer One

Layer One has failed me again.

Power is down again for one row of racks, hosting the DATALINE machines – Sylon’s machines are, obviously, not affected. Outage started at 20:48, or so says Nagios. The Layer One people are already working on the issue, but i do wonder how do managed to get it this unreliable. Three consecutive power failures in one and a half day.

UPDATE:

Power is back at 22:10. Now back to repairing MyISAM tables.

UPDATE:

Power went down for all racks at about 23:00. There was a short time with power avaiable at approximately 23:02, some machines completed their power on cycle, but wasn’t long enough for Nagios to notice. It’s back up now at 00:45.

This makes it 5 consecutive power failures.

Nine Status Ticket offers some more insight into this problem. I’ve had Init7 Pikett on Phone two times, (at the first and the second major interruption), but they told me they didn’t know what happened.

Um 23:00 Uhr stellten wir einen generellen Unterbruch des Standorts Letzigraben fest. Techniker waren in weniger als 10 Minuten vor Ort und trafen einen vollständigen Stromausfall aller Räume – TIC und LayerOne – an.

Nach ersten Erkenntnissen hat eine Stromleiste in dem Raum der LayerOne einen Kurzschluss ausgelöst, welcher zu einer Störung der USV-Anlage führte. Dies wiederum führte zu einer Abschaltung der gesamten Stromversorgung.

Um 00:32 Uhr konnte die Stromversorgung wieder in Betrieb genommen werden. Sämtliche Systeme wurden erfolgreich überprüft. Sollte ein Dienst wider Erwarten nicht funktionieren, so nehmen Sie bitte mit uns Kontakt auf.

Wir möchten uns für die Unanehmlichkeiten entschuldigen. Der Rechenzentrums-Betreiber TIC hat auf Freitag, den 23.02.07 eine Stellungsnahme in Aussicht gestellt. Sobald wir diese erhalten, werden wir weitere Details über diese Plattform kommunizieren.

Update 23.02.2007 04:39: Im Raum der LayerOne fiel im Verlauf des 22.02.2007 eine 64A Sicherung einer ganzen Stromschiene aus. Ein Techniker hat dann ohne irgendwelche Massnahmen versucht, die Schiene wieder einzuschalten. Beim ersten mal war dies erfolgreich, nach dem zweiten Anspringen der Sicherung am Abend hat der Techniker erneut versucht, die Sicherung wieder einzuschalten. Leider hat der Techniker nicht gewusst, dass solche Schienen (alle Schienen dieses Typs wurden im TIC Raum nach Druck von Nine.ch ersetzt) durchschmoren koennen. Beim wiedereinschalten war die Schiene schon so stark geschmort, dass dies zu einem Kurzschluss gefuehrt hat, welcher durch die naechst groessere Sicherung vor der USV abgefangen wurde. Durch diesen Vorgang wurde die Verbindung zwischen USV und Dieselgenerator getrennt, weshalb um 23:00 der Strom fuer 2 Sekunden ausfiel und das Haus auf USV Betrieb umschaltete. Weshalb dieser 2 Sekunden lange Ausfall auftritt, ist noch nicht klar. Nach rund 6 Minuten USV-Laufzeit waren die Batterien leer und das ganze Haus offline. Nine.ch war sehr schnell klar, was das Problem war. Leider hat der Pikettdienst der Swisscom/Simag knappe 1h20 gebraucht um vor Ort zu kommen (war mit Taxi in angetrunkenem Zustand gekommen) damit er den Zugang zur Hauptverteilung gewaehren konnte um die Sicherung wieder einzuschalten!

Heute 23.02.2007 findet eine Sitzung der Mieterparteien im Haus statt um diese unbefriedigende Situation raschmoeglichst zu verbessern.

UPDATE:

Nine edited their ticket and removed the references to the drunk technician from Swisscom/Simag.

Service Agent fails to create a configuration with java CPD0A35

CPD0A35 - Java exception not handled for user-defined category 255.
TCP266B - Schnittstelle nicht gefunden

Ever received these messages when trying to create a service agent configuration? There’s no fix for this problem. You can try to install ESA-PTF’s as long as you want, it won’t solve the problem. I had a PMR open because of this issue, and IBM told me that there’s no Fix. The workaround for this, i’ve found earlier on my one.

Just use iSeries Navigator to create the ESA connection. You can find it in TCP/IP, Outgoing Connections, Create Universal Connection. Just remember, iNav needs a lot of host servers to be running. You might need to start them to use the “Create Universal Connection” Wizard (STRHOSTSVR).

Another power outage at Layer One

At about 14:00 today, i received an SMS from our Nagios instance, telling me that our switch at Layer One was down. I investigated quickly, and found that other machines located at Layer One were down too.

A quick call to our bandwidth provider Init7 confirmed that there was another power outage. If i counted right, this is the third power outage in the ~2 years that Sylon is located at Layer One. It happened because of power maintenance in another floor. Both power circuits went down, because they were plugged into the same UPS. That makes perfect sense.

Fortunately, there were few problems. Unfortunately, one of the problems we had required attention on site. One of the new DL360G5 decided to stay asleep. iLO wasn’t responding either. On site, i just power toggled the machine, and it worked fine. Strange.

There were a few (obvious) problems with customers still using MyISAM tables for data storage – and where myisamchk didn’t fix it, i’ve just restored the most recent backup made with bontmia.

UPDATE:
Power failed again last night for about 20 minutes. It didn’t effect all Racks (DATALINE’s machines went down, but Sylon’s didn’t). They didn’t know why this happened.

Backup Exec and the Exchange database logfiles

Image this: You’ve setup a new machine, running Exchange 2003, installed Backup Exec with AOFO, and start running your backups. Everything works fine.

A few days later, the clients calls you, telling you that Exchange is no longer working. You connect to the machine through VPN, and see that the log partition (you do have log partition, don’t you?) ran full. But you have properly set up your backups and everything. But the log files don’t get deleted.

Backup Exec - Advanced Open File OptionWhen using Backup Exec with the Advanced Open File Option (AOFO, Snapshots), BE might not delete the backed up exchange database logfiles.

The culprit here is the option “Process logical volumes for backup one at a time” option.

When you activate this option, BE will no longer delete your Exchange logfiles. I do not really understand why this is the case, but i was able to get it to work fine without this option, by shutting down some services in preparation to preparing the snapshots.

Windows Home Server – First Impressions

This Saturday, i received a mail from Microsoft Connect, telling me about my invite ID for the Windows Home Server Public Beta. I already had a spare Shuttle XPC lying around, onto which i installed WHS.

The setup routine of WHS still very much reminds you that this is a Beta version, and nothing completely ready for production. It looks more like an automated deployment of Windows Server 2003 than a new product based upon it. But since WHS is primarily an OEM product, these details don’t matter that much. The initial installer uses Windows PE 2.0, which makes it a lot easier to load custom storage drivers (something which was a real hassle with 2003 and XP, since most machines don’t come with a floppy drive anymore).

After the installer was done, i shut the machine down and placed it next to my TV, headless. I booted it again, and installed the WHS connector software to my workstation, running Windows Vista. The installer went without a hitch – and the Home Server console felt quite snappy, it’s also a bit sparse – there aren’t that many options you can choose from.

This brings me to my next point:

What is Windows Home Server

Windows Home Server is, more or less, a NAS appliance for Home users. You can save files on it, you can backup your PC’s automatically to it, and you can access your files remotely.

And that’s it. WHS is not a smaller version of Windows Small Business server – it doesn’t offer Exchange, Active Directory, etc. It is not meant to be used by IT professionals at home – it is meant for non-IT people which want a central file storage, fully integrated backup functionality, and remote access capability.

Using Windows Home Server

After installing, there were already several default shares for music, video, etc. Creating an new user also creates a share for that user. WHS forces users to use UNC names, instead of drive letters to access their shares. This is an improvement, since drive letters are evil. Especially in companies, where you should be using DFS anyway.

There’s a completely automated backup solution, which is the main selling and distinguishing point of WHS. The computers will wake up from sleep automatically at night, make a backup, and go back to sleep. It will be interesting how this is going to get handled in more common households, where PCs are usually switched off with a power strip (maybe not in the US, but definitively in most European households).

The other point is remote access. Remote access allows you to access all your stuff on your WHS through a web interface, which sucks. Really. It is slow, and it’s also cumbersome. There’s much room for improvement there, and with improvement i mean “rewrite”. You can also use your WHS as a RDP gateway to access your machines at home through RDP, which seems intelligent until you realise that XP Home, Vista Home Basic and Vista Home Premium don’t ship with RDP.

WHS also allows you to update your passwords on all computers at once, through a password updater utility. WHS isn’t a domain, but this somewhat resembles this feature.

My opinion

WHS is a product for my parents. Not a product for me. This is important to realize, i’ve heard many people moan that WHS doesn’t include a shared calendar, Exchange, Active Directory, etc. If you want all this (complicated) stuff at home, go for Windows Small Business server. WHS is an appliance, not a complete server solution. The main reason for this decision was probably to avoid stealing customers from Live.com’s online mail and calendaring service.

WHS itself offers a very unique storage management feature, which allows certain folders to be automatically duplicated on multiple hard drives. In my opinion, this is one of the two killer features (the other is backup). By allowing mirroring to be adjusted on a per share basis, you can choose to mirror all your important files, but not your extensive collection of completely legal DVD backups. WHS also allows easy replacement of hard drives (though not of the system drive).

The remote access seems to be very unfinished. A self signed SSL certificate is used to access your WHS, which will bring the consumers into the habit of accepting invalid certificates – which makes all this wonderful SSL verification stuff completely worthless. Maybe Microsoft will deliver a solution to this problem, though i really doubt that this is going to be easy, probably a complicated deal with one of the big certification companies is in order.

WHS will be an excellent alternative to all the SOHO NAS appliances on the market – of which most really suck. WHS doesn’t suck, it just seems unfinished (which makes sense, considering it’s in Beta). I hope Microsoft irons all those kinks out till the release. I will probably buy one for my parents when it’s released.

EuroForm 100 IPDS modules for HP printers

EuroForm offers IPDS modules for HP printers. One of our customers bought one of them directly – we didn’t test them before, so i didn’t have much clue on how to get them working. The customers technician even preinstalled them into the printer, so there wasn’t much work for me to do anyway.

I configured an appropriate *PSFCFG and *DEVD, according to the documentation provided by EuroForm.

Printing worked. Overlays worked. But i could only access trays 1-3 (numbered 2-4 on the printer itself). So i opened the WebConfig GUI of the Printer:

HP LaserJet 4700 config pageEuroform 100 IPDS Module Config Website

The printer recognized all 5 trays (yes, they were not yet configured for different paper types, but that isn’t a problem). But the IPDS module didn’t. I called the hotline of the swiss reseller Now Consulting.

While waiting for the callback, i played around with the options. And yes – the Envelope Tray and the Multifunction Tray mentioned in the second screenshot corresponded to trays 5 and 6 of the printer. The problem was solved. Now that was easy.

About three hours later, i was already at home (~18:00), one of the Now Consulting guys called me – told me to fax all information (screenshots were not acceptable). Oh well. Luckily the problem was already solved.