Archive for the ‘Linux’ Category.

OpenVPN on Windows works surprisingly well

I’ve been using OpenVPN for a few years on Linux to establish site to site VPNs. It has never let me down, and i was always able to get the configuration working in the way I wanted it, without much effort and fiddling. Another nice ability of OpenVPN is that it can work it’s way through almost any firewall, which can be especially nice when working with restricted internet access.

A few days ago, i’ve got into a situation where I needed to get to a site to site VPN up as quickly as possible, behind a restrictive firewall. I’ve started with the obvious route, and found a few resources referring to OpenVPN on the net.

One of them is the OpenVPN GUI, which is mostly aimed at roadwarrior scenarios. The Windows installation notes and the Windows section in the howto are quite sparse. As such, my expectations weren’t high.

Installing OpenVPN results in the creation of a virtual ethernet adapter, that’s backed by the TAP driver (which is not signed). The install went fine, and configuration was the same as on Linux.

The Windows installer automatically installs as service that defaults to a disabled state, which when started launches OpenVPN for all *.ovpn files in %ProgramFiles%\OpenVPN\config. Simple, but efficient. Logs get written to %ProgramFiles%\OpenVPN\log.

After creating an appropriate configuration, i put it into the config dir, started the service, and everything just worked. Right out of the box. Without thinkering. Without error messages. It just worked.

As such, the application clearly shows it’s Linux/Unix origin, but it works nicely. Windows administrators that have never worked with a unix-like operating system might be put off by the application. I would still suggest everyone to take a look at OpenVPN for some low cost VPN improvisations.

Graphs are not only for managers

Fancy graphs are only for managers

I’ve heard this one more than once, and it just isn’t true. While hard numbers are good for many things, they are usually not adequate for looking at Network connections.

Especially with the advent of VPN connections throughout multiple ISPs, companies, etc. there is a need to have a less subjective view of the quality of these links. Luckily there are many open source options available for graphical network monitoring.

The most important tool for WAN connections is SmokePing. With Cacti, you can graph almost anything. SNMP support is built in, and you can also use scripts. I’ve used many scripts with SSH commands and public key authentication to transfer even sensitive statistics over the network.

Booting PXElinux from RIS

You already have RIS/WDS set up to do a network installation of windows? But now you have some linux machines, that you want to setup using you already implemented Windows Server?

It’s quite easy to do:

Place the following text in X:\RemoteInstall\Setup\German\Tools\PXELinux\i386\templates

[OSChooser]
Description = "PXElinux"
Help = "Linux Network Boot Loader - Currently loads a Debian 4.0 Installer"
LaunchFile = "Setup\German\Tools\PXElinux\i386\templates\pxelinux.0"
Version = "1.00"
ImageType=Flat

You can get all the necessary debian files from netboot.tar.gz. You generally do not need to edit any files in there, just extract them into the templates directory.

Creating simple graphs using rrdtool

rrdtool graph for temperatures
As discussed in the previous post, you can gather temperature data from RSA II or iLO cards using SNMP quite easily.

While the data itself can be good enough to make a decision, executives in a company always like nice diagrams. So my first try was to load the CSV-like datafile generated using said script into Excel, and make a diagram out of it. But Excel is restricted to 255 parameters per axis, which was severely limiting.

I’ve been using Cacti for quite some time, but wasn’t willing to implement it because we’re mostly a Windows shop, and my plan was to integrate the linux boxes into Operations Manager 2007. Cacti uses Tobi Oetiker’s rrdtool to create the graphs.

Creating graphs using rrdtool is quite easy, actually. I wrote a simple script that handled this:

makerrd

Creates the appropriate rrd file. Replace the unix timestamp as appropriate. The last value on the RRA lines is the number of values saved into the data file.

#!/bin/sh
rrdtool create test.rrd           \\
           --start 1176465000     \\
           -s 300                 \\
           DS:temp:GAUGE:600:U:U  \\
           RRA:AVERAGE:0:1:5000

inputrrd

Loads the data from the simple CSV-like file into the RRD file. The more elegant approach would be to load the data directly from SNMP into the rrd database, but i’m no programmer.

#!/bin/zsh
while IFS=';' read timestamp temp ; do
        temp=`echo $temp | sed 's/\\..*//;'`
        rrdtool update test.rrd ${timestamp}:${temp}
        if [ $? != 0 ] ; then
                rrdtool failed
        fi
done < machine

makegraph

Creates a graph from the data in the rrd file. The HRULE lines create lines for error margins. In this case 35C and 30C.

#!/bin/sh
rrdtool graph temp.png                       \\
        --start 1176465385 --end `date +%s`  \\
        DEF:mytemp=test.rrd:temp:AVERAGE     \\
        LINE2:mytemp#0000FF                  \\
        HRULE:35#FF0000                      \\
        HRULE:30#FFA500

See the created graph to the right. Of course, rrdtool has much more options and can create much nicer graphs.

IBM System x3250

A few days ago, i got hands on my first IBM System x3250. The x3250 isn’t a middle class server like the x3650, it’s IBMs low end rack server. You will see the difference on the pictures – there’s also a large pricing difference. This machine was to serve as a router/firewall/vpn concentrator, and thus doesn’t have any demands toward hardware. The OS installed was Debian GNU/Linux 3.1, which has it’s own set of problems.

The x3250 doesn’t have Light Path diagnostics, hot pluggable fans, or even hot pluggable hard disks. You can order them with 2.5″ HP SAS disks though, but that makes it a lot more expensive (to the point where an x3550 might be the better choice).

Here’s the configuration ordered:

  • System x3250 Xeon 1.83Ghz DC, with 2×512MB Base Memory, 3.5″ Simple Swap SATA
  • 2x 80GB SS-SATA

Unpacking and opening

IBM System x3250 Package Contents
This machine came packaged nicely into a big box, secured on a wooden pallet. It contained the usual low cost rack mount kit, without the facility the remove the machine halfway out of the rack, there was no cable tray, and no rails. It’s hard to see on the picture, because of the missing frame of reference, but the x3250 is very, very short. It would probably fit in a Telco Rack.

The disk blanks fit nicely (they have to – you don’t remove them, even if there are disks installed). Interestingly this machine still has PS/2 inputs, and the case is the same as the one of it’s predecessor, the xSeries 306m. Of course now with Intel Xeon DC. Even though a baseline model, you still have the ability to install an IBM RSA II card for remote maintenance. It also has a dedicated slot for installing a SAS/SATA raid controller, allowing you to do real hardware RAID without loosing a precious PCI-E slot.

Interiors

IBM System x3250 Fans
As you can see on the pictures, it’s clearly visible that this machine belongs to another price class than the x3650. While all cables are nicely tied together, and nothing is flying around, it’s still different from a middle class machine. The fans aren’t hot pluggable, neither are the disks. You can only install 4 DIMMs in total.

There’s an interesting heat pipe attached to the CPU, which i haven’t seen before – not even the x3650 has a heat pipe. Documentation however, is still top notch. The included documentation on the inside of the upper lid is is very detailed, and contains all the information you probably need.

Installing options

A simple swap SATA disk for an x3250
The cheapest x3250 has so called Simple Swap SATA disks. You can install and replace them while the server is mounted into the rack, but they aren’t hot pluggable. You don’t require any tools for this tasks, too. I think this was solved much better than HPs approach in their baseline machines – they use screws, and you will need to remove the machine from the rack.

Installing the SS-SATA disks is easy – just remove the filler pannel, and insert the disk till it clicks. Then place the filler panel pack into the server. Removing the disks is a breeze too, just pull on the blue latches attached to the disk.

Booting the server

IBM System x3250 System Diagram
The baseline x3250 doesn’t have a hardware RAID controller, just a standard Intel AHCI SATA controller, which is well supported on Linux. And by Linux, i mean “not Debian”. The current stable release of Debian doesn’t support AHCI SATA. This isn’t such a big problem, because you can install the OS using IDE emulation, build or install a newer kernel, and then switch the system to AHCI SATA mode.

However, this proved to be much more of a problem than i initially thought. Linux was able to recognize the disks, but after configuring the software RAID, the machine become really, really slow. Like 386 16Mhz slow. The RAID was rebuilding in the background, with about 2Mbytes per Minute. While this installing was very, very slowly skipping ahead, i built a proper kernel on another machine.

After the install finished, i quickly installed the new kernel, booting the machine in AHCI mode – thanks to Linux SW RAID autodection, there was no need to reconfigure anything. The RAID finished rebuilding with 50Mbyte/s, which i found much more acceptable – no slowdowns either.

Resumee

The x3250 is a cheap baseline model, and it’s visible. But i still think it trumps the alternative models from HP and Dell, while being similar in pricing.

Also, the obligatory plug to DATALINE AG which sells this server and other IBM System x or System i servers.

Load testing for webservers

Before moving anything important into production, you should test if it works.

While this sounds easy enough, it isn’t. In this specific case, we will talk about doing load tests for webservers. While web serving is usually a non issue when serving static .html pages without big downloads, things get much more interesting when you’re using some kind of automatically generated content. A blog is usually a simplified content management system with not-that-much features. But there are bigger CMS, which for example might generate pictures automatically. They might also have search possibilities built in, which is always non-cacheable.

Things get even more interesting when we’re talking about completely interactive, mostly not-cacheable sites like discussion portals. But i don’t have any experience with those.

Load testing isn’t difficult, but the important thing is to use the right numbers. Getting them is much more difficult than doing the testing itself.

When we’re talking about smaller projects, you usually have to gather these numbers on your own – more about this below. When we’re talking about bigger projects, you can use the numbers used to size the system (You did size the system, right?).

Your first step would usually be to gather the necessary data for normal load. If you’re a startup, and don’t have any data for a normal load, then estimate. Then use three times your estimate.

But what should you estimate? What kind of numbers do we need?

  • Number of concurrent users
  • Delay between hits

You can usually calculate these numbers using your previous statistics. Never use your hits statistic, always use your visitor statistic. Why? Because the hits will change together with your page layout.

Another important part is the time a visitor spends on your site during a visit, and how many pagehits he generated in this time. If you’re not using frames, then you can use the page hits of your old page of your new page. You will need to calculate the number of hits (as opposed to page hits) for the new site, though.

There’s no simple hands on formula you can use to get valid values to use with your load testing software. See if the numbers “feel” right. For a small company, the concurrent visitors can easily be below ten. For sites like bluewin.ch they can be much, much higher.

You will also need a software to generate the necessary load. I’ve been working with Siege in the past, and it’s a very cool, simple suite of tools. I’ve usually used Siege with Sproxy, which allows me to gather a list of URLs to test.

You can configure your web browser to use Sproxy, and you can then gather an appropriate URL List for Siege easily. It supports both GET and POST requests, which means it will work with everything you can throw at it. When creating a URLs.txt, make sure to include pages which use heavy processing power on the backends (like searches) multiple times to simulate a more real load. You shouldn’t filter out images or static pages from your URLs.txt to simulate a more real load.

Then you can run siege. At first, you should run with low concurrency and a delay of 1 second. This is just to make sure that everything works as it is supposed to work. You can also use this to setup your monitors and testing stuff.

  • Monitor the operating system – usually your statistics system (i use Cacti, MoM and other tools will work fine, too) won’t gather everything. Have a window with vmstat 1 and one with top open, when you’re using Linux. On windows, i would recommend Process Explorer. If you’re running into limits, it’s important to see where your system comes to it’s limit. Processes can be either IO-bound (DB, Filesystem, Network), CPU-bound or application bound (the last is the only one not easily fixable using money).
  • Monitor your web server. Apache has a nice server-status handler. I don’t know about IIS, but i’m sure it has equivalent tools.
  • Monitor your back end. This one depends a lot. For the standard SMB/SOHO setup, the backend usually is a MySQL database. SHOW STATUS is your friend, though you will want to have a look at the manual to be able to interpret these values. For bigger CMS, this will be completely different. Trust the experts you’ve hired to monitor it.
  • Monitor your load balancer. This one is also different depending on the solution you use.

Note that you should monitor the operating system on all systems affected. Start the test slow, and increase at a steady rate. There’s a tool included called bombardment which does the increasing on it’s own.

When you’ve successfully passed a load test at twice the expected maximum load, you should make a backup of everything. Start defining sane resource limits for all processes and servers you’re using. Verify again with the expected maximum load. The results should not change.

Now run siege with the -b option, to run a benchmark. You will probably have to up the resource limits for the shell your siege process is running in, because the defaults don’t suffice for many, many virtual users. Wait for the first component to get killed by resource limits.

If nothing gets killed by resource limits, you either have insane hardware, or you’re not trying hard enough. If a machine runs into it’s hardware limits before the resource limits you have defined, it may crash (thus the backup). If you’re using a multi tier infrastructure, make sure to also run load tests behind your caching frontend servers. This is necessary to ensure proper resource limits on your back end machines. Otherwise, a bug, runaway process or something similar can cause one of your backend machines to crash.

Another power outage at Layer One

At about 14:00 today, i received an SMS from our Nagios instance, telling me that our switch at Layer One was down. I investigated quickly, and found that other machines located at Layer One were down too.

A quick call to our bandwidth provider Init7 confirmed that there was another power outage. If i counted right, this is the third power outage in the ~2 years that Sylon is located at Layer One. It happened because of power maintenance in another floor. Both power circuits went down, because they were plugged into the same UPS. That makes perfect sense.

Fortunately, there were few problems. Unfortunately, one of the problems we had required attention on site. One of the new DL360G5 decided to stay asleep. iLO wasn’t responding either. On site, i just power toggled the machine, and it worked fine. Strange.

There were a few (obvious) problems with customers still using MyISAM tables for data storage – and where myisamchk didn’t fix it, i’ve just restored the most recent backup made with bontmia.

UPDATE:
Power failed again last night for about 20 minutes. It didn’t effect all Racks (DATALINE’s machines went down, but Sylon’s didn’t). They didn’t know why this happened.

PPTP from Windows XP/Vista to Linux fails

mppe_decompress[1]: osize too small! (have: 1400 need: 1401)

This message from dmesg familiar? The PPTP connection works fine using ping, until you try to send or receive a packet near the MTU?

It’s a bug in older versions of the linux kernel’s PPTP support. You can update your kernel to the latest version, and this problem will go away, or as a temporary solution, you can manually specify a lower MTU in windows. Note that using pppd’s options to achieve this won’t work.

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NdisWan\Parameters\Protocols\0]
"ProtocolType"=dword:00000800
"PPPProtocolType"=dword:00000021
"TunnelMTU"=dword:00000514

With this simple .reg file, you can lower the windows MTU, and get successful connections. This is especially useful if you can’t upgrade the kernel for one reason or another.

Another reiserfs died

A few months back the Sylon Mail Server crashed because of spurious reiserfs errors. The machine was already old, and we suspected problems with the storage subsystem (a HP SmartArray and 2×36, 2x 147GB U320 Disks), so we replaced the whole machine, but also changed the filesystem back to ext3. Luckily, thanks to backups and a mostly working reiserfsck, no mail was lost (but a whole night).

Now the same problem happened again, but on another machine, about a year old (an IBM xSeries 306m), running a Linux software raid with two 80GB disks on an Intel AHCI SATA controller. Completely different hardware.

ReiserFS: dm-2: warning: vs-13060: reiserfs_update_sd: stat data of object [491319 491321 0x0 SD] (nlink == 11) not found (pos 1)

I don’t really understand why these errors just pop up. The machine has ECC RAM, and the AHCI controllers are rather stable. It’s the same symptoms as we had on the Sylon mail server. In this case, i had a lot more luck than with the sylon mail server. The affected volume was just for online backup storage. I ran reiserfsck, which suggested a --rebuild-tree. I didn’t go that route, and reformatted the volume with ext3.

I still have one machine running reiserfs, but i hope that i don’t get anymore problems with it.