PXE Boot

I administer a computational Beowulf cluster, and I was in the unfortunate position of having to rebuild the cluster software-wise after a recent failure of the mater node’s hard drives. The hardware configuration of the cluster is reasonably simple: one master node and twenty one slave nodes, all of which are connected via a high speed switch. The software configuration is similarly simple, but in this article I wish to focus on the steps I took to get PXE (Pre-boot eXecution Environment) running on the master node.

I decided to modernize the cluster by installing Fedora Core 4 on all of the nodes (master and slaves). The slave nodes have no CD-ROM drives; I was forced to install Linux over the network if I wanted a fresh copy on these nodes. The first step in this process was to set up a DHCP server on the master node, which requires the DHCP package. I used dhcp-3.0.2-22.FC4. I first did

cp /usr/share/doc/dhcp-3.0.2/dhcpd.conf.sample /etc/dhcpd.conf

I could now edit /etc/dhcpd.conf to include the appropriate parameters. My dhcpd.conf file looked like the following.

ddns-update-style interim;
ignore client-updates;

subnet 192.168.0.0 netmask 255.255.255.0 {
   range 192.168.0.10 192.168.0.254;
   default-lease-time 86400;
   max-lease-time 86400;
   option routers 192.168.0.1;
   option domain-name-servers 192.168.0.1;
   option subnet-mask 255.255.255.0;
   option domain-name "mydomain.org";
   option time-offset -7;
   option ntp-servers 192.168.0.1;
   filename "pxelinux.0";
   next-server 192.168.0.1;
}

I isolated the 192.168.0. subnet for my private network. The most important option in the above list is filename, which points to the file pxelinux.0. This is the first PXE file the slave node will see, and it contains many of the instructions needed for properly executing the PXE boot process. After the dhcpd.conf file was set up appropriately, I started the DHCP server with

/sbin/service dhcp start

The next step was to set up the TFTP server, which was used by the slave nodes to grab pxelinux.0. I had to install two packages for this task, tftp-server-0.40-6 and syslinux-3.08-2. In order to activate the TFTP server, I had to edit the file /etc/xinetd.d/tftp, specifically changing disable=yes to disable=no. To activate the TFTP server, I had to restart the xinetd service:

/sbin/service xinetd restart

In Fedora Core 4, the TFTP server’s root directory is found at /tftpboot. I then created a directory named FedoraCore4 within in /tftpboot, and I used rsync to make a copy of each of the four installation CDs in this location. I next copied the files /tftpboot/FedoraCore4/images/pxeboot/vmlinuz and /tftpboot/FedoraCore4/images/pxeboot/initrd.img to /tftpboot. I also copied /tftpboot/FedoraCore4/isolinux/memtest to /tftpboot.

The next step was to copy the file /usr/lib/syslinux/pxelinux.0 to the directory /tftpboot. This is the file each slave node will be looking for first when it boots via PXE, as specified in the dhcp.conf file. I then created the directory pxelinux.cfg within /tftpboot. This directory housed the configuration files that governed the boot options for the slave nodes.

Each slave node combs through a list of configuration files to find its assigned boot options. The first file checked is the one titled 01-xx-xx-xx-xx-xx-xx, where xx-xx-xx-xx-xx-xx is the MAC address of the slave node’s network card. The next file checked is the file named by the hexadecimal representation of the slave node’s IP address. For example, if the slave node collects the IP address 192.168.0.254 from the DHCP server, the second file checked is the one titled C0A800FE. The third file searched is titled C0A800F, the fourth file C0A800, and so on, until the file titled C is checked. This behavior allows the system administrator to specify one configuration file for a whole subnet of the network, so that all slave nodes with 192.168.0.x IP addresses can use the same configuration file.

The final configuration file checked is titled default. In my case, I wanted all slave nodes to share the same boot options, so I only had the one file title default in my /tftpboot/pxelinux.cfg directory. This file had the following contents:

prompt 1
default memtest
timeout 1000

label memtest
   kernel memtest

label linux
   kernel vmlinuz
   append initrd=initrd.img ramdisk_size=8192

The default boot option was memtest. I chose this setting because I did not want the slave node launching into the installation routine if I was away from the terminal after it booted. The slave node waits until timeout milliseconds before launching into the default option. In this case, to install Linux, I would type linux at the PXE boot prompt.

In order to allow installation via HTTP, I needed to configure the Apache web server to allow such a connection. At the very end of the file /etc/httpd/conf/httpd.conf, I placed the following text:

<Directory /tftpboot/FedoraCore4>
   Options Indexes
   AllowOverride None
</Directory>
Alias /linux /tftpboot/FedoraCore4

I then restarted the Apache web server:

/sbin/service httpd restart

I could now boot the slave nodes via PXE and install Fedora Core 4 via HTTP. Once Fedora Core 4 booted, I was asked the question

What type of media contains the packages to be installed?

I chose HTTP. After telling the slave node to grab its IP address via DHCP, I was asked to provide the IP address of the web server and the Fedora Core directory. After supplying the appropriate IP address and specifying a directory of /linux, as per the newly added text in the httpd.conf file on the web server, I was presented with the same install options I see when using CDs. I repeated this process to install a fresh copy of Fedora Core 4 on each slave node.