22-JAN-2011: WAN PXE Boot

Installing a remote system without DHCP support

The company I work for has a dysfunctional organizational structure. The people who run DHCP servers out at district and field sites usually don't know too much about DHCP. Additionally, getting them to make changes to their DHCP server to allow PXE booting a Solaris or Linux machine for Jumpstart or Kickstart is not something likely to occur in a timely manner (say, less than 1 month) since they: a) don't understand the request; and b) don't really care about us.

So when a request came in to Jumpstart two Solaris x86 systems at one of the districts by the end of the following week, we had two problems:

  1. We didn't have a Solaris x86 FLAR; and
  2. We didn't have any way to get the system load the "pxegrub" bootloader from our TFTP server (since that's normally what the DHCP server tells the PXE client to do, and we had no control over the DHCP servers).

The first issue was mostly resolved due to previous work by a co-worker getting ready for support of some other Solaris x86 systems. We had a Solaris x86 Jumpstart profile, just no FLAR. He was able to produce a FLAR within a few days of install, post-install, test, clean-up, verify, rinse, repeat.

The second issue was the one I was delegated. My first thought was to use normal GRUB's "el torito" support to burn a bootable CD instead of "pxegrub" from the TFTP server and just use the "configfile" directive within GRUB to load the "menu.lst" from the TFTP server. This did not go well. I spent days on it, and while I was able to get it to load the Solaris x86 kernel and miniroot (initial ramdisk) the installation always failed early when trying to configure the network devices.

I started reading the scripts on the miniroot to determine how it was attempting to configure the network devices and discovered that it was expecting GRUB to store network information somewhere that the newly loaded kernel could find it and a stock GRUB didn't do this. Sun had apparently patched this into GRUB. No big deal. I'll just take the GRUB from an existing Solaris x86 system, burn it to a CD, and let it go.

Fail.

Fail. Sun. Fail.

The newly created Sun-patched GRUB-on-CD booted up just fine. However, the "dhcp" directive was completely broken. It would send out a DHCP_REQUEST from 0.0.0.0 to 255.255.255.255:68, the DHCP server would then respond with an address in a DHCP_REPLY message. Then, the DHCP client would do... another DHCP_REQUEST but this time from the IP address the DHCP server just gave it instead of 0.0.0.0. The DHCP server would reply with another DHCP_REPLY but the Sun-modified GRUB didn't seem to care, and would ask for an address from the address it was just assigned forever (or atleast a day, since I let it sit there doing that for a full day wondering if it would ever actually complete whatever it was trying to do).

I gave up on that dream, and started looking for an alternate approach.

I then found gPXE which is a PXE client. Further, it's a scriptable PXE client where the script can be embedded in the PXE client itself. Further still it supports being booted from CD.

With "gPXE" I was able to create an ISO image that obtained an IP address via DHCP, went out to my hard-coded-in-the-binary TFTP server and loaded the "nbp.01MAC_ADDRESS" file (a.k.a., "pxegrub") that the Jumpstart "add_install_client" script created. It booted Sun's modified "pxegrub", which fetched the correct "menu.lst" file and loaded the kernel and miniroot via TFTP. Once that was done, it was able to see the network configuration structure that "pxegrub" left in place for it and was able to configure the network interfaces, mount up the NFS server, find its "sysidcfg" file and begin installing.

Hooray.


Here's the gPXE script I embedded in the ROM:

 #!gpxe
 dhcp net0
 set next-server 331.39.13.6
 set 150:string ${next-server}
 set 67:string 01${mac:hex}
 chain tftp://${next-server}/nbp.01${mac:hex}
 boot

(where 331.39.13.6 is the IP address of the Jumpstart server)

I initially did this on the "rom-o-matic" web site, but it didn't work because the "${mac:hex}" variable formats the address:

  1. With colons separating bytes of the MAC address where Jumpstart expects nothing to separate the bytes; and
  2. As lower-cased where Jumpstart expects the digits to be upper-cased.

A small (one line) patch to the "gPXE" code and updating the "rom-o-matic" PHP scripts which also come with "gPXE" to work with PHP5 later it formats the MAC address the same way the Jumpstart server does.


Here's how I added the client from the Jumpstart server:

 jumpstart# ./add_install_client -d -e 01:02:03:04:05:06 -t 331.39.13.6: \
            -p 331.39.13.6:/jumpstart/sysidcfg/testbox -c 331.39.13.6:/jumpstart \
            -b 'boot-args=- install nowin' i86pc

(Again, where "331.39.13.6" is the IP address of the Jumpstart server)