Project:Infrastructure/Developer Machines/ia64
ia64 Admin Notes
These are various notes mainly targeted at people administrating Gentoo dev machines, although most things are probably generally useful. These are not general "how do I administrate a Gentoo box" notes.
Machine-specific Notes
dolphin
Host: dolphin.ia64.dev.gentoo.org
HP RX2600, CD writer. Donated by HP in 2003. This machine is powered off since two years ago to save power/cooling resources.
ILO is accessible using port 1 of console2. I used to access it using ssh armin76:dolphin-iLO@console2.gentoo.osuosl.org
but console2 doesn't seem to answer now.
ttyS0 is accessible using port 2 of console2, ssh armin76:dolphin-ttyS0@console2.gentoo.osuosl.org
This machine had 4GB of RAM, 2x900MHz processors, 1x36GB HDD SCSI 80pin, 2x72GB HDD SCSI 80pin. No RAID.
This machine should still be in gentoo's rack in OSL, on top of bender. It does not have rails.
beluga
HP RX2620, CD/DVD reader only. Donated by HP in 2012, previously it used to be in HP's datacenter. It's stored in OSL but not in Gentoo's rack. It was sent as-is from HP, so iLO is configured with wrong parameters, probably. Also it will have static IP in the OS, wrongly configured too. I think it had a RAID5 by HW using 72GB HDDs. Cannot remember how many, probably 4 or 5. It had 2x 1.6GHz processors and 12GB of RAM.
It was stored in case guppy failed in the future and we had no other option.
guppy
HP RX3600. DVD/CD writer IIRC. Used to be in HP's DC but was
sent to OSL when HP pulled the plug in DC.
iLO is accessible from port 5 in console2. Once logged in you can access
the remote console too.
Admin notes
Hostnames
These are the current systems we have available. See machine specific notes at bottom for more details.
Machine Name | IP | DNS Hostnames | Console Server | Console Account |
---|---|---|---|---|
guppy | 140.211.166.179 | guppy.ia64.dev.gentoo.org | ?? | ?? |
Console Access
iLO2 is accessible over telnet and SSH from dev.gentoo.org box (ssh needs some legacy ciphers). Ask infra@ for credentials and IP address.
You can use this to:
- Interact with the EFI (e.g. to select recovery kernel, boot from plugged Gentoo DVD, change boot order)
- Log in directly over ttyS1 to recover
- Reboot machine
Hardware notes
List devices over MP console as: 'CM' > 'DF'
PSU status
PSU status can be checked over MP console as: 'CM' > 'PS':
Power supplies State ----------------------------------- Power Supply 0 Fault Power Supply 1 Normal
Here we see that PSU-0 needs to be swapped. Tracked (and fixed) at bug #671420.
HDD status
Disk array needs to be checked from operating system:
root #
cciss_vol_status -V /dev/sda
Controller: Smart Array P600 Board ID: 0x3225103c Logical drives: 0 Running firmware: 1.52 ROM firmware: 1.52 /dev/cciss/c0d0: (Smart Array P600) RAID 5 Volume 0 status: Using interim recovery mode. Failed drives: connector 1I box 1 bay 6 HP DH072ABAA6 3PD0YA8B00009816N8B5 HPD4 Total of 1 failed physical drives detected on this logical drive. Physical drives: 7 connector 1I box 1 bay 8 HP DG072A8B54 3LB0RFWF00007703FJ9Y HPD7 OK connector 1I box 1 bay 7 HP DG072A9BB7 B365P6A072YP0641 HPD0 OK connector 1I box 1 bay 5 HP DG072A9BB7 B365P6A074CF0641 HPD0 OK connector 2I box 1 bay 4 HP DG072A9BB7 B365P6A073U40641 HPD0 OK connector 2I box 1 bay 3 HP DG072A9BB7 B365P6A073KC0641 HPD0 OK connector 2I box 1 bay 2 HP DG072A9BB7 B365P6904NHC0635 HPD0 OK connector 2I box 1 bay 1 HP DG072A9BB7 B365P6A072RM0641 HPD0 OK /dev/cciss/c0d0(Smart Array P600:0): Non-Volatile Cache status: Cache configured: Yes Total cache memory: 224 MiB Cache Ratio: 50% Read / 50% Write Read cache memory: 112 MiB Write cache memory: 112 MiB Write cache enabled: No Write cache temporarily disabled Temporary disable condition. Posted write operations have been disabled due to the fact that less than 75% of the battery packs are at the sufficient voltage level.
Here we see that HDD-6 needs to be swapped. Tracked (and fixed) at bug #671420. Leaving the error example here for posterity.
Batteries are also dead. I'm not sure how many batteries are there: one per controller or one per SAS I/O card. TODO: find out how to check those as well.
Common iLO commands
- Get remote console output (ttyS1):
CO
- Get interactive console (to login and recover system on ttyS1):
CO Ctrl-E f c
- Reboot main machine:
RS
- Power cycle main machine and RAID:
PC -cycle
- Manage iLO users:
UC
- Get builtin help:
HE
Other stuff
There are a few concepts to keep in mind when using iLO:
- MP (iLO): a separate from main machine board that accepts telnet and ssh connections, issues commands to main machine over BMC interface, can to I/O on ttyS1
- BMC: an FPGA on motherboard of main machine, accepts commands from MP. Can reboot machine, return hardware parts, report health status, etc.
- main machine itself: a few ia64 CPUs, RAM and so on.
user ---<telnet>---> MP --> [ BMC <-> ia64-machine ].
Typical problems
Mysterious hangups on reboot
Sometimes BMC hangs up on main machine reboot. Not clear why.
You can usually still access MP but I have not figured out how to reboot the machine in this state without physical help. End up asking infra/on-site staff to reboot a machine.
Makes each reboot a challenge.
Kernel Management
ia64 systems are EFI systems. guppy uses standard grub2 efi64 setup.
To update a kernel:
- build kernel in /usr/src/linux
- install kernel as
make install && make modules_install
- boot-test new kernel over iLO by changing path to vmlinux.
- regenerate configs via
grub-mkconfig --output=/boot/grub/grub.cfg
Needed patches/configs
- bug #808405: stack canary has to be removed as it assumes that one of stack tops is unused
- bug #808408:
VM_FLUSH_RESET_PERMS
has to be ignored as it breaks BPF and kernel module loading sometimes by corrupting vmalloc() state. hardened_usercopy=0
kernel command (or setCONFIG_HARDENED_USERCOPY=n
) because linear mapping is not accounted in usercopy check. Full of false positives on any buffer checks.
Sample Config Files
Recovery notes
iLO (CM > CO
) serial console runs on ttyS1
, ttyS0
is wired to physical(?) console.
Console is configured in EFI as P Serial Acpi(HWP0002,PNP0A03,0)/Pci(1|2) Vt100+ 115200
.
EFI shell
In interactive EFI boot menu pick EFI Shell [Built-in]
. And run the DVD kernel:
# inspect cdrom fs0:\> ls fs0:\efi\boot Directory of: fs0:\efi\boot 09/27/09 08:42p <DIR> 2,048 . 09/27/09 08:42p <DIR> 2,048 .. 09/27/09 08:42p 698 elilo.conf 09/27/09 08:42p 7,020,793 gentoo 09/27/09 08:42p 374,212 bootia64.efi 09/27/09 08:42p 6,092,363 gentoo.igz 09/27/09 08:42p 380 elilo.msg # run kernel with custom arguments (cdrom's defaults and not very suitable) fs0:\> fs0:\efi\boot\bootia64.efi -i gentoo.igz gentoo initrd=gentoo.igz root=/dev/ram0 init=/linuxrc dokeymap looptype=squashfs loop=/image.squashfs cdroot console=ttyS1,115200n8 ... livecd ~ # uname -r 2.6.30-gentoo-r6
Or alternatively you can boot directly from HDD if you need non-standard arguments:
Shell> fs1:\EFI\gentoo\elilo.efi boot\vmlinuz-4.9.72-gentoo root=/dev/cciss!c0d0p3
For newer kernel (4.19+) devices got renamed from /dev/cciss!c0d0p${N} to /dev/sda${N}:
Shell> fs1:\EFI\gentoo\elilo.efi boot\vmlinuz-4.19.86-gentoo root=/dev/sda3
To get network setup just configure the addresses (see below for up-to-date setup):
# ip addr add 140.211.166.179/27 dev eth1 # ip link set up dev eth1 # ip r add default via 140.211.166.161 dev eth1
eth1 is a NIC with MAC ..:..:..:51:cf:57.
ELILO shell
Type TAB
at ELILO boot:
prompt to interrupt boot process.
TODO: actual syntax to load initrd
Config snippets
Config snippets on plugged Gentoo-2009 cdrom:
/etc/inittab
... # TERMINALS c1:12345:respawn:/sbin/agetty 38400 tty1 linux c2:2345:respawn:/sbin/agetty 38400 tty2 linux c3:2345:respawn:/sbin/agetty 38400 tty3 linux c4:2345:respawn:/sbin/agetty 38400 tty4 linux c5:2345:respawn:/sbin/agetty 38400 tty5 linux c6:2345:respawn:/sbin/agetty 38400 tty6 linux # SERIAL CONSOLES #s0:12345:respawn:/sbin/agetty 9600 ttyS0 vt100 #s1:12345:respawn:/sbin/agetty 9600 ttyS1 vt100 ...
elilo.conf
prompt message=/efi/boot/elilo.msg chooser=simple timeout=50 relocatable image=/efi/boot/gentoo label=gentoo append="initrd=gentoo.igz root=/dev/ram0 init=/linuxrc dokeymap looptype=squashfs loop=/image.squashfs cdroot" initrd=/efi/boot/gentoo.igz image=/efi/boot/gentoo label=gentoo-serial append="initrd=gentoo.igz root=/dev/ram0 init=/linuxrc dokeymap looptype=squashfs loop=/image.squashfs cdroot console=tty0 console=ttyS0,9600" initrd=/efi/boot/gentoo.igz image=/efi/boot/gentoo label=gentoo-sgi append="initrd=gentoo.igz root=/dev/ram0 init=/linuxrc dokeymap looptype=squashfs loop=/image.squashfs cdroot console=tty0 console=ttySG0,115200" initrd=/efi/boot/gentoo.igz
/etc/conf.d/net
Useful for livecd as DHCP does not acquire data:
config_eth1="140.211.166.179/27" routes_eth1="default via 140.211.166.161"