BYTEMARK HOSTING SITE MAP

Hosting services | Connectivity | Support | Company | Software | Order now

Help! My VM is stuck!

Here is a guide to getting your VM out of a scrape if you think it has crashed, or know that you have made it inaccessible through a firewall rule, taking down the ethernet interface etc. We assume that at this point one or all of your machine's services have stopped responding or you have no control left from inside the machine itself.

We also assume that you have an ssh session open to your machine's admin shell: from another Linux machine you should type ssh yourmachine@yourmachine.vmadmin.bytemark.co.uk.

1) Can you ping it?

If you can type ping yourname.vm.bytemark.co.uk and get a response, it means that your kernel is still running, and there should be a simple way of getting control back.

You could try getting to a serial console using the console command from the admin shell. If this doesn't work you may need to try further steps.

2) Three-fingered salute

The easiest way out, if you can't see what's going on from the console, or the console doesn't work for some reason, may be a straight reboot. You can achieve this on our machines with the cad command at the admin shell, which is equivalent to pressing Ctrl+Alt+Delete on the keyboard. You should leave your VM a minute or two to reboot or otherwise, and it's often instructive to see whether it stops responding to pings for a brief period to show that it has rebooted.

You can use the status command to find out how long the machine has been running since its last reboot; once it restarts you will see the status go from Currently: on (12345s) to Currently: on (2s) to show the time since the last reboot.

Note that in order for cad to work, it must be set up in your /etc/inittab file with the following line:

ca:12345:ctrlaltdel:/sbin/shutdown -t1 -a -h now

This tells the kernel that it should halt rather than reboot: inside the UML you should have no reason to reboot, only use halt and our management system will handle reboots. See here for the reason.

3) Emergency measures

If you cannot get a console up, cad does nothing even after a minute or two, and you have no other way of sorting the problem out, you can forcibly kill your kernel and get it to restart.

Type autoreboot off followed by rescuehalt at the admin shell. This should stop the stop the kernel in its tracks and hopefully avoid any data loss, but some processes may be confused by the sudden disappearance of the kernel next time they start up. It is equivalent to pulling the power, but is not quite as drastic as all that since the host kernel will have records of any disc writes that were pending.

If the rescuehalt command hangs for more than a few seconds pressing Ctrl+c will force it to finish.

If you type status afterwards, you should see that the kernel is Currently: off and is thus no longer running. To check that it will boot up okay the next time you should boot it at the admin shell by typing rescueboot. The boot process should proceed and take you to a login prompt, or at least a state where an ssh daemon is visible. Once you have confirmed the system boots "manually", you should halt it from a root prompt and then type autoreboot on at the admin shell to resume normal operations.

If the rescueboot does not give you either an ssh daemon or a login: prompt, you should try rescueboot /bin/bash which skips most of the boot to get you an immediate root shell prompt. This will work unless you've done something really drastic to your filesystem; from here you can run fsck, edit files etc. until you are confident the system will boot. At which point you can exit the shell and try a normal rescueboot to get the kernel to start again, as above.

Note that when you have booted straight to a shell, your file system will be read only! To fix this, you will want to run mount / -o rw,remount to edit files.

4a) Call for help

If all else fails and you have tried all the above, please call us on the number at the bottom of this web page and we can almost certainly sort out your system through privileged access to the host machine. Please note that we may charge for time spent performing data recovery or system reconstruction if you have deleted critical system files.

4b) Wipe it clean

If you have just been tinkering and have no critical data to preserve, you can always use the reimage command to wipe your filesystem clean and start again. We do not recommend this as a solution, but it's there if you think it's appropriate. Simply type (e.g.) reimage etch to start with a clean Debian system; you will of course be asked to confirm.

Topics:

Rescue VirtualMachine

© 2006 Bytemark Hosting. All rights reserved