Help! My VM is stuck!
Here is a guide to getting your VM out of a scrape if you think it has crashed, or know that you have made it inaccessible through a firewall rule, taking down the ethernet interface etc. We assume that at this point one or all of your machine's services have stopped responding or you have no control left from inside the machine itself.
We also assume that you have an ssh session open to your machine's admin
shell: from another Linux machine you should type ssh
yourmachine@yourmachine.vmadmin.bytemark.co.uk.
1) Can you ping it?
If you can type ping yourname.vm.bytemark.co.uk and get a
response, it means that your kernel is still running, and there should be a
simple way of getting control back.
You could try getting to a serial console using the console command from the admin shell. If this doesn't work you
may need to try further steps.
2) Three-fingered salute
The easiest way out, if you can't see what's going on from the console,
or the console doesn't work for some reason, may be a straight reboot. You
can achieve this on our machines with the cad command at the
admin shell, which is equivalent to pressing Ctrl+Alt+Delete on the
keyboard. You should leave your VM a minute or two to reboot or otherwise,
and it's often instructive to see whether it stops responding to pings for a
brief period to show that it has rebooted.
You can use the status command to find out how long the
machine has been running since its last reboot; once it restarts you will
see the status go from Currently: on (12345s) to
Currently: on (2s) to show the time since the last
reboot.
Note that in order for cad to work, it must be set up in
your /etc/inittab file with the following line:
ca:12345:ctrlaltdel:/sbin/shutdown -t1 -a -h now
This tells the kernel that it should halt rather than reboot: inside the UML you should have no reason to reboot, only use halt and our management system will handle reboots. See here for the reason.
3) Emergency measures
If you cannot get a console up, cad does nothing even after
a minute or two, and you have no other way of sorting the problem out, you
can forcibly kill your kernel and get it to restart.
Type autoreboot off followed by rescuehalt at the
admin shell. This should stop the stop the kernel in its tracks and
hopefully avoid any data loss, but some processes may be confused by the
sudden disappearance of the kernel next time they start up. It is
equivalent to pulling the power, but is not quite as drastic as all that
since the host kernel will have records of any disc writes that were
pending.
If the rescuehalt command hangs for more than a few seconds pressing Ctrl+c will force it to finish.
If you type status afterwards, you should see that the
kernel is Currently: off and is thus no longer running. To
check that it will boot up okay the next time you should boot it at the
admin shell by typing rescueboot. The boot process should
proceed and take you to a login prompt, or at least a state where an ssh
daemon is visible. Once you have confirmed the system boots "manually", you
should halt it from a root prompt and then type
autoreboot on at the admin shell to resume normal operations.
If the rescueboot does not give you either an ssh daemon or
a login: prompt, you should try rescueboot /bin/bash which
skips most of the boot to get you an immediate root shell prompt. This will
work unless you've done something really drastic to your filesystem; from
here you can run fsck, edit files etc. until you are confident the system
will boot. At which point you can exit the shell and try a normal
rescueboot to get the kernel to start again, as above.
Note that when you have booted straight to a shell, your file system will be
read only! To fix this, you will want to run mount /
-o rw,remount to edit files.
4a) Call for help
If all else fails and you have tried all the above, please call us on the number at the bottom of this web page and we can almost certainly sort out your system through privileged access to the host machine. Please note that we may charge for time spent performing data recovery or system reconstruction if you have deleted critical system files.
4b) Wipe it clean
If you have just been tinkering and have no critical data to preserve,
you can always use the reimage command to wipe your filesystem
clean and start again. We do not recommend this as a solution, but it's
there if you think it's appropriate. Simply type (e.g.) reimage etch to start with a clean Debian system; you will
of course be asked to confirm.
Topics:
