How to recovery Juju environment from a copy of a controller image

DISCLAIMER: This procedure is still incomplete and is not certified by Canonical!

Scenario

Our Juju controllers have become unstable and we don’t have a recent backup. But we have copied the image disk of the controller before the mess started. Let’s try to recover Juju from this!

Copy image back to the hypervisor

Forst let’s copy the image of the controller back to the hypervisor, with a different name.

Change IP address and agent.conf

We need to change the IP address of the machine before starting it or it will have the same address of the current controller.

To do so we mount the image on the local disk with libguestfs (http://ask.xmodulo.com/mount-qcow2-disk-image-linux.html):

apt-get install libguestfs-tools
mkdir ~/localmount
guestmount -a /mnt/vm/ct1-juju2-04-old.qcow2 -m /dev/sda1 ~/localmount

Now we can edit ~/localmount/etc/network/interfaces to change the IP address.

Also edit ~/localmount/var/lib/juju/agents/machine-XXX/agent.conf setting the new IP address in the apiaddresses section.

When done, unmount the image:

guestunmount ~/localmount

Define new VM using the existing image

Using virt-manager we now define a new VM that use the existing image.

Reconfigure mongodb

Log on the machine. We need to reconfigure mongodb as it points to the old controllers. Indeed, we note that this node has been removed from the mongo repliaset as its address is not known.

We follow the instructions here:

https://docs.mongodb.com/v3.0/tutorial/reconfigure-replica-set-with-unavailable-members/

Define this Bash function to log in to mongo shell:

# cat mongo.sh
  dialmongo() {
    agent=$(cd /var/lib/juju/agents; echo machine-*)
    pw=$(sudo cat /var/lib/juju/agents/${agent}/agent.conf |grep statepassword |awk '{ print $2 }')
    /usr/lib/juju/mongo3.2/bin/mongo --ssl --sslAllowInvalidCertificates -u ${agent} -p $pw localhost:37017/juju --authenticationDatabase admin
   }

Log in to mongo shell:

# dialmongo

and in the shell type:

cfg = rs.conf()
cfg.members[0].host = "$NEW_IP_ADDRESS:37017"
cfg.members = [cfg.members[0]]
rs.reconfig(cfg, {force : true})

The node should switch to PRIMARY as it is the only one in the configuration.

Juju client setup

Install a new Juju client; then copy from the original client the directory:

/root/.local/share/juju

Edit the file /root/.local/share/juju/controller.yaml, replace the addresses in the variable api-endpoints with the IP address of the new machine.

Now we should be able to communicate with the controller and make a juju create-backup.

TODO: I have tried to restore the backup but Juju complains as the machines of the original controller are still active. We should try marking them broken in maas, after cloning the containers.