How to restart a non-responsive Juju controller

In this scenario the Juju CLI is not responsive. The juju status command just hangs, and we want to restart the controller, but we don’t know its address/location.

Follow these steps:

  1. Look into the controllers.yaml file:

    ~/.local/share/juju/controllers.yaml
    

    and note the IP addresses in api-endpoints.

  2. Try to ping one of the IP addresses (e.g. 10.2.1.42), and note the TTL:

    $ ping -c1 10.2.1.42
    PING 10.2.1.42 (10.2.1.42) 56(84) bytes of data.
    64 bytes from 10.2.1.42: icmp_seq=1 ttl=62 time=10.7 ms
    
    --- 10.2.1.42 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 10.731/10.731/10.731/0.000 ms
    

    A TTL of 62 means that we are two hops away from the controller (ping sets the initial default TTL to 64).

  3. Juju controllers are in our deployment usually located on virtual machines. Log in to a virtual machine hypervisor and try to ping again:

    $ ping -c1 10.2.1.42
    PING 10.2.1.42 (10.2.1.42) 56(84) bytes of data.
    64 bytes from 10.2.1.42: icmp_seq=1 ttl=64 time=0.182 ms
    
    --- 10.2.1.42 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 0.182/0.182/0.182/0.000 ms
    

    This time the TTL is 64, thus the controller virtual machine could be on the selected hypervisor.

  4. Get the associated MAC address from the arp cache:

    $ ip neigh show | grep 10.2.1.42
    10.2.1.42 dev br-box lladdr 52:54:00:fd:42:42 DELAY
    
  5. Get the list of running virtual machines with virsh list and for each virtual machine try:

    $ virsh dumpxml <VM name> | grep 52:54:00:fd:42:42
    

    Until the MAC address is matched with a virtual machine. If no VM matches the MAC address then try with a different hypervisor.

  6. Once the virtual machine with the target IP address has been located, it can be restarted:

    $ virsh stop <VM name>
    $ virsh start <VM name>
    

    and the juju status command on the Juju CLI can be tried again.