Technical Notes: How to remove juju application in error state
I have been playing with juju for application deployment recently as parts of my current job. Most of the time, I just need to `juju deploy application` and JuJu gives me all settings, no matter what cloud I am using. However, in some cases, juju refuses to follow my orders, which makes me really frustrasted.
For example, the following deployment will definitely fail because grafana charm version 24 was not updated to changes in upstream grafana.
$ juju deploy grafana-24
The system will stuck with following juju status
When juju stucks at this stage, due to the hook error, we could not remove application. Worse, if you try `juju remove-application grafana` without --force or --no-wait flags, any subsequent command will also fail. In other word, the application refuses to be removed, until you resolve its internal errors.
My colleagues suggests 3 ways to resolve this issue in this case.
1. Resolve the issue and `juju resolve grafana/0`
2. Perform an operational hack 1: change hook script to a bash that always returns normal status code (exitcode 0).
3. Perform an operational hack 2: `juju debug-hook grafana/0`, wait for the hook context loaded, and exit immediately. This will send the positive feedback to juju controllers, let the controller perform the next action (which is our removal command).
I believe that users should be able to remove application every time they want, no matter what problem is occurring. It seems that JuJu developers thought the same so they triaged several related bugs. However, up to the writing of this blog, the issue has not been fixed yet (my version 2.6.5-bionic-amd64 still has the issue). Until the issue is fixed, there is no way but to really resolve the issue or to "hack" it.
For example, the following deployment will definitely fail because grafana charm version 24 was not updated to changes in upstream grafana.
$ juju deploy grafana-24
The system will stuck with following juju status
$ juju status Model Controller Cloud/Region Version SLA Timestamp experiment stark-kvm stark-kvm 2.6.5 unsupported 16:53:31+09:00 App Version Status Scale Charm Store Rev OS Notes grafana error 1 grafana jujucharms 24 ubuntu Unit Workload Agent Machine Public address Ports Message grafana/0* error idle 0 10.4.23.120 hook failed: "install" Machine State DNS Inst id Series AZ Message 0 started 10.4.23.120 tidy-tick bionic default Deployed
When juju stucks at this stage, due to the hook error, we could not remove application. Worse, if you try `juju remove-application grafana` without --force or --no-wait flags, any subsequent command will also fail. In other word, the application refuses to be removed, until you resolve its internal errors.
My colleagues suggests 3 ways to resolve this issue in this case.
1. Resolve the issue and `juju resolve grafana/0`
2. Perform an operational hack 1: change hook script to a bash that always returns normal status code (exitcode 0).
3. Perform an operational hack 2: `juju debug-hook grafana/0`, wait for the hook context loaded, and exit immediately. This will send the positive feedback to juju controllers, let the controller perform the next action (which is our removal command).
I believe that users should be able to remove application every time they want, no matter what problem is occurring. It seems that JuJu developers thought the same so they triaged several related bugs. However, up to the writing of this blog, the issue has not been fixed yet (my version 2.6.5-bionic-amd64 still has the issue). Until the issue is fixed, there is no way but to really resolve the issue or to "hack" it.
Comments