Poetry v1.3 broke my pipeline

poetry install abruptly exited with Error code 1 in Buildkite but you cannot reproduce it on your local machine?! It took my team around half day to track down the root cause and I hope this post can save you some time here.

The screenshot for the odd peotry output

For your information, we use docker to build our images for the production environment and all tests are executed inside the docker container in Buildkite. This gives us the confidence if we have an issue in our CI/CD pipeline, we can reproduce it locally.

Apparently, we take this too far, variety of base OS (Linux for Buildkite, macOS for local) plus other hardware disparity could easily put us off. And this time, it's TTY.

Normally when you run a test locally, we tend to use an interactive shell whereas in the pipeline it is normally discouraged for sake of performance and cost. It's useless as well given it's uncommon for developers to connect to a build machine and give the build extra input. However, this difference can mask issues that can be exposed easily and earlier.

We initially observed an inconsistent build outcome. Due to unpinned Poetry version. Everything is fine before Poetry v1.22. But once the docker cached layer expired and the latest Poetry kicked in, you'll see a broken poetry install. It's so sudden that even if you turn on -- verbose, there's no extra insight you'll get around the stack trace.

This issue has been recorded here. Essentially, the stdlib method used in cleo acts differently in the newer version and it's sneaked into Poetry v1.3 without being caught by tests. Who will come up with a test case like that?

Anyway, if you meet the same issue and want to avoid this issue in your CI/CD pipeline, please

use poetry --quiet or poetry --no-ansi if you still want this Poetry version
pin your Poetry to version v1.2.2

Discussion time Considering we treat this as a surprise, what will you recommend to remediate similar issues in the future? For example, do you always recommend pin Poetry and any other relevant tools in the CI/CD pipeline for a consistent build outcome or will you advocate a fast fail approach so that problems can be spotted earlier? Leave your thoughts in the comment section because I'm keen to know your opinion!