Konubinix' opinionated web of thoughts

Gitops

Fleeting

Use a git repository and only this repository to store the state of the infrastructure1, 2, 3. Very close to IaC, but not necessarily used together. Because you already know git, you don’t need another tool to:

  • view diffs and logs on the state of the infrastructure,
  • create workflow, like pull request to ensure proper validation before getting into production,

gitops tools check the diff between the git state and the live state at regular interval, making sure to solve the “configuration drifts” and adjusting the live state to make it reflect the git state.

You have a code repository within which the dev can do whatever they want. Then, you have the infrastructure repository with much more control. There can be automated PR4 from the CI of the dev repository to the ops repository, but manual control is often preferred.

It is one implementation of devops, where a SCM is the sole interface between dev and ops5 (see the misconception of making the dev and ops know all the things of DEV + OPS). The ops setup the gitops tooling and the infra repository and the dev commit stuff in it.

It is more a way of thinking rather than a product or a tool6. Chances are you will have to create your own worflow depending on what your needs are.

Using a single repository to store the code and the infrastructure state, as it is often suggested in the 1017 is to me a bad idea. With infrastructure state commits, the git history will quickly become messy. Also, both the code and the infrastructure have very different responsibilities and it seems counter intuitive to think of mixing them. In my mind, gitops started to make sense only when I discovered that it was about having (at least) two separate git repository: one for the code, the other for storing the state. This revelation happened quite late.

I really got interested in gitops when my company started investigating deployment strategies, we quickly realized that we would need much more control and auditing than simply running helm upgrade in the continuous automation. In particular, blue/green and canary need a clear understanding of what was/is and will be put in the clusters. Rollback will be easier as well.

Therefore, in testing, helm upgrade becomes git clone ; copy ; git commit ; git push and putting in production basically is the same thing.

There are several tools implementing gitops. I was led to focus on spinnaker, argocd and flux. spinnaker is likely to be useful for very big industrial projects, but several hints nudged me to avoid it.

I first tried flux, that was a very good way to be introduced to gitops. I less than 5 minutes, I could have a working setup. It also by default suggests to use flux to manage flux. This is a dogfooding quality I appreciate.I played a lot with it and eventually dropped it for two reasons (written on [2025-05-16 Fri]):

The first is the lack of interface to easily understand what is happening. The power of gitops lies into its controller doing the job well, and your resources being appropriate. A lot can go wrong in between the git push and the change taking place. I could quite easily make flux hang for seemingly no reasons, and I had to spend more time than I expected debugging flux itself. I tried capacitor, but it was not much more than an web view of stuff I could already comfortably consult with the flux cli.

The second and the real no-go is the lack of orphan detection. I was very disappointed to find out that I could add pods in the namespace of my application without flux noticing. There are a few discussions on the matter8, but that’s all.

Trying things with “flux create” before committing is a good thing (in my mind), for it provides a small feedback loop and avoids polluting the git history with “wip: trying this”, “oups, forgot that”9. Because of the lack of orphan detection, it is more than easy to forget about a “flux create” and having a testing cluster that works great for bad reasons.

Then, I tried argocd. It comes with too much bells and whistles to my taste, but at least it provided something to deal with both points. Its interface is very good. I often could see at a glance what went wrong thanks to it. On the other hand, its documentation focuses a lot on the interface. I definitely prefer the documentation of flux that helps better understanding gitops (to my mind).

Its support for orphaned detection is not ideal10, bit at least there is something that I can work with.

Like flux, it eases creating argo resources (with the UI), even though the custom resources themselves are easy to work with anyway.

Both flux and argocd come with side projects (respectively flagger and argo rollout) to deal with dynamic deployments, like canary. I did not test them yet.

Retrospectively, I would advice to follow the same path I took: Start with flux, with a very nice and simple documentation, so that you can better understand the feeling of gitops. Even struggle a bit, trying to install broken helm charts. And when things get more serious, move to argocd, to get orphaned resources detection and the nice UI, despite the denser and a bit less intuitive documentation. If I was more fluent in gitops, I would have rather contributed to flux than look for an alternative.

configuration drifts

remove the configuration drift

push based or pull based?

In ALL the videos I have watched about gitops, they indicated that the main source of configuration drift is when someone gets into the system and makes some changes manually.

They indicate that this explains the need to have a pull based model, that will restore the system automatically without waiting for the next CI trigger to run.

Yet, I think that if the ops can easily reproducibly run the push based command, they would be nudged into using it rather than going and fixing stuff live.

To me, the incentive to modify the env in live is a symptom of the ci automation drift and should be addressed instead of adding a tool to hide it.

Doing so, the person running the command should still think about pushing per change. Right? So that makes the pull model more secure?

I don’t think so. If someone went to the prod to fix something manually, I think that this person would not accept that the fix would be reverted automatically after 3 minutes and therefore the fix reverted as well. In that case, it seems sensible to bet that this person will also disable the pull based automation. Thus, to be honest we should consider that this person has the same probabilities of forgetting to enable the pull based flow that the one that forgot to push per changes in the previous example.

In that case, forgetting to push the fix won’t disable the automatic flow, while in the pull based flow, someone would need to explicitly check and realize that it was disabled.

Also, in the push based flow scenario, the person doing the fix would be more likely to see the dirty git status and push per change in the future, whereas in the pull based scenario no trace of this change would remain.

In conclusion, I think that we tend to overestimate the value in the pull based model and underestimate the one of the push based one. Both are great and have great advantages and the awesome aspects of the pull based flow should not shadow the great aspects of the push based one.

dealing with secrets

We don’t want the secrets to lie around indefinitely in some git history, that may well get into the wrong hands in some distant future.

I could find a lot of documentation explaining how sealed secrets and sops work, but very few descriptions of use cases showing how practical it is.

argocd provide a very high level description of some pros and cons of general methods11, flux goes a bit more into the details12.

First, I think one might think about what is the source of truth for the secrets.

Let’s first assume the it is the git repository.

In that case, some team member puts a secret in the repository, encrypting it using the public key. Then, only the cluster may be able to get the secret in plain text. If one team member needs the secret for a side workflow (like accessing the database), that person should get access to the secret in the cluster.

Because this is the source of truth, its secrets need some extra care not to be lost.

In case the cluster needs to be destroyed, one would need to back the sealed-secret secret up and restore it in the new cluster. As a matter of fact, all the secret, including the cluster one, would be backed-up if you already have a backup solution in place, like velero, so that might not be that problematic. It asks the question thought, of whether those backup might be encrypted as well, like using a ssss secret share by the team or something.

Then, when moving from cluster to cluster (like testing, then staging, then prod), we have to duplicate those, needing several operations, like extracting the secrets from a cluster to put in the other one. For shared secrets, one must think of updating them in all the configurations. I don’t see any issue in automating this, but so far, I did not find any tool already available to do it.

Now, in case one already uses a secret manager, like aws secret manager, hashicorp vault or bitwarden, they should be put in the cluster via some automation mechanism.

Using the external secrets operator would be tempting, as one would only need to configure the secret access. But it feels like a threat to provide access from an hot system to a secret store. A lot of things could go wrong. One might prefer that the entity accessing the secret store does it out of the part dealing with the ingress and possible attacks.

On the other hand, some automation could setup the sealed secret and sops. This collides with one hypothesis of cryptography: the generation of an encrypted data should not be deterministic, to avoid guessing attacks. For that reason, if the automation simply generates all the encrypted data, then every time git would show some difference. One might accept having some way to tell whether a secret has changed or not, ideally using a hmac to avoid the guessing attacks. There is a discussion in sealed-secret about that suggestion but nothing tangible got out of it. It is unclear to me, though, when this automation would take place. It’s place is obvious in the testing cluster, because it reflects the state of automatic git commit after the build of the new images. This automatic git commit can also automatically commit the new versions of the secrets. For the production cluster though, where a manual commit is needed to acknowledge the deployment, it might be easy to have a drift between what is in the secret store and what is in the repository. One might end up being puzzled to see some behavior in the cluster not coherent with what one sees in the secret store, forgetting that the source of truth of the cluster is in fact the repository.

Finally, one might simply want to install the secret automatically, without using git at all. In that case, we loose a bit of the auditing promise of gitops, as something might go wrong because of a change of secret that would not be traced.

In the end, it appears to me that there is definitely no silver bullet.

My personal preference would be to use git as the single source of truth with some tooling to ease migrating secrets from cluster to cluster.

But, because I currently work in an organisation using a central secret store, I would go to the solution suggested in the sealed secret discussion, store them in git automatically, with a way to identify the changes.

when already having a source of truth

https://github.com/bitnami-labs/sealed-secrets/issues/376

with sealed secrets

generates a (private key/public key) in the production cluster. You have to export the public key to encrypt the secrets in the git repository.

Cloud Native Live: Crossplane - GitOps-based Infrastructure as Code through Kubernetes API - YouTube

  • External reference:

    talks about the fact that gitops is mostly about using git and only git as single interface (and source of truth) for all ops related stuff.

    He uses crossplane to specify the whole infrastructure in kubernetes manifest. The stuff we could write in terraform and run manually can be declared as CRD in kubernetes, automatically synced using argo CD from a git repository and automatically applied using crossplane.

with kubernetes

storing manifests in git is not enough to claim doing gitopsh

one repo for deploy and dev

Demonstrations show gitops with argo CD in a way that supposedly show its advantages.

But what I see is a new automatic commit for every push in the main branch, which is less than ideal.

To me, this makes sense when they the main branch contains only the commits to deploy, or if this repository is not dedicated to dev. Otherwise, the dev commits will be drawn in the noise of automatic commits.

difference between dev repo and deploy repo

Some resources make clearer that there are two kinds of repos with different commit habits and merge rules

By listening to those, I understand that that gitops is about having one git repository as database for storing the desired state of the ops, because git has the characteristics that we like : auditability, fine control, immutability, easy to rollback.

gitops tools are focused on looking at this git repository and applying this desired state. It is like real time infrastructure as code.

To do so, we need a separate deploy repository, because security rules, the use of the commit history (a lot of automatic commits in the deploy repo) will not be the same at all.

Using the same repository for both will most likely pollute the dev repository with a lot of automatic commit and make it barely readable.

Notes linking here


  1. using Git as the single source of truth for system state

    https://www.harness.io/blog/gitops

     ↩︎
  2. d’une bonne pratique DevOps qui s’appuie sur Git en tant que source de référence unique et que mécanisme de contrôle pour créer, mettre à jour et supprimer l’architecture du système

    https://www.atlassian.com/fr/git/tutorials/gitops

     ↩︎
  3. GitOps repose sur l’utilisation de référentiels Git comme unique source de vérité pour distribuer l’infrastructure en tant que code

    https://www.redhat.com/fr/topics/devops/what-is-gitops

     ↩︎
  4. pratique consistant à utiliser les pull requests Git pour vérifier et déployer automatiquement les modifications de l’infrastructure système

    https://www.atlassian.com/fr/git/tutorials/gitops

     ↩︎
  5. takes DevOps best practices used for application development such as version control, collaboration, compliance, and CI/CD, and applies them to infrastructure automation

    https://about.gitlab.com/topics/gitops/

     ↩︎
  6. GitOps is not a single product, plugin, or platform. There is no one-size-fits-all answer to this question, as the best way for teams to put GitOps into practice will vary depending on the specific needs and goals of the team

    https://about.gitlab.com/topics/gitops/

     ↩︎
  7. using a dedicated GitOps repository for all team members to share configurations and code

    https://about.gitlab.com/topics/gitops/

     ↩︎
  8. https://github.com/fluxcd/flux2/discussions/4084 ↩︎

  9. https://xkcd.com/1296/ ↩︎

  10. https://github.com/argoproj/argo-cd/issues/7418 ↩︎

  11. https://argo-cd.readthedocs.io/en/stable/operator-manual/secret-management/ ↩︎

  12. https://fluxcd.io/flux/guides/sealed-secrets/#gitops-workflow and https://fluxcd.io/flux/guides/mozilla-sops/#gitops-workflow ↩︎