Enforcing AWS ElastiCache (Valkey) Best Practices with Kyverno + Crossplane + GitOps


In my
previous post on managing AWS ElastiCache (Valkey) clusters with Crossplane and GitOps
, I showed how we can stand up clusters entirely through
declarative YAML. One of the biggest wins of that approach is that everything becomes code.
That means we can encode our company’s best design patterns as policies and automatically apply them across every cluster—past, present, and future.

Why Kyverno?

There are many ways to validate and enforce Crossplane resources. We chose Kyverno because it speaks Kubernetes-native YAML, it’s easy to read,
and it supports both soft (audit) and hard (enforce) modes. Our rollout strategy:

  1. Start in Audit mode to see which existing stacks violate standards without breaking deploys.
  2. Fix the drift (massage the clusters/stacks) until everything passes.
  3. Flip to Enforce to block future non-compliant changes.

If you’re new to Crossplane and want to understand how we provision Valkey clusters declaratively,
I recommend reading the
step-by-step Crossplane + GitOps guide here
.

Design Choice: One Validation per Rule

When we first built our cluster policy, we wanted a simple way to surface a clear, actionable list of problems.
We landed on a structure where we can have multiple rules, but each rule performs exactly one validation.
If we need another validation, we create another rule. This gives us:

  • Custom error messages that read like a to-do item for developers.
  • Cleaner visualization in dashboards (see Policy Reporter below).
  • Modular maintenance—toggle, tune, or extend rules independently.

Visualizing Compliance with Policy Reporter

We tested a few UIs and found the Policy Reporter UI makes it easiest to review policy findings.
It’s effectively a categorized to-do list by policy, namespace, and resource. Developers (and DBAs) can quickly drill into
just the ElastiCache items and see exactly what needs to be fixed.

Helpful Policy Annotations

We use Kyverno’s annotations to improve grouping and filtering in reports:

metadata:
  name: replicationgroup-policy
  annotations:
    policies.kyverno.io/title: ReplicationGroup Policy
    policies.kyverno.io/category: ElastiCache
    policies.kyverno.io/severity: medium
    policies.kyverno.io/subject: ReplicationGroup Crossplane

With a dedicated category (e.g., ElastiCache), DBAs can filter the Policy Reporter UI
to the most relevant findings for them.

Full Example: ClusterPolicy for Crossplane ReplicationGroup

Below is a simple, end-to-end example that validates several core standards. We start in Audit mode
to gather findings without blocking deploys. After all clusters pass, switch to Enforce to prevent non-compliant changes going forward.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: replicationgroup-policy
  annotations:
    policies.kyverno.io/title: ReplicationGroup Policy
    policies.kyverno.io/category: ElastiCache
    policies.kyverno.io/severity: medium
    policies.kyverno.io/subject: ReplicationGroup Crossplane
    policies.kyverno.io/description: "Validates Crossplane ReplicationGroup resources for ElastiCache to ensure compliance with organizational standards, security requirements, and cost optimization guidelines. This includes enforcing approved instance types, encryption requirements, and other configuration standards. The resource for the policy can be found at https://github.ancestry.com/infrastructure/containers-applicationbases/tree/master/applications/kyverno/templates/deploy/policies"
spec:
  validationFailureAction: Audit
  background: true
  rules:
    - name: validate-replication-group
      match:
        any:
          - resources:
              kinds:
                - elasticache.aws.upbound.io/v1beta2/ReplicationGroup
      validate:
        message: "ReplicationGroup validation failed: Instance type must be in the r6g family. Current instance type is '{{ request.object.spec.forProvider.nodeType }}'"
        pattern:
          spec:
            forProvider:
              nodeType: "cache.r6g.*"

    - name: validate-encryption
      match:
        any:
          - resources:
              kinds:
                - elasticache.aws.upbound.io/v1beta2/ReplicationGroup
      validate:
        message: "Encryption at rest must be enabled"
        pattern:
          spec:
            forProvider:
              atRestEncryptionEnabled: true

    - name: validate-transit-encryption
      match:
        any:
          - resources:
              kinds:
                - elasticache.aws.upbound.io/v1beta2/ReplicationGroup
      validate:
        message: "Encryption in transit must be enabled"
        pattern:
          spec:
            forProvider:
              transitEncryptionEnabled: true

    - name: validate-management-policy
      match:
        any:
          - resources:
              kinds:
                - elasticache.aws.upbound.io/v1beta2/ReplicationGroup
      validate:
        message: "ReplicationGroup is still in Observe mode and not being managed by Crossplane"
        pattern:
          spec:
            managementPolicies: "!Observe"

Switching from Audit to Enforce

Once your dashboards show green across the board, flip the policy to hard enforcement by changing a single field:

spec:
  validationFailureAction: Enforce

Developer Experience Tips

  • Name rules by intent (validate-encryption, validate-transit-encryption, etc.) so it’s obvious what failed.
  • Write messages like tickets—tell the developer exactly what to fix and (if useful) echo the current value using variables like {{ request.object.spec.forProvider.nodeType }}.
  • Keep rules atomic (one validation per rule) for better UX in Policy Reporter and simpler maintenance.
  • Batch remediation by category (e.g., all ElastiCache issues) so DBAs and app teams can focus on what they own.

Wrap-Up

With Crossplane defining ElastiCache (Valkey) as YAML and Kyverno validating those definitions, we get a clear, automated path to standardization.
Start in Audit to learn, remediate the drift, then move to Enforce for durable guardrails—no more snowflake clusters.

If you haven’t seen how we provision these clusters in the first place, check out the companion post:

How to Manage AWS Valkey Clusters with Crossplane and GitOps
.

Leave a Reply

Your email address will not be published. Required fields are marked *