Summary
The user supports a proposed improvement but raises concerns about the risks of YAML configurations, particularly how a faulty upload by one team could disrupt others due to configmap merging. They recommend implementing a mechanism in propeller to prevent problematic agent YAMLs from affecting the final configmap, suggesting the use of the latest known good configuration for each agent. The user advocates for webhooks or other YAML validation methods before applying configurations, emphasizing the importance of separation of concerns, decentralized management, blast radius isolation, and automated validation. They stress that automated validation is crucial, as relying on a single team for validation undermines decentralization. Additionally, they suggest using the existing config validator in Flyte and propose a systematic discovery of custom agent ConfigMaps, automated validation of these ConfigMaps, and a mechanism for aggregation and conflict resolution to ensure only valid configurations are applied without affecting others.
shuliang
Here are some potential details Iam thinking about:
Discovery of Custom Agent ConfigMaps: • Implement a mechanism that systematically discovers and collects custom agent ConfigMaps across namespaces or predefined locations. This ensures that any new custom agent configurations are properly detected and processed. Validation of ConfigMaps: • Before applying or loading any custom agent configuration, each ConfigMap should go through an automated validation process (via webhooks or other validation tools). If a ConfigMap fails validation, it would be excluded from being applied or loaded, without affecting other valid configurations. • This ensures that invalid custom agent configurations do not propagate errors or make FlytePropeller unable to start. Aggregation and Conflict Resolution: • After validation, the valid ConfigMaps should be aggregated. This aggregation should handle potential conflicts—for example, if two agents have the same name or endpoint, a mechanism should resolve or flag the conflict. • This ensures that only conflict-free and valid configurations are applied, maintaining isolation between configurations and preventing a bad agent config from affecting the entire Flyte setup. •
shuliang
> the merging happens at yaml layer It does not need to be yaml per se right? propeller should be capable to discover the configmap objects in the cluster based on some convention name or labels.
kumare
There is a config validator in Flyte- using that would be a good idea
rogert
and I remark automated, because even when config creation is decentralised if a single flyte owning team is to validate all configs this defeats the entire purpose of this idea
rogert
I guess we can use webhooks or any other yaml validation mechanism before applying the resulting big yaml. I think separation of concerns and decentralised management should also come with blast radius isolation and automated validation to prevent one single agent from making the propellers unable to start
kumare
That’s hard to do, the merging happens at yaml later
rogert
I think that’s a great improvement. My only concern would be around yaml sanity. As long as there is a configmap merge operation, one single team uploading a bad yaml can break all other teams’ config, so propeller should be given a good mechanism to prevent bad agent yamls from making it to the final configmap. Ideally propeller should use the latest known good config for every single agent, and one breaking config yaml for a given agent should be ignored and shouldn’t block other teams from updating some other agent config