Summary
The user is looking for best practices to track costs for individual workflows in cloud providers, particularly regarding Kubernetes bin packing. They have aggregate cost data for Flyte but want more detailed tracking. The user notes that Flyte tags Kubernetes resources with relevant task and workflow information and suggests deploying a monitoring solution to collect resource usage data. They mention the need to translate allocated resources into cost estimates per node, considering that multiple pods from different workflows may share nodes. The user is open to further discussion and mentions the existence of open-source tools that could assist with this process.
james.cohen
thanks for the detailed response
john657
<@U07SJCNF60Z> Flyte already tags k8s resources (pods) with the information about the task/workflow/namespace with which they are associated. From there you'd want to deploy a monitoring solution (prometheus, datadog, etc) that collects allocated memory, CPU, etc. Then you need to run the translation between those allocated values and an estimate of the cost per node. Some decisions are involved in translating compute node costs to pods, since multiple pods from unrelated workflows can run on the same node. There are a handful of OSS tools that can help with this I think
james.cohen
Hey <@U049GPARFSQ>, thanks for taking the time to respond. I was the one that originally had this question. Basically, I just want to know the best practice for tracking costs across workflows. Right now we just have the aggregate cost for all flyte usage, but we would like a bit more granularity. Happy to chat if necessary
john657
Hey <@U06PDL7UAL9>, Flyte does ship with a series of Grafana dashboards that provide some metrics to show overall size/utilization of the cluster, but not explicit cost info.
Cost observability is a feature under active development in Union. As you said, per-workflow/per-execution cost will ultimately be an estimate, but we can get closer than aggregate node uptime.
I would love to grab a few minutes with you to understand your current user flow (some other AV folks have a process of estimating cost per scene and I would like to learn how you think about this). Down for a chat?
yisheng034
morning. how do folks do cost tracking on the cloud provider for per workflow or even per workflow execution basis. I understand it's difficult to do so because of kubernetes bin packing. we've been able to get aggregate costs by having flyte only schedule pods on flyte specific nodes, but wanted to hear if more granularity is possible