Summary
The user is new to Flyte and has workloads that include both slow-running and fast-running tasks. They are concerned that using Flyte for fast-running tasks, which require setup time like loading a neural network to GPU, may introduce unnecessary overhead compared to their previous method of using an HTTP server with FastAPI. The user is considering running an HTTP server alongside Flyte tasks to make HTTP requests but feels this approach may be redundant. They are seeking advice on the best way to handle this situation.
pim
Thanks for the pointer! That indeed looks like what we need. I think we'll evaluate Flyte first, and consider Union later
kumare
Actors are completely designed for this usecase.
kumare
<@U0805QZCYS0> you should definitely keep it within flyte, we thinking keeping it simple makes it much much better
kumare
Would you be open to talking more about this?
kumare
<@U07VB6BDE1L> if you are open to it, we at Union have built a new feature called https://docs.union.ai/byoc/user-guide/core-concepts/actors#actors|Actors , which reuses containers, can allow you to pin models to memory and can run tasks in milliseconds
pingsutw
what kind of long running task? If it’s long running computation, you could just use regular python @task to run it in a pod
g.m.verkes
Would you do the same for long running comptutations? Would you implement a separate queue for long running computations with something like rabbitmq or is there a better way and keep it within Flyte?
pim
Yeah, it's CPU (or GPU) bound
pingsutw
is your task CPU bound? if so, I think it’s better to run multiple http servers, and use agent to dispatch requests to them
pim
Or would you have the agent itself execute the requests?
pim
Thanks! So you'd then make a task for each HTTP endpoint, and have a single agent to dispatch the HTTP requests?
josh.wills
I handle these kinds of tasks using flyte agents: https://docs.flyte.org/en/latest/user_guide/flyte_agents/developing_agents.html
pim
Hi all! I'm new to Flyte. My workloads consist of slow-running tasks, for which Flyte is perfectly suited, as well as fast-running tasks. The latter tasks might need some set-up time, however, such as loading a neural network to GPU (for model inference). Before moving to Flyte, we've been using a HTTP server to execute those tasks, e.g. with FastAPI. Running these as Flyte tasks seems to add unnecessary overhead. What would be the best/idiomatic way of handling this? I'm currently thinking of running a HTTP server, and then running a flyte task to make a HTTP request. This seems a little duplicious though. Thanks!