Summary
The user is seeking advice on utilizing cache in a pipeline that runs locally and then in the cloud or on another machine. They note that Flyte uses diskcache
with SQLite, which is not ideal for simultaneous access across multiple machines. The user suggests potential solutions, including placing the SQLite database on an NFS share, implementing PostgreSQL support for diskcache
, or replacing diskcache
with a caching solution that supports PostgreSQL or another database. They are looking for better approaches to address this issue.
aleksei.grachev.tech
Thank you, Haytham
habuelfutuh
If you reallyyy want, the cache service has an API interface. you can replace diskcache with a version that uses gRPC to record results into the remote cache (files will need to be pushed remotely though)... and then running in the cloud will automatically leverage the cache.
aleksei.grachev.tech
Hello, everyone. I'm curious if anyone has encountered a similar use case:
diskcache
, which relies on SQLite. However, SQLite isn't ideal for simultaneous access from multiple machines and requires additional workarounds.
Here are some thoughts I've come up with:diskcache
.diskcache
with a caching solution that supports PostgreSQL or another database.
None of these solutions are perfect, so I would appreciate any suggestions for a better approach.
Thank you!