Summary
The user is looking for a solution to create a FlyteFile using an Azure Storage account blob SAS URL, which Flyte misinterprets as a directory. They request workarounds, clarification on feature availability, and examples, mentioning a specific pull request. The user suspects the issue stems from two bugs in Flytekit (fsspec) rather than a lack of support. They suggest involving Yee, who has a local solution but is cautious about its broader implications. The user believes that using workload identity and setting remoteData
to signedUrls:false
might help resolve the issue. They note they are not in production and have a limited use case but have successfully implemented workload identity with Azure blob storage for common Flyte workflows.
srale
Thanks for the info :) I will take a look into it and get back to you if this doesn't work for our use case
chris.grass
<@U06V6CQTKL6> the long SAS has been an open issue for a long time because of some library complexities. have you tried https://azure.github.io/azure-workload-identity/docs/introduction.html|workload identity? feel free to reach out if you have any questions about implementation
srale
Hi all <#C05315T4K5K|> :slightly_smiling_face: We want to use Azure Storage account blob SAS url to create a FlyteFile. The problem with this, is that the FlyteFile maps the whole file path + the sas in the url as the file name. This means that Flyte sees the url as a directory and not a file. Is there a workaround for this, or is this feature missing? Thank you in advance
kumare
<@UNR3C6Y4T>
chris.grass
we aren't running in prod and have a limited use case, but we have workload identity + azure blob store working for common flyte workflows
chris.grass
using workload identity and "remoteData
is configured to set signedUrls:false
" should be enough to bypass the issue
david.espejo
right <@U05QG8SE2LA> Is this also a limitation even if <@U06V6CQTKL6> used Workload Identity instead of SAS tokens for storage account access?
chris.grass
We might want to pull Yee into the conversation since he was looking at the python fixes. iirc, he had a local solution but was concerned about its implications for other use cases
chris.grass
as mentioned in the flyte golang pr, i don't think the lack of support for that endpoint is the fundamental problem here. i think the two flytekit (fsspec) bugs are actually to blame for the behavior <@U06V6CQTKL6> is seeing. https://github.com/flyteorg/flyte/issues/4701 https://github.com/flyteorg/flyte/issues/4700
chris.grass
sorry, i have been out of the flyte ecosystem for a while now. expected to get back in this week or next, so this is good timing. give me a little while to catch up though
david.espejo
Not sure, for now I'm deferring to <@U05QG8SE2LA> to validate what's missing to merge
srale
Yes, it would :slightly_smiling_face: When do you think this could be available?
david.espejo
<@U06V6CQTKL6> would <https://github.com/flyteorg/flyte/pull/4629|this PR> cover what you intend to do?
srale
It seems that it interprets the query from '/tmp/flytecdmj730u/local_flytekit/ec7c7207da04cd009680ed0636b3277e/a.txt?sp=r&st=2024-09-17T14:10:24Z&se=2024-11-01T23:10:24Z&spr=https&sv=2022-11-02&sr=b&sig=A2WINhWtCSfJNJ8sdqodQJrxKoNjz%2FGfmHUln1VlDf4%3D'
, so just ?sp=r&st=2024-09-17T14:10:24Z&se=2024-11-01T23:10:24Z&spr=https&sv=2022-11-02&sr=b&sig=A2WINhWtCSfJNJ8sdqodQJrxKoNjz%2FGfmHUln1VlDf4%3D
as part of the file name (perhaps because it doesn't end with the file extension) and flyte assumes it's a directory
kumare
Now I get it - yes if you are using http (signed url) it will use http protocol and that cannot download directories only files. But weirdly the path looks like a dir. this seems we should default to assuming a file and proceeding
srale
Shared Access Signature - https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview
srale
Running a simple workflow like this:
from flytekit import task, workflow, Resources
from flytekit.types.file import FlyteFile
import os
@task(_requests_=Resources(_cpu_="1", _mem_="1Gi"), _limits_=Resources(_cpu_="2", _mem_="1Gi"))
def normal_task(_sas_: str) -> str:
new_sas = FlyteFile.from_source(_sas_)
with open(new_sas, "r") as f:
text = f.read()
return text
@workflow
def wf(_sas_: str) -> str:
normal_output = normal_task(_sas_=_sas_)
return normal_output
kumare
We would need examples and pointers - I do not follow
kumare
Sorry what is sas?