Summary
The user is facing an issue with the .flyteignore
file not functioning in --copy auto
mode while using pyflyte run --remote ...
, as it only works in all
mode. This results in an unexpectedly large targz file size of ~100MB instead of ~100kB due to the project's legacy layout. The user's virtual environment folder is adjacent to the project code, causing it to be included in the packaging process, which they typically manage with ignore files. The .venv/...
folder is visible when using the -v
option. Moving the virtual environment would require significant changes to the existing project tooling. Additionally, the user cannot spend more time on the current PR and is unsure about the unrelated CI test failures.
bennett.h.rand
Sorry, I can't spend any more time on this PR right now, and I'm unsure why the CI is failing. (tests that are failing seem unrelated to my changes)
bennett.h.rand
According to the test I've written, it seems like my bugfix isn't working correctly in Windows. I may be able to get a proper Windows development environment running in several hours so I can debug it properly, instead of waiting for the CI to fail.
ytong
thank you. if you can construct a unit test without too much time that would be much appreciated
bennett.h.rand
ytong
thank you thank you
bennett.h.rand
Yes, I can.
ytong
got it. would you mind making a pr out of this?
bennett.h.rand
<https://github.com/BennettRand/flytekit/commit/6e556091d0a7cc88a13991d4361fe9a2e398fe8d|This change> seems to make fast-serialization work as expected in my and my coworkers' environments
bennett.h.rand
['/home/bennettrand/tlaloc/python/.venv/lib/python3.10/site-packages', '/home/bennettrand/tlaloc/python/.venv/local/lib/python3.10/dist-packages', '/home/bennettrand/tlaloc/python/.venv/lib/python3/dist-packages', '/home/bennettrand/tlaloc/python/.venv/lib/python3.10/dist-packages']
>>> os.path.commonpath(site.getsitepackages())
'/home/bennettrand/tlaloc/python/.venv'
>>> os.path.commonpath(site.getsitepackages()) in set(site.getsitepackages())
False```
bennett.h.rand
I think I've found that site.getsitepackages()
is returning a list of paths that combine to a os.path.commonpath
that does not exist in the site packages list, so if os.path.commonpath(site_packages + [mod_file]) in site_packages_set
is always false.
ytong
mind going through that logic and seeing where it’s failing for you?
ytong
but that should already be taken care of https://github.com/flyteorg/flytekit/blob/3475ddc41f2ba31d23dd072362be704d7c2470a0/flytekit/tools/script_mode.py#L212-L214
bennett.h.rand
Yeah, I'm testing things that way and by grabbing and inspecting the uploaded fast-serialized file from cloud storage.
The specific issue is that my team's virtual environment folder is store directly alongside the project code, so that whole folder is packaged with --copy auto
, typically something we've relied on ignore files to deal with. And -v
makes is very obvious when a bunch of the file trees start with .venv/...
.
Unfortunately, all the pre-existing project tooling relies on the virtual environment being there, so moving it would be a whole project itself.
ytong
did you try verbose mode to list out all the files that it’s bundling?
ytong
this was the intended behavior yeah, it was assumed perhaps incorrectly that if you loaded a python file, that it would need to be included. because if not, presumably when you go to run the task, the container will attempt to load it again. and if it doesn’t find it it’ll fail
bennett.h.rand
Due to some legacy situations in how the project is laid out, this results in the targz being ~100MB large, as opposed to ~100kB.
bennett.h.rand
Hello! I'm using pyflyte run --remote ...
and everything is working, but I notice that my .flyteignore
file is not being obeyed in --copy auto
mode, only all
. Is this the intended behavior? The function that lists the imported files <https://github.com/flyteorg/flytekit/blob/master/flytekit/tools/script_mode.py#L116|isn't being passed the ignore_group object>.