F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

Issue with .flyteignore in pyflyte

Summary

The user is facing an issue with the .flyteignore file not functioning in --copy auto mode while using pyflyte run --remote ..., as it only works in all mode. This results in an unexpectedly large targz file size of ~100MB instead of ~100kB due to the project's legacy layout. The user's virtual environment folder is adjacent to the project code, causing it to be included in the packaging process, which they typically manage with ignore files. The .venv/... folder is visible when using the -v option. Moving the virtual environment would require significant changes to the existing project tooling. Additionally, the user cannot spend more time on the current PR and is unsure about the unrelated CI test failures.

Status
resolved
Tags
  • 100MB
  • 100kB
  • Packaging
  • Workflow
  • flyte
  • all
  • .venv
  • File Handling Issue
  • Bug
  • pyflyte run
  • User
  • --copy auto
  • File Management
  • Developer
  • pyflyte run --remote
  • .flyteignore
  • Question
  • Developer Help
  • pyflyte
  • Support Request
  • Bug Report
Source
#ask-the-community
    b

    bennett.h.rand

    11/27/2024

    Sorry, I can't spend any more time on this PR right now, and I'm unsure why the CI is failing. (tests that are failing seem unrelated to my changes)

    b

    bennett.h.rand

    11/14/2024

    According to the test I've written, it seems like my bugfix isn't working correctly in Windows. I may be able to get a proper Windows development environment running in several hours so I can debug it properly, instead of waiting for the CI to fail.

    y

    ytong

    11/12/2024

    thank you. if you can construct a unit test without too much time that would be much appreciated

    y

    ytong

    11/11/2024

    thank you thank you

    b

    bennett.h.rand

    11/11/2024

    Yes, I can.

    y

    ytong

    11/11/2024

    got it. would you mind making a pr out of this?

    b

    bennett.h.rand

    11/7/2024

    <https://github.com/BennettRand/flytekit/commit/6e556091d0a7cc88a13991d4361fe9a2e398fe8d|This change> seems to make fast-serialization work as expected in my and my coworkers' environments

    b

    bennett.h.rand

    11/7/2024
    ['/home/bennettrand/tlaloc/python/.venv/lib/python3.10/site-packages', '/home/bennettrand/tlaloc/python/.venv/local/lib/python3.10/dist-packages', '/home/bennettrand/tlaloc/python/.venv/lib/python3/dist-packages', '/home/bennettrand/tlaloc/python/.venv/lib/python3.10/dist-packages']
    &gt;&gt;&gt; os.path.commonpath(site.getsitepackages())
    '/home/bennettrand/tlaloc/python/.venv'
    &gt;&gt;&gt; os.path.commonpath(site.getsitepackages()) in set(site.getsitepackages())
    False```
    
    b

    bennett.h.rand

    11/7/2024

    I think I've found that site.getsitepackages() is returning a list of paths that combine to a os.path.commonpath that does not exist in the site packages list, so if os.path.commonpath(site_packages + [mod_file]) in site_packages_set is always false.

    y

    ytong

    11/7/2024

    mind going through that logic and seeing where it’s failing for you?

    b

    bennett.h.rand

    11/7/2024

    Yeah, I'm testing things that way and by grabbing and inspecting the uploaded fast-serialized file from cloud storage. The specific issue is that my team's virtual environment folder is store directly alongside the project code, so that whole folder is packaged with --copy auto , typically something we've relied on ignore files to deal with. And -v makes is very obvious when a bunch of the file trees start with .venv/.... Unfortunately, all the pre-existing project tooling relies on the virtual environment being there, so moving it would be a whole project itself.

    y

    ytong

    11/7/2024

    did you try verbose mode to list out all the files that it’s bundling?

    y

    ytong

    11/7/2024

    this was the intended behavior yeah, it was assumed perhaps incorrectly that if you loaded a python file, that it would need to be included. because if not, presumably when you go to run the task, the container will attempt to load it again. and if it doesn’t find it it’ll fail

    b

    bennett.h.rand

    11/6/2024

    Due to some legacy situations in how the project is laid out, this results in the targz being ~100MB large, as opposed to ~100kB.

    b

    bennett.h.rand

    11/6/2024

    Hello! I'm using pyflyte run --remote ... and everything is working, but I notice that my .flyteignore file is not being obeyed in --copy auto mode, only all. Is this the intended behavior? The function that lists the imported files <https://github.com/flyteorg/flytekit/blob/master/flytekit/tools/script_mode.py#L116|isn't being passed the ignore_group object>.