F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

FlyteFile S3 Upload Errors

Summary

The user is encountering errors with FlyteFile when uploading data to S3, specifically a TypeError related to task outputs and an unexpected keyword argument 'cache_regions' during Session initialization. They provide a code snippet for generating sample data and suggest returning the S3 path directly, though they are unsure about the syntax. They confirm that returning a string works and share two examples of tasks and a workflow involving FlyteFile.

Status
open
Tags
    Source
    #ask-the-community
      e

      eric901201

      10/4/2024

      wait, maybe your region is special? I've never seen this error before

      e

      eric901201

      10/4/2024

      Do you mind paste your error message to the thread below your question next time? It will be more readable for us, thank you :heart:

      e

      eric901201

      10/4/2024

      Both these 2 examples work

      def t_deck() -> FlyteFile:
          return "<s3://my-s3-bucket/example.txt>"
          return FlyteFile("<s3://my-s3-bucket/example.txt>")
      
      @task(enable_deck=True, container_image=custom_image)
      def t_read_file(file: FlyteFile) -&gt; str:
          with open(file, "r") as f:
              return f.read()
      
      @workflow
      def wf() -&gt; str:
          ff =  t_deck()
          return t_read_file(file=ff)```
      
      e

      eric901201

      10/4/2024

      I'm pretty sure

      e

      eric901201

      10/4/2024

      but return str works

      e

      eric901201

      10/4/2024

      I'm not sure your syntax is correct or not

      e

      eric901201

      10/4/2024

      try return s3 path directly

      j

      jielian.guo

      10/3/2024

      Hi community! I want use FlyteFile to upload data to S3, so I put the s3 url in the remote path, I got errors. My task is

      def fetch_and_upload_data(load_type: str) -&gt; FlyteFile:
          import os
      
          timestamps = pd.date_range(start="2024-01-01", periods=96, freq='15T')
      
          # Sample data for site PV and temperature
          np.random.seed(0)
          pv_data = np.random.uniform(low=0, high=100, size=len(timestamps))  # PV generation in kW
          temperature_data = np.random.uniform(low=-10, high=35, size=len(timestamps))  # Temperature in Celsius
      
          # Create a DataFrame
          data = pd.DataFrame({
              'Timestamp': timestamps,
              'site': pv_data,
              'temperature': temperature_data
          })
          execution_id = current_context().execution_id.name
      
          raw_data_path_remote = f"<s3://jielian-dev/{execution_id}/raw_data_{load_type}_2022-08-31T17:00:00Z_2023-03-03T11:00:00Z.csv>"
          # write to local path
          raw_data_path_local = Path(flytekit.current_context().working_directory) / f"{execution_id}/raw_data_{load_type}_2022-08-31T17:00:00Z_2023-03-03T11:00:00Z.csv"
          directory = Path(flytekit.current_context().working_directory) / f"{execution_id}"
          if not os.path.exists(directory):
              os.makedirs(directory)
          data.to_csv(raw_data_path_local)
          return FlyteFile(path=raw_data_path_local.__str__(),
                           remote_path=raw_data_path_remote
                           )```