F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

ValueError with StructuredDataset in Workflow

Summary

The user encountered a ValueError related to a StructuredDataset while executing a remote workflow with a DataFrame of random data. They suspected the issue was with the DataFrame, despite it being passed correctly in the code. The user sought suggestions to resolve the error and mentioned using flytekit version 1.13.5. They resolved the issue by using a workaround that involved specifying the DataFrame as a StructuredDataset with the code snippet: "df_input" : StructuredDataset(dataframe=df, file_format="").

Status
resolved
Tags
    Source
    #ask-the-community
      p

      pavlinamitsou

      10/1/2024

      so it is solved now with the above workaround

      p

      pavlinamitsou

      10/1/2024

      I am using the version 1.13.5 but in order to make it work I had to do the following for the dataframe :

      "df_input" : StructuredDataset(dataframe=df, file_format="")

      a

      aallasamhita

      10/1/2024

      this should work. what's your flytekit version?

      p

      pavlinamitsou

      9/26/2024

      Hello!

      I am trying to use a remote workflow and execute it like this:

              'column1': np.random.randint(0, 100, size=10),
              'column2': np.random.rand(10),
              'column3': np.random.choice(['A', 'B', 'C', 'D'], size=10)
          }
      
          df = pd.DataFrame(data)
      
          inputs = {
              "data_source_name": "dashboards-test",
              "data_source_description": "test",
              "df_input": df
          }
          workflow_name = (
              "workflowname"
          )
          remote = FlyteRemote(config=Config.auto())
          workflow = remote.fetch_workflow(
              project="project",
              domain="production",
              name=workflow_name,
          )
          execution = remote.execute(workflow, inputs=inputs)
      
          execution = remote.wait(execution)```
      But it fails on this line `remote.execute(workflow, inputs=inputs)` with the error:
      
      ```ValueError: Error encountered while executing 'load_data':
        Failed to find a handler for <class 
      'flytekit.types.structured.structured_dataset.StructuredDataset'>, protocol 
      [flyte], fmt ['']```
      I am assuming that they problem is related to the dataframe but  in the workflow's code I saw that it is also being passed a dataframe without any problem.
      
      Any suggestion?