Summary
The user is encountering errors with FlyteFile when uploading data to S3, specifically a TypeError related to task outputs and an unexpected keyword argument 'cache_regions' during Session initialization. They provide a code snippet for generating sample data and suggest returning the S3 path directly, though they are unsure about the syntax. They confirm that returning a string works and share two examples of tasks and a workflow involving FlyteFile.
eric901201
wait, maybe your region is special? I've never seen this error before
eric901201
Do you mind paste your error message to the thread below your question next time? It will be more readable for us, thank you :heart:
eric901201
Both these 2 examples work
def t_deck() -> FlyteFile:
return "<s3://my-s3-bucket/example.txt>"
return FlyteFile("<s3://my-s3-bucket/example.txt>")
@task(enable_deck=True, container_image=custom_image)
def t_read_file(file: FlyteFile) -> str:
with open(file, "r") as f:
return f.read()
@workflow
def wf() -> str:
ff = t_deck()
return t_read_file(file=ff)```
eric901201
I'm pretty sure
eric901201
but return str
works
eric901201
I'm not sure your syntax is correct or not
eric901201
try return s3 path
directly
jielian.guo
Hi community! I want use FlyteFile to upload data to S3, so I put the s3 url in the remote path, I got errors. My task is
def fetch_and_upload_data(load_type: str) -> FlyteFile:
import os
timestamps = pd.date_range(start="2024-01-01", periods=96, freq='15T')
# Sample data for site PV and temperature
np.random.seed(0)
pv_data = np.random.uniform(low=0, high=100, size=len(timestamps)) # PV generation in kW
temperature_data = np.random.uniform(low=-10, high=35, size=len(timestamps)) # Temperature in Celsius
# Create a DataFrame
data = pd.DataFrame({
'Timestamp': timestamps,
'site': pv_data,
'temperature': temperature_data
})
execution_id = current_context().execution_id.name
raw_data_path_remote = f"<s3://jielian-dev/{execution_id}/raw_data_{load_type}_2022-08-31T17:00:00Z_2023-03-03T11:00:00Z.csv>"
# write to local path
raw_data_path_local = Path(flytekit.current_context().working_directory) / f"{execution_id}/raw_data_{load_type}_2022-08-31T17:00:00Z_2023-03-03T11:00:00Z.csv"
directory = Path(flytekit.current_context().working_directory) / f"{execution_id}"
if not os.path.exists(directory):
os.makedirs(directory)
data.to_csv(raw_data_path_local)
return FlyteFile(path=raw_data_path_local.__str__(),
remote_path=raw_data_path_remote
)```