I faced the same issue and I was able to install third-party packages in the following way:
BATCH_CONFIG = {
"pyspark_batch": {
"main_python_file_uri": f"{BUCKET}/python/latest/{JOB}",
"python_file_uris": [f"{BUCKET}/python/latest/local_lib/requests-2.32.3-py3-none-any.whl]
"args": ["gs://pub/shakespeare/rose.txt", f"{BUCKET}/sample-output-data"]
},
"environment_config": {
"execution_config": {
"network_uri": f"projects/{PROJECT_ID}/global/networks/main-vpc-prd",
"subnetwork_uri": f"https://www.googleapis.com/compute/v1/projects/{PROJECT_ID}/regions/{REGION}/subnetworks/data-prd",
"service_account": IMPERSONATION_CHAIN,
}
}
}
What I did is to download the python whl file for the library that I want to use. Then I included that as item in the python_file_uris array.
Note: Following this approach you can include as many packages as you want.
Sources:
-> Requests whl file: https://pypi.org/project/requests/#files