How to begin for the new user?

Hi,
I am really new to the Pangeo community. So my question may be silly.
When I run the example code no matter which one, I got similar error:

ValueError: Bad Request: https://storage.googleapis.com/download/storage/v1/b/pangeo-cnes/o/alti%2Fj3%2F.zmetadata?alt=media
User project specified in the request is invalid.

I guess I lost something (settings for google cloud?) before run the code. But do not know where to find the solutions?

Can someone give me instructions?

Thanks
Lei

1 Like

Hi Lei,

I think I got this error before when I was trying to run example code without first signing up for a (free!) Pangeo Cloud account: Pangeo Cloud — Pangeo documentation (go to “SIGN UP” section). Once you get an account, you can more easily run the examples as Jupyter notebooks. Hope this helps :slight_smile:

Dianne

2 Likes

Hi, Dianne

Thanks for your reply. The Pangeo Cloud account must be the requirement before using it.
I have send an application to the Pangeo Cloud. And I am waiting the response from the Pangeo staff.

Lei

1 Like

Hi @lei , welcome to the forum, and thanks for filling out the application!

I don’t believe any confirmation of account is sent automatically, but once you fill out the google form you should be able to log in after 24 hours to whichever cluster you requested access to.

If you’re running from one of the hosted JupyterHubs you should have access to pangeo-operated storage buckets. Can you point to the specific example you’re trying to run with a link to the repository or documenation?

Hi, @scottyhq

Thanks very much for your suggestion.

The example I tried is the satellite altimeter along track data: Along Track Altimetry Analysis — Pangeo Gallery documentation

However, I simply use the JupyterLab to run the example (should I use the jupyterHub instead?). And I still got the same error message as my first post.

As you said, I have not received the confirmation after filling out the application of Pangeo Cloud.

I also got some message following Google cloud deployment and then log to continue:

403 : Forbidden
Looks like you have NOT been added to the list of allowed users for this hub. Please contact the hub administrators.

Lei

Scott, it looks like the script that has been updating the cluster permissions has broken! Merge pull request #978 from Jon-Jos/patch-1 ¡ pangeo-data/pangeo-cloud-federation@9832e62 ¡ GitHub

This would explain why users who sign up are not being added. :man_facepalming:

OK, we’ve fixed a problem with updating members @lei , once you confirm your membership here People · Pangeo · GitHub (you should have a ‘pending invitation’ within 24 hours of filling out the google form) you can log into https://us-central1-b.gcp.pangeo.io.

For that particular example notebook, if you look into the catalog file, you’ll see urlpaths pointing to a Google Cloud storage bucket gs://pangeo-cnes which is only accessible from the Pangeo Google JupyterHub. Note that you’ll have much better performance analyzing data in google cloud storage on the Hub so that you don’t have to transfer large amounts of data over internet to your own machine.

3 Likes

@scottyhq and @rabernat , the altimetry example worked now. Thanks very much. The Pangeo is really fantastic and useful.

Now, I try to run the example LLC4320 in Pangeo which is a much larger data-set. Thus, the dask is needed.

The codes before the Launch Dask Cluster section run good without error. However, when I run:

from dask_kubernetes import KubeCluster
from dask.distributed import Client
cluster = KubeCluster()
cluster.adapt(minimum=1, maximum=20)
client = Client(cluster)
cluster

I see these wrong messages:

---------------------------------------------------------------------------
ValueError: Worker pod specification not provided. See KubeCluster docstring for ways to specify workers

After googling these messages, I run:

from dask_kubernetes import KubeCluster, make_pod_spec

pod_spec = make_pod_spec(image='daskdev/dask:latest',
                         memory_limit='4G', memory_request='4G',
                         cpu_limit=1, cpu_request=1)

cluster = KubeCluster(pod_spec)

cluster.scale(10)  # specify number of workers explicitly
cluster.adapt(minimum=1, maximum=100)  # or dynamically scale based on current workload

But seems this is not the solution, and error message raised:

---
ClientConnectorError: Cannot connect to host 10.12.0.1:443 ssl:default [Connect call failed ('10.12.0.1', 443)]

I know this problem is not related the Pangeo side but the dask side. Forgive me also a beginner of Dask and Kubercluster. Perhaps veterans here know how to fix this quickly.

Those examples are extremely old and unmaintained. We should probably take them down.

Much newer examples are here - Physical Oceanography — Pangeo Gallery documentation

These new examples show that you should use Dask Gateway instead of Dask Kubernetes.

Perfect! I can load the LLC4320 now. A lot of thanks to Pangeo and the people build it.
Thanks @rabernat

1 Like

Hello everybody,
i’m also new to the pangeo community and i am confronted with the same problem as lei in the first case.
I’m already signed up at Pangeo Cloud - Pangeo documentation.
and confirmed my membership as well.
i logged into https://us-central1-b.gcp.pangeo.io and tried my notebook, but i got the same error…

i hope somebody can help me.

Mario

My notebook is the swot simulator

@crazyearth - you should use Dask Gateway, not Dask Kubernetes. See the documentation at https://pangeo.io/cloud.html.