Needing help and clarification about Jupyter on HPC Cheyenne

chiaral · October 10, 2019, 3:28pm

I will start with a new post here - let me know if I should open a new issue on github.

Today I accessed Cheyenne and started setting up everything (I had already created a pangeo environment a year ago, so I updated it). Then I configured dask (I did not configured the password for Jupyter, I rather use the token) as explained here . I successfully launched a qsub using the account number I have access to, and then launched jupyter with:

jupyter lab --no-browser --ip=hostname --port=8888

and then tried to forward it to my browser by opening another terminal page and using

ssh -N -L 8888:r8i4n0:8888 username@cheyenne.ucar.edu

and then it hangs for a bit and then I get

channel 2: open failed: connect failed: Connection refused

indeed if I try to open localhost:8888 I cannot connect to the notebook.

I am sure is some silly thing I am overlooking, But what is it?

Also my second question is, should I just try the “Deploy Option 2” with launch_dask.sh? What are the differences/drawbacks/limitations of one option over the other?

Thanks!

rabernat · October 10, 2019, 4:24pm

I think the problem is that you are ssh’ing to the wrong node. r8i4n0 the hostname of the compute node where your notebook is running. It will vary depending on where the scheduler places your job. There are a couple of ways to get the hostname. If you already have an interactive terminal running, you just type hostname on the command line. These lines from the docs your referenced give you a command to run to get the full ssh command printed out.

echo "ssh -N -L 8888:`hostname`:8888 $USER@cheyenne.ucar.edu"
ssh -N -L 8888:r8i4n0:8888 username@cheyenne.ucar.edu

The fact that you are using the same hostname as the docs tells me that you probably just copied that line exactly, rather than trying to get the actual host name where you job is running.

chiaral · October 10, 2019, 4:56pm

Thanks. That worked.

jhamman · October 10, 2019, 5:08pm

@chiaral- I think you know about this already but, just in case, and for others who come along here, we now have a JupyterHub setup on Cheyenne which means you don’t need to use the ssh-tunneling options anymore: https://jupyterhub.ucar.edu

@jukent posted this nice walk through describing how to access this hub:

chiaral · October 10, 2019, 5:16pm

This is so much easier! I wasn’t sure if it was ready for accessing it, but it seems to have worked perfectly and sees my local environments as well. This is AWESOME!

Thanks so much to every one at NCAR! I am getting really excited to work on the hackathon!

jukent · October 10, 2019, 11:24pm

I’m glad it worked for you!

chiaral · October 25, 2019, 7:44pm

Hello @jukent & @jhamman

This message is a bit OT compared to the previous topic, but still related to my quest of figuring things out on Cheyenne/Jupyter.

I created a package, I successfully (or so it looks) installed it in my conda environment in my home directory on Cheyenne. This package has some fortran code that gets compiled during the installation.
I run
python setup.py build_ext
pip install -e .

everything works, the py.test in fact are successful.

When I call the fortran subroutine from a terminal

python -c 'from package import fortran_call'

I get no error whatsoever. It seems to work. (And again, all the tests written in the package that run through the many subroutines are successful).

If I open a jupyter notebook through the https://jupyterhub.ucar.edu (and I choose the same conda environment), the same call fails and I get this error:

ImportError: libifcoremt.so.5: cannot open shared object file: No such file or directory

It seems to be an issue with Jupyterhub because it seems like I can load it correctly within the terminal.

Any suggestions of what this could be? And how I can solve it?

I don’t know if it’s related, but during the build, I get this Warning:
UserWarning: LDFLAGS is used as is, not appended to flags already defined by numpy.distutils!

I didn’t get that before on the other machines where I successfully used and installed this package.

UPDATE:
I found this very old issue that sounds exactly what I am facing.
Since I cannot do the changes suggested in the comments, I tried to set a LD_LIBRARY_PATH for the current environment.

import os os.environ['LD_LIBRARY_PATH'] = '/ncar/opt/slurm/latest//lib:/glade/u/apps/ch/opt/mpt_fmods/2.19/intel/18.0.5:/glade/u/apps/ch/opt/mpt/2.19/lib:/glade/u/apps/opt/intel/2018u4/compilers_and_libraries/linux/lib/intel64:/glade/u/apps/ch/os/usr/lib64:/glade/u/apps/ch/os/usr/lib:/glade/u/apps/ch/os/lib64:/glade/u/apps/ch/os/lib'

which I got from my shell environment.

But I still get the same error. So I am not sure if this workaround cannot work, or this issue I linked to is not what would solve mine.

chiaral · October 27, 2019, 2:03pm

@rabernat figured out the workaround!
I leave it here for reference. These commands need to be run in a cell before the lauch of dask with ncar_jobqueue

I set up LD_LIBRARY_PATH (get details from your home directory with env|grep LD

LD_LIBRARY_PATH = '/ncar/opt/slurm/latest//lib:/glade/u/apps/ch/opt/mpt_fmods/2.19/intel/18.0.5:/glade/u/apps/ch/opt/mpt/2.19/lib:/glade/u/apps/opt/intel/2018u4/compilers_and_libraries/linux/lib/intel64:/glade/u/apps/ch/os/usr/lib64:/glade/u/apps/ch/os/usr/lib:/glade/u/apps/ch/os/lib64:/glade/u/apps/ch/os/lib'

set a command like:
env_command = f"export LD_LIBRARY_PATH='{LD_LIBRARY_PATH}'"
pass it to your dask workers when you use ncar_jobqueu:
from ncar_jobqueue import NCARCluster cluster = NCARCluster(project='XXXXYYZZ', env_extra=[env_command])

I still cannot run the package from the jupyter notebook, but the workers will see it.

jhamman · October 28, 2019, 1:43am

Hi @chiaral - I’m glad you’ve found a workaround for now. I think I’ll just leave a few a notes that be helpful pointers in the future:

Make sure you compile your C/Fortran code on the system you will be running on. Cheyenne and the DAV systems are not 100% compatible (https://www2.cisl.ucar.edu/resources/computational-systems/cheyenne/code-development-support/compiling-code)
Play close attention to environment variables like LD_LIBRARY_PATH, LDFLAGS, PATH, etc. When launching Jupyter or Dask jobs, its not a guarantee that you’ll end up with the same environment.
Think a bit about how dynamically linked libraries will play with the rest of your python setup. Presumably you’re going to want to share this with others so getting the linking working seamlessly will be important down the road.

Topic		Replies	Views
JupyterHub access on Cheyenne CMIP6 Hackathon	8	2206	October 16, 2019
Trying to open Gateway Cluster Pangeo Cloud Support	10	868	November 10, 2021
Connecting to Dask-Gateway cluster from Outside a JupyterHub Cloud	0	1108	March 26, 2020
Start up errors on pangeo google cloud deployments Cloud	9	629	March 18, 2022
How about creating a pangeo-hpc github repo? HPC	5	943	September 27, 2019

Needing help and clarification about Jupyter on HPC Cheyenne

Related topics