Introduction

This is the official documentation for Picasso supercomputer usage. If you have any problem that is not specified on this documentation or in the frequently asked questions section please, contact us at  This email address is being protected from spambots. You need JavaScript enabled to view it. .

INDEX

At this documentation you can find the following sections:

  • 0.Hardware: description of our computational resources.
    • System overview
    • Available hardware
  • 1.Login to Picasso: how to connect to picasso.
    • SSH connection
    • Terminal
    • Important notice
  • 2.Copy files from/to Picasso: how to transfer files.
    • Downloading files from the internet
    • Copying files from your computer to picasso and viceversa
  • 3.Software: with instructions to interrogate the system with already available software.
    • Installed software
    • Loaded software
    • Compiling software
  • 4.File systems: with storage system architecture.
    • Home and Scratch
    • File and space quota
    • Fast scratch filesystem (FSCRATCH)
    • Local scratch filesystem
    • Backup policy
  • 5.How to send jobs: Picasso queue system usage guide and examples.
    • Preparing to send a job
    • Modifying resources and limits
    • Asking for GPUs
    • Sample jobs generator
    • Sending a job
    • Array jobs: how to send lots of jobs
    • Monitoring queued jobs. The new online job monitor
    • Cancelling jobs
    • Using the LocalScratch filesystem
  • 6.FAQs: There is a compilation of frequently asked questions at the end of this documentation.
    • How much resources should I ask for my jobs?
    • Error message: Connection refused/Network is unreachable
    • Error message: Remote host identification has changed
    • How do I change my password?
    • Why is my process in the login node being cancelled?
    • Why is my job queued for so much time?
    • Why does my job not get the resources I asked for?
    • I need more time for my job
    • Why has my job failed?
    • Error message: cnf <command>
    • Why has my job been cancelled?
    • Error message: Out of memory handler
    • Why can't I edit nor create new files?
    • I need more space or file quota

Citations and akcnowledges

If you use our resources, you must acknowledge this service on your publications. Please, add a text like this:

The authors thank the Supercomputing and Bioinnovation Center (SCBI) of the University of Malaga for their provision of computational resources and technical support (www.scbi.uma.es/site)

 

We would appreciate if you could inform us of your publications that used our resources, and we will take the opportunity to congratulate you.

Top

Hardware

System overview 

SCBI's supercomputing resources comprises a set of computation nodes with different characteristics. However, all those machines are unified behind a single Slurm queue system instance, so you shouldn't worry about their differences, as you only have to create a SBATCH script (a text file with some special syntax).

The script is used to request the resources (time, cpus, memory, gpus, etc) that your job will use. It is important that the resources asked can be optimally used, as this will help your job to start as soon as possible. Machines with special characteristics like a lot of memory or gpus are scarse, and because of that they are harder to reserve. 

Once you have your script written, you have to send it to the queue system. Then the queue system will analyze your request and send the job to the appropriate computers. There are some examples in chapter "How to send jobs". If you have any question, please, don't hesitate in contacting us at  This email address is being protected from spambots. You need JavaScript enabled to view it.  and we will do our best to try to help you.

Available hardware

  • [sd]: 126 x SD530 nodes: 52 cores (Intel Xeon Gold 6230R @ 2.10GHz), 192 GB of RAM. InfiniBand HDR100 network. 950 GB of localscratch disks.
  • [bl]: 24 x Bull R282-Z90 nodes: 128 cores (AMD EPYC 7H12 @ 2.6GHz), 2 TB of RAM. InfiniBand HDR200 network. 3.5 TB of localscratch disks.
  • [sr]: 156 x Lenovo SR645 nodes: 128 cores (AMD EPYC 7H12 @ 2.6GHz), 512 GB of RAM. InfiniBand HDR100 network.. 900 GB of localscratch disks.
  • [exa]: 4 x DGX-A100 nodes: 8 GPUs (Nvidia A100), 1 TB of RAM. InfiniBand HDR200 network. 14 TB of localscratch.

IMPORTANT: if you aim to use a lot of files (ie. more than 15000) to solve a problem (for example, in IA training), you must contact with us first, because it can lead to serious performance problems.

Top

Login to Picasso

SSH connection 

Like most supercomputers, Picasso is based on a Linux distribution, in this case openSuse Leap 15.2. Remote system access is done by SSH protocol (port 22), connecting to a login server. In this server you will find all compilers and tools needed to prepare and send your jobs to the queue system.

To connect to the login server you will need to enter the following command on the system terminal:

ssh <username>@picasso.scbi.uma.es

EXAMPLE: If your username is myuser, you should enter:

ssh This email address is being protected from spambots. You need JavaScript enabled to view it.

After this, you will be prompted to enter your password. When entering it, you will see that nothing appears on the screen, but it is being registered. Press enter when you are done, and the connection should be established.

Warning: if you fail to enter the password for several times, the system will block your IP. You will have to contact us to This email address is being protected from spambots. You need JavaScript enabled to view it. in order to get unbanned.

Tip: if you have session issues and your connection brokes after a short innactivity time or because of Internet connection stability; you must include a keep alive command in your connections:

ssh ServerAliveInterval=60 This email address is being protected from spambots. You need JavaScript enabled to view it.

Terminal

If you are using Linux or Mac OS X, you don't need additional programs to connect since you already have an SSH client on your system terminal. On windows you can use some program like MobaXterm (https://mobaxterm.mobatek.net/) or use Linux kernel available under WSDL for Windows versions ≥10.

Important notice

IMPORTANT NOTE: The login machine is not a place to execute your work. It can be used for building scripts, compiling programs, testing that a complex command that you are going to use in the SBATCH script is launching correctly, etc. but NOT for making real work. All launched programs will be automatically killed without previous notice when they use more than 10 minutes of cpu time.

Top

Copy files from/to picasso

At sometime you will need to copy files from, or into, picasso. We have to differentiate two cases: 

Downloading files from the internet

In case you want to DOWNLOAD a file into picasso from a url available on INTERNET, you can download it using wget command in picasso console:

wget <url>

EXAMPLE: To download a file from the url https://www.example.com/file.txt, you have to use:

wget https://www.example.com/file.txt

It is usual that wget <url> command does not work when the file you need to download requires you to enter a password, to login into some account, or to make some other action before downloading. In these cases you will need to install a plugin for your web browser. This plugin will generate the complete working wget command for you. Then you will just have to paste the command in picasso and wait for the download to finish. These are the steps you need to follow:

1. Download and install one of these plugins depending on the web browser that you use (curlwget for chrome, cliget for firefox, curlwget for edge)

2. Start the download (in your computer) of the file you are interested in, then stop the download.

3. Now click on the plugin icon (usually in the top right corner of your web browser).

4. The complete wget command should appear. Copy the complete wget command.

5. Paste it in picasso and press enter. The download should start.

Copying files from your computer to picasso and viceversa

In case you need to COPY a file from/to picasso to/from your computer, we need to differentiate from UNIX based operating systems (Linux, MacOS) and Windows systems.

IMPORTANT: for all OS, there are third-party software which can include user interfaces which can be more friendly than console. As picasso administrators, we do not recommend, nor regret, the usage of these programs; but the correct maintenance and/or function cannot be assured.

If you are using an UNIX based OS or MacOS X, you can use the command:

scp <from_path_file> <to_destination_path>

This command can be used to copy in both directions.

To copy from picasso to your computer:

scp <user>@picasso.scbi.uma.es:<file_path_in_picasso> <file_local_destination>

To copy from your computer to picasso:

scp <file_local_destination> <user>@picasso.scbi.uma.es:<file_destination_in_picasso>

EXAMPLES:

For copying a file that is in your computer in /home/user/Desktop/file.txt to your home in picasso, you have to use the following command.

scp /home/user/Desktop/file.txt This email address is being protected from spambots. You need JavaScript enabled to view it. :~/

For copying a file that is in picasso in /mnt/home/users/groupname/username/file.txt to the Desktop of your computer, you have to use the following command.

scp This email address is being protected from spambots. You need JavaScript enabled to view it. :~/file.txt /home/user/Desktop/

Copying complete folders:

To copy complete folders including files and any other content in a recursive way, use the -r flag in the scp command.  

scp -r <from_path_folder> <to_destination_path>

If you want to copy a lot of files, we recommend the use of the rsync command, is very similar to scp, but rsync can skip already transferred files, so it makes a synchronization instead of a full copy.

In case you are using a Windows systems, you must install a third party software like Putty or Moba XTerm, or install the command PSCP (equivalent to SCP) and use it in the same way that explained for UNIX systems, but using PSCP. 

NOTE:  For heavy transfers we recommend using the rsync command, since if the transfer is interrupted by any reason it will skip existing files when you try to upload them again.

Top

Software

Installed software 

We have a wide variety of software installed ready to use. You can browse the updated list in our web, or by executing this command on the login server:

module avail

To use any software package shown in the list, you have to add this line to your SBATCH script (or excute it on the login node if you are only testing how to use it):

module load software/version

EXAMPLE: If you want to load the 22.1 version of Ansys, you may use:

module load ansys/22.1

Note that you need to issue the module load of the software you are going to use every time that you create a new ssh session, but also inside the SBATCH file that you are going to use to submit your job to the queue system. As a rule of thumb, it is recommended to use the latest version installed.

If you need a program that is not shown in the list, you can contact us asking for its installation ( This email address is being protected from spambots. You need JavaScript enabled to view it. ). You can also install your own programs and scripts in your home if necesssary.

Loaded software

To see the packages that you have already loaded:

module list

And to unload any previously loaded package:

module unload software

EXAMPLE: If you want to unload the 22.1 version of Ansys, you may use:

module unload ansys/22.1

You can also use the following command to unload all the software that you may have loaded. (Note: if you disconnect from picasso, all the software is automatically unloaded).

module purge

Compiling software

To compile your own software you can use different compilers (gcc, intel, pgi,... ). Each software has its own build instructions, but normally you can compile it with different compilers. 

Gcc is the default, but the intel compiler can give your code some speedup. To compile using the Intel compiler you should load it first:

module load intel

Top

File system

Picasso Filesystems 

The Picasso filesystem is divided in two physically independent spaces. In both of them, as a user of picasso, you will get some disk quota. The quota determines the disk limits for your user.

  • home: here you should store input data, your own developed scripts, final results, and other important data. To go to your home space you can enter one of the following commands:
    cd $HOME
    cd ~
    cd
  • fscratch: FScratch is a very high speed storage in which you should launch your work. You can find relevant information about using Fscratch bellow. Be aware that fscratch is a temporary storage, and old files will be deleted periodically. PLEASE, DO NOT USE IT TO STORE IMPORTANT DATA. To go to your fscratch space you can enter:
    cd $FSCRATCH

File and space quota

Apart from the space limitation in each of both spaces (home and fscratch), there is also a file quota. While the space quota determines the limitation in terms of gigabytes written, the file quota determines the limitation in terms of number of files written.

The quota works in two steps, that are called soft quota and hard quota:

  • Soft quota: When you exceed your quota, you will receive a warning message every time you log into picasso, explaining that you have seven days to get back to a normal state. During this period, you will maintain the full normal access to picasso resources, and you can even create new files. This way, your jobs will be able to finish without incidents. If after 7 days you are not yet below the quota, then you will lose the write permission, so no data could be written until you free up some space or remove enough files.
  • Hard quota: When you exceed your quota by a lot, you will find a hard limit which will not allow you to write any more files, even if you were inside the 7 days of soft quota grace period.

Remember that you can reach both soft of hard quota limits because of the number of files written or because of the size of these files.

In order to check your quotas, you can run:

mmlsquota

Fast scratch filesystem (FSCRATCH)

The FScratch (Fast Scratch) filesystem should be used to speed up the jobs, specially the ones that make an intense use of the storage. This space is conceived as a pseudo-volatile filesystem. This means that the data stored here will be deleted periodically and automatically, so you should not use it for storaging important files.

The way that it is meant to be used is:

  1. Go to FSCRATCH by using
    cd $FSCRATCH
  2. Copy the input data to FSCRATCH by using
    cp -r $HOME/path/to/job/directory ./
  3. Enter in the directory when the copy is finished.
  4. Execute your work as usual.
  5. And eventually, don't forget to copy the output data back to your HOME, because it will be deleted after some weeks.

If you are not sure how to use FScratch, please contact us to This email address is being protected from spambots. You need JavaScript enabled to view it. .

Local scratch filesystem

There are some nodes that have a local scratch filesystem. This local scratch is even faster than the fscratch filesystem, but it has a main disadvantage: It is only accessible from each node, so you cannot access it from the login machine, but only from inside the SBATCH script that you will send to the queue system (using the $LOCALSCRATCH environment variable).

The local scratch is very fast, and can speed up some jobs substantially when used. And it's a must when a job needs to write lots of files or to make an intense disk usage. If you think you need access to local scratch and you are not sure how to use it, please contact us to This email address is being protected from spambots. You need JavaScript enabled to view it. . You can also find an example in the chapter "How to send jobs".

Backup policy

IMPORTANT NOTE: It is important that you follow a responsible backup policy. We are not liable for the loss of data stored in our systems, so if your data is important, you should have backups of it in your own machine or backup system. As a courtesy we maintain a backup of the files stored in the home directory, but we cannot neither warranty it nor make backups of all the space available. You can follow these backup guidelines if you like:

  • It is a good habit to use a version control system for your own programs and scripts. Git can be a good solution (https://git-scm.com/). Version control systems helps programmers to keep a tracking of all changes made to source code, scripts, etc... Every time that you make a change to a file, you can save it to your version control system with some textual description, and you can later see those changes at anytime or go back to an older version. This is not a true backup, but a kind of.
  • Keep different backups, from different dates. Backup are also useful if you delete of modify a file by mistake.
  • Store backups in different physical places. This way, if your main computer location suffers a disaster, you could access another copy that you have in another place.
  • Try to access your backup data periodically. You can make lots of backups, but if they are not accessible they are not useful.
Top

How to send jobs

Our current queue system is Slurm. So any Slurm's manual will give you more detailed information about these commands. This is only a quickstart guide:

Preparing to send a job

Prior to sending a job to the queue system you only have to write a small script file with a specific format. Inside the file there are some commented lines with the #SBATCH prefix, these lines are used to request the necessary resources that your program needs to solve your desired job.

To help you in this step, we have written a script that generates a sample file that you can use as a template. In the login node you can call this script with:

gen_sbatch_file script.sh "command"

Where command is the call to the program that you want to send to picasso queue system. As an example, if you wanted to list all files in the current directory you can write:

gen_sbatch_file script.sh "ls"

This will generate a file called script.sh that contains the requested command, and a generic resource request for picasso (in terms of number of CPUs, gigabytes of RAM, and time).

Now that you are familiar with the generation script, let's show a more real example. If you wanted to solve a fluent job with a journal file called journal.jou, then you should enter:

gen_sbatch_file script_fluent.sh "time fluent 3ddp -g -i journal.jou"

This will generate a file called script_fluent.sh that will solve the journal.jou file, and also output the time consumed for that task. Each software has its own form of calling it for solving a job, but don't panic, as you will find all the details in the individual intructions, guides or readme files of each software.

Once you have your template file generated, you can edit it with vi, pico or your prefered editor. Inside the file you can change the required resources, load the desired package and write the params of the command you are going to execute as it is explained in the section "Modifying resorces and limits". As an example, the previously generated script_fluent.sh file will need a line module load ansys/22.1 before the fluent command in order to work.

Modifying resources and limits

The SBATCH file generated by gen_sbatch_file contains a set of lines requesting different resources. These lines are commented with #SBATCH and will be interpreted by the slurm queue system. Note: All SBATCH pragma lines should go at the head of the file, without uncommented lines above them, because any SBATCH line after the first uncommented line will be ignored by slurm.

A sample script will be similar to this one:

#!/usr/bin/env bash
# The name to show in queue lists for this job:
##SBATCH -J jobname

# Number of task:
#SBATCH --ntasks=1

# Number of cpus per task:
#SBATCH --cpus-per-task=1

# Amount of RAM needed for this job:
#SBATCH --mem=2gb

# The time the job will be running:
#SBATCH --time=10:00:00

# To use GPUs you have to request them:
##SBATCH --gres=gpu:1

# If you need nodes with special features uncomment the desired constraint line:
##SBATCH --constraint=fat
##SBATCH --constraint=cal
##SBATCH --constraint=dx
##SBATCH --constraint=bl
##SBATCH --constraint=sd
##SBATCH --constraint=exa # GPUs 

# Set output and error files
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

# Remove one comment in following line to make an array job. Then N jobs will be launched. In each one SLURM\_ARRAY\_TASK\_ID will take one value from 1 to 100
##SBATCH --array=1-100

# To load some software (you can show the list with 'module avail'):
module load software


# the program to execute with its parameters:
time my_command parameters

Lines with double comments (symbol ##) are ignored, you can activate them by removing only one comment, leaving it with one #. To request more cpus modify the --cpus-per-task line, to request more time, modify the --time line, and to request more ram, modify the --mem line. If you want to use, or know about other available options, check official documentation.

IMPORTANT NOTE: resources limits have hard policies. It means that, if you exceed requested resources, your job will be killed.

ANOTHER IMPORTANT NOTE: you can evaluate the resources that a job has effectively utilized by running seff when it has already finished. This will allow you to adjust resources for optimal utilization (your jobs will start to solve sooner). You can also use the new online job monitor for this task. In the section "Monitoring queued jobs" below, you can find more details about it.

FOR OLD USERS: new Slurm version has changed command --cpus for --cpus-per-task. You must update your scripts in order to obtain requested resources

Asking for GPUs

When your software is able to use GPUs for its proccesing, and you also have correctly configured it, you just have to include this line in your sbatch script:

#SBATCH --gres=gpu:1

If you want to use more than one GPU, just change the "1" for another number. Remember that our exa nodes have 8 GPUs each.

Sample jobs generator

We also have written a tool that generates a complete sample job for some softwares. You can see a list of available sample jobs with this command:

gen_sample_job | grep -v NO

When you have located the desired sample job, you can generate it with the following command:

gen_sample_job <sofware> <output_foler>

Where <software> is the desired software without the version, and <output_folder> is the folder that will be created to contain the sh script and other needed files. For example, if you wanted to create a sample job for the software Gaussian in a new folder called gaussian_job, you should enter:

gen_sample_job gaussian gaussian_job

Sending a job

When you have a modified version of the script.sh file adapted to your needs, you are ready to send it to the queue system. For doing so, you just have to enter:

sbatch script.sh

Now, the job has been received by the queue system, who will look for the resources requested. Once the resources are available, the job will begin.

In the section "Monitoring queued jobs" you will learn how to monitor the state of your jobs. This way, you will know if the job is still looking for a place to be executed (queued) or if it is already running.

IMPORTANT NOTE: You will notice that if you requested resources adapted to your needs (and not in excess), your job will begin much faster, as it will find a place to be executed much easier. Also bear in mind that if you request lesser resources than what your job will need, the job will be immediately killed as soon as the resources are exceeded by the software.

Array jobs: how to send lots of jobs

When you need to send a bunch of jobs that executes the same command over different data, you can make use of array jobs. Array jobs are now a native option of slurm, so you will find advanced information about them in slurm's manuals. NOTE: If you are going to send an arrayjob bigger than 1000 jobs for the first time, please, contact us first.

To do this, you only need to do a few changes to the script file. At first, remove one comment symbol from the --array line. Leaving it only with one comment simbol #:

# Leave only one comment in following line to make an array job. Then N jobs will be launched. In each one SLURM\_ARRAY\_TASK\_ID will take one value from 1 to 100

#SBATCH --array=1-100

And then rewrite your command line so that it will use the SLURM_ARRAY_TASK_ID environment variable to choose the appropriate input file or parameters. Eg.:

time my_command myFile_${SLURM_ARRAY_TASK_ID}

After these modifications, you have to send it to the queue system, using sbatch:

sbatch script.sh

In this example, the sarray command will send 100 jobs replacing SLURM_ARRAY_TASK_ID with 1, 2, 3... until 100, one in each job.

Monitoring queued jobs

At any time, you can monitor the queue of jobs by issuing this command:

squeue

By default squeue shows jobs in short format (grouping array jobs together). If you need to access the long format, use the -l flag:

squeue -l


New online job monitor

For more detailed information about your running jobs, you can access the new online job monitor. This utility will show you details in real time about the number of CPUs and amount of RAM being used, and also the GPU and VRAM usage if it is the case. For using the online monitor follow these instructions:

  1. Enter https://www.scbi.uma.es/slurm_monitor/admin/login
  2. Login with your picasso user credentials
  3. You will see a list of all of your running or finished jobs
  4. By using the first row controls you can sort the jobs by any column
  5. When using the filters on the right, only the jobs you are interested in will be shown
  6. Click on the desired job
  7. You will see more details at the top, and real time graphics below
  8. By clicking on the different items of the legend you can hide or show their corresponding graph
    • System: It can be an indicator of disk usage. If it is high you should use localscratch
    • New proc: A vertical line will show up every time a new proccess has been detected. It is useful when you use multiple programs in your script.
    • Reserved cores: Horizontal line with the cores that were asked.
    • Reserved RAM: Horizontal line with the GB of RAM that were asked.
    • RAM: RAM actually used.
    • Cores: CPU cores actually used.

Remember that if you adjust your asked resources to the ones that you can actually use, your job will begin to solve sooner as it will find faster a place to be executed. Also, more resources will be available for other users in picasso.

 

Cancelling jobs

At some times, you will want to cancel a job that is already running or queued. To do so, you only have to take the job id number (first column shown in squeue), and issue this command:

scancel <JOBID>

where <JOBID> is the number of the job that will be cancelled.

To cancel only some jobs of an array job, use this format:

scancel <JOBID>_[1-50]

In that example, you would have cancelled the jobs 1 to 50 from the array job with id JOBID

Using the LOCALSCRATCH filesystem 

By default, you can work and create temporary files on your normal $SCRATCH filesystem, but sometimes you may need to use a program that generates thousands and thousands of temporary files very fast. That is not a good citizen behaviour for shared file systems, since each small file creation do a few requests to the shared storage to get completed. If there are thousands of them, they can hog the system at some point.

In this extreme cases, it is mandatory to use the $LOCALSCRATCH filesystem. In less extreme situations, in which software executes large I/O operations, you can also take advantage of the speed up that the Localscratch filesystem provides. As seen in the hardware section, all machines have at least 900GB of localscratch. For high localscratch usage, please contact us first.

The localscratch filesystem is independent to each node, and thus, it is not shared between nodes and it's not accesible from the login machines. Because of that, you have to understand how to copy your data there, use it, and later on, retrieving the important results back to your home (all done inside the sbatch script). Here you can find an example that could help you in this tasks. Feel free to contact us if you have any questions.

#!/usr/bin/env bash
# The name to show in queue lists for this job:
##SBATCH -J jobname

# Number of desired cpus:
#SBATCH --ntasks=1

# Amount of RAM needed for this job:
#SBATCH --mem=2gb

# The time the job will be running:
#SBATCH --time=10:00:00

# To use GPUs you have to request them (incompatible with ntasks):
##SBATCH --gres=gpu:1

# If you need nodes with special features uncomment the desired constraint line:
##SBATCH --constraint=bigmem
##SBATCH --constraint=cal
##SBATCH --constraint=dx
##SBATCH --constraint=bl
##SBATCH --constraint=sd

# Set output and error files
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

# Remove one comment in following line to make an array job. Then N jobs will be launched. In each one SLURM\_ARRAY\_TASK\_ID will take one value from 1 to 100
##SBATCH --array=1-100

# To load some software (you can show the list with 'module avail'):
module load software

# create a temp dir in localscratch
MYLOCALSCRATCH=$LOCALSCRATCH/$USER/$SLURM_JOB_ID
mkdir -p $MYLOCALSCRATCH

# execute there the program
cd $MYLOCALSCRATCH
time program1 $HOME/data/data1 > results

#copy some results back to home
cp -rp results $HOME/place_to_store_results

#remove your localscratch files:

if cd $LOCALSCRATCH/$USER; then
if [ -z "$MYLOCALSCRATCH" ]; then
rm -rf --one-file-system $MYLOCALSCRATCH
fi
fi

Top

FAQs

This section contains answers to Frequently Asked Questions: 

How much resources should I ask for my jobs?

It is important that the resources are adjusted to your needs. Too little resources will end up in cancelled jobs, and too much resources will end up increasing the time that your job needs to find a place to be executed, also resulting in lesser resources free for other users.

You can use the new online job monitor (explained in the "How to send jobs" section, subsection "Monitoring queued jobs") to evaluate if you are using the resources correctly.

Even if you are using all the cores you asked for, it does not have to mean that they are being correctly used. Some programs does not scale well.

As an example, there are programs that using 64 cores solve the problems nearly as twice as fast as when using 32 cores. Nevertheless, other programs are not that good, and can only improve the speed by a 10% in such a scenario, or even last more with 64 cores than with 32!

If you experiment this kind of problem, or if you need help adjusting the resources, prepare a job example that last about 2 hours to finish and conctact us at This email address is being protected from spambots. You need JavaScript enabled to view it.

Error message: Connection refused/Network is unreachable

If you receive this error message, your IP has been blocked because of too many failed login attemps. If you have not been automatically unbanned in 30 minutes, please contact us at  This email address is being protected from spambots. You need JavaScript enabled to view it.

Error message: Remote host identification has changed

If you see a message with some of these texts: "Remote host identification has changed"; "It is possible that someone is doing something nasty". Can be for several security breachs but, we have changed our fingerprint recently (July 2021). If you have not connected since this date, you must remove the old key and use the new one. To do this, please read the full warning message and you will identificate a text that says "You can use following command to remove the offending key:". After that, a command you must run is shown with your personal path to SSH keys. In a linux system it will be pre-assumable as:

ssh-keygen -R picasso.scbi.uma.es -f <yourPath>

How do I change my password?

If you already know your password and you want to change it. You can achieve it by using the command:

passwd

It will ask you about your current password. Then, new password will be asked twice, and a successful change message will be eventually shown.

Why is my process in the login node being cancelled?

The login machine is not a place to execute your work. It can be used for building scripts, compiling programs, testing that a complex command that you are going to use in the SBATCH script is launching correctly, etc. but NOT for making real work. All launched programs will be automatically killed without previous notice when they use more than 10 minutes of cpu time.

Why is my job queued for so much time?

It is probably due to the resources that you ask are excessive. You will notice that if you requested resources adapted to your needs (and not in excess), your job will begin much faster, as it will find a place to be executed much easier. Also bear in mind that if you request lesser resources than what your job will need, the job will be immediately killed as soon as the resources are exceeded by the software.

If you asked for GPUs for your job, you can check the currently free GPUs available issuing:

free_gpus.sh

Why does my job not get the resources I asked for?

All SBATCH pragma lines (for example #SBATCH --ntasks=8) should go at the head of the file, without uncommented lines above them, because any SBATCH line after the first uncommented line will be ignored by slurm. Probably this happened to your sbatch file.

I need more time for my job

Users have a time limit of 3 or 7 days of maximum job timeframe. Asking for more than 3 days is risky, given that the possibility of any kind of error increases with time.

If even with 7 days it is not enough for your job, it clearly needs some optimization (parallelism, division, change of software, overall rethink, etc.). You can contact us at This email address is being protected from spambots. You need JavaScript enabled to view it. if you need help for this task.

Why has my job failed?

You can take a look at the error and output log files to get clues about what happened.

Error message: cnf <command>

The “Command not found” error is shown when picasso is unable to find the command you try to execute. It is commonly caused by forgetting to include a module load command prior to the call of the software concerned.

Why has my job been cancelled?

If your job tried to use more resources than what you requested, the job will be automatically cancelled. In the case that it was hogging the system in a way that interferes with other users' jobs or if the resources reserved for it were dramatically underused, we will cancel it manually.

Error message: Out of memory handler

If your job tried to use more RAM than what you requested, the job will be automatically cancelled. Please, increase the amount of RAM requested in your sbatch file and confirm it follows the structure #SBATCH --mem=2gb;

Why can't I edit nor create new files?

Probably you exceeded the quota. Use the <code>mmlsquota</code> for checking. Remember that you can reach both soft of hard quota because of the number of files written or because of the size of these files. When you removed enough files, you will be back able to create and edit files. For more information refer to the "File and space quota" section.

If you did not exceed your quota, then, you might be trying to write in a folder, or edit a file in which you don't have the write permission.

I need more space or file quota

Remember that you have two separate locations for your files, the $HOME (where you should store input data, your own developed scripts, final results, and other important data) and the $SCRATCH (where you should launch your work).

Verify that you don't preserve any unwanted old files, and remove them if so. If you need to increase the quota because of some conda / python package, you should know that you don't have to install the complete environment in your user. It is enough to issue module load anaconda and then pip install only for the rest of packages that are not in the module.

If any of the above helps you, you can contact us at This email address is being protected from spambots. You need JavaScript enabled to view it.

 

Top

Additional information