Update home authored by Matthias Redies's avatar Matthias Redies
......@@ -117,3 +117,27 @@ use nvtx
```bash
nsys profile --trace=nvtx,cuda ./exe
```
##Running on iffslurm
Iffslurm has a very strange configuration. Therefore if you set `--cpus-per-task=64` you'll get 64 logical & 32 physical cores. So you need to set `--cpus-per-task=128` to get all cores and then `OMP_NUM_THREADS=64` to use each core only once. For example:
```bash
#!/bin/bash
#SBATCH --job-name=job
#SBATCH --nodes=1 # Run all processes on a single node
#SBATCH --ntasks=1 # Run a single task
#SBATCH --cpus-per-task=128 # Number of CPU cores per task
#SBATCH --time=24:00:00 # Time limit hrs:min:sec
#SBATCH --output=slurm-%j.log # Standard output and error log
#SBATCH -p th1-2020-64
export OMP_NUM_THREADS=64
export OMP_PROC_BIND=spread
export I_MPI_PIN=enable
ulimit -c unlimited
ulimit -s unlimited
source compiler-select intel-fi
srun ~/fleur/build/fleur_MPI -trace
```
\ No newline at end of file