... | ... | @@ -116,4 +116,28 @@ use nvtx |
|
|
+ run with the profiler
|
|
|
```bash
|
|
|
nsys profile --trace=nvtx,cuda ./exe
|
|
|
```
|
|
|
|
|
|
##Running on iffslurm
|
|
|
Iffslurm has a very strange configuration. Therefore if you set `--cpus-per-task=64` you'll get 64 logical & 32 physical cores. So you need to set `--cpus-per-task=128` to get all cores and then `OMP_NUM_THREADS=64` to use each core only once. For example:
|
|
|
```bash
|
|
|
#!/bin/bash
|
|
|
#SBATCH --job-name=job
|
|
|
#SBATCH --nodes=1 # Run all processes on a single node
|
|
|
#SBATCH --ntasks=1 # Run a single task
|
|
|
#SBATCH --cpus-per-task=128 # Number of CPU cores per task
|
|
|
#SBATCH --time=24:00:00 # Time limit hrs:min:sec
|
|
|
#SBATCH --output=slurm-%j.log # Standard output and error log
|
|
|
#SBATCH -p th1-2020-64
|
|
|
|
|
|
export OMP_NUM_THREADS=64
|
|
|
export OMP_PROC_BIND=spread
|
|
|
export I_MPI_PIN=enable
|
|
|
|
|
|
ulimit -c unlimited
|
|
|
ulimit -s unlimited
|
|
|
|
|
|
source compiler-select intel-fi
|
|
|
|
|
|
srun ~/fleur/build/fleur_MPI -trace
|
|
|
``` |
|
|
\ No newline at end of file |