TACC launcher 是什么? 它是一个简单实用的工具,用来帮助用户在一个批处理脚本中提交多个单线程或多线程的任务。 它的详细介绍请参考官网:传送门。 它的下载地址:传送门。 TACC launcher 怎么用?非常推荐前往官网查看它的使用方法,有很详细的介绍。我就不再重复了,英文不好的朋友可以使用网页翻译工具翻译一下。 简单讲,就是: - 将这个工具下载下来
- 解压缩
- 不需要编译!
- 配置环境变量
- 写一个joblist文件,里面写上所有需要执行的任务
- 使用launcher的命令提交
TACC launcher + slurm 实例准备算例我们准备一个joblist文件:myjoblist,里面写上要执行的任务,先简单些12行helloworld做测试: echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"编写提交脚本我们再编写一个提交脚本sub.sh,里面写上launcher的相关命令: #!/bin/bashexport LAUNCHER_JOB_FILE=/path/to/myjoblistexport LAUNCHER_DIR=$HOME/launcher/launcher-3.1.1export PATH=$LAUNCHER_DIR:$PATHexport LAUNCHER_PLUGIN_DIR=$LAUNCHER_DIR/pluginsexport LAUNCHER_RMI=SLURMexport LAUNCHER_SCHED=interleavedexport LAUNCHER_WORKDIR=`pwd`$LAUNCHER_DIR/paramrun说明:
1. LAUNCHER_JOB_FILE 为myjoblist路径,请改为实际路径
2. LAUNCHER_DIR 为launcher的安装路径,请改为实际路径
3. 其他的变量暂时不需要修改 提交脚本yhbatch -N 2 -n 6 -p debug sub.sh说明:
1. -N 2 表示2个节点
2. -n 6 表示6个cpu核(一共6个,不是每个节点6个;另外,注意n需要能被N整除,否则报错)
3. -p debug 表示使用debug分区 查看结果使用slurm作业调度系统提交的任务会有一个默认的输出文件slurm-jobid.out,我们查看这个文件: Launcher: Setup complete.------------- SUMMARY --------------- Number of hosts: 2 Working directory: $HOME/workdir/test Processes per host: 3 Total processes: 6 Total jobs: 12 Scheduling method: interleaved-------------------------------------Launcher: Starting parallel tasks...Launcher: Task 1 running job 2 on cn95 (echo "hello, world")Launcher: Task 0 running job 1 on cn95 (echo "hello, world")hello, worldhello, worldLauncher: Task 2 running job 3 on cn95 (echo "hello, world")hello, worldLauncher: Job 1 completed in 0 seconds.Launcher: Task 5 running job 6 on cn96 (echo "hello, world")Launcher: Task 4 running job 5 on cn96 (echo "hello, world")hello, worldhello, worldLauncher: Task 3 running job 4 on cn96 (echo "hello, world")Launcher: Job 3 completed in 0 seconds.hello, worldLauncher: Job 2 completed in 0 seconds.Launcher: Job 6 completed in 0 seconds.Launcher: Job 5 completed in 0 seconds.Launcher: Job 4 completed in 0 seconds.Launcher: Task 0 running job 7 on cn95 (echo "hello, world")hello, worldLauncher: Task 2 running job 9 on cn95 (echo "hello, world")hello, worldLauncher: Task 1 running job 8 on cn95 (echo "hello, world")hello, worldLauncher: Task 5 running job 12 on cn96 (echo "hello, world")hello, worldLauncher: Task 3 running job 10 on cn96 (echo "hello, world")hello, worldLauncher: Task 4 running job 11 on cn96 (echo "hello, world")hello, worldLauncher: Job 7 completed in 0 seconds.Launcher: Job 9 completed in 0 seconds.Launcher: Job 8 completed in 0 seconds.Launcher: Job 12 completed in 0 seconds.Launcher: Job 10 completed in 0 seconds.Launcher: Job 11 completed in 0 seconds.Launcher: Task 0 done. Exiting.Launcher: Task 2 done. Exiting.Launcher: Task 1 done. Exiting.Launcher: Task 5 done. Exiting.Launcher: Task 3 done. Exiting.Launcher: Task 4 done. Exiting.Launcher: Done. Job exited without errors- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
说明: 参数 值 说明
Number of hosts 2 -N 2,所以为2个节点
Working directory $HOME/workdir/test 这个是实际的提交目录
Processes per host 3 每个节点的进程数,是通过 6/2=3 得到,所以注意要整除 !
Total processes 6 -n 6,所以有一共6个进程
Total jobs 12 在myjobslist中我们写了12行,所以是12个jobs
Scheduling method interleaved 这个参数是调度方法,有3种,详见官网 记录- 在测试的时候,默认使用LAUNCHER_SCHED=dynamic会一直计算无法结束,暂时不考虑。
- 对openmp程序的支持?mpi程序呢?待测试。(看时间吧)
- 如果出现缺少库的情况,请将缺少的库添加到LD_LIBRARY_PATH中即可。
【转载】 https://blog.csdn.net/sowhatgavin/article/details/81912515
|
|