黑马程序员技术交流社区

标题: 【上海校区】如何使用TACC launcher来批量提交串行任务 [打印本页]

作者: 不二晨    时间: 2018-8-22 09:43
标题: 【上海校区】如何使用TACC launcher来批量提交串行任务
TACC launcher 是什么?

它是一个简单实用的工具,用来帮助用户在一个批处理脚本中提交多个单线程或多线程的任务。

它的详细介绍请参考官网:传送门

它的下载地址:传送门

TACC launcher 怎么用?

非常推荐前往官网查看它的使用方法,有很详细的介绍。我就不再重复了,英文不好的朋友可以使用网页翻译工具翻译一下。

简单讲,就是:

TACC launcher + slurm 实例准备算例

我们准备一个joblist文件:myjoblist,里面写上要执行的任务,先简单些12行helloworld做测试:

echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"echo "hello, world"编写提交脚本

我们再编写一个提交脚本sub.sh,里面写上launcher的相关命令:

#!/bin/bashexport LAUNCHER_JOB_FILE=/path/to/myjoblistexport LAUNCHER_DIR=$HOME/launcher/launcher-3.1.1export PATH=$LAUNCHER_DIR:$PATHexport LAUNCHER_PLUGIN_DIR=$LAUNCHER_DIR/pluginsexport LAUNCHER_RMI=SLURMexport LAUNCHER_SCHED=interleavedexport LAUNCHER_WORKDIR=`pwd`$LAUNCHER_DIR/paramrun

说明:
1. LAUNCHER_JOB_FILE 为myjoblist路径,请改为实际路径
2. LAUNCHER_DIR 为launcher的安装路径,请改为实际路径
3. 其他的变量暂时不需要修改

提交脚本yhbatch -N 2 -n 6 -p debug sub.sh

说明:
1. -N 2 表示2个节点
2. -n 6 表示6个cpu核(一共6个,不是每个节点6个;另外,注意n需要能被N整除,否则报错)
3. -p debug 表示使用debug分区

查看结果

使用slurm作业调度系统提交的任务会有一个默认的输出文件slurm-jobid.out,我们查看这个文件:

Launcher: Setup complete.------------- SUMMARY ---------------   Number of hosts:    2   Working directory:  $HOME/workdir/test   Processes per host: 3   Total processes:    6   Total jobs:         12   Scheduling method:  interleaved-------------------------------------Launcher: Starting parallel tasks...Launcher: Task 1 running job 2 on cn95 (echo "hello, world")Launcher: Task 0 running job 1 on cn95 (echo "hello, world")hello, worldhello, worldLauncher: Task 2 running job 3 on cn95 (echo "hello, world")hello, worldLauncher: Job 1 completed in 0 seconds.Launcher: Task 5 running job 6 on cn96 (echo "hello, world")Launcher: Task 4 running job 5 on cn96 (echo "hello, world")hello, worldhello, worldLauncher: Task 3 running job 4 on cn96 (echo "hello, world")Launcher: Job 3 completed in 0 seconds.hello, worldLauncher: Job 2 completed in 0 seconds.Launcher: Job 6 completed in 0 seconds.Launcher: Job 5 completed in 0 seconds.Launcher: Job 4 completed in 0 seconds.Launcher: Task 0 running job 7 on cn95 (echo "hello, world")hello, worldLauncher: Task 2 running job 9 on cn95 (echo "hello, world")hello, worldLauncher: Task 1 running job 8 on cn95 (echo "hello, world")hello, worldLauncher: Task 5 running job 12 on cn96 (echo "hello, world")hello, worldLauncher: Task 3 running job 10 on cn96 (echo "hello, world")hello, worldLauncher: Task 4 running job 11 on cn96 (echo "hello, world")hello, worldLauncher: Job 7 completed in 0 seconds.Launcher: Job 9 completed in 0 seconds.Launcher: Job 8 completed in 0 seconds.Launcher: Job 12 completed in 0 seconds.Launcher: Job 10 completed in 0 seconds.Launcher: Job 11 completed in 0 seconds.Launcher: Task 0 done. Exiting.Launcher: Task 2 done. Exiting.Launcher: Task 1 done. Exiting.Launcher: Task 5 done. Exiting.Launcher: Task 3 done. Exiting.Launcher: Task 4 done. Exiting.Launcher: Done. Job exited without errors

说明:

参数
说明

Number of hosts
2
-N 2,所以为2个节点

Working directory
$HOME/workdir/test
这个是实际的提交目录

Processes per host
3
每个节点的进程数,是通过 6/2=3 得到,所以注意要整除 !

Total processes
6
-n 6,所以有一共6个进程

Total jobs
12
在myjobslist中我们写了12行,所以是12个jobs

Scheduling method
interleaved
这个参数是调度方法,有3种,详见官网
记录


【转载】        https://blog.csdn.net/sowhatgavin/article/details/81912515



作者: 不二晨    时间: 2018-8-23 17:09
奈斯




欢迎光临 黑马程序员技术交流社区 (http://bbs.itheima.com/) 黑马程序员IT技术论坛 X3.2