目录
- S3aSIM
- 下载
- 编译
- 运行S3aSIM
- HPIO
- 下载
- 编译
- 运行HPIO
- S3D
- 下载
- 编译
- 运行S3D
- BTIO
- 下载
- 编译
- 运行BTIO
- Github
S3aSIM
S3aSim 是一个序列相似性搜索算法框架,用于使用MPI-IO测试和评估各种I/O策略。它已针对PVFS2 ROMIO提示进行了优化,但可以进行扩展以将MPI-IO提示用于其他文件系统。 S3aSim使用带有数据库分段的主从并行编程模型,模仿mpiBLAST访问模式。
下载
http://users.ece.northwestern.edu/~aching/research_webpage/benchmarks/s3asim-1.00.tar.gz
编译
修改一下Makefile中的mpi环境变量。
ifeq ($(PLATFORM), LA-MPI)
mpi_path = /opt/lampi/lampi-1.5.12/gm
cc = gcc -I ${mpi_path}/include -L ${mpi_path}/lib
CFLAGS = -Wall -Wstrict-prototypes -g
else
mpi_path = /usr/local/
cc = $(mpi_path)/bin/mpicc
CFLAGS = -Wall -Wstrict-prototypes -g
endif
EXECUTABLE = s3asim-nj s3asim-test
运行S3aSIM
mpirun --np 4 s3asim-nj \
--total-fragments 128 \
--query-count 20100 \
--database-sequence-size-min 6 \
--database-sequence-size-max 45088768 \
--compute-speed 10.0 \
--io-method 0 \
--parallel-io 0 \
--output_file /mnt/gkfs/output
mpirun -n <numprocs> ./s3asim-nj
There are many options that can be used with s3asim.
-h, --help display this help and exit
-f, --total-fragments number of fragments of data (default 4)
-c, --query-count number of total queries (default 2)
-q, --query-size-min min size of each query (default 10)
-Q, --query-size-max max size of each query (default 1000)
-d, --database-sequence-size-min
min size of each database-sequence (default 10)
-D, --database-sequence-size-max
max size of each database-sequence (default 1000)
-y, --query_params_file query params file (default N/A)
(-q and -Q options will be ignored)
-Y, --db_params_file db params file (default N/A)
(-d and -D options will be ignored)
-r, --result-size-min min size of each result (default 10)
-m, --result-count-min min count of each result (default 10)
-M, --result-count-max max count of each result (default 1000)
-K, --compute-speed speedup of compute time (default 1.0)
-i, --io-method note: If using serial I/O, only
individual I/O can be used. (default)
0 - individual I/O
1 - collective I/O
-p, --parallel-io 1 - use parallel I/O (default 0)
-s, --query-sync 1 - sync per query (default 0)
-o, --output_file output file (default test)
-I, --no_io 0 - default 1 - turn I/O off for testing
-a, --atomicity 0 - default 1 - turn atomicity on
-e, --end_write 1 - write all data at end
-H, --mpi-io-hint set as many MPI-IO hints as is desired
through repeated use (interface is
key=value) - for example, to turn off
data sieving for writes in ROMIO, use
"-H romio_ds_write=enable"
HPIO
HPIO(高性能 I/O)基准测试是用于评估/调试 MPI-IO 的非连续 I/O 性能的工具。它允许用户指定各种非连续 I/O 访问模式并验证输出。它已针对 PVFS2 MPI-IO 提示进行了优化,但可以进行扩展以将 MPI-IO 提示用于其他文件系统。
下载
http://users.ece.northwestern.edu/~aching/research_webpage/benchmarks/hpio-1.55.tar.gz
编译
修改一下Makefile中的mpi环境变量。
# Please change mpi_dir to point to your MPI installation
mpi_dir = /usr/local
cc = ${mpi_dir}/bin/mpicc
CFLAGS = -Wall -Wstrict-prototypes -g
EXECUTABLE = hpio hpio-small
运行HPIO
mpirun -np 4 hpio -a 0 -b 111 -n 1001 -m 111111 -O 11 -A 1 -m 01 -o output
fakerth@fakerth-IdeaCentre-GeekPro-17IRB:~$ mpirun -np 4 hpio -a 0 -b 111 -n 1001 -m 111111 -O 11 -A 1 -m 01 -o output
Filling in parameters which have not been specified by the user...
** Initializing filename prefix to 't'
** Initializing default region count = 4096
** Initializing default region size = 8
** Initializing default region spacing = 128
** Initializing default reps = 1
** Initializing default rep max time = 600.000000
** Initializing to use 1 second wait time
** Initializing to use fsync
** Initializing to use INDIVIDUAL fsync method
** Initializing to use VECTOR pattern type
** Initializing use of partial generate file option if necessary
** Initializing to not keep created files that have been read
** Initializing to use different files per run
** Initializing cache size of 2 GBytes
** Initializing verify mode off
** Initializing cache flush off
** Initializing enable resume off
** Initializing to not estimate space
#################### Experimental Settings ####################
procs = 4
dir = (null)
output_dir = output
default: region count = 4096
default: region size = 8
default: region spacing = 128
default: test (check single mode only) = (null)
pattern datatype: = vector
rw: WRITE = 1
rw: READ = 1
bandwidth test: region_count = 1
bandwidth test: region_size = 1
bandwidth test: region_spacing = 1
bandwidth test: single mode = 0
check test: human = 0
check test: defined = 0
check test: single mode = 0
(M) Contig | (F) Contig = 1
(M) NonContig | (F) Contig = 0
(M) Contig | (F) NonContig = 0
(M) NonContig | (F) NonContig = 1
individual I/O = 0
collective I/O = 1
reps = 1
rep maximum time (seconds) = 600.000000
average method = average all
verify = none
enable fsync = 1
fsync method = individual
same file = 0
generate files = partial (before reads)
keep files = 0
enable cache = 0
cache size (MBytes) = 2048
wait time (seconds between runs) = 1.000000
check required space mode = 0
enable resume test mode = 0
atomic mode = 1
hint 0 " romio_cb_write" = enable
hint 1 " romio_cb_read" = enable
###############################################################
write | region_count | c-c | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_count
0.001 | 0.002 | 0.001 | 0.000 | 0.004 | 30.031 | 19.083 | 2048
0.001 | 0.001 | 0.005 | 0.000 | 0.007 |209.046 | 21.108 | 4096
0.001 | 0.001 | 0.005 | 0.000 | 0.007 |367.663 | 41.480 | 8192
0.000 | 0.000 | 0.006 | 0.000 | 0.007 |1288.177 | 75.734 | 16384
0.001 | 0.002 | 0.005 | 0.000 | 0.008 |509.698 |137.893 | 32768
0.001 | 0.002 | 0.005 | 0.000 | 0.008 |996.982 |280.106 | 65536
0.000 | 0.003 | 0.006 | 0.000 | 0.009 |1290.257 |464.627 | 131072
0.000 | 0.005 | 0.005 | 0.000 | 0.010 |1696.382 |831.337 | 262144
0.000 | 0.010 | 0.008 | 0.000 | 0.018 |1619.774 |902.425 | 524288
0.001 | 0.017 | 0.014 | 0.000 | 0.032 |1937.882 |1051.525 | 1048576
write | region_count | nc-nc | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_count
0.001 | 0.005 | 0.002 | 0.000 | 0.008 | 11.698 | 8.611 | 2048
0.001 | 0.007 | 0.003 | 0.000 | 0.011 | 17.232 | 12.424 | 4096
0.001 | 0.010 | 0.004 | 0.000 | 0.015 | 24.171 | 18.018 | 8192
0.001 | 0.018 | 0.008 | 0.000 | 0.027 | 28.147 | 19.743 | 16384
0.002 | 0.032 | 0.009 | 0.000 | 0.043 | 31.317 | 24.340 | 32768
0.002 | 0.063 | 0.011 | 0.000 | 0.076 | 31.573 | 26.740 | 65536
0.002 | 0.129 | 0.028 | 0.000 | 0.160 | 31.056 | 25.437 | 131072
0.006 | 0.255 | 0.050 | 0.000 | 0.311 | 31.346 | 26.228 | 262144
0.010 | 0.581 | 0.087 | 0.000 | 0.679 | 27.528 | 23.929 | 524288
0.080 | 1.484 | 0.151 | 0.000 | 1.715 | 21.557 | 19.572 | 1048576
write | region_size | c-c | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_size
0.000 | 0.000 | 0.005 | 0.000 | 0.006 |948.080 | 22.560 | 8
0.001 | 0.001 | 0.006 | 0.000 | 0.007 |327.680 | 38.724 | 16
0.000 | 0.000 | 0.006 | 0.000 | 0.006 |1199.058 | 84.004 | 32
0.001 | 0.001 | 0.005 | 0.000 | 0.007 |791.229 |150.533 | 64
0.000 | 0.002 | 0.005 | 0.000 | 0.007 |1277.194 |294.203 | 128
0.004 | 0.005 | 0.002 | 0.000 | 0.012 |730.747 |505.886 | 256
0.001 | 0.007 | 0.004 | 0.000 | 0.012 |1079.337 |689.895 | 512
0.001 | 0.016 | 0.007 | 0.000 | 0.025 |1021.770 |693.059 | 1024
0.001 | 0.034 | 0.012 | 0.000 | 0.047 |931.666 |696.909 | 2048
0.000 | 0.035 | 0.027 | 0.000 | 0.063 |1811.697 |1027.107 | 4096
write | region_size | nc-nc | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_size
0.000 | 0.004 | 0.003 | 0.000 | 0.007 | 32.619 | 19.165 | 8
0.001 | 0.007 | 0.005 | 0.000 | 0.014 | 36.824 | 20.368 | 16
0.001 | 0.011 | 0.003 | 0.000 | 0.016 | 44.075 | 34.717 | 32
0.001 | 0.006 | 0.004 | 0.000 | 0.010 |177.342 |105.653 | 64
0.001 | 0.008 | 0.002 | 0.000 | 0.011 |260.314 |198.632 | 128
0.001 | 0.007 | 0.005 | 0.000 | 0.013 |571.178 |321.877 | 256
0.000 | 0.009 | 0.004 | 0.000 | 0.014 |875.660 |611.816 | 512
0.000 | 0.028 | 0.006 | 0.000 | 0.034 |578.221 |474.694 | 1024
0.000 | 0.024 | 0.012 | 0.000 | 0.036 |1355.981 |899.305 | 2048
0.000 | 0.048 | 0.032 | 0.000 | 0.080 |1345.298 |805.102 | 4096
write | region_spacing | c-c | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_spacing
0.000 | 0.000 | 0.001 | 0.000 | 0.002 |1115.506 | 91.117 | 8
0.001 | 0.000 | 0.005 | 0.000 | 0.006 |510.008 | 23.430 | 16
0.001 | 0.000 | 0.005 | 0.000 | 0.006 |490.447 | 23.684 | 32
0.001 | 0.000 | 0.001 | 0.000 | 0.003 |260.970 | 74.399 | 64
0.012 | 0.000 | 0.001 | 0.000 | 0.013 |571.120 |102.300 | 128
0.000 | 0.000 | 0.001 | 0.000 | 0.002 |675.629 |100.728 | 256
0.002 | 0.000 | 0.005 | 0.000 | 0.007 |432.224 | 23.143 | 512
0.001 | 0.000 | 0.001 | 0.000 | 0.002 |615.362 | 76.965 | 1024
0.001 | 0.000 | 0.005 | 0.000 | 0.007 |494.145 | 23.166 | 2048
0.001 | 0.000 | 0.005 | 0.000 | 0.006 |504.123 | 23.519 | 4096
write | region_spacing | nc-nc | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_spacing
0.001 | 0.007 | 0.001 | 0.000 | 0.009 | 18.977 | 16.394 | 8
0.001 | 0.006 | 0.002 | 0.000 | 0.009 | 19.605 | 15.789 | 16
0.001 | 0.007 | 0.001 | 0.000 | 0.009 | 17.947 | 15.302 | 32
0.001 | 0.007 | 0.002 | 0.000 | 0.010 | 17.456 | 14.313 | 64
0.001 | 0.009 | 0.002 | 0.000 | 0.012 | 13.376 | 11.473 | 128
0.001 | 0.008 | 0.002 | 0.000 | 0.011 | 16.225 | 12.409 | 256
0.001 | 0.012 | 0.003 | 0.000 | 0.016 | 10.595 | 8.255 | 512
0.001 | 0.015 | 0.006 | 0.000 | 0.022 | 8.348 | 5.920 | 1024
0.001 | 0.022 | 0.016 | 0.000 | 0.039 | 5.569 | 3.292 | 2048
0.000 | 0.036 | 0.028 | 0.000 | 0.065 | 3.425 | 1.939 | 4096
read | region_count | c-c | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_count
0.001 | 0.004 | 0.000 | 0.000 | 0.005 | 15.364 | 15.342 | 2048
0.001 | 0.004 | 0.000 | 0.000 | 0.005 | 29.818 | 29.691 | 4096
0.001 | 0.004 | 0.000 | 0.000 | 0.005 | 56.585 | 56.497 | 8192
0.001 | 0.005 | 0.000 | 0.000 | 0.006 |105.416 |105.126 | 16384
0.001 | 0.005 | 0.000 | 0.000 | 0.006 |195.311 |195.120 | 32768
0.001 | 0.002 | 0.000 | 0.000 | 0.003 |813.954 |813.007 | 65536
0.001 | 0.007 | 0.000 | 0.000 | 0.008 |575.469 |574.464 | 131072
0.001 | 0.011 | 0.000 | 0.000 | 0.011 |760.682 |760.389 | 262144
0.002 | 0.017 | 0.000 | 0.000 | 0.019 |915.237 |914.913 | 524288
0.001 | 0.030 | 0.000 | 0.000 | 0.031 |1073.570 |1073.355 | 1048576
read | region_count | nc-nc | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_count
0.005 | 0.003 | 0.000 | 0.000 | 0.007 | 23.242 | 23.234 | 2048
0.000 | 0.008 | 0.000 | 0.000 | 0.009 | 14.802 | 14.644 | 4096
0.001 | 0.008 | 0.000 | 0.000 | 0.010 | 29.947 | 29.284 | 8192
0.005 | 0.013 | 0.005 | 0.000 | 0.024 | 38.058 | 27.130 | 16384
0.005 | 0.028 | 0.000 | 0.000 | 0.033 | 36.296 | 35.680 | 32768
0.006 | 0.058 | 0.000 | 0.000 | 0.064 | 34.530 | 34.517 | 65536
0.006 | 0.126 | 0.002 | 0.000 | 0.135 | 31.779 | 31.177 | 131072
0.006 | 0.245 | 0.006 | 0.000 | 0.257 | 32.652 | 31.817 | 262144
0.009 | 0.558 | 0.000 | 0.000 | 0.568 | 28.670 | 28.670 | 524288
0.019 | 1.289 | 0.010 | 0.000 | 1.317 | 24.829 | 24.647 | 1048576
read | region_size | c-c | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_size
0.000 | 0.004 | 0.000 | 0.000 | 0.005 | 29.968 | 29.932 | 8
0.001 | 0.004 | 0.000 | 0.000 | 0.005 | 59.255 | 58.836 | 16
0.007 | 0.005 | 0.000 | 0.000 | 0.012 |105.506 |105.332 | 32
0.000 | 0.001 | 0.000 | 0.000 | 0.002 |729.825 |728.810 | 64
0.001 | 0.007 | 0.000 | 0.000 | 0.008 |289.432 |289.093 | 128
0.000 | 0.005 | 0.000 | 0.000 | 0.006 |738.401 |736.909 | 256
0.000 | 0.010 | 0.000 | 0.000 | 0.010 |841.048 |809.223 | 512
0.001 | 0.017 | 0.003 | 0.000 | 0.022 |925.920 |778.092 | 1024
0.001 | 0.032 | 0.000 | 0.000 | 0.034 |987.505 |987.164 | 2048
0.001 | 0.056 | 0.000 | 0.000 | 0.057 |1138.852 |1138.712 | 4096
read | region_size | nc-nc | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_size
0.001 | 0.004 | 0.000 | 0.000 | 0.005 | 31.102 | 30.399 | 8
0.001 | 0.005 | 0.000 | 0.000 | 0.006 | 54.645 | 52.977 | 16
0.001 | 0.004 | 0.000 | 0.000 | 0.005 |127.455 |122.971 | 32
0.001 | 0.002 | 0.001 | 0.000 | 0.004 |487.823 |288.189 | 64
0.001 | 0.005 | 0.002 | 0.000 | 0.008 |370.997 |284.697 | 128
0.005 | 0.007 | 0.000 | 0.000 | 0.012 |553.174 |552.718 | 256
0.005 | 0.009 | 0.000 | 0.000 | 0.014 |852.869 |852.241 | 512
0.001 | 0.015 | 0.000 | 0.000 | 0.016 |1081.006 |1080.414 | 1024
0.005 | 0.032 | 0.000 | 0.000 | 0.037 |993.609 |993.484 | 2048
0.005 | 0.070 | 0.000 | 0.000 | 0.075 |920.466 |917.722 | 4096
read | region_spacing | c-c | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_spacing
0.000 | 0.001 | 0.000 | 0.000 | 0.001 |198.069 |192.541 | 8
0.000 | 0.000 | 0.000 | 0.000 | 0.001 |376.373 |374.224 | 16
0.001 | 0.001 | 0.000 | 0.000 | 0.002 |181.981 |172.180 | 32
0.000 | 0.004 | 0.000 | 0.000 | 0.005 | 31.275 | 31.133 | 64
0.001 | 0.004 | 0.000 | 0.000 | 0.005 | 29.691 | 29.496 | 128
0.001 | 0.001 | 0.000 | 0.000 | 0.002 |231.066 |228.050 | 256
0.001 | 0.004 | 0.000 | 0.000 | 0.005 | 28.823 | 28.802 | 512
0.001 | 0.004 | 0.000 | 0.000 | 0.005 | 29.940 | 29.911 | 1024
0.000 | 0.000 | 0.000 | 0.000 | 0.001 |416.763 |413.803 | 2048
0.000 | 0.004 | 0.000 | 0.000 | 0.005 | 28.160 | 28.071 | 4096
read | region_spacing | nc-nc | collective
----------------time (seconds)--------------|-bandwidth (MB/s)|---parameter---
open | io | sync | close | total | IO | IOsyn | region_spacing
0.001 | 0.003 | 0.000 | 0.000 | 0.004 | 41.105 | 41.066 | 8
0.000 | 0.003 | 0.000 | 0.000 | 0.004 | 40.902 | 39.949 | 16
0.001 | 0.006 | 0.000 | 0.000 | 0.006 | 22.559 | 21.811 | 32
0.000 | 0.003 | 0.000 | 0.000 | 0.004 | 35.859 | 35.142 | 64
0.000 | 0.003 | 0.000 | 0.000 | 0.004 | 38.216 | 37.380 | 128
0.000 | 0.005 | 0.000 | 0.000 | 0.005 | 27.510 | 27.015 | 256
0.001 | 0.014 | 0.000 | 0.000 | 0.016 | 8.627 | 8.543 | 512
0.001 | 0.012 | 0.000 | 0.000 | 0.012 | 10.852 | 10.849 | 1024
0.001 | 0.013 | 0.000 | 0.000 | 0.013 | 9.978 | 9.975 | 2048
0.001 | 0.019 | 0.000 | 0.000 | 0.020 | 6.538 | 6.537 | 4096
S3D
S3D 是一种连续尺度第一原理直接数值模拟代码,可求解化学物质(包括化学反应)的质量连续性、动量、能量和质量分数的可压缩控制方程。该软件对实现S3D 燃烧模拟代码的 I/O 内核的PnetCDF方法的性能进行基准测试 。评价方法为弱标度法。
下载
https://github.com/wkliao/S3D-IO
编译
修改mpif90和PnetCDF的环境变量:
# Please change the following variables:
# MPIF90 -- MPI Fortran compiler
# FCFLAGS -- Compile flag
# PNETCDF_DIR -- PnetCDF library installation directory
#
MPIF90 = mpif90
FCFLAGS = -Wall -g
PNETCDF_DIR = $(HOME)/PnetCDF
运行S3D
mpiexec -n 4 ./s3d_io.x 10 10 10 2 2 1 1 F .
mpiexec -l -n 4096 ./s3d_io.x 800 800 800 16 16 16 1 F /scratch1/scratchdirs/wkliao/FS_1M_96
BTIO
BTIO在跨平方数量的MPI进程的三维数组上呈现块三对角分区模式。每个进程负责整个数据集的多个笛卡尔子集,其数量随着参与计算的进程数的平方根而增加。具有维度的单个全局数组unlimited被创建为输出文件中的netCDF记录变量。该数组有五个维度,并且仅沿中间三个维度在进程之间进行分区。每个记录 都是最低有效四个维度的子数组。用户可以调整要写入和读取的记录数。通过将一条记录附加到另一条记录,所有记录都会并行连续写入共享文件。数组变量以规范的行优先顺序存储在文件中。为了测量读取性能,稍后使用相同的数据分区模式读回全局变量。全局数组的大小也可以在输入参数文件“inputbt.data”中调整。
该软件针对 NASA NAS 并行基准 (NPB) 套件 ( http://www.nas.nasa.gov/publications/npb.html )使用的I/O模式对PnetCDF和MPI-IO方法的性能进行基准测试 。评价方法为强标度法。
下载
https://github.com/wkliao/BTIO
编译
修改mpif90和PnetCDF的环境变量:
# Please change the following variables:
# MPIF90 -- MPI Fortran compiler
# FCFLAGS -- Compile flag
# PNETCDF_DIR -- PnetCDF library installation directory
#
MPIF90 = mpif90
FCFLAGS = -Wall -g
PNETCDF_DIR = $(HOME)/PnetCDF
运行BTIO
mpiexec -n 1024 ./btio
mpiexec -n 1024 ./btio inputbt.data
Github
https://github.com/fakerst/application