Bryce Allen
1d779fd510
update comment
2022-10-29 13:44:38 +00:00
Bryce Allen
b2ed53adc1
switch boundary exchange / stencil direction
...
Contiguous staging vectors are required for multi-d exchange
when the non outer most dimension is exchanged. The previous
version was exchanging y, the outer most dimension, and the
data was already contiguous.
2022-10-29 12:41:12 +00:00
Bryce Allen
936f0851c8
fixes for sycl port
2022-10-28 17:49:01 +00:00
Bryce Allen
55bb0d26d1
WIP add sycl port of stencil2d
2022-10-25 22:41:45 +00:00
Bryce Allen
e5e3ca178a
more precision when printing timings
2022-10-24 18:07:38 -04:00
Bryce Allen
124654b576
update gt daxpy example for new gt-blas handle api
2022-10-24 18:01:22 -04:00
Bryce Allen
9fb70b5169
print n_iter and n_warmup
2022-10-24 18:55:02 +00:00
Bryce Allen
37a97f24dd
add iteration loop
2022-10-24 18:43:12 +00:00
Bryce Allen
2309afb2ab
print stage_host
2022-10-24 18:12:09 +00:00
Bryce Allen
7c332265d9
optional stage via host
2022-10-24 12:15:47 -04:00
Bryce Allen
c9a375df4a
check that nmpi divides n_global
2022-10-24 11:32:49 -04:00
Bryce Allen
88e2d23c7f
fix physical boundary for rank 0, comments
2022-10-24 08:36:41 -05:00
Bryce Allen
4dc1ad4603
add clang-format conf from gtensor
2022-10-24 08:26:00 -05:00
Bryce Allen
35860709f3
fix send/recv size
2022-10-23 17:24:29 -07:00
Bryce Allen
6e98c0c5a4
add ex 2d array with noncontiguous 1d stencil
2022-10-23 23:55:42 +00:00
Bryce Allen
baff75c6b1
add timer for exchange
2022-10-23 22:22:01 +00:00
Bryce Allen
849e894109
remove unneeded syncs
2022-10-23 20:16:18 +00:00
Bryce Allen
4143c5f06f
add n_global arg, print sizes in rank 0
2022-10-23 18:51:17 +00:00
Bryce Allen
74bfc20d50
remove fmt dependency, public oneapi won't build it
2022-10-23 13:27:16 -05:00
Bryce Allen
23d882d089
add 1d stencil example
2022-10-23 13:07:29 -05:00
Bryce Allen
349837e9c7
fix mpi init/set device order
2021-07-17 14:23:50 +00:00
Bryce Allen
df5f830a26
gt and cmake fixes
2021-07-16 22:07:00 -04:00
Bryce Allen
d791b81cb6
add gt port of mpi_daxpy
2021-07-16 21:36:50 -04:00
Bryce Allen
2434b39b53
add mpigatherinplace example for reproducing pmpi wrapper bug
2020-09-02 18:42:48 -04:00
Bryce Allen
7a1d10349e
Use MPI_IN_PLACE in one of the allgathers
...
Try to reproduce nsys segfault seen when running GENE, which
has an in place allgather as the BT for the segfault.
2020-09-02 16:34:37 -04:00
Bryce Allen
cff437eace
barrier off by default
2020-08-11 15:35:17 +00:00
Bryce Allen
cd6e6f7eb5
add jlse runners, more flexible node counter
2020-08-11 15:34:46 +00:00
Bryce Allen
12d76b4a42
update ignores
2020-08-11 10:23:33 -04:00
Bryce Allen
909f8880de
add mpi barrier before allgather
2020-08-10 11:37:45 -04:00
Bryce Allen
924b721ad7
fix summit job script run script arg order
2020-08-10 11:33:00 -04:00
Bryce Allen
02b31f0427
hacky multi-node support
...
assumes 6 procs per node
2020-08-07 18:50:39 -04:00
Bryce Allen
c32b86422f
distribute total across ranks
...
useful for test < 6 ranks per node
2020-08-07 18:50:39 -04:00
Bryce Allen
538c22a22f
add avg script for parsing timings in *.txt
2020-08-07 18:07:56 -04:00
Bryce Allen
37ad5e87ce
use define to switch between managed/unmanaged
2020-08-07 14:14:56 -04:00
Bryce Allen
6940ce7ceb
add mem free print, fit in 8GB gpu
2020-08-07 13:21:22 -04:00
Bryce Allen
3ebd09725e
add mpi wtime counters, fix make clean
2020-08-07 13:05:34 -04:00
Bryce Allen
3dd6045f2e
move finialize to outside profiler area
2020-08-07 13:02:07 -04:00
Bryce Allen
55af9daa9b
make: fix summit build
2020-08-06 11:13:32 -04:00
Bryce Allen
063e592dcf
fix all* cuda malloc size
2020-08-06 11:13:18 -04:00
Bryce Allen
134c933e86
use managed mem for allgather, cleanup
2020-08-06 10:11:58 -04:00
Bryce Allen
714a96d1ea
update cuda errors for 11
...
deprecated API was removed
2020-08-06 07:42:59 -04:00
Bryce Allen
3e99cf443b
fix allgather recv size
2020-08-06 07:42:46 -04:00
Bryce Allen
4d504dd5b1
add versions with nvtx
2020-08-05 16:45:42 -04:00
Bryce Allen
df9a3a79a8
add env var debugging
2020-03-31 14:33:06 -04:00
Bryce Allen
74b23dff0b
initial version
2020-02-24 17:20:21 -05:00