Bryce Allen
|
df5f830a26
|
gt and cmake fixes
|
4 years ago |
Bryce Allen
|
d791b81cb6
|
add gt port of mpi_daxpy
|
4 years ago |
Bryce Allen
|
2434b39b53
|
add mpigatherinplace example for reproducing pmpi wrapper bug
|
5 years ago |
Bryce Allen
|
7a1d10349e
|
Use MPI_IN_PLACE in one of the allgathers
Try to reproduce nsys segfault seen when running GENE, which
has an in place allgather as the BT for the segfault.
|
5 years ago |
Bryce Allen
|
cff437eace
|
barrier off by default
|
5 years ago |
Bryce Allen
|
cd6e6f7eb5
|
add jlse runners, more flexible node counter
|
5 years ago |
Bryce Allen
|
12d76b4a42
|
update ignores
|
5 years ago |
Bryce Allen
|
909f8880de
|
add mpi barrier before allgather
|
5 years ago |
Bryce Allen
|
924b721ad7
|
fix summit job script run script arg order
|
5 years ago |
Bryce Allen
|
02b31f0427
|
hacky multi-node support
assumes 6 procs per node
|
5 years ago |
Bryce Allen
|
c32b86422f
|
distribute total across ranks
useful for test < 6 ranks per node
|
5 years ago |
Bryce Allen
|
538c22a22f
|
add avg script for parsing timings in *.txt
|
5 years ago |
Bryce Allen
|
37ad5e87ce
|
use define to switch between managed/unmanaged
|
5 years ago |
Bryce Allen
|
6940ce7ceb
|
add mem free print, fit in 8GB gpu
|
5 years ago |
Bryce Allen
|
3ebd09725e
|
add mpi wtime counters, fix make clean
|
5 years ago |
Bryce Allen
|
3dd6045f2e
|
move finialize to outside profiler area
|
5 years ago |
Bryce Allen
|
55af9daa9b
|
make: fix summit build
|
5 years ago |
Bryce Allen
|
063e592dcf
|
fix all* cuda malloc size
|
5 years ago |
Bryce Allen
|
134c933e86
|
use managed mem for allgather, cleanup
|
5 years ago |
Bryce Allen
|
714a96d1ea
|
update cuda errors for 11
deprecated API was removed
|
5 years ago |
Bryce Allen
|
3e99cf443b
|
fix allgather recv size
|
5 years ago |
Bryce Allen
|
4d504dd5b1
|
add versions with nvtx
|
5 years ago |
Bryce Allen
|
df9a3a79a8
|
add env var debugging
|
6 years ago |
Bryce Allen
|
74b23dff0b
|
initial version
|
6 years ago |