Bryce Allen
|
7a1d10349e
|
Use MPI_IN_PLACE in one of the allgathers
Try to reproduce nsys segfault seen when running GENE, which
has an in place allgather as the BT for the segfault.
|
5 years ago |
Bryce Allen
|
cff437eace
|
barrier off by default
|
5 years ago |
Bryce Allen
|
cd6e6f7eb5
|
add jlse runners, more flexible node counter
|
5 years ago |
Bryce Allen
|
909f8880de
|
add mpi barrier before allgather
|
5 years ago |
Bryce Allen
|
02b31f0427
|
hacky multi-node support
assumes 6 procs per node
|
5 years ago |
Bryce Allen
|
c32b86422f
|
distribute total across ranks
useful for test < 6 ranks per node
|
5 years ago |
Bryce Allen
|
37ad5e87ce
|
use define to switch between managed/unmanaged
|
5 years ago |
Bryce Allen
|
6940ce7ceb
|
add mem free print, fit in 8GB gpu
|
5 years ago |
Bryce Allen
|
3ebd09725e
|
add mpi wtime counters, fix make clean
|
5 years ago |
Bryce Allen
|
3dd6045f2e
|
move finialize to outside profiler area
|
5 years ago |
Bryce Allen
|
063e592dcf
|
fix all* cuda malloc size
|
5 years ago |
Bryce Allen
|
134c933e86
|
use managed mem for allgather, cleanup
|
5 years ago |
Bryce Allen
|
3e99cf443b
|
fix allgather recv size
|
5 years ago |
Bryce Allen
|
4d504dd5b1
|
add versions with nvtx
|
5 years ago |