52 Commits (5bcf1382bace425049cb890568aaaa37d696eaa0)
 

Author SHA1 Message Date
Bryce Allen 5bcf1382ba gt: parameterize space
3 years ago
Bryce Allen 4f82b8359b sycl: drop cl namespace, debug dev ids
3 years ago
Bryce Allen cecb98415e cmake: use main gtensor
3 years ago
Bryce Allen 3e13fee811 fix sycl oo version, better debug
3 years ago
Bryce Allen d960429a83 add new sycl version using span2d class
3 years ago
Bryce Allen 8e43edc21e use milliseconds for timing
3 years ago
Bryce Allen 44d72ad2bb switch exchange/stencil dim on sycl example
3 years ago
Bryce Allen 1d779fd510 update comment
3 years ago
Bryce Allen b2ed53adc1 switch boundary exchange / stencil direction
3 years ago
Bryce Allen 936f0851c8 fixes for sycl port
3 years ago
Bryce Allen 55bb0d26d1 WIP add sycl port of stencil2d
3 years ago
Bryce Allen e5e3ca178a more precision when printing timings
3 years ago
Bryce Allen 124654b576 update gt daxpy example for new gt-blas handle api
3 years ago
Bryce Allen 9fb70b5169 print n_iter and n_warmup
3 years ago
Bryce Allen 37a97f24dd add iteration loop
3 years ago
Bryce Allen 2309afb2ab print stage_host
3 years ago
Bryce Allen 7c332265d9 optional stage via host
3 years ago
Bryce Allen c9a375df4a check that nmpi divides n_global
3 years ago
Bryce Allen 88e2d23c7f fix physical boundary for rank 0, comments
3 years ago
Bryce Allen 4dc1ad4603 add clang-format conf from gtensor
3 years ago
Bryce Allen 35860709f3 fix send/recv size
3 years ago
Bryce Allen 6e98c0c5a4 add ex 2d array with noncontiguous 1d stencil
3 years ago
Bryce Allen baff75c6b1 add timer for exchange
3 years ago
Bryce Allen 849e894109 remove unneeded syncs
3 years ago
Bryce Allen 4143c5f06f add n_global arg, print sizes in rank 0
3 years ago
Bryce Allen 74bfc20d50 remove fmt dependency, public oneapi won't build it
3 years ago
Bryce Allen 23d882d089 add 1d stencil example
3 years ago
Bryce Allen 349837e9c7 fix mpi init/set device order
4 years ago
Bryce Allen df5f830a26 gt and cmake fixes
4 years ago
Bryce Allen d791b81cb6 add gt port of mpi_daxpy
4 years ago
Bryce Allen 2434b39b53 add mpigatherinplace example for reproducing pmpi wrapper bug
5 years ago
Bryce Allen 7a1d10349e Use MPI_IN_PLACE in one of the allgathers
5 years ago
Bryce Allen cff437eace barrier off by default
5 years ago
Bryce Allen cd6e6f7eb5 add jlse runners, more flexible node counter
5 years ago
Bryce Allen 12d76b4a42 update ignores
5 years ago
Bryce Allen 909f8880de add mpi barrier before allgather
5 years ago
Bryce Allen 924b721ad7 fix summit job script run script arg order
5 years ago
Bryce Allen 02b31f0427 hacky multi-node support
5 years ago
Bryce Allen c32b86422f distribute total across ranks
5 years ago
Bryce Allen 538c22a22f add avg script for parsing timings in *.txt
5 years ago
Bryce Allen 37ad5e87ce use define to switch between managed/unmanaged
5 years ago
Bryce Allen 6940ce7ceb add mem free print, fit in 8GB gpu
5 years ago
Bryce Allen 3ebd09725e add mpi wtime counters, fix make clean
5 years ago
Bryce Allen 3dd6045f2e move finialize to outside profiler area
5 years ago
Bryce Allen 55af9daa9b make: fix summit build
5 years ago
Bryce Allen 063e592dcf fix all* cuda malloc size
5 years ago
Bryce Allen 134c933e86 use managed mem for allgather, cleanup
5 years ago
Bryce Allen 714a96d1ea update cuda errors for 11
5 years ago
Bryce Allen 3e99cf443b fix allgather recv size
5 years ago
Bryce Allen 4d504dd5b1 add versions with nvtx
5 years ago