79332018

Date: 2025-01-06 04:00:49
Score: 0.5
Natty:
Report link

I want to thank some of the helpful comments.

I believe I have answers to most of the questions I had previously, and want to write a short summary here.

To map data stored in embedded pointers inside a struct/class such as

struct Dataset {
    int len = 0;
    float *data = nullptr;
    Dataset() {}
    Dataset(int len0, int *data0) {...}
} readonlydata, readwritedata;

the following omp pragma works on most compilers (gcc 11 to 14, clang 16+)

map(to: readonlydata) map(to: readonlydata.data[0:readonlydata.len]) \
map(tofrom: readwritedata) map(tofrom: readonlydata.data[0:readonlydata.len])

The data pointer must be separately mapped in order to pass those to the device.

this was mostly inspired by the "Deep-Copy" OpenACC example shared by Mat Colgrove

https://developer.download.nvidia.com/assets/pgi-legacy-support/Deep-Copy-Support-in-OpenACC_PGI.pdf

It appears that OpenMP also supports using variables as array length at runtime.

Based on the OpenMP 5.1 examples, another way to map such nested dynamic data is to use declare mapper(), which does not apply to individual variable, but applies to the struct type (typedef)

typedef struct Dataset dataset;
#pragma omp declare mapper(dataset ds) map(ds, ds.data[0:ds.len])

Unfortunately, it appears that declare mapper() clause is currently not supported in either gcc or nvc.

Now, regarding gcc, clang and nvc, the completeness and robustness of their OpenMP GPU offloading features are quite uneven and overall buggy.

Among these 3 compilers, nvc is the most robust and also offers the highest gpu speed after offloading. However, it is only supported on Linux. gcc/clang can build on Mac/Windows, but both produced slow/unoptimized binaries. gcc-12 is relatively the more stable one, but the binary is also quite slow. gcc-11 can build my code, but does not run properly on some GPUs; gcc-13/14 both can build, but won't run. I have found a number of regressions that were related to those error messages.

Some commonly seen gcc error messages when building nvptx with gcc-11 to 13

as of now (Jan of 2025), gcc's GPU offloading is still quite buggy and unoptimized. nvc is the quicker solution to get the code to build and run.

Reasons:
  • RegEx Blacklisted phrase (1): I want
  • Long answer (-1):
  • Has code block (-0.5):
  • Contains question mark (0.5):
  • Self-answer (0.5):
Posted by: FangQ