79807088

Date: 2025-11-02 11:32:46
Score: 0.5
Natty:
Report link

Figured it out, it was related to the work sizes.

By setting the local_work_size to NULL I think it's iterating single process through the seed_ranges, if you set the global_work_size to 28 (number of cores) and the local_work_size to 1 then it will fully utilise the CPU.

I didn't change the work_dim though.

uint64_t global = num_seed_ranges; // 28 in my case
uint64_t local = 1;
error = clEnqueueNDRangeKernel(
    commands, //command queue
    ko_part_b, // kernel
    1, NULL, // work dimension stuff
    &global, // global work size (num of cores) 
    &local, // local work size (1)
    0, NULL, NULL // event queue stuff
);

Final Results:

Reasons:
  • Long answer (-0.5):
  • Has code block (-0.5):
  • Self-answer (0.5):
  • Low reputation (1):
Posted by: Richard Clubb