最近用cuda去做加速计算,发现当计算数据量较大时,报错“misaligned address”,如下:

出现这种情况可能是因为指针没有与处理器所需的边界对齐造成的。
From the CUDA Programming Guide, section 5.3.2:
Global memory instructions support reading or writing words of size equal to 1, 2, 4, 8, or 16 bytes. Any access (via a variable or a pointer) to data residing in global memory compiles to a single global memory instruction if and only if the size of the data type is 1, 2, 4, 8, or 16 bytes and the data is naturally aligned (i.e., its address is a multiple of that size).
This is what the debugger is trying to tell you: Basically, you shouldn't dereference a pointer pointing to a 32-bit value from an address not aligned at a 32-bit boundary.
在使用CUDA进行大规模计算时遇到了'misaligned address'错误。该错误通常由于指针地址未按处理器要求的边界对齐导致。CUDA编程指南指出,全局内存操作必须针对1,2,4,8或16字节大小的数据,并且数据必须自然对齐。这意味着32位数据的地址应当是32位的倍数。为解决此问题,需要确保指针指向的数据地址正确对齐。

2056

被折叠的 条评论
为什么被折叠?



