without any library? the closest I can think of is writing PTX assembly by hand, and using system calls to send it directly to the GPU. It can be quite an adventure.