Reports

Your question is missing some details but assuming my assumptions are correct:

I get an error Couldn't open shared file mapping:.. when running this code, most likely because the tensor is implicitly being copied to shared memory and second copy does not fit. There is exactly the same error if I call share_memory_() on this tensor explicitly, for the same reason.

This is correct. You will end up with two tensors:

Original CPU tensor (private memory)
Shared-memory tensor (a copy)

And as you say, it won't fit.

One approach besides the file thing could be to use multiprocessing's shared_memory e.g.

import torch
import numpy as np
from multiprocessing import shared_memory

tensor_shape = (1024, 1024, 512)
dtype = np.float32
num_elements = np.prod(tensor_shape)

sh_mem = shared_memory.SharedMemory(create=True, size=num_elements * np.dtype(dtype).itemsize)
np_array = np.ndarray(tensor_shape, dtype=dtype, buffer=sh_mem.buf)

# create tensor without actually copying data
tensor = torch.from_numpy(np_array)

As further proof of no copying, you can check the base pointer of each:

>>> print(np_array.ctypes.data)
133277195173888
>>> print(tensor.data_ptr())
133277195173888

and they should match up.

79490320