Apparently, changing the rsp substraction of 64 to 56 makes the code work. That implies that the program started with a 16-byte unaligned stack. However I thought that in x64 ABI, the stack is always 16-byte aligned at the very start of a process. Can someone tell me by their own experience if the stack is always 16-byte aligned/unaligned at the beginning of a process, on windows? I can't find the information on microsoft's x64 abi documentation
Working version:
BITS 64
DEFAULT REL
lea rcx, [filename] ; (filename: db "D:\Hello.txt", 0)
mov edx, 0x20000000
mov r8d, 1
xor r9d, r9d
sub rsp, 56 ;<--- was 64
mov qword [rsp+32], 2
mov qword [rsp+40], 0x80
mov qword [rsp+48], 0
call [CreateFileA]
add rsp, 56 ;<--- was 64
wat: jmp wat