Thanks, both - had some interesting results when running through clang
, with Peter's suggestions.
;;; 16-byte align "numbers"
.p2align 4, 0x0 ; previously 2
numbers: .long 1, 2, 3, 4 ; previously .word
.global _start
.p2align 2 ; reset alignment
_start:
;;; Back up x29 and x30, and move stack pointer
sub sp, sp, #32
stp x29, x30, [sp, #16]
add x29, sp, #16
;;; Load numbers, as Nate has suggested
adrp x8, numbers@page
;;; Slightly different `ldr` approach, using q0
ldr q0, [x8, numbers@pageoff]
;;; Accumulate vector
addv.4s s0, v0
;;; Move 32-bit result to 32-bit GP register
fmov w8, s0
;;; Store 64-bit register counterpart onto the stack for printing
str x8, [sp]
;;; Prime string for printing
adrp x0, format@page
add x0, x0, format@pageoff
;;; Print string
bl _printf
;;; Prepare "return 0" from "int main()"
mov w0, #0
;;; Restore x29, x30, and original stack pointer
ldp x29, x30, [sp, #16]
add sp, sp, #32
;;; "return 0"
ret
format:
.asciz "Answer: %u.\n"
I received the following alignment error from the linker...
ld: 'numbers' from 'assembly.o' at 0x100003F6C not 16-byte aligned, which cannot be encoded as a target of LDR/STR in '_start'+12 from 'assembly.o'
final section layout:
__PAGEZERO addr=0x00000000, size=0x100000000, fileOffset=0x00000000
__TEXT addr=0x100000000, size=0x00004000, fileOffset=0x00000000
__text addr=0x100003f30, size=0x0000005d, fileOffset=0x00003f30
__stubs addr=0x100003f90, size=0x0000000c, fileOffset=0x00003f90
__unwind_info addr=0x100003f9c, size=0x00000060, fileOffset=0x00003f9c
__DATA_CONST addr=0x100004000, size=0x00004000, fileOffset=0x00004000
__got addr=0x100004000, size=0x00000008, fileOffset=0x00004000
__LINKEDIT addr=0x100008000, size=0x00004000, fileOffset=0x00008000
..., and needed the .p2align 4, 0x0
for numbers, in order to make it work.
Interesting to see the use of ldr q0, ...
instead of ld1 { v0.4s }, ...
and addv.4s s0, v0
instead of addv s0, v0.4s
, from the compiler.
Will need to do some more research into alignment, experimenting with other instructions, and the choice of x8
over, say, x2
or x3
(avoiding argument registers, maybe?).
Thanks again for your help.