79299446

Date: 2024-12-21 13:07:30
Score: 2
Natty:
Report link

Thanks, both - had some interesting results when running through clang, with Peter's suggestions.

;;; 16-byte align "numbers"
.p2align 4, 0x0               ; previously 2
numbers: .long 1, 2, 3, 4     ; previously .word

.global _start
.p2align 2                    ; reset alignment

_start:
    ;;; Back up x29 and x30, and move stack pointer
    sub sp, sp, #32
    stp x29, x30, [sp, #16]
    add x29, sp, #16

    ;;; Load numbers, as Nate has suggested
    adrp     x8, numbers@page

    ;;; Slightly different `ldr` approach, using q0
    ldr q0, [x8, numbers@pageoff]

    ;;; Accumulate vector
    addv.4s s0, v0

    ;;; Move 32-bit result to 32-bit GP register
    fmov w8, s0

    ;;; Store 64-bit register counterpart onto the stack for printing
    str x8, [sp]

    ;;; Prime string for printing
    adrp    x0, format@page
    add x0, x0, format@pageoff

    ;;; Print string
    bl _printf

    ;;; Prepare "return 0" from "int main()"
    mov w0, #0

    ;;; Restore x29, x30, and original stack pointer
    ldp x29, x30, [sp, #16]
    add sp, sp, #32

    ;;; "return 0"
    ret

format:
    .asciz "Answer: %u.\n"

I received the following alignment error from the linker...

ld: 'numbers' from 'assembly.o' at 0x100003F6C not 16-byte aligned, which cannot be encoded as a target of LDR/STR in '_start'+12 from 'assembly.o'
final section layout:
    __PAGEZERO           addr=0x00000000, size=0x100000000, fileOffset=0x00000000
    __TEXT               addr=0x100000000, size=0x00004000, fileOffset=0x00000000
        __text           addr=0x100003f30, size=0x0000005d, fileOffset=0x00003f30
        __stubs          addr=0x100003f90, size=0x0000000c, fileOffset=0x00003f90
        __unwind_info    addr=0x100003f9c, size=0x00000060, fileOffset=0x00003f9c
    __DATA_CONST         addr=0x100004000, size=0x00004000, fileOffset=0x00004000
        __got            addr=0x100004000, size=0x00000008, fileOffset=0x00004000
    __LINKEDIT           addr=0x100008000, size=0x00004000, fileOffset=0x00008000

..., and needed the .p2align 4, 0x0 for numbers, in order to make it work.

Interesting to see the use of ldr q0, ... instead of ld1 { v0.4s }, ... and addv.4s s0, v0 instead of addv s0, v0.4s, from the compiler.

Will need to do some more research into alignment, experimenting with other instructions, and the choice of x8 over, say, x2 or x3 (avoiding argument registers, maybe?).

Thanks again for your help.

Reasons:
  • Blacklisted phrase (0.5): Thanks
  • Long answer (-1):
  • Has code block (-0.5):
  • Contains question mark (0.5):
  • Self-answer (0.5):
  • Looks like a comment (1):
  • Low reputation (1):
Posted by: ironman