The question seems a little unclear but essentially I think your problem has a solution that can be reduced to the below points.
Graph-Level Prediction Needed: Current setup makes node-level predictions, but task requires a single prediction per graph.
Loss Calculation Misalignment: BCEWithLogitsLoss applied to all nodes; should focus on the "correct" node or aggregate node embeddings.
Label-Output Mismatch: Ensure truth tensor in train() and val() matches intended output for only the "correct" node
Rank Calculation Issue in test(): Only rank the "correct" node or relevant nodes, not all nodes.
Pooling Layer for Aggregation: Use a global pooling layer (e.g., global_mean_pool) to create a graph-level embedding.