I believe your main concern is that similar results were produced as in the original article, but the reproduced code did not properly implement the method presented in the article in question. actully most of time there is no easy way around this and you need to understand the paper properly and try to reproduce full experimental setup authors have been taken (like same data splits, same preprocessing, .etc) to produce the results. a sign of success would be achieving results within the reported variance (especially when papers report mean ± std over multiple runs).
Matching reported metrics (e.g., accuracy, F1, BLEU, etc.) is a strong indicator, but not a guarantee that your implementation matches theirs functionally or fairly. so what i suggest is to try regenerating curves and diagrams and comparing theme to the ones presented in the paper. I also use an LLM model like gpt to check my implementation matches to the provided algorithm and paper which may not be authentic, but i do it as an assurance.
Finally consider that the paper results may be cherry-picked or forged, so always try different papers and don't just duvil on one paper.