Im using CatBoost on binary dataset HP determined by optuna. Optuna shows high AUC with test on test file, but when test real model on the same data, then there isn't only noticeable difference, AUC there its around 60%+ where optuna reports 99%. There is way to much discrepancy for the same data !? There is something fishy here ? The only explanation is that there for the same data optuna determined HP for chosen ML model, later real ML model with optuna determined HP doesn't handle data the same way !? I tested a lot, AUC newer goes up, what eves principle i may ad to optuna to gain better HP (sampling, SMOTE - different approaches, to handle class imbalance). So even if i use praised optuna i don't gain better AUC ! Why's that ? Neural networks for given train and test data should be able to reproduce data pattern in given confidence level (may not be 100%, but also not 50 or 60% ! I read somewhere that scikit isn't capable solution. And that's why optuna should be incorporated - built in in software (in NN's directly) so u get predefined model with workflows exactly that way in the same way to handle data. It may be very challenging for developers to do so, but its the only way to get end user to desired end results. Recipes for using data mining and mashine learning are worth nothing with such approaches. End user simple doesn't get the end results.