Generally one should standardize data before applying PCA, because PCA assumes data to be N-dimensional Gaussian cloud of points with the same variance for all observed features. Then PCA finds new hidden orthogonal directions ("factors") along which the variance is maximized.
Without standardizing, some feature can dominate just because of its scale. It's not so obvious for Iris dataset, because all 4 features are measured in cm and have similar scale.
Please see https://scikit-learn.org/stable/modules/decomposition.html#factor-analysis for some theory behind PCA and Factor Analysis, especially assumptions on noise variance.
Also, https://scikit-learn.org/stable/auto_examples/decomposition/plot_varimax_fa.html seems to be exactly what you are looking for. Note that all decomposition in this example is performed on standardized data.