79699738

Date: 2025-07-13 05:04:00
Score: 0.5
Natty:
Report link

You’re definitely on the right path, and your intuition about the peaks between 5–10 and 35–40 is spot on. I ran your dataset through KDE using scipy.stats.gaussian_kde, and it works beautifully with a tighter bandwidth.

Here's the idea:

I'm using the following code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
from scipy.signal import find_peaks

# Data
a = np.array([68, 63, 20, 55, 1, 21, 55, 58, 14, 4, 40, 54, 33, 71, 36, 38, 9, 51, 89, 40, 13, 98, 46, 12, 21, 26, 40, 59, 17, 0, 5, 25, 19, 49, 91, 55, 39, 82, 57, 28, 54, 58, 65, 2, 39, 42, 65, 1, 93, 8, 26, 69, 88, 32, 15, 10, 95, 11, 2, 44, 66, 98, 18, 21, 25, 17, 41, 74, 12, 4, 33, 93, 65, 33, 25, 76, 84, 1, 63, 74, 3, 39, 9, 40, 7, 81, 55, 78, 7, 5, 99, 37, 7, 82, 54, 16, 22, 24, 23, 3])

# Fit KDE using scipy
kde = gaussian_kde(a, bw_method=0.2)
x = np.linspace(0, 100, 1000)
y = kde(x)

# Find all peaks
peaks, properties = find_peaks(y, prominence=0.0005)  # Adjust as needed

# Sort peaks by height (y value)
top_two_indices = peaks[np.argsort(y[peaks])[-2:]]
top_two_indices = top_two_indices[np.argsort(x[top_two_indices])]  # left to right

# Plot
plt.figure(figsize=(14, 7))
plt.plot(x, y, label='KDE', color='steelblue')
plt.fill_between(x, y, alpha=0.3)

# Annotate top 2 peaks
for i, peak in enumerate(top_two_indices, 1):
    plt.plot(x[peak], y[peak], 'ro')
    plt.text(x[peak], y[peak] + 0.0005,
             f'Peak {i}\n({x[peak]:.1f}, {y[peak]:.3f})',
             ha='center', color='red')

plt.title("Top 2 Peaks in KDE")
plt.xlabel("a")
plt.ylabel("Density")
plt.xticks(np.arange(0, 101, 5))
plt.grid(True, linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()

Which displays

enter image description here

A few important notes:

  1. Prominence matters: I used prominence=0.0005 in find_peaks() — this helps ignore tiny local bumps and just focus on meaningful peaks. You can tweak it if your data changes.

  2. Bandwidth is everything: The choice of bandwidth (bw_method=0.2 in this case) controls the smoothness of the KDE. If it's too high, peaks will be smoothed out. Too low, and you’ll get noisy fluctuations.

  3. Automatic bandwidth selection: If you don’t want to hard-code bw_method, you can automatically select the optimal bandwidth using cross-validation. Libraries like sklearn.model_selection.GridSearchCV with KernelDensity from sklearn.neighbors let you fit multiple models with different bandwidths and choose the one that best fits the data statistically.

But honestly — for this particular dataset, manually setting bw_method=0.2 works great and reveals exactly the two main peaks you're after (one around ~7, the other near ~38). But for production-level or general-purpose analysis, incorporating automatic bandwidth selection via cross-validation can make your approach more adaptive and robust.

Reasons:
  • Blacklisted phrase (1): enter image description here
  • Long answer (-1):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: Shinde Aditya