79551671

Date: 2025-04-02 22:18:12
Score: 2
Natty:
Report link

You are in a good path if you are already thinking about optimization of your code. I must however point out, that writing good quality code, comes with the cost of spending a lot of time learning your tools, in this case the pandas library. This video is how I was introduced to the topic, and personally I believe it helped me a lot.

If I understand correctly you want to: filter specific crime types, group them by month and add up occurrences, and finally plot monthly crime evolution for each type.

Trying out your code three times back to back I got 4.4346, 3.6758 and 3.9400 s execution time -> mean 4.0168 s (not counting time taken to load dataset, used time.perf_counter()). The data used where taken from NYPD database (please include your data source when posting questions).

crime_counts is what we call, a pivot table, and it handles what you did separately for each crime type, while also saving them in an analysis-friendly pd.DataFrame format.

t1 = time.perf_counter()
# changing string based date to datetime object
df["ARREST_DATE"] = pd.to_datetime(df["ARREST_DATE"], format='%m/%d/%Y')
# create pd.Series object of data on a monthly frequency [length = df length]
df["ARREST_MONTH"] = df["ARREST_DATE"].dt.to_period('M') # no one's stopping you from adding new columns



# Filter the specific crime types
crime_select = ["DANGEROUS DRUGS", "ASSAULT 3 & RELATED OFFENSES", "PETIT LARCENY", "FELONY ASSAULT", "DANGEROUS WEAPONS"]
filtered = df.loc[df["OFNS_DESC"].isin(crime_select), ["ARREST_MONTH", "OFNS_DESC"]]

crime_counts = (filtered
                .groupby(["ARREST_MONTH", "OFNS_DESC"])
                .size()
                .unstack(fill_value=0))  # Converts grouped data into a DataFrame

# Plot results
crime_counts.plot(figsize=(12,6), title="Monthly Crime Evolution")
plt.xlabel("Arrest Month")
plt.ylabel("Number of Arrests")
plt.legend(title="Crime Type")
plt.grid(True)

t2 = time.perf_counter()
print(f"Time taken to complete operations: {t2 - t1:0.4f} s")

plt.show()

Above code completed three runs in 2.5432, 2.6067 and 2.4947 s -> mean 2.5482 s. Adding up to a ~36.56% speed increase.

Note: Did you include the dataset loading time into your execution time measurements? I found that by keeping df loaded and only running the calculations part, yields about 3.35s for your code, and 1.85s for mine.

Reasons:
  • Blacklisted phrase (1): helped me a lot
  • Blacklisted phrase (1): This video
  • Long answer (-1):
  • Has code block (-0.5):
  • Contains question mark (0.5):
  • Low reputation (1):
Posted by: John Skounakis