Prophet is a great time-series forecasting library, but it is known to struggle with count data, especially when being close to zero. I’ve encountered this issue frequently in my work, which ultimately led me to develop a new Prophet-inspired library: Gloria.
Gloria addresses this problem by introducing a number of new distributions beyond the normal distribution. For instance, count data can be handled using Poisson, Binomial, Negative Binomial, or Beta-Binomial distributions. The following code block showcases how I would try to treat your data, which is similar to what is shown in the Saturation tutorial:
import pandas as pd
from gloria import Gloria, cast_series_to_kind, CalendricData
# Load the data
data = pd.read_csv("headcounts.csv")
# Save the column names for later use
timestamp_name="Date"
metric_name="Headcount"
# Convert timestamp to datetime
data[timestamp_name] = pd.to_datetime(data[timestamp_name])
# Ensure metric is an unsigned integer
data[metric_name] = cast_series_to_kind(data[metric_name], "u")
# Set up the Gloria model
m = Gloria(
model="binomial",
timestamp_name=timestamp_name,
metric_name=metric_name,
sampling_period="15min",
n_changepoints = 0
)
# Create protocol for calendric data
calendric_protocol = CalendricData(country = "US")
# Add the protocol
m.add_protocol(calendric_protocol)
# Fit the model to the data
m.fit(data, capacity = 180)
# Predict
forecast = m.predict(periods=24 * 60 * 4)
# Plot the results
m.plot(forecast, show_capacity=True)
m.plot_components(forecast)
Some remarks:
Using Gloria, you need to pick the distribution model when instantiating the Gloria object. As your count data are capped at 180, it certainly makes sense to try model="binomial"
. The capacity is later passed as an argument to the fit()
method. If your data show overdispersion (=excessive noise), you can try "beta-binomial"
instead.
Gloria is rather strict enforcing data types. Count data cannot be floats and must be non-negative. Accordinly, they should be unsigned ints. We ensure that using the helper function cast_series_to_kind()
(see in docs).
Gloria offers the possibility to handle non-daily data. Actually, you can do that too with Prophet, but its API is a bit inconvenient in this situation. For this reason, you tell the Gloria object on what time grid your input data live using sampling_period="15min"
.
You are using a flat trend. Gloria does not (yet) include that, but you can achieve similar behaviour, by turning off changepoints setting n_changepoints=0
. Your trend function will have a nonzero intercept and growth rate, but no rate changes.
Gloria objects do not include any holidays or seasonalities by default. This design choice was made to prepare Gloria for use cases based on non-calendric data. Instead, we introduced protocols (see API documentation and Calendric Data tutorial). Protocols bundle default events and seasonalities for certain data sources. The CalendricData
protocol emulates Prophet's default behaviour and adds weekly and yearly seasonalities as well as country holidays, if you specify the country of interest. Eventually you add the protocol to the model using m.add_protocol(calendric_data)
.
Your data show a drop lasting about a month around December 23 /Jan 24. My above code example won't be prepared for this drop and the fit in this range will be poor. You could model that allowing the trend to go to closer to zero. You would need to introduce custom changepoints as the Modeling Trends tutorial demonstrates, but there is another way: Gloria generalizes Prophet's holiday regressor concept to more general events. You could leverage this system by introducing a Gaussian-shaped pulse event of a month length (the length may need to be adapted), add it to your model using m.add_event()
and setting its anchor time the center of the drop. The event will act as an envelope on your prediction and be optimized such as to suppress the weekly oscillations to match your drop in the data. You can model it in the following way
m = Gloria(...)
# Define the event profile
profile = Gaussian(width="30d") # one month Gaussian drop
# Add event to model
m.add_event(
name="drop",
regressor_type="SingleEvent", # choose the regressor class
profile=profile, # attach the profile
t_anchor="2024-01-01" # anchor time
)
m.fit(...)
Fitting your data in this way should give you a number of advantages
No more drops in the prediction below zero.
All predictions and confidence bands are integer values, respecting the lower and upper bounds of your data.
The prediction should better match your observed weekly peaks (Prophet pulls the prediction down towards zero due to the zero-inflated data. This way the prediction average fits the data at the expense of extreme values)
Fitting the drop with an event clearly separates this vacation effect from other patterns.
Regards,
Benjamin