Quick Start Guide

The finnts package, commonly referred to as “Finn”, is a standardized times series forecast framework developed by Microsoft Finance. It’s a result of years of effort trying to perfect a centralized forecasting practice that everyone in finance could leverage. Even though it was built for finance like forecasts, it can easily be extended to any type of time series forecast.

Finn takes years of hard work and thousands of lines of code, and simplifies the forecasting process down to a an AI agent that can optimize a forecast for you or a few function calls that give you total control over the forecast Finn creates. In order to leverage the best components of Finn, please check out all of the other vignettes within the package.

library(finnts)

browseVignettes("finnts")

The easiest way to use Finn is through the built in AI agent. The agent learns about your data and runs experiements to create the most accurate forecast possible. Here’s how to get started with the agent.

1. Bring Data

Data used in Finn needs to follow a few requirements, called out below.

Data is tabular, formatted as data frame, tibble, or spark data frame.
Needs a time stamp or date column, which needs to be formatted as a date and labeled as “Date”. The date values need to start at the beginning of the period. For example, a monthly data set needs to have each date period started on the first day of each month. For a quarterly forecast, the first day of the quarter, etc.
Contains at least one unique label to identify one time series from another. These are sometimes referred to as “data combos” or “combo variables” in Finn. For example, a monthly forecast by country should have a column with country names to help Finn split out each country into separate time series.
No duplicate rows at the intersection of data combos and date.
Column headers should contain only letters, numbers, and underscores. They should also start with a letter, not a number. These requirements ensure that R/Python handle your data frame correctly without any errors.
External regressors are optional, they’re not required to produce a Finn forecast. To learn more about how to use them, please check out the vignette on external regressors.

A good example to use when producing your first Finn forecast is to leverage existing data examples from the timetk package. Let’s take a monthly example and trim it down to speed up the run time of your first Finn forecast.

library(finnts)

hist_data <- timetk::m4_monthly %>%
  dplyr::filter(date >= "2013-01-01") %>%
  dplyr::rename(Date = date) %>%
  dplyr::mutate(id = as.character(id))

print(hist_data)
#> # A tibble: 120 × 3
#>    id    Date       value
#>    <chr> <date>     <dbl>
#>  1 M1    2013-01-01  9120
#>  2 M1    2013-02-01  8280
#>  3 M1    2013-03-01  7860
#>  4 M1    2013-04-01  7150
#>  5 M1    2013-05-01  8110
#>  6 M1    2013-06-01 10860
#>  7 M1    2013-07-01 10730
#>  8 M1    2013-08-01  9610
#>  9 M1    2013-09-01  8270
#> 10 M1    2013-10-01  9200
#> # ℹ 110 more rows

print(unique(hist_data$id))
#> [1] "M1"    "M2"    "M750"  "M1000"

The above data set contains 4 individual time series, identified using the “id” column.

2. Set Up Agent

Before any forecasts are created, we need to set up the agent. This is done by setting up a new project with set_project_info(), then initialize the agent with set_agent_info().

In order to use the agent, we must first connect it to a LLM using the ellmer package.

# connect to LLM via Azure AI
driver_llm <- ellmer::chat_azure_openai(model = "gpt-4o-mini")

Now we can use that LLM inside of our agent when setting it up.

# set up new forecast project and agent run
project <- set_project_info(
  project_name = "Demo_Project",
  combo_variables = c("id"),
  target_variable = "value",
  date_type = "month"
)

agent <- set_agent_info(
  project_info = project,
  driver_llm = driver_llm,
  input_data = hist_data,
  forecast_horizon = 12,
  hist_end_date = as.Date("2014-12-01")
)

3. Iterate Forecast

Once the agent is set up and fed data, we can now have it automatically optimize for the most accurate forecast. The agent will try various combinations of feature engineering techniques, ML algorithms, etc. to create the best forecast. We can give the agent constraints around how many iterations it can try and what accuracy goal it should strive to beat. Let’s have it run at most 3 different forecast iterations with an accuracy goal of a 3% weighted MAPE.

iterate_forecast(
  agent_info = agent,
  max_iter = 3,
  weighted_mape_goal = 0.03
)

4. Analyze Forecast Output

After the agent completes the forecast iteration, we can take a look at the final results.

forecast_output <- get_agent_forecast(agent_info = agent)

Here’s a breakdown of the major columns in the Finn forecast output.

Combo: Unique identifier of each time series. It’s a combination of combo variables defined in set_project_info().
Model_ID: Unique identifier of each model used to create the forecast. It’s a combination of the columns Model_ID, Model_Type, and Recipe_ID. Multiple models separated by “_” mean that multiple models were averaged together to create the final forecast.
Model_Name: Which model was trained to produce the forecast.
Model_Type: How the model was trained. Models can be trained on a single time series (local), multiple time series (global), or a combination of models via a simple average (simple_average).
Recipe_ID: What kind of feature engineering recipe was applied to the data before model training.
Run_Type: Separates what data is the final future forecast (Future_Forecast), historical back testing (Back_Test), and validation used for hyperparameter tuning of specific models (Validation).
Train_Test_ID: Unique ID to separate train test splits during the time series cross-validation process.
Best_Model: Simple flag that shows the most accurate Model_ID per time series.
Horizon: The forecast horizon.
Target: The historical values of the target variable defined in set_project_info().
Forecast: Forecast created by the specific Model_ID.
Prediction Intervals: 80% and 95% prediction intervals for the future forecast, created from the back testing results.

To see which inputs were used by the agent to create the optimal forecast, you can call get_best_agent_run(). This allows us to see what kind of feature engineering was applied, as well as what models were chosen to run. More information about the inputs used by the agent in each Finn run iteration can be found in the other vignettes.

agent_run_results <- get_best_agent_run(agent_info = agent, full_run_info = TRUE)

5. Update Forecast

After the agent runs iterate_forecast(), we can use the previously trained models to get updated forecasts with new data. This makes it easy to get updated forecasts fast. New time series that appear in the data are automatically forecast using default local model inputs. If individual time series fail during the update process, they are also re-forecast with defaults rather than stopping the entire run. If too many series fail (>20% of existing series, with a minimum of 10 failed series), the run will error and direct you to use iterate_forecast() instead.

# set up agent with updated data
# overwrite creates a new version of the agent, which is required when running update_forecast()
agent <- set_agent_info(
  project_info = project,
  driver_llm = driver_llm,
  input_data = hist_data,
  forecast_horizon = 6,
  hist_end_date = as.Date("2015-06-01"),
  overwrite = TRUE
)

# update forecast
update_forecast(
  agent_info = agent,
  weighted_mape_goal = 0.03
)

# get updated forecast output
updated_forecast_output <- get_agent_forecast(agent_info = agent)