Extending the data science workflow: {vetiver} and {pins}

R-Ladies Rome

Isabel Zimmerman || Posit, PBC

while you listen

  • if you would like, there is a very small code-along at the end of this talk!
  • to prepare, open up RStudio, and have vetiver, pins, tidymodels, and ranger packages

who am I?

best practices in data science lifecycle

import pandas as pd
import numpy as np

np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
    "celebrity", "danger", "animals"]].dropna()

best practices in data science lifecycle

import pandas as pd
import numpy as np

np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, ensemble

X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)

best practices in data science lifecycle

import pandas as pd
import numpy as np

np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, ensemble

X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)
oe = preprocessing.OrdinalEncoder().fit(X_train)
rf = ensemble.RandomForestRegressor().fit(oe.transform(X_train), y_train)
rf_pipe = pipeline.Pipeline([('ordinal_encoder',oe), ('random_forest', rf)])

many of the problems I faced were different, though

if you develop models…

you can operationalize them

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

and these practices can be HARD.

what are some tasks vetiver helps with?

  • model versioning
  • model deployment
  • model monitoring

versioning

versioning

model

model_final

model_final_final

model_final_final_actually

model_final_final_actually (1)

{pins}

pins package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues

  • can be pinned in a variety of locations (AWS, GCP, GitHub, Posit Connect, and more!)
  • good for sharing models, data, files, .Rds objects
  • good for a pipelines where data is read and/or updated
  • bad when multiple people are writing data at once

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # create place for models to be stored
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads")

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # can also be s3, azure, gcs, connect
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads")

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # create place for models to be stored
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads") # create deployable model object

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # create place for models to be stored
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads") # create deployable model object
vetiver_pin_write(model_board, v)

Meta(title='ads: a pinned Pipeline object',
    description="Scikit-learn <class 'sklearn.pipeline.Pipeline'> model", 
    created='20221102T094151Z', 
    pin_hash='4db397b49e7bff0b', 
    file='ads.joblib', 
    file_size=1087, 
    type='joblib', 
    api_version=1, 
    version=VersionRaw(version='65155'), 
    name='ads', 
    user={'required_pkgs': ['vetiver', 'scikit-learn']})

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp(
    allow_pickle_read = True)

v = VetiverModel(rf, "ads", ptype_data = X_train)
vetiver_pin_write(model_board, rf)

utilizing model cards

not only good models, but good models

  • summary
  • documentation
  • fairness

utilizing model cards

vetiver_pin_write(model_board, v)

utilizing model cards

Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()

utilizing model cards

vetiver.vetiver_pin_write(model_board, v)
vetiver.model_card()

utilizing model cards

utilizing model cards

utilizing model cards

deploy your model

deploy your model

deploy your model

deploy your model

my_api = VetiverAPI(v)
my_api.run()

deploy your model

vetiver.deploy_connect(
    connect_server = connect_server, 
    board = model_board, 
    pin_name = "ads", 
    version = "59869")

vetiver.prepare_docker(board=board, pin_name="ads")

monitoring

monitoring

metrics = vetiver.compute_metrics(
    new_data, 
    "date", 
    timedelta(weeks = 1), 
    [mean_absolute_error, r2_score], 
    "like_count", 
    "y_pred"
    )

vetiver.pin_metrics(
    model_board, 
    metrics, 
    "metrics_pin_name", 
    overwrite = True
    )
    
vetiver.plot_metrics(metrics)

monitoring

Why should I be excited about vetiver?

Composability

  • Internally, with VetiverAPI and VetiverModel
  • Externally, leveraging the tools vetiver is built on

Ergonomics

  • feels good to use
  • works with the tools you like

resources