StaySure

AI / Machine Learning

StaySure

Hotel Booking Cancellation Predictor

Live demo → View source →

F1 Score

0.85

on the held-out test set

ROC-AUC

0.958

strong class separation

Training data

119,390

real bookings

Problem

Hotel booking cancellations cost the hospitality industry billions annually. If a hotel could predict — at booking time — which reservations are likely to cancel, they could adjust pricing, overbooking policies, and staffing accordingly.

Solution

An end-to-end ML project trained on 119,390 real hotel bookings. Given details like lead time, deposit type, booking channel, and previous cancellations, the model predicts whether a booking will be cancelled — and explains why using SHAP feature importance values. Deployed as an interactive Gradio demo on Hugging Face Spaces.

Tech stack

EDA	Jupyter · pandas · seaborn
Modeling	scikit-learn Pipelines · LogisticRegression · RandomForest · XGBoost
Hyperparameter tuning	GridSearchCV (3-fold)
Experiment tracking	MLflow (local file backend)
Explainability	SHAP summary + per-prediction force plots
Deployment	Gradio on Hugging Face Spaces

ML pipeline

Raw CSV (119,390 rows)
  │
  ▼
EDA notebook
  │  leakage check, target distribution, correlations
  ▼
sklearn Pipeline (ColumnTransformer)
  │  impute nulls → encode categoricals → scale numerics
  ▼
Model comparison (LR baseline → RF → XGBoost)
  │  GridSearchCV, logged to MLflow
  ▼
Best model (Random Forest) → test set evaluation
  │  confusion matrix, ROC-AUC, F1
  ▼
SHAP analysis → feature importance
  │
  ▼
Gradio app → Hugging Face Spaces (public demo)

Model results

Model	F1 Score	ROC-AUC
Logistic Regression (baseline)	0.74	0.86
Random Forest (winner)	0.85	0.958
XGBoost	0.83	0.95

Random Forest narrowly edged XGBoost on F1; chosen for its faster inference and simpler dependency surface for the Hugging Face deployment.

Confusion matrix

ROC curve

Selected lessons

EDA leakage hunt comes first. Several columns in the dataset (e.g. reservation_status) effectively encode the target — using them inflates accuracy to 100%. Removing them is the difference between a "model" and a lookup table.
Class imbalance ≠ broken metrics. ~37% cancellation rate is mild but not balanced. F1 over accuracy as the primary metric — a 63%-accurate model that always predicts "not cancelled" is useless.
Pipelines > manual preprocessing. sklearn ColumnTransformer + Pipeline guarantees test-set transformations match the training-set transformations. A whole class of "works in notebook, fails in production" bugs disappears.
SHAP explanations are the deploy unlock. Showing why a booking is flagged turns "the AI says no" into a tool a hotel manager would actually trust.

Local setup

git clone https://github.com/neuralxjam/staysure
cd staysure
uv sync
jupyter lab notebooks/01_eda.ipynb   # start with EDA