Back to Projects

AI / Machine Learning

StaySure

Hotel Booking Cancellation Predictor

StaySure demo

F1 Score

0.85

on the held-out test set

ROC-AUC

0.958

strong class separation

Training Data

119k

real hotel bookings

Problem

Hotel booking cancellations cost the hospitality industry billions annually. If a hotel could predict — at booking time — which reservations are likely to cancel, they could adjust pricing, overbooking policies, and staffing accordingly.

Solution

An end-to-end ML project trained on 119,390 real hotel bookings. Given details like lead time, deposit type, booking channel, and previous cancellations, the model predicts whether a booking will be cancelled — and explains why using SHAP feature importance values. Deployed as an interactive Gradio demo on Hugging Face Spaces.

Tech Stack

EDAJupyter · pandas · seaborn
Modelingscikit-learn Pipelines · LogisticRegression · RandomForest · XGBoost
Hyperparameter TuningGridSearchCV (3-fold)
Experiment TrackingMLflow (local file backend)
ExplainabilitySHAP summary + per-prediction force plots
DeploymentGradio on Hugging Face Spaces

ML Pipeline

Raw CSV (119,390 rows)
  │
  ▼
EDA notebook
  │  leakage check, target distribution, correlations
  ▼
sklearn Pipeline (ColumnTransformer)
  │  impute nulls → encode categoricals → scale numerics
  ▼
Model comparison (LR baseline → RF → XGBoost)
  │  GridSearchCV, logged to MLflow
  ▼
Best model (Random Forest) → test set evaluation
  │  confusion matrix, ROC-AUC, F1
  ▼
SHAP analysis → feature importance
  │
  ▼
Gradio app → Hugging Face Spaces (public demo)

Model Results

ModelF1 ScoreROC-AUC
Logistic Regression (baseline)0.740.86
Random Forest (winner ✓)0.850.958
XGBoost0.830.95

Random Forest narrowly edged XGBoost on F1; chosen for faster inference and a simpler dependency surface for the Hugging Face deployment.

Selected Lessons

  • EDA leakage hunt comes first. Several columns (e.g. reservation_status) effectively encode the target — using them inflates accuracy to 100%. Removing them is the difference between a model and a lookup table.
  • Class imbalance ≠ broken metrics. ~37% cancellation rate is mild but not balanced. F1 over accuracy as the primary metric — a 63%-accurate model that always predicts "not cancelled" is useless.
  • Pipelines > manual preprocessing. sklearn ColumnTransformer + Pipeline guarantees test-set transformations match the training-set transformations. A whole class of "works in notebook, fails in production" bugs disappears.
  • SHAP explanations are the deploy unlock. Showing why a booking is flagged turns "the AI says no" into a tool a hotel manager would actually trust.