AI / Machine Learning
StaySure
Hotel Booking Cancellation Predictor
F1 Score
0.85
on the held-out test set
ROC-AUC
0.958
strong class separation
Training data
119,390
real bookings
Problem
Hotel booking cancellations cost the hospitality industry billions annually. If a hotel could predict — at booking time — which reservations are likely to cancel, they could adjust pricing, overbooking policies, and staffing accordingly.
Solution
An end-to-end ML project trained on 119,390 real hotel bookings. Given details like lead time, deposit type, booking channel, and previous cancellations, the model predicts whether a booking will be cancelled — and explains why using SHAP feature importance values. Deployed as an interactive Gradio demo on Hugging Face Spaces.
Tech stack
EDAJupyter · pandas · seaborn
Modelingscikit-learn Pipelines · LogisticRegression · RandomForest · XGBoost
Hyperparameter tuningGridSearchCV (3-fold)
Experiment trackingMLflow (local file backend)
ExplainabilitySHAP summary + per-prediction force plots
DeploymentGradio on Hugging Face Spaces
ML pipeline
Raw CSV (119,390 rows)
  │
  ▼
EDA notebook
  │  leakage check, target distribution, correlations
  ▼
sklearn Pipeline (ColumnTransformer)
  │  impute nulls → encode categoricals → scale numerics
  ▼
Model comparison (LR baseline → RF → XGBoost)
  │  GridSearchCV, logged to MLflow
  ▼
Best model (Random Forest) → test set evaluation
  │  confusion matrix, ROC-AUC, F1
  ▼
SHAP analysis → feature importance
  │
  ▼
Gradio app → Hugging Face Spaces (public demo)
Model results
ModelF1 ScoreROC-AUC
Logistic Regression (baseline)0.740.86
Random Forest (winner)0.850.958
XGBoost0.830.95
Random Forest narrowly edged XGBoost on F1; chosen for its faster inference and simpler dependency surface for the Hugging Face deployment.
Confusion matrix
StaySure demo
ROC curve
StaySure ROC curve
Selected lessons
  • EDA leakage hunt comes first. Several columns in the dataset (e.g. reservation_status) effectively encode the target — using them inflates accuracy to 100%. Removing them is the difference between a "model" and a lookup table.
  • Class imbalance ≠ broken metrics. ~37% cancellation rate is mild but not balanced. F1 over accuracy as the primary metric — a 63%-accurate model that always predicts "not cancelled" is useless.
  • Pipelines > manual preprocessing. sklearn ColumnTransformer + Pipeline guarantees test-set transformations match the training-set transformations. A whole class of "works in notebook, fails in production" bugs disappears.
  • SHAP explanations are the deploy unlock. Showing why a booking is flagged turns "the AI says no" into a tool a hotel manager would actually trust.
Local setup
git clone https://github.com/neuralxjam/staysure
cd staysure
uv sync
jupyter lab notebooks/01_eda.ipynb   # start with EDA
© 2026 Jhames Andrew Macabata