AI / Machine Learning
StaySure
Hotel Booking Cancellation Predictor

F1 Score
0.85
on the held-out test set
ROC-AUC
0.958
strong class separation
Training Data
119k
real hotel bookings
Problem
Hotel booking cancellations cost the hospitality industry billions annually. If a hotel could predict — at booking time — which reservations are likely to cancel, they could adjust pricing, overbooking policies, and staffing accordingly.
Solution
An end-to-end ML project trained on 119,390 real hotel bookings. Given details like lead time, deposit type, booking channel, and previous cancellations, the model predicts whether a booking will be cancelled — and explains why using SHAP feature importance values. Deployed as an interactive Gradio demo on Hugging Face Spaces.
Tech Stack
| EDA | Jupyter · pandas · seaborn |
| Modeling | scikit-learn Pipelines · LogisticRegression · RandomForest · XGBoost |
| Hyperparameter Tuning | GridSearchCV (3-fold) |
| Experiment Tracking | MLflow (local file backend) |
| Explainability | SHAP summary + per-prediction force plots |
| Deployment | Gradio on Hugging Face Spaces |
ML Pipeline
Raw CSV (119,390 rows) │ ▼ EDA notebook │ leakage check, target distribution, correlations ▼ sklearn Pipeline (ColumnTransformer) │ impute nulls → encode categoricals → scale numerics ▼ Model comparison (LR baseline → RF → XGBoost) │ GridSearchCV, logged to MLflow ▼ Best model (Random Forest) → test set evaluation │ confusion matrix, ROC-AUC, F1 ▼ SHAP analysis → feature importance │ ▼ Gradio app → Hugging Face Spaces (public demo)
Model Results
| Model | F1 Score | ROC-AUC |
|---|---|---|
| Logistic Regression (baseline) | 0.74 | 0.86 |
| Random Forest (winner ✓) | 0.85 | 0.958 |
| XGBoost | 0.83 | 0.95 |
Random Forest narrowly edged XGBoost on F1; chosen for faster inference and a simpler dependency surface for the Hugging Face deployment.
Selected Lessons
- ›EDA leakage hunt comes first. Several columns (e.g. reservation_status) effectively encode the target — using them inflates accuracy to 100%. Removing them is the difference between a model and a lookup table.
- ›Class imbalance ≠ broken metrics. ~37% cancellation rate is mild but not balanced. F1 over accuracy as the primary metric — a 63%-accurate model that always predicts "not cancelled" is useless.
- ›Pipelines > manual preprocessing. sklearn ColumnTransformer + Pipeline guarantees test-set transformations match the training-set transformations. A whole class of "works in notebook, fails in production" bugs disappears.
- ›SHAP explanations are the deploy unlock. Showing why a booking is flagged turns "the AI says no" into a tool a hotel manager would actually trust.