Hotel Price Prediction Machine Learning
Hotel Price Prediction System
This project aims to predict hotel room prices on Booking.com for major cities in Saudi Arabia, with prices given in Saudi Riyal (SAR). The model is trained on real hotel listing data and considers practical factors such as the number of beds, average customer rating, total reviews, and room size. By focusing on these key predictors, the system ensures realistic and reliable price forecasts.The results can be valuable for both travelers and policymakers. For tourists, it provides insights into fair pricing and helps in better travel planning, while for the Ministry of Tourism and other stakeholders, it supports monitoring price trends, detecting anomalies, and promoting competitive and transparent pricing across regions.
Download New Real Time Projects :-Click here
Overview
Field | Details |
---|---|
Project Name | Booking.com Hotel Price Prediction in KSA |
Language/s Used | Python |
Version (Recommended) | Python 3.8+ |
Type | Web Application (Machine Learning) |
Why this project
- Helps tourists and the government anticipate likely room prices across major cities in KSA.
- Supports price regulation and competitiveness by providing model-driven benchmarks.
- Surfaces data errors and outliers that distort market signals.
- Addresses unstable or inconsistent search results by grounding decisions in a unified dataset and model.
Data Description
The dataset was built by scraping Booking.com hotel listings and consolidating them into a structured CSV used by the app and notebooks. It contains the following fields:
- hotel_name
- location (e.g., Riyadh, Jeddah, Medina)
- room_type (e.g., Suite, Room)
- price (SAR)
- per_night (stay basis)
- beds (integer)
- rating (1–10 scale)
- rating_title (text label of the rating)
- number_of_ratings (review count)
- Size (room area in m²)
- Log_number_of_ratings (derived)
- Log_price (derived)
These variables power the regression analysis and the interactive predictions in the app.
Design and Modeling Approach
The project follows a clear regression pipeline:
- Fetch
To avoid seasonality bias, Booking.com listings were scraped iteratively and consolidated. Scraping utilities in the repository rely on lightweight tools and selectors tailored to hotel listing pages. - Clean
The data underwent standard cleaning steps to remove duplicates, handle NaNs, normalize categorical spaces, and align column names. Only useful features were retained for modeling. - Preprocessing
To place features on comparable scales and stabilize relationships with price, the following transformations are applied:- Feature scaling with RobustScaler and StandardScaler
- “Gaussianizing” transforms where helpful: log, Box-Cox, and polynomial expansions for non-linear effects
- Modeling
Multiple regressors were explored in the notebooks. The best performing configuration used a Random Forest Regressor, evaluated with train/test split and repeated cross-validation to validate generalization. Reported results include:- Test set performance ~96%
- Mean Absolute Error (MAE) ≈ 0.1974 (on the transformed target)
Web Application (What You Can Do)
The repository ships with a simple web interface that makes the model usable without opening a notebook:
- Interactive Inputs: Adjust core drivers—beds, number_of_ratings (reviews), rating, and optionally room size—to get an immediate predicted price in SAR.
- Instant Predictions: The interface displays the predicted room price using the trained model and the same preprocessing applied during development.
- Model Explainability with SHAP:
- Summary plot to see which features most influence pricing across the dataset
- Bar view for global importance comparisons
These visuals help policymakers and analysts justify pricing decisions and understand model behavior.
Available Features
- Streamlined Streamlit web UI for interactive price prediction in SAR
- End-to-end regression pipeline with Robust/Standard scaling and log/Box-Cox transforms
- RandomForestRegressor training and validation with repeated K-fold cross-validation
- SHAP-based feature importance (summary and bar charts) integrated into the app
- CSV dataset (
reg22.csv
) aligned to the fields listed above - Exploratory notebooks for data analysis and model comparison
- Lightweight scraping utilities with a minimal requirements file for extractor tooling
- Saved model artifacts/notebooks for reproducibility and quick experimentation
- Project report and presentation files for stakeholder communication
Tools and Libraries
- Language: Python
- Scraping:
requests
, selector utilities - EDA: Pandas, NumPy, Matplotlib, Seaborn
- Preprocessing/Modeling: scikit-learn, SciPy, statsmodels, pylab
- Explainability & Visualization: SHAP, Plotly Express, Missingno, Yellowbrick, Sweetviz
- Interface: Streamlit
We have projects Available in all languages:–Click Here
hotel price prediction machine learning
house price prediction using machine learning
real estate price prediction using machine learning
predicting hotel bookings cancellation with a machine learning classification model
hotel booking prediction
hotel booking machine learning
price prediction machine learning project
hotel price prediction machine learning github
hotel price prediction machine learning python
hotel price prediction machine learning pdf
hotel price prediction machine learning example
hotel price prediction machine learning 2022
hotel price prediction machine learning excel
Post Comment