- Base de datos SQLite con información de vehículos - Dashboard web con Flask y Bootstrap - Scripts de web scraping para RockAuto - Interfaz CLI para consultas - Documentación completa del proyecto Incluye: - 12 marcas de vehículos - 10,923 modelos - 10,919 especificaciones de motores - 12,075 combinaciones modelo-año-motor
4.2 KiB
4.2 KiB
Vehicle Database with RockAuto Data Integration
Project Overview
This project combines two components:
- A comprehensive vehicle database system
- A data extraction system for RockAuto.com vehicle information
Due to anti-bot measures on RockAuto.com, a manual extraction approach is recommended for collecting vehicle data.
System Components
1. Vehicle Database
- SQLite database with normalized schema
- Tables for brands, models, years, engines, and their relationships
- Python API for managing the database
- Interactive query interface
2. Data Extraction Tools
- Automated scraper (for sites without anti-bot measures)
- Manual extraction guide for RockAuto.com
- Data import functionality
Database Schema
The database consists of five main tables:
- brands: Vehicle manufacturers (Toyota, Ford, etc.)
- models: Vehicle models (Camry, F-150, etc.)
- engines: Engine specifications (2JZ-GTE, EcoBoost, etc.)
- years: Calendar years for vehicle production
- model_year_engine: Junction table linking all entities with trim levels and specifications
Using the System
Initial Setup
cd vehicle_database
./setup.sh
Querying the Database
python3 scripts/query_interface.py
Adding More Data Manually
from ../vehicle_scraper/manual_input import ManualDataInput
input_tool = ManualDataInput()
# Add a single vehicle
input_tool.add_vehicle_data("Toyota", "Corolla", 2021, "1.8L 4-Cylinder")
# Add multiple vehicles
vehicles = [
{"make": "Nissan", "model": "Altima", "year": 2020, "engine": "2.5L 4-Cylinder"},
{"make": "Hyundai", "model": "Elantra", "year": 2019, "engine": "2.0L 4-Cylinder"}
]
input_tool.add_multiple_vehicles(vehicles)
Manual Data Extraction from RockAuto.com
Since RockAuto has anti-bot measures, follow this process:
- Open your web browser and go to: https://www.rockauto.com
- Click on the "Catalog" link in the navigation menu
- You will see a list of vehicle manufacturers (makes)
- For each manufacturer:
- Click on the manufacturer name
- You'll see a page with vehicle models organized by year
- Note down the models and years you see
- To find engine information:
- Click on a specific model/year combination
- You'll see parts categories for that vehicle
- Look for "Engine" or "Engine Mechanical" category
- Note down the engine type/specifications
- Use the ManualDataInput class to add the collected data to your database
File Structure
vehicle_database/ # Main database system
├── sql/
│ └── schema.sql # Database schema
├── scripts/
│ ├── database_manager.py # Main database manager
│ ├── query_interface.py # Interactive query interface
│ └── csv_importer.py # CSV import functionality
├── data/ # Sample CSV data files
├── vehicle_database.db # SQLite database file
├── setup.sh # Setup script
├── README.md # Project documentation
└── GETTING_STARTED.md # Getting started guide
vehicle_scraper/ # Data extraction tools
├── rockauto_scraper.py # Automated scraper (for other sites)
├── rockauto_scraper_enhanced.py # Enhanced scraper
├── manual_input.py # Manual input tool
├── manual_input_simple.py # Simplified manual input
└── requirements.txt # Python dependencies
Extending the Database
To add more vehicle data:
- Collect data manually from RockAuto.com using the provided guide
- Use the ManualDataInput class to add data to the database
- Or prepare CSV files in the required format and use the CSV importer
Future Enhancements
- Web scraping capabilities for other automotive parts sites
- Export functionality to share data
- Advanced search and filtering options
- Data validation and cleaning tools
Troubleshooting
If you encounter issues:
- Check that Python 3.x is installed
- Ensure all required packages are installed (
pip3 install -r requirements.txt) - Verify database file permissions
- Check that the schema matches the expected structure
The system is now ready to use. You can start by exploring the existing data through the query interface and then add more data as needed.