Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Big Data Analytics Course

Jupyter Book Python License Docker

Course Materials for Big Data Analytics - PSAU

πŸ“š AboutΒΆ

Comprehensive course materials for teaching Big Data Analytics using modern Python-based tools and technologies. This repository contains:

πŸš€ Quick StartΒΆ

# Clone the repository
git clone https://github.com/chebil/BigData.git
cd BigData

# Start all services (Jupyter, Spark, PostgreSQL)
docker-compose up -d

# Access Jupyter Lab at http://localhost:8888
# Access Spark UI at http://localhost:8080

Local InstallationΒΆ

# Create conda environment
conda env create -f environment.yml
conda activate bigdata-course

# Or use pip
pip install -r requirements.txt

# Start Jupyter Lab
jupyter lab

Build the BookΒΆ

# Install Jupyter Book
pip install jupyter-book

# Build the book
jupyter-book build .

# Open _build/html/index.html in your browser

πŸ“– Course StructureΒΆ

Part I: FoundationsΒΆ

  1. Introduction to Big Data - Concepts, lifecycle, data types

  2. Data Analytics Lifecycle - Six-phase approach

  3. Statistical Foundations - Python, NumPy, Pandas, visualization

Part II: Machine LearningΒΆ

  1. Clustering - K-means, hierarchical, DBSCAN

  2. Association Rules - Market basket analysis, Apriori

  3. Regression - Linear, multiple, regularization

  4. Classification - Logistic regression, NaΓ―ve Bayes, decision trees

  5. Time Series - ARIMA, forecasting, Prophet

  6. Text Analytics - NLP, sentiment analysis, topic modeling

Part III: Big Data TechnologiesΒΆ

  1. Distributed Computing - Hadoop, Spark, PySpark

  2. Advanced Topics - Deep learning, deployment, cloud platforms

πŸ§ͺ LabsΒΆ

LabTopicDuration
Lab 0Environment Setup30 min
Lab 1Data Exploration2 hours
Lab 2Python & Pandas2 hours
Lab 3Statistics & Visualization3 hours
Lab 4Clustering2.5 hours
Lab 5Association Rules2 hours
Lab 6Regression2.5 hours
Lab 7Classification3 hours
Lab 8Time Series2.5 hours
Lab 9Text Analytics3 hours
Lab 10Apache Spark3 hours
Lab 11Capstone Project10+ hours

πŸ› οΈ TechnologiesΒΆ

Core Stack:

Big Data:

Machine Learning:

NLP:

Infrastructure:

πŸ“Š DatasetsΒΆ

All labs use real-world datasets:

πŸŽ“ Learning OutcomesΒΆ

After completing this course, students will be able to:

βœ… Apply the data analytics lifecycle to real-world problems
βœ… Perform exploratory data analysis using Python
βœ… Implement machine learning algorithms from scratch
βœ… Build and evaluate classification and regression models
βœ… Process large datasets using Apache Spark
βœ… Perform text analytics and sentiment analysis
βœ… Deploy machine learning models
βœ… Work with big data technologies

πŸ“ AssessmentΒΆ

🀝 Contributing¢

Contributions are welcome! Please:

  1. Fork the repository

  2. Create a feature branch

  3. Make your changes

  4. Submit a pull request

πŸ“„ LicenseΒΆ

This project is licensed under the MIT License - see LICENSE file.

πŸ‘¨β€πŸ« InstructorΒΆ

Dr. Chebil Khalil
Department of Computer Science
Prince Sattam bin Abdulaziz University (PSAU)
Email: chebilkhalil@gmail.com

⭐ Star History¢

If you find this repository helpful, please consider giving it a star!


Built with ❀️ using Jupyter Book and MyST Markdown