(in Polish) Inżynieria i analiza Big Data II-ćwiczenia WSE-BD-IABD-II-ćw
Module 1: Introduction to Data Engineering and Analysis
Key concepts, goals, and applications of data engineering
Stages of the data analysis process: from acquisition to presentation
Evolution of data handling: from traditional data warehouses to real-time analytics
Module 2: Building and Managing Data Warehouses
Designing the architecture of a data warehouse
ETL (Extract, Transform, Load) processes: concepts, tools, and best practices
Cloud-based data warehousing – deployment models and scalability
Module 3: Data Cleaning and Quality Assurance
Methods for identifying and correcting errors in datasets
Maintaining data quality: standards, policies, and control procedures
Real-world implementations of data quality management across sectors
Module 4: Time-Based Data Analysis
The role of time-series data in business and social sciences
Time series analysis: trend detection, seasonality, forecasting
Real-time time-based analysis: techniques and tools overview
Module 5: Real-Time Data Processing
Introduction to data streaming and its business applications
Real-time data storage using non-relational and NoSQL databases
Practical case studies of real-time data analysis in organizations
Module 6: Data Visualization and Reporting
Criteria for selecting effective data visualization tools
Designing clear, interactive, and dynamic data reports
Case studies: effective communication of analytical insights
Module 7: Practical Data Projects
Independent selection and development of data-driven projects
Data collection and analysis from various domains (industry, science, society)
Presentation and discussion of project outcomes
Module 8: Ethics and Data Security
Data privacy: protecting personal data in analytical workflows
Data security in processing and reporting environments
Ethical considerations in data analysis – case studies and best practices
(in Polish) Dyscyplina naukowa, do której odnoszą się efekty uczenia się
(in Polish) Grupa przedmiotów ogólnouczenianych
(in Polish) Opis nakładu pracy studenta w ECTS
Subject level
Learning outcome code/codes
Type of subject
Preliminary Requirements
Course coordinators
Learning outcomes
Knowledge
The student understands the key concepts and stages of the data analysis process, including data collection, preparation, processing, and presentation.
The student is familiar with the structure and purpose of data warehouses and the role of cloud environments in scalable data management.
The student knows the principles and applications of data cleaning, quality assurance, and time series analysis.
The student understands the foundations of data mining, statistical modeling, and real-time data processing.
Skills
The student is able to design basic ETL workflows and apply data cleaning techniques to improve data quality.
The student can perform analytical tasks such as forecasting, clustering, and classification using selected methods.
The student is able to build clear and informative data visualizations, adapting them to different types of audiences.
The student can interpret results of data analysis critically and communicate findings effectively in written and visual formats.
Social competences
The student demonstrates awareness of ethical standards in data analysis, including data privacy and responsible use of AI technologies.
The student is able to work collaboratively in a group project and contribute to team-based data analysis.
The student shows openness to interdisciplinary perspectives and is willing to reflect on the social consequences of data-driven technologies.
Assessment criteria
The final grade for the Data Engineering and Analysis course is based on the cumulative score obtained from three components. Students can earn up to 15 points for regular attendance and active participation in class activities. A final test, which includes practical data analysis and visualization tasks using a sample database, is worth up to 50 points. Additionally, students complete a group essay based on a selected publication, film, or radio program addressing topics such as digital privacy, AI ethics, or the social impact of social media—this component is worth up to 35 points.
The grading scale is as follows:
0 to 50 points results in a fail (2.0)
51 to 60 points corresponds to satisfactory (3.0)
61 to 70 points is satisfactory plus (3.5)
71 to 80 points is good (4.0)
81 to 90 points is good plus (4.5)
91 to 100 points earns a very good (5.0)
To pass the course, students must obtain a minimum of 51 points.
Bibliography
lla S., Big Data Analytics with Hadoop 3. Build highly effective analytics solutions to gain valuable insight into your big data, Birmingham 2023.
Ben_Gan I. Podstawy języka T-SQL: Microsoft SQL Server 2022 i Azure SQL Database, Warszawa 2023.
Foster D., Deep learning i modelowanie generatywne. Jak nauczyć komputer malowania, pisania, komponowania i grania, Gliwice 2021.
Franco Caleano M. I., Big Data Processing with Apache Spark. Efficiently tackle large datasets and big data analysis with Spark and Python, Birmingham 2018.
Gedeck P. Bruce P. Bruce A. Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python, Gliwice 2021.
Grus J. Data science od podstaw. Analiza danych w Pythonie. Gliwice 202.
Harrison G. , NoSQL, NewSQL i BigData. Bazy danych następnej generacji, Gliwice 2019.
Kleppmann M. Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, Sebastopopol (US) 2017.
Knight D., Pearson M., Ostrovsky E. Schacht B. Microsoft Power BI. Jak modelować i wizualizować dane oraz budować narracje cyfrow, Gliwice 2022.
Lantz B. Machine Learning with R. Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data - Fourth Edition, Birmingham 2023.
Lukavsky J. Building Big Data Pipelines with Apache Beam. Use a single programming model for both batch and stream data processing, Birmingham 2022.
Mishra S. Simplify Big Data Analytics with Amazon EMR. A beginner’s guide to learning and implementing Amazon EMR for building data analytics solutions, Birmingham 2022.
Nield T., Podstawy matematyki w data science. Algebra liniowa, rachunek prawdopodobieństwa i statystyka, Gliwice 203.
Oficalne materiały edukacyjne Microsoft dostępne pod adresem: https://learn.microsoft.com
Oficjajna dokumentacja techniczna Microsoft, dostępna pod adresem https://docs.microsoft.com
Rockoff L. Język SQL. Przyjazny podręcznik.Gliwice 2022.
Russo A. Ferrari M., Kompletny przewodnik po DAX, wyd. 2 rozszerzone. Analiza biznesowa przy użyciu Microsoft Power BI, SQL Server Analysis Services i Excel, Warszawa 202.
Stevenson D., Big Data i nauka o danych i AI bez tajemnicy, Gliwice 2019.
Ward B. Odsłaniamy SQL Server 2019: Klastry Big Data i uczenie maszynowe, Warszawa 202.
Warren J., Marz N., Big Data. Najlepsze praktyki budowy skalowalnych systemów obsługi danych w czasie rzeczywistym, Gliwice 2016.
Żulickie R. Data science: najseksowniejszy zawód XXI wieku w Polsce. Big data, sztuczna inteligencja i PowerPoint, Łódź 2023.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: