Programming Environment for Data Engineering (I) WSE-BD-ŚPID-I

The course “Programming Environment for Data Engineering” is designed to introduce students to the fundamental aspects of digital environments used in modern data analysis. It is aimed at students without a background in computer science, with the goal of building a solid understanding of data processing technologies—from hardware components and operating systems to computer networks and advanced cloud and distributed computing environments.

Throughout the course, students will learn the principles of IT infrastructure, including how data is stored and transmitted, the role of networking, and how data processing is organized in distributed systems. Particular emphasis is placed on understanding the architecture and core services of two major cloud platforms: Google Cloud Platform and Microsoft Azure, including data storage, batch processing, and cloud-based analytical tools.

An integral part of the course involves learning the basic syntax and applications of two key languages for data querying and analysis: Structured Query Language (SQL), used in relational databases, and Data Analysis Expressions (DAX), typically used in reporting and visualization tools such as Power BI. Students will practice building basic queries and conducting simple analyses using sample datasets.

In the section dedicated to distributed computing, students will explore the core principles of data clusters using Apache Spark as a reference platform. They will learn the logic behind distributed data processing, the components of a Spark cluster, and how it integrates with cloud-based infrastructures.

The course is practical and project-based. In addition to theoretical knowledge, students will complete exercises involving data queries, file structure exploration, information flow analysis in infrastructure, and basic computational tasks within distributed environments.

(in Polish) Dyscyplina naukowa, do której odnoszą się efekty uczenia się

information and communication technology

(in Polish) Grupa przedmiotów ogólnouczenianych

(in Polish) nie dotyczy

(in Polish) Opis nakładu pracy studenta w ECTS

Student workload (5 ECTS): The course is delivered over a full academic year and includes 60 contact hours in the classroom (lectures and labs). Participation in classes (lectures + labs): 60 hours Independent preparation for classes: 30 hours Completion of practical assignments and online tasks: 35 hours Preparation for final assessment: 15 hours Other activities (e.g., use of materials, consultations): 10 hours Total student workload: 150 hours

Subject level

elementary

Learning outcome code/codes

(in Polish) Zgodnie z programem studiów uchwalonym przez Senat UKSW: https://monitor.uksw.edu.pl/docs/search BDAS2_U12, BDAS2_U10, BDAS2_K01

Type of subject

obligatory

Preliminary Requirements

Students enrolling in this course are expected to have a general awareness of how modern information technologies function. Specifically, the following are required: basic computer skills in Windows or macOS (e.g., creating and managing files and folders, using a web browser, editing text documents), fundamental knowledge of spreadsheet tools (e.g., Excel or Google Sheets), ability to use the university’s e-learning platform and communication tools (e.g., Microsoft Teams, Google Workspace), willingness to learn new digital tools and openness to technical topics.

Course coordinators

Adam Bartosiewicz

Learning outcomes

PL:

Student zna podstawowe komponenty infrastruktury informatycznej wspierającej przetwarzanie danych, w tym sprzęt komputerowy, sieci oraz zasady działania systemów operacyjnych.
Student rozumie architekturę i funkcjonowanie usług chmurowych (Google Cloud Platform, Microsoft Azure) w kontekście przechowywania i przetwarzania danych.
Student zna podstawowe koncepcje i składnię języków SQL oraz DAX w analizie danych.
Student posiada wiedzę o zasadach działania systemów rozproszonych i klastrów obliczeniowych (np. Apache Spark).
EN:

The student understands the core components of IT infrastructure supporting data processing, including computer hardware, networks, and operating systems.
The student is familiar with the architecture and operation of cloud services (Google Cloud Platform, Microsoft Azure) for data storage and processing.
The student knows the fundamental concepts and syntax of SQL and DAX languages for data analysis.
The student has knowledge of distributed systems and computing clusters (e.g., Apache Spark).
Umiejętności / Skills

PL:

Student potrafi zastosować podstawowe zapytania SQL i DAX do eksploracji oraz analizy danych.
Student potrafi zidentyfikować i opisać elementy infrastruktury technologicznej wykorzystywanej w inżynierii danych.
Student potrafi zinterpretować podstawowe przepływy danych w środowisku chmurowym i rozproszonym.
Student potrafi wykonać proste ćwiczenia projektowe związane z przetwarzaniem danych w różnych środowiskach programistycznych.
EN:

The student can apply basic SQL and DAX queries for data exploration and analysis.
The student can identify and describe components of the technological infrastructure used in data engineering.
The student can interpret fundamental data flows within cloud and distributed environments.
The student is able to perform simple project-based tasks involving data processing in different programming environments.
Kompetencje społeczne / Social Competences

PL:

Student rozumie znaczenie poprawnego i etycznego przetwarzania danych.
Student wykazuje otwartość na interdyscyplinarne podejście do pracy z danymi.
Student potrafi pracować w zespole realizującym zadania projektowe.
EN:

The student understands the importance of ethical and correct data processing.
The student demonstrates openness to interdisciplinary approaches in data-related work.
The student is able to collaborate effectively within a project team.

Assessment criteria

The final grade for the course “Programming Environment for Data Engineering” is based entirely on the completion of online laboratory assignments. The total number of points available is 100. In order to pass the course, a student must obtain at least 51% of the total points (50% + 1 point).

The grading scale is as follows:
A score between 0% and 50% results in a 2.0 (fail), meaning the student did not meet the basic requirements of the course. A score between 51% and 60% corresponds to a 3.0 (satisfactory), which indicates the student met the minimum requirements and demonstrated basic task completion. A score between 61% and 70% earns a 3.5 (satisfactory plus), reflecting partially correct solutions and a basic understanding of key concepts. A score between 71% and 80% results in a 4.0 (good), indicating generally correct task execution and a solid grasp of the material. A score between 81% and 90% corresponds to a 4.5 (good plus) and demonstrates high-quality work, independence, and thoughtful engagement with the tasks. Finally, a score between 91% and 100% results in a 5.0 (very good), awarded to students who have completed all exercises fully and show excellent understanding of the course content.

Practical placement

Internships should develop skills in data analysis, digital tools, and IT-supported workflows. Recommended settings include public institutions, NGOs, media, education, marketing, and IT or analytics departments. Tasks may include assisting with data analysis, organizing cloud data, preparing reports, and collaborating in interdisciplinary teams.

Bibliography

Alla S., Big Data Analytics with Hadoop 3. Build highly effective analytics solutions to gain valuable insight into your big data, Birmingham 2023.
Ben_Gan I. Podstawy języka T-SQL: Microsoft SQL Server 2022 i Azure SQL Database, Warszawa 2023.
Foster D., Deep learning i modelowanie generatywne. Jak nauczyć komputer malowania, pisania, komponowania i grania, Gliwice 2021.
Franco Caleano M. I., Big Data Processing with Apache Spark. Efficiently tackle large datasets and big data analysis with Spark and Python, Birmingham 2018.
Gedeck P. Bruce P. Bruce A. Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python, Gliwice 2021.
Grus J. Data science od podstaw. Analiza danych w Pythonie. Gliwice 202.
Harrison G. , NoSQL, NewSQL i BigData. Bazy danych następnej generacji, Gliwice 2019.
Kleppmann M. Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, Sebastopopol (US) 2017.
Knight D., Pearson M., Ostrovsky E. Schacht B. Microsoft Power BI. Jak modelować i wizualizować dane oraz budować narracje cyfrow, Gliwice 2022.
Lantz B. Machine Learning with R. Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data - Fourth Edition, Birmingham 2023.
Lukavsky J. Building Big Data Pipelines with Apache Beam. Use a single programming model for both batch and stream data processing, Birmingham 2022.
Mishra S. Simplify Big Data Analytics with Amazon EMR. A beginner’s guide to learning and implementing Amazon EMR for building data analytics solutions, Birmingham 2022.
Nield T., Podstawy matematyki w data science. Algebra liniowa, rachunek prawdopodobieństwa i statystyka, Gliwice 203.
Oficalne materiały edukacyjne Microsoft dostępne pod adresem: https://learn.microsoft.com
Oficjajna dokumentacja techniczna Microsoft, dostępna pod adresem https://docs.microsoft.com
Rockoff L. Język SQL. Przyjazny podręcznik.Gliwice 2022.
Russo A. Ferrari M., Kompletny przewodnik po DAX, wyd. 2 rozszerzone. Analiza biznesowa przy użyciu Microsoft Power BI, SQL Server Analysis Services i Excel, Warszawa 202.
Stevenson D., Big Data i nauka o danych i AI bez tajemnicy, Gliwice 2019.
Ward B. Odsłaniamy SQL Server 2019: Klastry Big Data i uczenie maszynowe, Warszawa 202.
Warren J., Marz N., Big Data. Najlepsze praktyki budowy skalowalnych systemów obsługi danych w czasie rzeczywistym, Gliwice 2016.
Żulickie R. Data science: najseksowniejszy zawód XXI wieku w Polsce. Big data, sztuczna inteligencja i PowerPoint, Ł

Additional information

Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system:

Description of WSE-BD-ŚPID-I in USOSweb