Data Engineering 101

Data Engineering 101

Category: RiseAhead Programme - DataAhead

Module: Data Engineering 101

Module Description: The goal of this Data Engineering course is to equip participants with the essential skills and knowledge required to become proficient data engineers. This course covers fundamental to advanced concepts and tools used in data engineering, including Python programming, SQL, ETL processes, cloud platforms, and version control systems. By the end of this course, participants will be capable of designing, building, and managing robust data infrastructure and pipelines.

Objectives:

  • Upon successful completion of the course, participants will be able to:
  • 1. Understand Data Engineering Fundamentals: Gain a solid understanding of the roles and responsibilities of a data engineer and how it differs from other data-related roles. Learn about key tools and data architecture designs.
  • 2. Proficient in Python for Data Engineering: Utilise Python for data manipulation, including using libraries such as Pandas, performing exploratory data analysis, managing data imports from various file formats and data sources, and then exporting to a different format at a different location.
  • 3. SQL Mastery for Data Engineering: Execute basic and advanced SQL queries, design data models using star schema, automate queries, and manage SQL databases on modern platforms such as Snowflake.
  • 4. Implement Version Control: Use Git for version control, manage repositories, and implement collaborative workflows in data engineering projects.
  • 5. ETL Orchestration Tool: Understand ETL orchestration process and deploying tools such as Airflow using Docker to perform ETL job scheduling and automation based on real world scenarios using real data.
  • 6. Leverage Cloud Computing: Work with cloud platforms like Databricks and Microsoft Azure, perform data manipulations using PySpark, and set up and manage cloud data warehouses. Perform ETL orchestration using built-in Databricks Workflow.
  • 7. Apply Knowledge in Real Projects: Using Azure Data Factory, integrate Python, SQL, and ETL skills in comprehensive data projects, demonstrating the ability to handle complex data engineering tasks.

Method of Delivery:

Virtual Learning (Online)
Period: 2 hours
Minimum Participants:
Maximum Participants:
Deliverables:
  • Data Engineering Mini Project
Price: RM 500 / per participant
Physical Learning (Face-to-Face)
Period: 2 days
Minimum Participants:
Maximum Participants:
Deliverables:
  • Data Engineering Proof of Concept (POC)
Price: RM 2,800 / per participant
Workshop
Period: 8 weeks
Minimum Participants:
Maximum Participants:
Deliverables:
  • Data Engineering Capstone Project
Price: RM 7,000 / per participant
BespokeBiz - Consulting & Monitoring
Period: > 6 months
Minimum Participants:
Maximum Participants:
Deliverables:
  • Data Infrastructure Architecture Document and Implementation Plan
  • Data Infrastructure Performance Report
Price: RM From 50,000 / per company