Image by Author
People say you should consider value for money when buying things. However, the best value for money is getting something good for free. But do such things exist? Supposedly not, if we go by the saying, “No such thing as a free lunch.”
I claim there is a free lunch, and I’m about to prove it! I dug out 10 educational ‘free lunches’ – free data engineering courses that also provide quality knowledge. It is true; there’s much more variety and choice if you can or want to pay tens, hundreds, sometimes even thousands of dollars.
Many such courses are considered free on some other free course lists. Paying $90 one-off or $45/month is free to some people. But many people don’t have that money for a ‘free’ course, despite being very willing to learn data engineering. (Also, let’s get real! Free literally means, well, free! Not ‘cheap’, not ‘very little money’, or ‘affordable’. Free!)
From what I researched, these courses really are free. Many are from edX. If you choose free access to the course, you must complete it in a certain time, usually around six months. But that should be enough to complete every course comfortably. Also, free access means you don’t get lifetime access to all the materials (they are deleted once you finish) and don’t get a certificate. Despite this, you should be able to use these courses to learn about data engineering.
Before I talk about the courses, let’s briefly overview the data engineer’s role. That way, knowing what to look for in courses will be easier.
Understanding the Role of a Data Engineer
Very simply, data engineers are in charge of making data available to data team members and other stakeholders. In doing so, they wrangle data and build and maintain data infrastructure, e.g., ETL process, data pipelines, data storage.
Naturally, the courses should cover all or some of those skills. Let’s take a closer look at the courses – pun intended – that will comprise your educational free lunch.
Free Data Engineering Courses
1. Data Engineering by ASU
Platform and link to the course: edX
Duration: 5 weeks at 1-9 hours/week; learn at your own pace
Description: This introductory-level course by Arizona State University focuses on working with databases in data engineering and how to interact with them using SQL. You will learn about database structure, the star schema, and joining data from multiple tables. In the final stage, you will learn how to create reports with SQL and write scripts for data processing.
2. Python and Pandas for Data Engineering by Pragmatic AI Labs
Platform and link to the course: edX
Duration: 4 weeks at 3-6 hours/week; learn at your own pace
Description: In yet another introductory edX course, you’ll learn Python and pandas for data engineering. The introduction to Python consists of topics such as simple statements, if statements, while loops, and functions. Then, you’ll learn about data manipulation in Pandas (particularly DataFrames) and its alternatives, such as NumPy, Spark, and PySpark. In the last module, you’ll learn about Python development environments and version control.
3. Scripting with Python and SQL for Data Engineering by Pragmatic AI Labs
Platform and link to the course: edX
Duration: 4 weeks at 3-6 hours/week; learn at your own pace
Description: If you want to learn SQL and Python for data engineering simultaneously, this is the course for you. You’ll use Python’s built-in data structures to manipulate data and write Python scripts for data task automation. The course also teaches you web scraping and using SQLite to store and query data in Python. Regarding SQL, you’ll learn how to import and export data from MySQL database and how to execute MySQL queries in VSCode.
4. Cloud Data Engineering by Pragmatic AI Labs
Platform and link to the course: edX
Duration: 4 weeks at 3-6 hours/week; learn at your own pace
Description: This course will teach you data engineering in the cloud. You’ll learn about methodologies in data engineering, develop distributed systems, serverless data engineering systems, and cloud ETL pipelines, and learn about data governance. In the process, you’ll get in touch with technologies such as:
- CUDA
- Numba
- ASICs
- Colab Pro
- Colab API
- Google BigQuery
- AWS
- Databricks SQL
- Click
- Python
- Rust
This is also an introductory course with no prerequisites needed.
5. Building ETL and Data Pipelines with Bash, Airflow and Kafka by IBM
Platform and link to the course: edX
Duration: 5 weeks at 2-4 hours/week; learn at your own pace
Description: This data engineering course focuses on building ETL and data pipelines. During the course, you’ll learn what ETL and ELT processes are, create ETL using Bash shell scripts, use Apache Airflow to create batch data pipelines, and Apache Kafka for streaming data pipelines.
This is an introductory course to these topics but requires experience working with relational databases, SQL, and Bash shell scripting.
6. Data Warehousing and BI Analytics by IBM
Platform and link to the course: edX
Duration: 6 weeks at 2-3 hours/week; learn at your own pace
Description: This intermediate course by IBM teaches you the essentials of data warehouses, data marts, and data lakes. You will learn how to design, model, and implement data warehouses. More specifically, you will use CUBEs, ROLLUPs, materialized views, and tables. You’ll also learn about facts and dimensional modeling, data modeling with star and snowflake schemas, staging areas for data warehouses, data quality, and populating a data warehouse with data. In the third module, you’ll work on data warehouse analytics in Cognos Analytics.
The course requires experience with SQL and relational databases.
7. Apache Spark for Data Engineering and Machine Learning by IBM
Platform and link to the course: edX
Duration: 3 weeks at 2-3 hours/week; learn at your own pace
Description: Yet another intermediate course. It focuses on teaching Apache Spark. It’s an important tool in data engineering, so you’ll learn about Spark Structured Streaming, GraphFrames, ETL process, and ML pipelines. In addition, you’ll learn ML fundamentals, such as regression, classification, and clustering.
The course requires foundational Apache Spark knowledge. It’s also suggested that you complete the Big Data, Hadoop and Spark Basics course by IBM.
8. DE Zoomcamp
Platform and link to the course: DataTalks.Club
Duration: 10 weeks; learn at your own pace
Description: Finally, a course from a different platform! This online boot camp will provide you with comprehensive data engineering knowledge. It’ll teach you containerization and infrastructure, workflow orchestration, data warehousing, analytics engineering, batch processing, and streaming. You’ll be introduced to technologies such as Google Cloud Platform, Terraform, Docker, SQL, Mage, dbt, Apache Spark, and Apache Kafka.
The prerequisites for this bootcamp are the SQL basics. Also, it’s preferable that you have experience with Python or, if not, some other programming language.
9. DE End-to-End Projects
Platform and link to the course: DE Academy
Duration: No info.
Description: This is a project-based project in which you’ll learn how to use AWS, Snowflake, Python,Kafka, Azure, Databricks, Airflow, and Tableau. You will analyze and transform data, migrate it, and streamline workflows.
10. Scala Programming for Data Science
Platform and link to the course: Cognitive Class AI
Duration: 20 hours; learn at your own pace
Description: This learning path consists of three courses. The first is Scala 101, which will teach you the basics of object-oriented programming, case objects & classes, collections, and idiomatic Scala. In the second course, Spark Overview for Scala Analytics, you will be introduced to Apache Spark, RDDs, DataFrames for large-scale data science, and advanced Spark topics (e.g., Hive with Spark, Spark streaming). The third course is about Scala in data science, where you will learn basic statistics and data types, how to prepare data, engineer features, fit a model, build a pipeline, and perform grid search.
Conclusion
No surprise that it’s easier when you have money – you get access to more courses that are more diverse. Yeah, it sucks not having money! But this doesn’t mean you must say goodbye to your dream of landing a data engineer role.
It is much harder to find them, but there are still some good courses that can teach you basic and more advanced data engineering. I found ten of them. Some other free resources, such as blogs or YouTube videos, can help you reach the required level of knowledge.
If you’re industrious enough, dedicated, and persistent, I’m sure you can land a data engineering role for free.
Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.