Senior Data Engineer
Job Description
At CivicDataLab, we are building robust and automated tools focusing on data analytics, encompassing data curation, data cleansing, data standardization, and sophisticated data wrangling. These products would support our projects across different sectors – public finance, digital public goods, climate change etc. We are also standardizing open datasets to adhere to the Data Catalog Vocabulary (DCAT) metadata standards, making them searchable and more user-friendly. To assist us with this endeavor, we are seeking an experienced data engineer (with 2+ years of experience as a Data Engineer). This role requires the candidate to be based in Delhi.
Requirements:
What You’ll Be Doing
- Design and develop scalable data orchestration pipelines using Prefect and Apache Airflow
- Create and oversee data APIs responsible for collecting, managing, and analysing data from diverse public data sources.
- Standardize metadata of open datasets by ensuring compliance with the DCAT metadata standard.
- Collaborate with our partners to perform in-depth Exploratory Data Analysis (EDA) of various datasets.
- Engage in the development of database models in accordance with the specific project requirements.
- Maintain and monitor our existing open data platforms like Open Budgets India, Justice Hub, Open Contracting India.
- Engage regularly with our diverse stakeholders and open-source communities to discuss and create reusable resources around use-cases of public data, data engineering best practices, and guidebooks.
- Thoroughly document code, processes, and all activities performed by the data team, ensuring clarity and comprehensiveness. This includes documenting algorithms, methodologies, data transformations, and the overall workflow.
Skills You Should Bring
- 2+ years of thorough experience working with Python and SQL.
- Understanding of message brokers such as RabbitMQ.
- Knowledge of open-source data scraping frameworks and tools such as Selenium and Scrapy.
- Experience with building an end to end ETL pipeline.
- Familiarity with building database systems.
- Knowledge of API or Stream-based data extraction processes.
- Comprehensive knowledge of a Git-based workflow
- Comprehensive knowledge of metadata standards such as DCAT.
Good to have
- An understanding of data privacy principles and practices, as well as experience in implementing data privacy algorithms, would be valuable in maintaining the confidentiality and integrity of sensitive datasets.
- Prior experience collaborating with government or social sector research-based organizations.
- Prior experience in analysing and presenting data using tools such as Apache Superset, Metabase etc.
- Proficiency in working with spreadsheets
- Knowledge of reading and writing code in R and other statistical programming environments.
- Knowledge of working with geospatial data processing tools such as QGIS.
- Prior experience in actively contributing to FOSS (Free and Open-Source Software) projects.
- Familiarity in working with Agile methodologies and Scrum processes.