Azure Data Engineer

Full-time

Company Description

BETSOL is a cloud-first digital transformation and data management company offering products and IT services to enterprises in over 40 countries. BETSOL team holds several engineering patents, is recognized with industry awards, and BETSOL maintains a net promoter score that is 2x the industry average.

BETSOL’s open source backup and recovery product line, Zmanda (Zmanda.com), delivers up to 50% savings in total cost of ownership (TCO) and best-in-class performance.

BETSOL Global IT Services (BETSOL.com) builds and supports end-to-end enterprise solutions, reducing time-to-market for its customers.

BETSOL offices are set against the vibrant backdrops of Broomfield, Colorado and Bangalore, India.

We take pride in being an employee-centric organization, offering comprehensive health insurance, competitive salaries, 401K, volunteer programs, and scholarship opportunities. Office amenities include a fitness center, cafe, and recreational facilities.

Learn more at betsol.com

Job Description

We are seeking an experienced Azure Data Engineer to join our enterprise data engineering team. This role is focused on building and maintaining modern, scalable data pipelines across our data ecosystem — including lakehouses, data warehouses, data marts, and operational data stores — while supporting the migration of legacy ETL solutions to Microsoft Fabric and Azure.

Key Responsibilities:

Data Pipeline Development

Design and build ETL/ELT pipelines using Azure Data Factory, Microsoft Fabric Data Pipelines, Databricks, and Fabric Notebooks
Implement medallion architecture (Bronze/Silver/Gold) in Fabric Lakehouse environments
Develop transformation logic using T-SQL, Spark SQL, PySpark, and Dataflows Gen2
Build and maintain dimensional models (star/snowflake schema) and Data Vault models
Implement incremental loading patterns using CDC, watermarking, and delta detection
Create reusable pipeline components, templates, and parameterized frameworks
Optimize pipeline performance through partitioning, parallelization, and query tuning

Legacy-to-Fabric Migration

Convert legacy ETL mappings, workflows, and scheduling logic to Microsoft Fabric/ADF equivalents
Recreate parameter files, session configurations, and orchestration patterns in Fabric
Execute unit testing and data reconciliation to validate migrated pipelines produce identical results
Document conversion patterns, technical decisions, and issue resolutions
Support parallel runs and cutover validation

Data Quality & Testing

Build data quality checks and validation frameworks embedded within pipelines
Develop automated testing strategies (unit, integration, regression) for data pipelines
Create monitoring dashboards and alerting for pipeline failures and data anomalies
Perform source-to-target reconciliation for both BAU and migration workloads

Platform Operations & Collaboration

Monitor, troubleshoot, and optimize production pipelines
Implement logging, error handling, and retry mechanisms
Support CI/CD pipelines for data solutions using Azure DevOps and Git
Manage environment promotions (DEV → QA → PROD) and participate in on-call rotation
Implement security best practices: RBAC, encryption, data masking, workspace security
Collaborate with Data Architects, Business Analysts, DevOps, and BI teams
Maintain technical documentation: pipeline specs, data dictionaries, and runbooks

Technical Skills:

Microsoft Fabric & Azure

Microsoft Fabric — Lakehouse, Data Warehouse, Data Pipelines, Dataflows Gen2, Notebooks
Azure Data Factory v2 — pipelines, linked services, integration runtimes, triggers
Azure Synapse Analytics — Dedicated SQL Pools, Serverless SQL, Spark Pools
Azure Data Lake Storage Gen2, OneLake, Shortcuts, and Direct Lake mode

SQL & Programming

Expert-level T-SQL — stored procedures, complex queries, performance tuning
Python for data processing and automation
PySpark for large-scale data transformations
Familiarity with JSON, XML, and REST APIs

Informatica Platform

Development experience with Informatica PowerCenter (Designer, Workflow Manager, Workflow Monitor)

Data Platforms & Formats

Delta Lake format and Delta table operations
Apache Spark architecture and optimization
Data partitioning strategies and performance tuning
Parquet and Avro file formats
Dimensional modeling and Data Vault concepts

DevOps & Governance

Git version control and Azure DevOps (Repos, Pipelines)
CI/CD implementation for data solutions
Fabric workspace deployment pipelines
Data lineage, metadata management, and data cataloging
Security best practices — RBAC, encryption, masking
Awareness of compliance standards (GDPR, HIPAA, SOC2)

Qualifications

Education & Experience

Bachelor's degree in Computer Science, Information Technology, Engineering, or related field
6-8 years of hands-on experience in data engineering, ETL development, and data warehousing
Minimum 6 months of hands-on experience working with Microsoft Fabric in a live/production environment (1 year+ preferred) - this includes practical delivery experience across Dataflow Gen2, PySpark Notebooks orchestrated by Fabric Pipelines, Fabric Data Pipelines (ADF-based), Delta table joins in Notebooks, PySpark aggregations, Notebook-based DQ with logging to Bronze
1+ years of experience developing solutions on Microsoft Azure data platform
1+ years of hands-on experience with Informatica PowerCenter and/or IICS development
Experience participating in or leading ETL migration projects
Strong understanding of data warehouse concepts, dimensional modeling, and data integration patterns
Microsoft Certified: Azure Data Engineer Associate (DP-203)
Microsoft Certified: Fabric Analytics Engineer Associate (DP-600)

I'm interested

Privacy Notice Imprint