Azure Data Engineer
- Full-time
Company Description
BETSOL is a cloud-first digital transformation and data management company offering products and IT services to enterprises in over 40 countries. BETSOL team holds several engineering patents, is recognized with industry awards, and BETSOL maintains a net promoter score that is 2x the industry average.
BETSOL’s open source backup and recovery product line, Zmanda (Zmanda.com), delivers up to 50% savings in total cost of ownership (TCO) and best-in-class performance.
BETSOL Global IT Services (BETSOL.com) builds and supports end-to-end enterprise solutions, reducing time-to-market for its customers.
BETSOL offices are set against the vibrant backdrops of Broomfield, Colorado and Bangalore, India.
We take pride in being an employee-centric organization, offering comprehensive health insurance, competitive salaries, 401K, volunteer programs, and scholarship opportunities. Office amenities include a fitness center, cafe, and recreational facilities.
Learn more at betsol.com
Job Description
We are seeking an experienced Azure Data Engineer to join our enterprise data engineering team. This role is focused on building and maintaining modern, scalable data pipelines across our data ecosystem — including lakehouses, data warehouses, data marts, and operational data stores — while supporting the migration of legacy ETL solutions to Microsoft Fabric and Azure.
Key Responsibilities:
Data Pipeline Development
- Design and build ETL/ELT pipelines using Azure Data Factory, Microsoft Fabric Data Pipelines, Databricks, and Fabric Notebooks
- Implement medallion architecture (Bronze/Silver/Gold) in Fabric Lakehouse environments
- Develop transformation logic using T-SQL, Spark SQL, PySpark, and Dataflows Gen2
- Build and maintain dimensional models (star/snowflake schema) and Data Vault models
- Implement incremental loading patterns using CDC, watermarking, and delta detection
- Create reusable pipeline components, templates, and parameterized frameworks
- Optimize pipeline performance through partitioning, parallelization, and query tuning
Legacy-to-Fabric Migration
- Convert legacy ETL mappings, workflows, and scheduling logic to Microsoft Fabric/ADF equivalents
- Recreate parameter files, session configurations, and orchestration patterns in Fabric
- Execute unit testing and data reconciliation to validate migrated pipelines produce identical results
- Document conversion patterns, technical decisions, and issue resolutions
- Support parallel runs and cutover validation
Data Quality & Testing
- Build data quality checks and validation frameworks embedded within pipelines
- Develop automated testing strategies (unit, integration, regression) for data pipelines
- Create monitoring dashboards and alerting for pipeline failures and data anomalies
- Perform source-to-target reconciliation for both BAU and migration workloads
Platform Operations & Collaboration
- Monitor, troubleshoot, and optimize production pipelines
- Implement logging, error handling, and retry mechanisms
- Support CI/CD pipelines for data solutions using Azure DevOps and Git
- Manage environment promotions (DEV → QA → PROD) and participate in on-call rotation
- Implement security best practices: RBAC, encryption, data masking, workspace security
- Collaborate with Data Architects, Business Analysts, DevOps, and BI teams
- Maintain technical documentation: pipeline specs, data dictionaries, and runbooks
Technical Skills:
Microsoft Fabric & Azure
- Microsoft Fabric — Lakehouse, Data Warehouse, Data Pipelines, Dataflows Gen2, Notebooks
- Azure Data Factory v2 — pipelines, linked services, integration runtimes, triggers
- Azure Synapse Analytics — Dedicated SQL Pools, Serverless SQL, Spark Pools
- Azure Data Lake Storage Gen2, OneLake, Shortcuts, and Direct Lake mode
SQL & Programming
- Expert-level T-SQL — stored procedures, complex queries, performance tuning
- Python for data processing and automation
- PySpark for large-scale data transformations
- Familiarity with JSON, XML, and REST APIs
Informatica Platform
- Development experience with Informatica PowerCenter (Designer, Workflow Manager, Workflow Monitor)
Data Platforms & Formats
- Delta Lake format and Delta table operations
- Apache Spark architecture and optimization
- Data partitioning strategies and performance tuning
- Parquet and Avro file formats
- Dimensional modeling and Data Vault concepts
DevOps & Governance
- Git version control and Azure DevOps (Repos, Pipelines)
- CI/CD implementation for data solutions
- Fabric workspace deployment pipelines
- Data lineage, metadata management, and data cataloging
- Security best practices — RBAC, encryption, masking
- Awareness of compliance standards (GDPR, HIPAA, SOC2)
Qualifications
Education & Experience
- Bachelor's degree in Computer Science, Information Technology, Engineering, or related field
- 6-8 years of hands-on experience in data engineering, ETL development, and data warehousing
- Minimum 6 months of hands-on experience working with Microsoft Fabric in a live/production environment (1 year+ preferred) - this includes practical delivery experience across Dataflow Gen2, PySpark Notebooks orchestrated by Fabric Pipelines, Fabric Data Pipelines (ADF-based), Delta table joins in Notebooks, PySpark aggregations, Notebook-based DQ with logging to Bronze
- 1+ years of experience developing solutions on Microsoft Azure data platform
- 1+ years of hands-on experience with Informatica PowerCenter and/or IICS development
- Experience participating in or leading ETL migration projects
- Strong understanding of data warehouse concepts, dimensional modeling, and data integration patterns
- Microsoft Certified: Azure Data Engineer Associate (DP-203)
- Microsoft Certified: Fabric Analytics Engineer Associate (DP-600)