MLSE (Python/PySpark)
Locations: Noida, Uttar Pradesh, India; Gurgaon, Haryana, India; Hyderabad, Telangana, India; Bangalore, Karnataka, India; Indore, Madhya Pradesh, India
Experience: 6 to 8 years
Job Reference Number: 13024
Qualification
6–8 years of hands-on experience with Big Data technologies – PySpark (DataFrame and SparkSQL), Hadoop, and Hive
Strong experience with Python and Bash scripting
Solid understanding of SQL and data warehouse concepts
Excellent analytical, problem-solving, and research skills
Ability to think innovatively and solve problems beyond standard toolsets
Strong communication, presentation, and interpersonal skills
Hands-on experience with AWS Big Data services – IAM, Glue, EMR, RedShift, S3, Kinesis
Experience with orchestration tools such as Apache Airflow and other job schedulers
Experience in migrating workloads from on-premise to cloud or cloud-to-cloud environments
Skills Required
Python, PySpark, SQL
Role & Responsibilities
Develop efficient ETL pipelines based on business requirements, adhering to development standards and best practices
Conduct integration testing of pipelines in AWS environments
Provide time and effort estimates for development, testing, and deployment activities
Participate in peer code reviews to ensure code quality and standards compliance
Build cost-effective AWS pipelines using services like S3, IAM, Glue, EMR, and Redshift