Big Data Engineer
Big Data Engineer
Our direct client, a fast-growing software and data analytics firm in Greenwich, is seeking to bring on a mid to senior level Big Data Engineer. You will be part of the team to design and implement solutions integrated into client’s analytics Hadoop/EMR, Spark and Elasticsearch systems. The ideal candidate is an experienced application developer with a concentration in data pipeline building, automation, data warehousing and data modeling. The successful candidate will be responsible for expanding and optimizing data, analytics and data pipeline architecture, as well as optimizing data flow and building out a semantic layer to support cross functional teams. The Data Engineer will support software developers, business analysts and data scientists on data and computational initiatives and will ensure optimal data architectures and patterns are consistent throughout ongoing projects.
- Collaborate with the infrastructure team to ensure optimal extraction, transformation and loading of data in the current and future systems using technologies like Redshift, Hive, S3, PySpark, Elasticsearch, R, EMR.
- Generation of the client specific multi-tenant large, complex data layers that meets and exceeds the demanding functional / non-functional needs of our SaaS based Web Application.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing solutions for greater scalability.
- Develop Analytic Dashboards, utilizing tools like Tableau and RStudio, that utilize our data pipeline metrics to provide actionable insights, operational efficiency and other key technical and business performance enhancing opportunities.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- Strong organizational, oral and written skills.
- Candidate with 8+ years of experience in a Data Engineer role, who has attained at minimum a Bachelor’s degree in Computer Science.
- Strong analytic and modeling skills in working with both structured and unstructured data
- Successful candidate will have experience and proficiency in the following software / tools:
- Big data tools: Hadoop, Spark 2 (pyspark), Kafka, Elasticsearch, Hive.
- Expertise in Redshift or equivalent columnar databases.
- Data pipeline and workflow management tools i.e.: HortonWorks HDF, Lambda, Oozie, Airflow
- Working knowledge of message queuing and stream processing (Amazon MQ, SQS)
- Experience with Languages Java and/or Scala a plus.
- Experience with RStudio including formatted tables, plots and LaTeX a plus
Job ID: 4387