Ranked as #12 on Forbes’ List of 25 Fastest Growing Public Tech Companies for 2017, EPAM is committed to providing our global team of over 24,000 people with inspiring careers from day one. EPAMers lead with passion and honesty, and think creatively. Our people are the source of our success and we value collaboration, try to always understand our customers’ business, and strive for the highest standards of excellence. No matter where you are located, you’ll join a dedicated, diverse community that will help you discover your fullest potential.
You are curious, persistent, logical and clever – a true techie at heart. You enjoy living by the code of your craft and developing elegant solutions for complex problems. If this sounds like you, this could be the perfect opportunity to join EPAM as a Senior Big Data Engineer. Scroll down to learn more about the position’s responsibilities and requirements.
You are curious, persistent, logical and clever – a true techie at heart. You enjoy living by the code of your craft and developing elegant solutions for complex problems. If this sounds like you, this could be the perfect opportunity to join EPAM as a Senior Big Data Engineer. Scroll down to learn more about the position’s responsibilities and requirements. We are building a new scrum team to build portfolio analytics data ingestion and processing pipelines from scratch on top of Hadoop Cloudera cluster.
Develop proposals for implementation and design of scalable big data architecture;
Develop scalable production ready data integration and processing solutions;
Convert large volumes of structured and unstructured customer data;
Design, implement, and deploy high-performance, custom applications at scale on Hadoop;
Work closely with data analysts and development stakeholders to transform data operations;
Design, document, and implement data lake and data stream processing;
Support the testing, deployment, and support of data processes;
Understand when to use data streams vs data lakes;
Design and implement support tools for data process;
Benchmark systems, analyze bottlenecks and propose solutions to eliminate them;
Articulate and align fellow team members to data process designs.
Proficient understanding of distributed computing principles;
Python (intermediate/advanced) and Spark (intermediate/advanced) are must;
Management of Hadoop cluster (Cloudera preferred), with all included services;
Proficiency with Hadoop v2, MapReduce, HDFS, Sqoop, HBase;
Experience in building stream-processing systems, using solutions such as Storm or Spark-Streaming;
Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala;
Experience in integration of data from multiple data sources such as Microsoft SQL Server, Oracle;
Good understanding of SQL queries, joins, stored procedures, relational schemas;
Experience in various messaging systems, such as Kafka or RabbitMQ;
Experience in Cloudera FS domain knowledge is a big plus but not required.