As a member of the Tealeaves Health team in Roswell, you will be responsible for design, development, testing and support of a highly customizable next generation data platform with data pipelines, tools and frameworks. This individual will work with an existing development team to create the new data platform with latest big data technologies (Spark, Cloud, Cassandra) and migrate the existing prosperity custom data platform and provide production support. The current platform uses a custom built in house ETL process including Perl, Lua, Python, XML, Bash. The candidate will be accountable for understanding the current platform and design, development, implementation and post-implementation maintenance and support of new platform. The duties include data modeling and design, develop and test ETL using python/scala/java jobs on Spark, enhancements/changes to existing code, new data structures, and new reporting capabilities.
The ideal candidate should have extensive knowledge and experience working on building data pipelines, ETL, Rest API with Spark or similar platform using Python/Java. Ideally, this person also should have the knowledge on data architecture, data flows, data analytics, reporting, Business Intelligence. An agile mindset with commitment to teamwork, collaboration, hustle, and strong communication skills are absolute requirements.
- Work in an Agile Scrum team following process guidelines and participating in team ceremonies.
- Analyze, design and code business-related solutions, as well as core architectural changes, using an Agile programming approach resulting in software delivered on time and in budget;
- Analyze current data ingestion processes and jobs and its logic into new data Platform.
- Acquire data from client or secondary data sources and develop data pipeline to transform, map reduce, analyze data based on business requirements.
- Develop Innovate new ways of managing, transforming and validating data
- Establish and enforce guidelines to ensure consistency, quality and completeness of data assets
- Apply quality assurance best practices to all work products
- Learning cutting edge technologies and applications to solve problems.
- Provide production support and Troubleshoot Data load job failures or performance deficiencies and adjust process as appropriate
- Performance tune data pipelines and code to enhance scalability and performance.
- Automate the data load processes using various latest tools and technologies.
- Bachelor's Degree in Computer Science or Mathematics or related fields.
- 5+ years of work experience in relevant field (Data Engineer, BI Engineer, DW Engineer, Software Engineer, etc)
- Experience build ETL, a DW technology (Redshift, Oracle, Teradata, Netezza etc) and relevant data modeling best practices and design principles.
- Experience in data quality testing; adept at writing test cases and scripts, presenting and resolving data issues
- Experience in implementing distributed and scalable algorithms (Hadoop, Spark) is a plus
- Experience of working in a development teams, using agile techniques and Object Oriented development and scripting languages, is preferred
- Excellent SQL skills with ability to analyze complex data and model data design.
- Proficiency in a major programming language (e.g. Java) and/or a scripting language (scala/perl/python)
- Experience in MapReduce with tools like SPARK/Apache Map Reduce/Amazon EMR
- Strong communication skills and ability to discuss the product with PMs and business owners
- Strong analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy
- Proficiency working in a Linux command line environment for development: vi, git
- Proficiency manipulating data files in a Linux command line environment: awk, sed, cut, sort, grep,
- Proficiency loading and extracting high volume data with MySQL, Postgres
- Understanding of RDBMS, NOSQL databases and data modeling principles
- Demonstrated independent problem solving skills and ability to develop innovative solutions to complex analytical/data-driven problems
- Ability to express complex technical concepts effectively, both verbally and in writing
- Ability to handle multiple projects and deadlines with minimal supervision
- Demonstrated strong sense of ownership, urgency, drive and willingness to work in a small startup like environment with tight schedules and frequent priority changes.
- Excellent interpersonal skills necessary to work effectively with colleagues at various levels of the organization and across multiple locations
Nice to have Skills:
- Experience in Big data analytic platforms like Hadoop, Google Big Query.
- Understanding of XML, JSON, Lua, Perl, Rest API
- Experience in NoSQL databases like MongoDB, Casandra, AWS DynamoDB
- Experience with reporting tools like Tableau, Micro strategy.
- Experience with cloud solutions / AWS
- Experience building reports and/or data visualization
- Experience working with predictive analytics/decision models/data mining libraries as well as the tools for developing such
- Experience building or administering reporting/analytics platforms
- Experience building flexible data APIs that consumers use to power other parts of the business
- Health care claim and Patient data experience is a plus