View job on Handshake
Are you an IT Technologist passionate about the way research functions with technology? Do you love working with researchers on development as well as IT professionals in operations? If you have experience in both, we want you to join our ACCRE team as a DevOps Engineer.
The Engineer will collaborate closely with the DSI Data Science Team to enable fast-paced and independent work by the data science community with cutting-edge hardware and tools. The Engineer will be a member of the team to discuss, plan, and implement best-practice, reproducible, and sustainable solutions. For example, the ability for researchers to quickly spin up containers for the development of deep learning projects, and then share those containers with other research members will enable fast-paced and independent work with less need for direct assistance. In addition, the Engineer will have an opportunity to share skills through teaching and mentoring the next generation of Data Scientists and Data Science Engineers.
In addition, the DevOps Engineer focuses on system development and integrations, and user support for the ACCRE research community. Emphasis is placed on building middleware and system services that allow groups to scale productivity.
The DevOps Engineer has leadership and mentoring roles for the software group. The position reports to the Director of Research Computing Operations and is advised by the Technical Director.
Computing is emerging as a third paradigm for discovery, complementing theory and experiment.
The Advanced Computing Center for Research and Education (ACCRE) is being built and operated by Vanderbilt faculty. Its mission is to allow Vanderbilt researchers to define, benefit from, and explore HPC capabilities.
The center operates a 10,000+ core Linux cluster comprised of multiple computer architectures and over 14 petabytes of parallel access, fault tolerant, distributed disk storage.
This position will also work with the Data Science Institute (DSI). There you will work with researchers to develop and administer machine learning applications and systems. DSI utilizes Nvidia DGX systems for AI workloads to accelerate data-driven research and to study the impact of big data on society.
Duties and Responsibilities
- Leads and mentors a small team of developers.
- Co-teaches small workshops related to software development.
- Research, develop, implement, maintain, document, and support On-Demand based applications, ELK integrations, and custom service APIs.
- Research, develop, implement, maintain, document, and support infrastructure technology and services that facilitate the management and usage of the cluster, which includes elevated end-user cluster support.
- Develop libraries and application to facilitate cluster and user management and assist computer system analysts in the creation of cluster related software applications.
- Compile software for researchers using the EasyBuild framework.
- Assess software packages that could expand ACCRE’s value to users.
- Research and evaluate new technologies and concepts which could potentially further improve ACCRE’s capabilities and services.
- Provide guidance to existing and potential users on how to use ACCRE for their research projects through both one-on-one and small group training sessions including assistance with code compiling.
- Participate in the on-call rotation and after hours scheduled and unscheduled downtimes.
- Maintain familiarity with emerging techniques and technologies in research computing.
- Administer new server tools (e.g. DGX A100s).
- Develop and administer tools to monitor, analyze, and verify data to ensure system and data integrity.
- Plan, implement and evaluate deployment of common data science tools and frameworks in collaboration with the DSI Data Science Team (e.g., MLFlow, DVC, Fast.ai, huggingface.co)
- Collaborate with data scientists to creatively address the unique needs of a highly skilled user base in a reproducible and sustainable manner.
- Deliver workshops on the technology underpinnings of data science and data science engineering in collaboration with the DSI Data Science Team.
- Mentor Data Science Masters students in data science engineering practices, while leveraging their skills to further the work described above.
- Collaborate with the Data Science Team to develop approaches to moving data science solutions into production (e.g. streamlit, Shiny, Dash) with on-premise or hosted solutions.
- Bachelor’s degree required; strongly preferred to be in computer science or computer engineering
- A minimum of five years of experience with one or more major programming languages such as C, C++, Java, or Fortran, during work or school.
- Five years of experience with one or more Unix scripting languages such as Perl, Bash, Csh, or Python, plus a working knowledge of all of these, during work or school.
- Knowledge of Docker or other containers (e.g., Singularity)
- Understanding of server setup and usage (e.g., nginx, gunicorn)
- Strong ability to work independently and in a team environment and make decisions.
- Strong ability to share knowledge coherently with others and motivate and integrate peers.
- Ability to communicate to researchers the value that ACCRE provides.
- Physical ability to work with and lift hardware when needed.
- Strong programming ability and understanding of commonly used design patterns.
- Experience programming and integrating backend system services.
- Experience with Big Data software tools is a plus, which includes any of the following:
- HDFS/Hadoop and its related software stack (e.g. Pig, Hive, Spark, etc.).
- Database use and management in environments like MySQL, PostgreSQL, and NoSQL.
- Development and/or use of data mining, machine learning, or statistical analysis software in environments like R, Python, Matlab, or Stata.
- Knowledge and experience in version control tools and configuration management tools.
Commitment to Equity, Diversity, and Inclusion
At Vanderbilt University, we are intentional about and assume accountability for fostering advancement and respect for equity, diversity, and inclusion for all students, faculty, and staff. Our commitment to diversity makes us who we are. We have created a community that celebrates differences and lets individuality thrive. As part of this commitment, we actively value diversity in our workplace and learning environments as we seek to take advantage of the rich backgrounds and abilities of everyone. The diverse voices of Vanderbilt represent an invaluable resource for the University in its efforts to fulfill its mission and strive to be an example of excellence in higher education.
Vanderbilt University is an equal opportunity, affirmative action employer. Women, minorities, people with disabilities and protected veterans are encouraged to apply.
Please note, all candidates selected for an offer of employment are subject to pre-employment background checks, which may include but are not limited to, based on the role for which they have been selected: criminal history, education verification, social media review, motor vehicle records, credit history, and professional license verification.
Vanderbilt is a community of talented and diverse staff & faculty!
- Working and growing together as a community of communities… we are One Vanderbilt.
- Providing a work environment where every staff and faculty member can be their authentic and best self, while providing the resources and opportunities to learn and grow.
- Encouraging development, collaboration, and partnership both internally and externally while fostering the value that every member of the Vanderbilt community can lead and grow regardless of title or position.
We understand you have a choice when choosing where to work and pursue a career. We understand you are unique and have a story. We want to hear it. We encourage you to apply today so that you might become a part of our story.
Vanderbilt University has made the health and safety of our students, faculty and staff and our surrounding communities a top priority. As part of that commitment, the University requires all employees to (1) participate in routine on-campus COVID-19 testing or (2) show proof of full vaccination against COVID-19.