סקירה כללית

Linux HPC Systems Engineer Academic Research Computing Infrastructure Division IT Infrastructure & Services Department Key Responsibilities
• Lead the development, programming, and management of central computing systems, including infrastructure‑level projects.
• Oversee the implementation and execution of Linux system projects, ensuring reliability and scalability.
• Manage and operate Linux‑based HPC clusters and Deep Learning GPU farms.
• Evaluate, implement, and integrate cutting‑edge HPC technologies.
• Maintain current and future scientific computing systems within the research computing clusters.
• Collaborate with hardware and software vendors to support system development and maintenance.
• Provide consultation and technical guidance to faculty members on planning, purchasing, and using scientific computing systems; prepare training materials as needed.
• Perform additional duties as assigned by the department head.
• Reporting to: Head of Academic Research Computing Infrastructure. Requirements
• Bachelor’s degree in Engineering or Natural Sciences.
• At least 5 years of relevant experience.
• Hands‑on experience with Linux servers in production environments (physical and virtual).
• Experience with at least one HPC scheduler: Slurm, PBS Pro, LSF, Kubernetes, or SGE.
• Experience installing and managing enterprise applications on Linux servers.
• Strong scripting skills in Perl, Python, and Bash.
• Experience with automation and workflow orchestration for cluster environments.
• Advantage: Experience with parallel programming and/or public cloud infrastructures.
• Advantage: Experience with Ethernet and InfiniBand networking.
• Advantage: Experience managing enterprise storage systems (e.g., Dell, NetApp, IBM).
• Advantage: Experience analyzing system performance, benchmarking compute nodes and clusters, and evaluating GPU workload performance.

דרישות המשרה

• Lead the development, programming, and management of central computing systems, including infrastructure‑level projects.
• Oversee the implementation and execution of Linux system projects, ensuring reliability and scalability.
• Manage and operate Linux‑based HPC clusters and Deep Learning GPU farms.
• Evaluate, implement, and integrate cutting‑edge HPC technologies.
• Maintain current and futu