Expire in: a month
We are looking for a Senior ML Systems Engineer to build and validate simulation infrastructure for large-scale machine learning systems. This role focuses on modelling the compute and communication behaviour of systems used for ML training and inference, and using simulation to guide architecture, performance optimization, and capacity planning.
The ideal candidate combines strong systems experience with hands-on experience in measurement, benchmarking, and performance analysis of modern ML systems.
Experience:
The ideal candidate will have strong experience in ML systems, distributed systems, performance engineering, computer architecture, or simulation and hands-on experience with performance benchmarking, profiling, and measurement of ML systems.
You should have an understanding of systems used for machine learning training and inference, coupled with experience analysing compute, communication, and memory behaviour in large-scale ML systems.
Experience with distributed training concepts such as data parallelism, tensor/model parallelism, pipeline parallelism, collectives, and synchronization overheads.
Preference is for a proficiency in one of the following Python, C++, or Rust.
You should have strong analytical skills and the ability to connect simulation results to real system behaviour.
Qualifications:
We are looking for Master’s, or PhD in Computer Science, Electrical Engineering, Computer Engineering, or a related field.
Essential Requirements:
Candidates MUST be eligible to work and live in the UK, without ever requiring sponsorship. Copies of Visa and Passport will be requested.
Candidates MUST be able to work onsite / commute to London on a hybrid basis.
Candidates MUST have experience in simulated distributed ML training/inference workloads.
Candidates MUST have profiled distributed GPU-based ML workloads (inference/training)
Candidates MUST have experience with packet-level/discrete event simulation using ns3 or similar.
Salary / Benefits:
In addition to a Competitive Salary, my client offers a range of Benefits including Hybrid and Flexible Working, Stock Options, 25 days holiday, and relocation assistance.
Skills: ML Systems, Python, C++, Rust, Simulation, Machine Learning, GPU, PyTorch, JAX, XLA, NVLink, PCIeDo not include the following in your job application, CV, or cover letter:
You should not be asked for payment or irrelevant information. If you have concerns about a job advert or employer, seek guidance on how to proceed.
Looking for your next career move? Join a top company hiring Senior ML Systems Engineer job near me in London! This is your chance to work on exciting projects, grow professionally, and enjoy a rewarding career with competitive pay and excellent benefits. Whether you're an experienced professional or looking to take the next step, this role offers the perfect opportunity to enhance your skills and make an impact. Don’t miss out—apply today via Vita CV and take your career to the next level!
© Vita CV: Registered in England and Wales (16187919).
Vita CV uses cookies to enhance your experience, analyze site traffic, and personalize content. By continuing to browse, you agree to our use of cookies.