ML Kernel Performance Engineer – Amazon – Toronto, ON

Location: Toronto, ON | Company: Amazon

Amazon Development Centre Canada ULC is hiring an ML Kernel Performance Engineer for the AWS Neuron team in Toronto, Ontario. This role focuses on developing and optimizing high-performance compute kernels that power Amazon’s custom machine learning accelerators, Inferentia and Trainium, enabling breakthrough performance for deep learning and GenAI workloads.

Sponsored Links

As part of Annapurna Labs, you’ll work at the intersection of machine learning, hardware, and high-performance computing. You’ll collaborate across compiler, runtime, and framework teams to accelerate large-scale ML models, improve kernel-level efficiency, and help shape the future of AI acceleration technology on AWS.

About the role: ML Kernel Performance Engineer

In this position, you will design and implement optimized compute kernels tailored for the Neuron architecture, applying advanced techniques such as fusion, tiling, sharding, and scheduling. You’ll analyze performance bottlenecks using profiling tools, improve runtime efficiency, and work closely with both internal teams and customers to maximize model performance on AWS accelerators.

Beyond kernel development, you’ll contribute to compiler backend optimization, collaborate on future hardware designs, and help drive innovation in distributed ML training and inference systems. This role is ideal for engineers with strong low-level optimization skills who want to make an impact at scale in AI infrastructure.

Sponsored Links

Benefits and Salary

Amazon offers highly competitive compensation, stock options, comprehensive health and retirement benefits, and the chance to work on cutting-edge AI acceleration technologies that power AWS services worldwide.

Job Details

📌 Job Type: Regular, Full-Time

🏢 Company: Amazon Development Centre Canada ULC (AWS Neuron – Annapurna Labs)

📍 Location: Toronto, ON, Canada

🆔 Job ID: 3059983

Requirements / Skills

  • 3+ years of professional experience in software development and system optimization.
  • Strong background in kernel programming, compiler optimizations, and GPU/accelerator performance tuning.
  • Proficiency with CUDA, Triton, OpenCL, SYCL, ROCm, or similar programming models.
  • Experience with low-level optimization, memory hierarchies, and high-performance libraries.
  • Familiarity with ML frameworks such as PyTorch or TensorFlow.
  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.

How to Apply

If you’re ready to shape the future of AI acceleration at AWS, apply now through the official Amazon careers page below.

Share This Opportunity

Know someone with expertise in ML acceleration? Share this role and help them join AWS Neuron – Annapurna Labs in Toronto.

Job Summary & Tips for Applying

AI-generated summary and tips to help you highlight your strengths effectively.

When applying for the ML Kernel Performance Engineer role, emphasize your expertise in low-level optimization, GPU kernel programming, and accelerator architectures. Highlight any experience with CUDA, Triton, or LLVM backends and demonstrate your ability to optimize ML workloads at scale.

Use resume keywords like ML kernel engineer, AWS Neuron, AI acceleration, GPU optimization, high-performance computing, and Toronto software engineer to increase visibility. Showcasing both technical depth and collaboration skills will position you strongly in the selection process.