Compiler Engineer, IC4 — Meta MTIA
- Leveraged LLVM, MLIR, and the Triton frontend to optimize high-level GEMM kernels across 4 generations of SPMD AI accelerator.
- Designed and implemented original compiler passes to maximize cache reuse, absorb kernel logic into specialized fixed-function units, and maximally overlap DMA with computation.
- Provided Triton syntax extensions and MLIR dialects to support computation of block-quantized datatypes that lack an upstream representation (MX, NVFP4).
- Took proactive charge of oncall shifts — cleared thousands of test regressions, unblocked release conveyors, and implemented context-aware AI commit bisection to root-cause failures across Meta's monorepo.
- Consistently achieved Exceeds Expectations (EE) on performance reviews.
Technologies: C++, Python, MLIR, LLVM, PyTorch, Triton