Ben Finch

Toronto, ON, Canada

Senior Compiler Engineer with 4+ years building production AI compilers (MLIR, LLVM, Triton) for AI accelerators at Meta and UntetherAI. Strong foundations spanning compiler graph transformations, highly parallel device architectures, and embedded and full-stack systems.

Technical Skills

Languages: C, C++, Python, JavaScript, Verilog, VHDL, RISC-V, VBA, LabVIEW, SQL
Compilers & AI: MLIR, LLVM, Triton, ONNX, PyTorch, TensorFlow, Keras, TurnkeyML
Web: React, Node.js, Electron, HTML5, CSS3, Postgres, MariaDB, AWS
Hardware: FPGA (Altera Max10), TI MSP430, NI RoboRIO, Onion Omega, Arduino, Raspberry Pi, NIOS II, PCB design, 3D printing
Controls: Ladder Logic, Siemens STL/FBD/SFC, Siemens TIA, Rockwell FactoryTalk, Wonderware, Toyopuc

Work Experience

Compiler Engineer, IC4 — Meta MTIA

Aug 2025 – Present · Toronto, ON

Leveraged LLVM, MLIR, and the Triton frontend to optimize high-level GEMM kernels across 4 generations of SPMD AI accelerator.
Designed and implemented original compiler passes to maximize cache reuse, absorb kernel logic into specialized fixed-function units, and maximally overlap DMA with computation.
Provided Triton syntax extensions and MLIR dialects to support computation of block-quantized datatypes that lack an upstream representation (MX, NVFP4).
Took proactive charge of oncall shifts — cleared thousands of test regressions, unblocked release conveyors, and implemented context-aware AI commit bisection to root-cause failures across Meta's monorepo.
Consistently achieved Exceeds Expectations (EE) on performance reviews.

Technologies: C++, Python, MLIR, LLVM, PyTorch, Triton

Senior Compiler Engineer — UntetherAI

May 2022 – Jun 2025 · Toronto, ON

Promoted to Senior Jan 2025 · Initial Co-Op Internship Sep 2021 – Dec 2021

Leveraged TurnkeyML, LLVM, and MLIR across two generations of compiler SDKs. Produced compiler strategies for highly flexible and broad model support, as well as automatic kernel code generation.
Designed and developed user-facing compiler flows and APIs for fine-tuned compiler configuration and programmatic reproduction of AI models for deployment and debugging.
Created internal tooling to improve developer experience for integrated testing and live debugging of full AI models on custom accelerator hardware. Reduced triaging and bug-fixing time from weeks to days.
Implemented performant and flexible utilizations of non-trivial AI algorithms such as Non Maximum Suppression (NMS) on a highly parallelized at-memory compute platform.
Led the SDK and developer-infrastructure effort for bringup of new accelerator chips. Owned user-facing device targeting, full SDK device-specific feature activation from AI model ingestion through individual kernel place-and-route and bank compilation, and exhaustive validation of the full AI model library and feature test set across current, new, and future chip architecture revisions.

Technologies: C++, Python, MLIR, LLVM, TurnkeyML, PyTorch, TensorFlow

Cloud/IoT Web Developer (Intern) — blueRover

Sep 2020 – Dec 2020 · Cambridge, ON

Developed a system stability analyzer for the company's production server. System swiftly notified administration of stability issues across web services, from app performance to cloud service status.
Developed and deployed new features and bug fixes to an IoT web app hosted on Amazon AWS.
Safely and effectively staged and brought features to production with minimal supervision.

Technologies: JavaScript, React, Node.js, MariaDB, SQL, AWS

Full-Stack Web Developer (Intern) — The Co-operators

Jan 2020 – May 2020 · Kitchener, ON

Developed effective features on numerous large React codebases for insurance fraud investigation.
Implemented API endpoints with authorization and security features for servers in sandboxed environments.
Designed high-coverage tests for both frontend and backend features.

Technologies: JavaScript, React, Node.js

Control System Designer (Intern) — Powerhouse Controls

Apr 2019 – Aug 2019 · Cambridge, ON

Designed multiple assembly-line feature subsystems for Toyota Motor Manufacturing. Features significantly increased the efficiency of car assembly.
Developed effective robot and automation systems using Ladder Logic and Siemens STL with a high degree of quality.

Technologies: Toyopuc, Siemens STL, Ladder Logic

Control Systems Engineering Co-op (Intern) — Langtree Controls

Jan 2018 – Dec 2018 · Sarnia, ON

Overhauled major chemical processing control systems using Rockwell and Wonderware tools, adding superior system robustness.
Implemented a dynamic CAD system tag database in VB to audit and update over 1,000 loop diagrams.

Technologies: VBA, Python, Wonderware

Education

University of Waterloo

Class of 2022 · Waterloo, ON

Bachelor of Applied Science, Computer Engineering

Projects

TensorBake — A flexible and user-friendly MLIR-to-FPGA compiler. Compiles CNNs to highly variadic templated Verilog kernels, enabling performant and flexible CNN implementations on FPGA accelerators.
CBoil — A faithful port of Java's Parboiled parsing library to C, with new features and higher parsing efficiency. Uses a simple PEG structure to allow compiler parsers to be built quickly and reliably.
MISC64 — A minimal instruction set architecture designed to capitalize on superscalar hardware, with compiler optimizations to run RISC-like code with minimal stalling.
Chirp — A polyphonic synthesizer made in Electron. Implements waveforms, signal filtering, and custom envelopes to create complex sounds and music with MIDI control.
CHIP-8 Emulator — A simple emulator for the CHIP-8 microprocessor, written in C. Implements timers, opcode execution, memory, stack, registers, and graphics.
Cinepi — A Raspberry Pi media player and movie server. Implements HDMI override for local viewing and video streaming over HTTP using an Express API. React frontend imitates the Netflix UI.
NodeJS Blog — A personal website built as a weekend challenge. Implements a git-to-Heroku pipeline to update blog posts. Planned extensions include a REST API to upload and display projects.
FIRST Robotics — Designed and programmed three competitive robots and their interfaces, using multiple motion sensors for autonomous actuation. The most accomplished robot won an event and competed at the world championships.
Machine Learning Samples — Various ML projects, including a CNN that classifies handwritten digits with 95% accuracy and a neural network that detects malignant vs. benign mammograph samples.
MIDI Instrument — An embedded peripheral that communicates with a PC in the MIDI format. Cross-compiled in a virtual environment to the Onion Omega SoC.
WAV Player — A WAV file player on the NIOS II system. Implements efficient buffering and fast data unpacking for perfect audio playback at variable speed.
X/Y Platform — An embedded stepper-motor controller that follows encoded coordinates for smooth X/Y motion. Chassis 3D-printed and actuated with an MSP430 controller using a custom PCB.