I’m currently a research assistant in the LSDS (Large-Scale Data & Systems) group at the Department of Computing, Imperial College London. I work closely with Lluis Vilanova and his PhD student, Yaoxin Jing. I’m interested in techniques to allow CUDA applications to transparently use remote GPUs without source modification, while minimising network overhead and maintaining the correct semantics of the CUDA runtime and driver APIs. As of now, we are mainly targeting llama.cpp in our analysis, but we are interested in exploring other LLM inference engines such as vLLM, as well as more traditional HPC workloads such as simulations.

Prior to working as a research assistant, I did my MEng degree in Electronic and Information Engineering (2021-2025), also at Imperial - although I often call it Computer Engineering, as I did mostly computing modules in 3rd and 4th year. My masters’ thesis was on disaggregated GPU computation, supervised by Lluis Vilanova. I heavily focused on the performance of llama.cpp when transparently using remote GPUs, and through a RPC batching optimisation inspired by CUDA graphs, we were able to reduce the network overhead of remote GPU access from ~200% to ~10%. You may download a copy of my masters’ thesis here.

During my degree, I did software engineering internships at Revolut and Jump Trading - my internship at Jump Trading primarily involved modern C++ in high-performance live trading systems.