Projects

GPU kernels, ML systems, and low-level engineering work.

A100 MatMul Optimizer

Custom CUDA kernels for matrix multiplication optimized for A100 GPUs. Achieves near-cuBLAS throughput using shared memory tiling and warp-level primitives.

CUDAC++GPU

Modular Compute Pipeline

Plugin-based ML inference pipeline with hot-swappable CUDA and CPU backends. Built around ONNX Runtime with custom C++ extension modules.

C++ONNXMLSystems

Modal A100 Playground

Serverless GPU compute experiments on Modal.com. Benchmarks distributed training, custom ops, and multi-GPU communication patterns.

PythonGPUML

Dynamic Plugin Loader

Runtime plugin loading system in C++ using dlopen/dlsym. Supports versioned interfaces, hot-reload, and sandboxed execution.

C++Systems

Memory Manager

Custom user-space memory allocator implementing buddy system and slab allocation. Studied for cache-line alignment and fragmentation reduction.

C++Systems

Polymarket Trader

Algorithmic prediction market trading system with real-time orderbook analysis and automated position management.

PythonML