Projects
GPU kernels, ML systems, and low-level engineering work.
A100 MatMul Optimizer
Custom CUDA kernels for matrix multiplication optimized for A100 GPUs. Achieves near-cuBLAS throughput using shared memory tiling and warp-level primitives.
Modular Compute Pipeline
Plugin-based ML inference pipeline with hot-swappable CUDA and CPU backends. Built around ONNX Runtime with custom C++ extension modules.
Modal A100 Playground
Serverless GPU compute experiments on Modal.com. Benchmarks distributed training, custom ops, and multi-GPU communication patterns.
Dynamic Plugin Loader
Runtime plugin loading system in C++ using dlopen/dlsym. Supports versioned interfaces, hot-reload, and sandboxed execution.
Memory Manager
Custom user-space memory allocator implementing buddy system and slab allocation. Studied for cache-line alignment and fragmentation reduction.
Polymarket Trader
Algorithmic prediction market trading system with real-time orderbook analysis and automated position management.