About
nepatapoff@gmail.com · (650) 773-3116 · San Jose, CA
Hardware and GPU engineer focused on CUDA-accelerated compute, embedded systems, and GPU-backed ML pipelines. I care about performance at the metal level — kernel optimization, memory hierarchy, and making simulation and inference run fast.
Skills
CUDA Development
CUDA GraphsStreamsAsync ExecutionDynamic ParallelismPyCUDA
Deep Learning
PyTorchTensorFlowRAPIDS cuMLTritonscikit-learn
Parallel & Multi-GPU
Multi-GPU ScalingBatched PipelinesKernel ProfilingMemory Optimization
Systems & Embedded
C++ 11/14/17CMbed OSSPI/I2CFirmwareArduino
MLOps & DevOps
DockerKubernetesTerraformRaySparkCI/CD
Data & APIs
SQLSQLitePythonPipeline DesignScientific Computing
Experience
Hardware Engineer
05/2025 – CurrentPurcellAI
- ▸Implemented a full-stack data pipeline: from low-level serial capture on embedded devices to higher-level processing and orchestration in Python.
- ▸Developed firmware for Arduino Nicla Vision (Cortex-M7 MCU + onboard sensors), focusing on low-level embedded programming in C++ and Mbed OS.
- ▸Deployed quantized models on Nicla Vision, achieving 90% accuracy with real-time (120 ms) on-device inference for guided detection tasks.
- ▸Implemented manual SPI communication routines to interface with onboard sensors, exploring bit-level timing, protocol handshakes, and register access.
Computational Physics Engineer
05/2024 – 03/2025LongShot Space · Oakland, CA · Contract
- ▸Designed custom parallel data structures and optimized algorithms to support CUDA-accelerated 1D explicit simulation of a booster-aided light gas gun, achieving a 50× runtime reduction over CPU execution.
- ▸Developed and optimized CUDA kernels for parallel execution, building scalable pipelines capable of running 500+ concurrent simulations on a single GPU and extending to multi-GPU deployments.
- ▸Built GPU-accelerated ML pipelines with RAPIDS and TensorFlow, processing 100,000+ simulation outputs to streamline training, inference, and optimization workflows.
- ▸Benchmarked and profiled CUDA kernels, improving memory throughput by ~30% under large-batch, distributed simulation workloads.
Characterization Engineer
06/2022 – 12/2022DragonFly Energy · Reno, NV
- ▸Used spectroscopic tools (FTIR, RAMAN, QCL) to design new methods for battery degradation analysis.
- ▸Researched and determined battery failure mechanisms including delamination, dendrite formation, gaseous byproducts, and poor SEI formation.
Education
M.S. Computer Science
Expected 08/2025Arizona State University
B.S. Chemistry
05/2022University of Nevada, Reno