About

nepatapoff@gmail.com · (650) 773-3116 · San Jose, CA

Hardware and GPU engineer focused on CUDA-accelerated compute, embedded systems, and GPU-backed ML pipelines. I care about performance at the metal level — kernel optimization, memory hierarchy, and making simulation and inference run fast.

Download Resume (PDF)

Skills

CUDA Development

CUDA GraphsStreamsAsync ExecutionDynamic ParallelismPyCUDA

Deep Learning

PyTorchTensorFlowRAPIDS cuMLTritonscikit-learn

Parallel & Multi-GPU

Multi-GPU ScalingBatched PipelinesKernel ProfilingMemory Optimization

Systems & Embedded

C++ 11/14/17CMbed OSSPI/I2CFirmwareArduino

MLOps & DevOps

DockerKubernetesTerraformRaySparkCI/CD

Data & APIs

SQLSQLitePythonPipeline DesignScientific Computing

Experience

Hardware Engineer

05/2025 – Current

PurcellAI

▸Implemented a full-stack data pipeline: from low-level serial capture on embedded devices to higher-level processing and orchestration in Python.
▸Developed firmware for Arduino Nicla Vision (Cortex-M7 MCU + onboard sensors), focusing on low-level embedded programming in C++ and Mbed OS.
▸Deployed quantized models on Nicla Vision, achieving 90% accuracy with real-time (120 ms) on-device inference for guided detection tasks.
▸Implemented manual SPI communication routines to interface with onboard sensors, exploring bit-level timing, protocol handshakes, and register access.

Computational Physics Engineer

05/2024 – 03/2025

LongShot Space · Oakland, CA · Contract

▸Designed custom parallel data structures and optimized algorithms to support CUDA-accelerated 1D explicit simulation of a booster-aided light gas gun, achieving a 50× runtime reduction over CPU execution.
▸Developed and optimized CUDA kernels for parallel execution, building scalable pipelines capable of running 500+ concurrent simulations on a single GPU and extending to multi-GPU deployments.
▸Built GPU-accelerated ML pipelines with RAPIDS and TensorFlow, processing 100,000+ simulation outputs to streamline training, inference, and optimization workflows.
▸Benchmarked and profiled CUDA kernels, improving memory throughput by ~30% under large-batch, distributed simulation workloads.

Characterization Engineer

06/2022 – 12/2022

DragonFly Energy · Reno, NV

▸Used spectroscopic tools (FTIR, RAMAN, QCL) to design new methods for battery degradation analysis.
▸Researched and determined battery failure mechanisms including delamination, dendrite formation, gaseous byproducts, and poor SEI formation.

Education

M.S. Computer Science

08/2025

Arizona State University

B.S. Chemistry

05/2022

University of Nevada, Reno