About

nepatapoff@gmail.com · (650) 773-3116 · San Jose, CA

Hardware and GPU engineer focused on CUDA-accelerated compute, embedded systems, and GPU-backed ML pipelines. I care about performance at the metal level — kernel optimization, memory hierarchy, and making simulation and inference run fast.

Skills

CUDA Development

CUDA GraphsStreamsAsync ExecutionDynamic ParallelismPyCUDA

Deep Learning

PyTorchTensorFlowRAPIDS cuMLTritonscikit-learn

Parallel & Multi-GPU

Multi-GPU ScalingBatched PipelinesKernel ProfilingMemory Optimization

Systems & Embedded

C++ 11/14/17CMbed OSSPI/I2CFirmwareArduino

MLOps & DevOps

DockerKubernetesTerraformRaySparkCI/CD

Data & APIs

SQLSQLitePythonPipeline DesignScientific Computing

Experience

Hardware Engineer

05/2025 – Current

PurcellAI

  • Implemented a full-stack data pipeline: from low-level serial capture on embedded devices to higher-level processing and orchestration in Python.
  • Developed firmware for Arduino Nicla Vision (Cortex-M7 MCU + onboard sensors), focusing on low-level embedded programming in C++ and Mbed OS.
  • Deployed quantized models on Nicla Vision, achieving 90% accuracy with real-time (120 ms) on-device inference for guided detection tasks.
  • Implemented manual SPI communication routines to interface with onboard sensors, exploring bit-level timing, protocol handshakes, and register access.

Computational Physics Engineer

05/2024 – 03/2025

LongShot Space · Oakland, CA · Contract

  • Designed custom parallel data structures and optimized algorithms to support CUDA-accelerated 1D explicit simulation of a booster-aided light gas gun, achieving a 50× runtime reduction over CPU execution.
  • Developed and optimized CUDA kernels for parallel execution, building scalable pipelines capable of running 500+ concurrent simulations on a single GPU and extending to multi-GPU deployments.
  • Built GPU-accelerated ML pipelines with RAPIDS and TensorFlow, processing 100,000+ simulation outputs to streamline training, inference, and optimization workflows.
  • Benchmarked and profiled CUDA kernels, improving memory throughput by ~30% under large-batch, distributed simulation workloads.

Characterization Engineer

06/2022 – 12/2022

DragonFly Energy · Reno, NV

  • Used spectroscopic tools (FTIR, RAMAN, QCL) to design new methods for battery degradation analysis.
  • Researched and determined battery failure mechanisms including delamination, dendrite formation, gaseous byproducts, and poor SEI formation.

Education

M.S. Computer Science

Expected 08/2025

Arizona State University

B.S. Chemistry

05/2022

University of Nevada, Reno