Moliang Zhou

AI platform leader with 10 years of industrial experience in engineering and applied research. Focused on foundation models, deep learning, and general platform support, connecting cutting-edge AI with pragmatic engineering to turn research into resilient, production-grade platforms while growing talent and fostering a high-standard engineering culture.

Key Achievements

Experience

2021 - Present

Cupertino, United States

Senior to Staff Machine Learning Engineer

Apple, Ad Platforms

Lead ML-platform infrastructure spanning foundation models and classic ML, delivering config-driven training and serving as reusable libraries rather than bespoke services, cutting model time-to-production for ads ranking and relevance teams.
Architect a unified model-development platform (training engine plus pipeline) whose type-discriminator design scales LLM fine-tuning from single-digit-billion-parameter models on a handful of GPUs to 100B+ models across many GPUs on shared infrastructure, enabling teams to ship larger models without standing up bespoke stacks.
Drive cross-team Gemini fine-tuning, evaluation, and batch-inference enablement for the algorithm and relevance team, accelerating LLM-based ranking and relevance work by letting them fine-tune and batch-score frontier models at scale on the shared platform.
Build a GPU inference server on the model-serving platform reaching thousands of QPS across hundreds of replicas at single-digit-millisecond p99 latency (~300x the per-replica density of the prior gradient-boosting stack), hardened with circuit breakers and FMEA-driven failure handling to cut serving cost and latency for production ads models.
Author an agentic ML-platform assistant exposing two dozen tools over a knowledge base with sub-second retrieval, lifting code-generation accuracy for serving and training authoring across the platform's projects and reducing engineer ramp time.
Ship a model and workflow authoring tool with built-in complexity analysis that guides engineers to right-sized configurations, further reducing model time-to-production across teams.
Own the batch-inference platform spec across multiple execution backends, holding batch-vs-online parity below 0.1% on billion-record workloads, and lead ranking, bid, and budget recommendation products with privacy-preserving funnel design that enable nine-figure revenue growth.
Serve as ML-infra roadmap owner and force-multiplier through hundreds of code and design reviews, cross-org standards, intern and engineer-onboarding programs, and external talks.

2017 - 2021

Boston, United States

Software Development Engineer

Amazon, Alexa AI

Led deep learning solution 0-to-1, featuring code-free training, easy collaboration, and serving at scale.
Improved Alexa user experience with 15% error rate reduction.
Led language model inference engine, serving Alexa at scale with critical requirements.
Decoupled feature release process across teams and regions to reduce time-to-impact by 50%.
Optimized model building efficiency by 3x, saving $21M cost annually.
Optimized feature extraction efficiency by 3x, saving $1.2M cost annually.

Summary

Key Achievements

Experience

Education

Technical Skills

Selected Patents & Publications