MOLIANG ZHOU
Staff Machine Learning Engineer
+1 732 526 5760·
endeavourwilliam.zhou@gmail.com·
linkedin.com/in/moliangzhou
Summary
AI platform leader with 10 years of industrial experience in engineering and applied research.
Focused on foundation models, deep learning, and general platform support, connecting cutting-edge
AI with pragmatic engineering to turn research into resilient, production-grade platforms while
growing talent and fostering a high-standard engineering culture.
Key Achievements
◆
Critical Business Impact
Enable nine-figure revenue growth and new products
◆
Multiple 0-to-1 Track Records
Spearhead ML platforms from ground-up to SOTA, end to end
◆
AI-Centric, Domain-Versatile
Foundation model and deep learning focus in ads, NLP, and more
◆
Research & Innovation
Applied research publications and patents in the ML field
Experience
2021 - Present
Cupertino, United States
Senior to Staff Machine Learning Engineer
Apple, Ad Platforms
- Lead ML-platform infrastructure spanning foundation models and classic ML, delivering config-driven training and serving as reusable libraries rather than bespoke services, cutting model time-to-production for ads ranking and relevance teams.
- Architect a unified model-development platform (training engine plus pipeline) whose type-discriminator design scales LLM fine-tuning from single-digit-billion-parameter models on a handful of GPUs to 100B+ models across many GPUs on shared infrastructure, enabling teams to ship larger models without standing up bespoke stacks.
- Drive cross-team Gemini fine-tuning, evaluation, and batch-inference enablement for the algorithm and relevance team, accelerating LLM-based ranking and relevance work by letting them fine-tune and batch-score frontier models at scale on the shared platform.
- Build a GPU inference server on the model-serving platform reaching thousands of QPS across hundreds of replicas at single-digit-millisecond p99 latency (~300x the per-replica density of the prior gradient-boosting stack), hardened with circuit breakers and FMEA-driven failure handling to cut serving cost and latency for production ads models.
- Author an agentic ML-platform assistant exposing two dozen tools over a knowledge base with sub-second retrieval, lifting code-generation accuracy for serving and training authoring across the platform's projects and reducing engineer ramp time.
- Ship a model and workflow authoring tool with built-in complexity analysis that guides engineers to right-sized configurations, further reducing model time-to-production across teams.
- Own the batch-inference platform spec across multiple execution backends, holding batch-vs-online parity below 0.1% on billion-record workloads, and lead ranking, bid, and budget recommendation products with privacy-preserving funnel design that enable nine-figure revenue growth.
- Serve as ML-infra roadmap owner and force-multiplier through hundreds of code and design reviews, cross-org standards, intern and engineer-onboarding programs, and external talks.
2017 - 2021
Boston, United States
Software Development Engineer
Amazon, Alexa AI
- Led deep learning solution 0-to-1, featuring code-free training, easy collaboration, and serving at scale.
- Improved Alexa user experience with 15% error rate reduction.
- Led language model inference engine, serving Alexa at scale with critical requirements.
- Decoupled feature release process across teams and regions to reduce time-to-impact by 50%.
- Optimized model building efficiency by 3x, saving $21M cost annually.
- Optimized feature extraction efficiency by 3x, saving $1.2M cost annually.
2015 - 2017
New Brunswick, United States
Research Assistant
Rutgers, the State University of New Jersey
- Researched and developed an AI-assisted surgery monitoring system, deployed to Children’s National Medical Center.
Education
2014 - 2017
New Brunswick, United States
M.S. Electrical and Computer Engineering
Rutgers, the State University of New Jersey
GPA 4.0 / 4.0
2010 - 2014
Nanjing, China
B.Eng. Information Technology
Southeast University
Technical Skills
Programming Language | Software Framework | Service
Python, Java, Rust, C/C++ | Spring, Spark, Kafka, Docker, Kubernetes | AWS, Datadog, Grafana
Agentic Framework | Large Language Model Framework | AI Productivity Tool
MCP, LangGraph, smolagents, OpenAI Agents SDK | vLLM, SGLang, TensorRT-LLM | Claude Code, Cursor
Distributed Machine Learning Framework & Services
Triton Inference Server, HuggingFace, Ray, DeepSpeed, PyTorch, JAX, TensorFlow, ONNX, MLflow
Selected Patents & Publications
Ranking of Content Based on Implied Relationships
US20250258883A1
2023
Human Conversation Analysis using Attentive Multimodal Networks with Hierarchical Encoder-decoder
ACM International Conference on Multimedia
2018
Evaluation of Trace Alignment Quality and its Application in Medical Process Mining
IEEE International Conference on Healthcare Informatics
2017
Medical Workflow Modeling using Alignment-guided State-splitting HMM
IEEE International Conference on Healthcare Informatics
2017
Progress Estimation and Phase Detection for Sequential Processes
ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
2017
Car-A Deep Learning Structure for Concurrent Activity Recognition
ACM/IEEE International Conference on Information Processing in Sensor Networks
2017
Duration-aware Alignment of Process Traces
Industrial Conference on Data Mining
2016