DeepSeek LLMs: Development and Team Development Timeline and Key Events

Using the technical papers published by DeepSeek, here is the timeline and dramatic personae of DeepSeek’s models/releases/innovations. The use of synthetic data, not just for training, but in an adversarial RL (reward learning) context seems as important as forward steps in attention.

***So two things in accessible language. DeepSeek improved weights/bias thru altering attention models. That’s the key in reasoning deltas. DeepSeek used synthetic data in something akin to a diffusion (think image generating AI) model. That’s the secret sauce— made necessary by resource limitations***

Pre-2024

The development of Transformer-based Large Language Models (LLMs) established the dominance of decoder-only Transformer architectures. These models relied on self-supervised pre-training to develop various capabilities. Techniques such as supervised fine-tuning and reward modeling were introduced to enhance model performance in alignment with user intentions and instructions.

2024 (Specific Dates Unspecified)

Development of DeepSeek-Coder-V2:

• A coding model focusing on long-context handling (up to 128k tokens) was developed.

• Performance was assessed through pressure tests across various context lengths, demonstrating superior results.

• Benchmarked against other open and closed-source models for code generation.

• Mathematical reasoning abilities were evaluated on benchmarks like GSM8K, MATH, AIME 2024, and Math Odyssey.

Development of DeepSeek-R1:

• Reinforcement Learning (RL) techniques were integrated to enhance reasoning capabilities.

• Two approaches were tested: RL on a base model and RL with a cold start.

• Reward modeling, rejection sampling, and supervised fine-tuning were utilized to refine the model.

• Distillation methods transferred reasoning abilities to smaller models.

• Benchmarked against other LLMs in mathematical and reasoning tasks.

Development of DeepSeek-V2:

• A Mixture-of-Experts (MoE) model was developed, focusing on efficiency and performance.

• Introduced Multi-Head Latent Attention (MLA) to improve inference efficiency.

• New techniques such as Device-Limited Routing were employed to enhance MoE training efficiency.

• Scaling laws were derived for optimal model and data scaling.

• Supervised fine-tuning and reinforcement learning were implemented to align the model with user intent.

• Benchmark evaluations demonstrated performance across diverse tasks.

• Developed a smaller 16B variant, DeepSeek-V2-Lite.

Development of DeepSeek-V3:

• Model architecture was further refined with:

• Multi-Head Latent Attention (MLA)

• DeepSeekMoE with Auxiliary-Loss-Free Load Balancing

• Multi-Token Prediction (MTP)

• Infrastructure improvements included a dual-pipe mechanism and FP8 training.

• Long-context extension was incorporated using YaRN, with support for contexts up to 128k tokens.

• Extensive benchmark evaluations and ablation studies were conducted.

• Reinforcement Learning was enhanced using Group Relative Policy Optimization (GRPO).

• DeepSeek-V3 was also used as a Generative Reward Model.

• Knowledge was distilled from DeepSeek-R1, and self-rewarding mechanisms were explored.

Ongoing Research and Development

The DeepSeek team continues to explore advancements in large language models, with a strong focus on code generation, reasoning capabilities, and efficiency improvements.

Key Individuals in the DeepSeek Development Team

The following individuals have played significant roles in various stages of the DeepSeek LLM development:

DeepSeek-Coder-V2 Development Team

Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen, Xin Xie, Kang Guan, Yuxiang You, Aixin Liu, Qiushi Du, Wenjun Gao, Xuan Lu, Qinyu Chen, Yaohui Wang, Chengqi Deng, Jiashi Li, Chenggang Zhao, Chong Ruan, Fuli Luo, Wenfeng Liang.

DeepSeek-R1 Development Team

Hui Li, Jianzhong Guo, Jingchang Chen, Jingyang Yuan, Jinhao Tu, Junjie Qiu, Junlong Li, J.L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Hu, Kaichao You, Kaige Gao, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingxu Zhou, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R.J. Chen, R.L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S.S. Li, Shuang Zhou, Shaoqing Wu, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wen Liu, Wentao Zhang, W.L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X.Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y.K. Li, Y.Q. Wang, Y.X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxuan Liu, Yuyang Zhou, Y.X. Zhu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan.

DeepSeek-V3 Development Team

Z.Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhengyan Zhang, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Ziyi Gao, Zizheng Pan, Z.F. Wu, Zhuoshu Li, Zihan Wang, Yixin Dong, Size Zheng, Yilong Zhao, Yongji Wang.

Business & Compliance Team

Bin Wang, Dongjie Ji, Leyi Xia, Miaojun Wang, Mingming Li, Peng Zhang, Shaoqing Wu, Shengfeng Ye, T. Wang, W.L. Xiao, Wei An, Xiaosha Chen, Xiaowen Sun, Xianzu Wang, Ying Tang, Yukun Zha, Yuting Yan, Zhiniu Wen, Zhen Zhang.

Data Annotation Team

Lei Xu, Tian Yuan, Yanping Huang, Yaohui Li, Yuchen Zhu, Y.X. Zhu, X.Q. Li, Xiaojin Shen, Xiaosha Chen, Xinyu Yang, Yanhong Xu, Meng Li, R.L. Jin, R.J. Chen, Xinnan Song, Xinyi Zhou, Xingkai Yu, Xiyuan Li, Yixuan Tan, Zhicheng Ma, Zhen Huang, Zhipeng Xu, Zhongyu Zhang.

The DeepSeek project represents an advanced and structured approach to LLM development, continuously refining its models with a focus on efficiency, reasoning, and long-context processing. The multi-disciplinary team of researchers, engineers, and compliance specialists ensures robust progress across different iterations of DeepSeek models. Future advancements will likely continue pushing the boundaries of AI capabilities in code generation, problem-solving, and human-aligned interaction.

0 comments

Burstiness and Perplexity

skool.com/burstiness-and-perplexity

Master AI use cases from legal & the supply chain to digital marketing & SEO. Agents, analysis, content creation--Burstiness & Perplexity from NovCog

Leaderboard (30-day)