GPT3.5, InstructGPT和ChatGPT的关系

2024-03-11 1383阅读

温馨提示：这篇文章已超过376天没有更新，请注意相关的内容是否还可用！

GPT-3.5

GPT-3.5 系列是一系列模型，从 2021 年第四季度开始就使用文本和代一起进行训练。以下模型属于 GPT-3.5 系列：

code-davinci-002 是一个基础模型，非常适合纯代码完成任务
text-davinci-002 是一个基于 code-davinci-002 的 InstructGPT 模型
text-davinci-003 是对 text-davinci-002 的改进

gpt-3.5-turbo-0301 是对 text-davinci-003 的改进，针对聊天进行了优化

InstructGPT

以 3 种不同方式训练的 InstructGPT 模型变体：

训练方法模型	模型名字
SFT 监督微调人类示范 davinci-instruct-beta1	davinci-instruct-beta1
FeedME 对人工编写的演示和模型样本进行监督微调，这些模型样本被人工标注者在总体质量得分上评分为 7/7	text-davinci-001, text-davinci-002, text-curie-001, text-babbage-001
PPO 使用人类比较训练的奖励模型进行强化学习	text-davinci-003

SFT 和 PPO 模型的训练与 InstructGPT 论文中的模型类似。 FeedME（“feedback made easy”的缩写）模型是通过从我们所有的模型中提取最佳完成度来训练的。我们的模型通常在训练时使用最佳可用数据集，因此使用相同训练方法的不同引擎可能会在不同数据上进行训练。

ChatGPT

ChatGPT和InstructGPT是一对姐妹模型，是在GPT-4之前发布的预热模型，有时候也被叫做GPT3.5。ChatGPT和InstructGPT在模型结构，训练方式上都完全一致，即都使用了指示学习（Instruction Learning）和人工反馈的强化学习（Reinforcement Learning from Human Feedback，RLHF）来指导模型的训练，它们不同的仅仅是采集数据的方式上有所差异。

OpenAI 官网

We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.

其实GPT-3.5-turbo* 就是ChatGPT的模型的名字。

OpenAI相关研究论文

这些是我们今天在 API 中提供的研究论文中最接近的模型。请注意，并非 API 中可用的所有模型都对应于一篇论文，即使对于下面列出的模型，也可能存在细微差异，无法准确复制论文。

论文	发表时间	在论文中的模型名字	在API中模型的名字	参数数量
[2005.14165] Language Models are Few-Shot Learners	22 Jul 2020	GPT-3 175B	davinci	175B
GPT-3 6.7B	curie	6.7B
GPT-3 1B	babbage	1B
[2107.03374] Evaluating Large Language Models Trained on Code	14 Jul 2021	Codex 12B	code-cushman-0013	12B
[2201.10005] Text and Code Embeddings by Contrastive Pre-Training	14 Jan 2022	GPT-3 unsupervised cpt-text 175B	text-similarity-davinci-001	175B
GPT-3 unsupervised cpt-text 6B	text-similarity-curie-001	6B
GPT-3 unsupervised cpt-text 1.2B	No close matching model on API	1.2B
[2009.01325] Learning to summarize from human feedback	15 Feb 2022	GPT-3 6.7B pretrain	No close matching model on API	6.7B
GPT-3 2.7B pretrain	No close matching model on API	2.7B
GPT-3 1.3B pretrain	No close matching model on API	1.3B
[2203.02155] Training language models to follow instructions with human feedback	4 Mar 2022	InstructGPT-3 175B SFT	davinci-instruct-beta	175B
InstructGPT-3 175B	No close matching model on API	175B
InstructGPT-3 6B	No close matching model on API	6B
InstructGPT-3 1.3B	No close matching model on API	1.3B

其它

强化学习

通常，强化学习看起来像这样。环境会为每个动作产生奖励

InstructGPT

VPS购买请点击我

免责声明：我们致力于保护作者版权，注重分享，被刊用文章因无法核实真实出处，未能及时与作者取得联系，或有版权异议的，请联系管理员，我们会立即处理! 部分文章是来自自研大数据AI进行生成,内容摘自(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供学习参考,不准确地方联系删除处理! 图片声明：本站部分配图来自人工智能系统AI生成,觅知网授权图片,PxHere摄影无版权图库和百度，360，搜狗等多加搜索引擎自动关键词搜索配图，如有侵权的图片，请第一时间联系我们，邮箱：ciyunidc@ciyunshuju.com。本站只作为美观性配图使用,无任何非法侵犯第三方意图,一切解释权归图片著作权方,本站不承担任何责任。如有恶意碰瓷者,必当奉陪到底严惩不贷!

GPT3.5, InstructGPT和ChatGPT的关系

GPT-3.5

InstructGPT

ChatGPT

OpenAI相关研究论文

其它

强化学习

InstructGPT

相关阅读

怎么把织梦的模板替换?

dedecms怎么调用特定的栏目文档?

怎么抓包一个网页?

wap怎么封装app?

目录[+]