LLM 研究方向(一): LLM Prompts--p-tuning、LoRA

07-12 1039阅读

1. prompt-tuning background

2. Prompt Tuning 模型介绍

2.1 2021 prefix-tuning

2.2 2021 P-tuning v1

2.3 2021 Parameter-efficient prompt tuning (PET)

2.4 2022 P-tuning v2

2.5 2019 Adapter

2.6 2021 LoRA (Low-Rank Adaptation)

2.7 2024 DoRA (Weight-Decoupled Low-Rank Adaptation)

3. LoRA Implementation

3.1 LoRA 复现 01: MiniLoRA

3.1.1 core codes：torch.nn.utils.parametrize.register_parameterization 参数化应用函数

3.2 LoRA 复现 02: LoRA from Scratch on MNIST

3.2.1 core codes: Lightning 深度学习框架

3.3 LoRA 复现 03: Torch tutorial with torchtune

3.3.1 core codes: torchtune package 介绍

3.4 LoRA 复现 04: peft implementation

3.4.1 core codes: AutoModelForSeq2SeqLM 介绍

3.4.2 code codes: peft package 介绍

3.5 *LoRA 05: Explanation

Reference:

1. prompt-tuning background

problem: 之前的fune-tuning/model-tuning是对大模型进行下游任务re-training，即对whole模型参数进行微调！但由于LLM参数量太大，fine-tuning需要大量的数据、算力去更新学习参数，不够实用！

solution：prompt-tuning (p-tuning)，是一种通过提示词(prompt tokens)优化生成式预训练模型(e.g. GPT)的技术，旨在通过调整prompts而不是整个模型参数来提高模型在特定任务上的表现，达到节省计算开销和资源消耗、保持甚至提升model performance的目的。

按照时间顺序，prompt-tuning演进过程分别是：prefix-tuning、p-tuning v1、parameter-efficient prompt tuning、p-tuning v2。

2. Prompt Tuning 模型介绍

2.1 2021 prefix-tuning

prefix-tuning, paper: Optimizing Continuous Prompts for Generation, 就是在input tokens前面加上几个与任务相关task-specific的tokens，并用 $LLM 研究方向(一): LLM Prompts--p-tuning、LoRA$ 单独训练生成embeddings。

Note：tokens不拼接！原有的input tokens依旧用transformer生成embeddings，并且保持transformer参数不变。The prefix tokens' embeddings $LLM 研究方向(一): LLM Prompts--p-tuning、LoRA$ , hi are drawn from a trainable matrix MLP~ $LLM 研究方向(一): LLM Prompts--p-tuning、LoRA$ . Then remaining tokens' embeddings are computed by the Transformer.

优点：实现简单、高效训练、任务一致性。
缺点：适用性有限，prefix-tuning在一些特定任务中效果不如p-tuning，e.g. 上下文限制，由于prefix embeddings始终位于序列前端，可能无法充分利用输入序列的上下文信息。

2.2 2021 P-tuning v1

p-tuning v1, paper: GPT Understands, Too. 它通过在输入层提示模板固定位置插入可训练的提示词向量trainable prompt tokens embeddings，来提升模型性能。

problem: Previous prompts方法是离散discrete向量空间，主要是从词库V中选词vi作为提示词prompt来出入提示模板的第i个位置，并用prompt generator来生成提示词向量prompt embeddings。这种固定的提示词叫作hard prompt，只能用来微调整个模型的参数 pre-trained model parameters。

solution: p-tuning v1是连续continuous向量空间，主要是通过prompt encoder生成trainable parameterized prompt embeddings来代替词库词vi插入输入层，这种generated trainable prompts称为soft prompt。
- 初始化 initialize prompts: The movie was fantastic . -> 训练优化 -> 推理 inference，这时不BP。
- 优点：少量参数、提高性能、通用性强。
- 缺点：训练复杂；依赖提示词位置。
  
  2.3 2021 Parameter-efficient prompt tuning (PET)
  
  Parameter-efficient prompt tuning, paper: The power of scale for parameter-efficient prompt tuning, 可以在输入序列的任意位置插入trianable prompt embeddings。
  
  2.4 2022 P-tuning v2
  
  p-tuning v2, paper: Prompt tuning can be comparable to fine-tuning universally across scale and tasks, 多层提示prompt，在每一层加上prefix prompt embeddings。
  
  problem: 在模型参数量小于10B的训练中，prompt training效果要低于fine-tuning。
  
  solution：p-tuning v2在每一层都加上了layer prefix prompt embeddings，不同任务可以共享相同的网络参数，支持多任务学习。
  - 优点：可以更好地捕捉和利用上下文信息，进一步提高模型性能、更好泛化、灵活性强。
  - 缺点：实现复杂；计算开销增加。
    
    2.5 2019 Adapter
    
    paper: Parameter-Efficient transfer learning for NLP.
    
    2.6 2021 LoRA (Low-Rank Adaptation)
    
    paper: Low-Rank Adaptation of Large Language Models.
    
    $LLM 研究方向(一): LLM Prompts--p-tuning、LoRA$
    
    LoRA保持pre-trained model参数冻结，只在原始矩阵中添加一个 $LLM 研究方向(一): LLM Prompts--p-tuning、LoRA$ 参数，其参数比原始矩阵少。
    
    problem: 如果我们构造一个与Worig具有相同维度nxm的新 $LLM 研究方向(一): LLM Prompts--p-tuning、LoRA$ 矩阵来对模型进行微调，模型performance没有提升！还会将参数加倍！
    
    solution：所以设计鬼才提出了低秩概念r，通过基于低秩r的低维矩阵乘法来构造 $LLM 研究方向(一): LLM Prompts--p-tuning、LoRA$ , r