LoRA综述

PiSSA

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large (opens new window)

如图 1 所示，PiSSA (图 1c) 在模型架构上和 LoRA [1] 完全一致 (图 1b)，只是初始化 Adapter 的方式不同。LoRA 使用高斯噪声初始化 A，使用 0 初始化 B。而 PiSSA 使用主奇异值和奇异向量 (Principal Singular values and Singular vectors) 来初始化 Adapter 来初始化 A 和 B。

MiLoRA

MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning

目标：MiLoRA减少在多租户环境中生成新标记时的延迟——基于提示的路由机制
方案：输入提示的隐藏状态作为输入，经过pooler activation router，确定LoRA，后续层不再更改
效果：
问题：直接将输入作为prompt？

VeRA

VeRA: Vector-based Random Matrix Adaptation

目标：减少LoRA可训练参数量
效果：VeRA 与 LoRA 之间的性能差异并不显著，但可训练参数总量的下降非常明显，相比 LoRA， VeRA 的训练参数仅为前者的 1/100

LoRA-drop

LoRA-drop:Efficient LoRA Parameter Pruning based on Output Evaluation

目标：对不重要层的LoRA进行剪枝
方案：根据 $B A X$ 算重要性
效果：

AdaLoRA

AdaLoRA:Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

目标：
方案：与相同秩的标准LoRA相比，两种方法总共有相同数量的参数，但这些参数的分布不同，在AdaLoRA中，有的矩阵的秩高一些，有的矩阵的秩低一些
效果：
问题：

GaLore

目标：
方案：
效果：
问题：

Deep LoRA

目标：
方案：
效果：
问题：

CorDA

QLoRA

DoRA

DoRA（Weight-Decomposed Low-Rank Adaptation）的主要思想是将预训练权重分解为幅度（magnitude）和方向（direction），并利用 LoRA 来微调方向矩阵。

class LinearWithDoRAMerged(nn.Module):

    def __init__(self, linear, rank, alpha):
        super().__init__()
        self.linear = linear
        self.lora = LoRALayer(
            linear.in_features, linear.out_features, rank, alpha
        )
        self.m = nn.Parameter(
            self.linear.weight.norm(p=2, dim=0, keepdim=True))

  # Code loosely inspired by
  # https://github.com/catid/dora/blob/main/dora.py

    def forward(self, x):
        lora = self.lora.A @ self.lora.B
        numerator = self.linear.weight + self.lora.alpha*lora.T
        denominator = numerator.norm(p=2, dim=0, keepdim=True)
        directional_component = numerator / denominator
        new_weight = self.m * directional_component
        return F.linear(x, new_weight, self.linear.bias)