LoRA变体

AdaLoRA

QLoRA

DoRA

DoRA（Weight-Decomposed Low-Rank Adaptation）的主要思想是将预训练权重分解为幅度（magnitude）和方向（direction），并利用 LoRA 来微调方向矩阵。

class LinearWithDoRAMerged(nn.Module):

    def __init__(self, linear, rank, alpha):
        super().__init__()
        self.linear = linear
        self.lora = LoRALayer(
            linear.in_features, linear.out_features, rank, alpha
        )
        self.m = nn.Parameter(
            self.linear.weight.norm(p=2, dim=0, keepdim=True))

  # Code loosely inspired by
  # https://github.com/catid/dora/blob/main/dora.py

    def forward(self, x):
        lora = self.lora.A @ self.lora.B
        numerator = self.linear.weight + self.lora.alpha*lora.T
        denominator = numerator.norm(p=2, dim=0, keepdim=True)
        directional_component = numerator / denominator
        new_weight = self.m * directional_component
        return F.linear(x, new_weight, self.linear.bias)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

LoRA 通常会等比例增减幅度和方向，DoRA 通过将预训练权重矩阵分解为幅度和方向，能够更接近全量微调的效果。

上次更新: 2024/07/12, 15:43:54

← LoRA LoRAMoE→