## Soft Prompt Tuning

Soft Prompt Tuning on Sentiment Analysis Task

In [1]:
import torch
import t2t

In [2]:
trainer_arguments = t2t.TrainerArguments(
    model_name_or_path="t5-large",
    output_dir="/tmp/saved_model",
    cache_dir="/workspace/cache",
    overwrite_output_dir=True,
    train_file="../sample_data/trainlines.json",
    validation_file="../sample_data/validlines.json",
    source_id="s",
    target_id="t",
    max_source_length=128,
    max_target_length=8,
    num_train_epochs=3,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    gradient_accumulation_steps=1,
    warmup_ratio=0.1,
    lr_scheduler_type="linear",
    logging_strategy="epoch",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=1,
    learning_rate=0.1,
    fp16=False,
)
trainer = t2t.Trainer(arguments=trainer_arguments)

Using custom data configuration default-364cfa96496637f3


Downloading and preparing dataset json/default to ../cache/json/default-364cfa96496637f3/0.0.0/d75ead8d5cfcbe67495df0f89bd262f0023257fbbbd94a730313295f3d756d50...


  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

0 tables [00:00, ? tables/s]

Dataset json downloaded and prepared to ../cache/json/default-364cfa96496637f3/0.0.0/d75ead8d5cfcbe67495df0f89bd262f0023257fbbbd94a730313295f3d756d50. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

### Prompt Tuning Conversion

* Add in new tokens into tokenizer and model embedding layer
* Freeze entire model except for embedding parameters for the new tokens
* Initialize the new tokens (currently to the mean of the embedding layer)

In this example, we use 5 prompt tokens, which has been demonstrated to be effective.

Note: entire embedding layer is marked as trainable, but the gradients are zeroed out for all the embeddings except the newly added prompt tokens.

In [3]:
prompt_list = ["<p1>","<p2>","<p3>","<p4>","<p5>"]
trainer.convert_to_prompt_tuning(prompt_list=prompt_list)
trainer.model_summary()

Summary
- model name: t5-large
- model params:
  - train: 32.9 M
  - total: 737.7 M
  - vocab: 32105
- prompt tuning only: True


Examples of tokenization and embeddings

In [4]:
test_seq = "<p1><p2><p3><p4><p5> XXX"
print("input:", test_seq)
tokenized_seq = trainer.tokenizer(test_seq)["input_ids"]
print("tokenized:", tokenized_seq)
print("prompt tokens ->", tokenized_seq[:5])

for name, param in trainer.model.shared.named_parameters():
    print("example of frozen token:")
    frozen_token_example = param.data[tokenized_seq[0]-1][:10]
    print(frozen_token_example)
    print("^ this should not change after training\n")
    print("example of prompt token:")
    print(param.data[tokenized_seq[0]][:10])
    print("^ this is trainable")

input: <p1><p2><p3><p4><p5> XXX
tokenized: [32100, 32101, 32102, 32103, 32104, 3, 4, 4, 4, 1]
prompt tokens -> [32100, 32101, 32102, 32103, 32104]
example of frozen token:
tensor([ 0.7617, -1.4844,  3.0625,  1.8516, -1.7812, -5.5938,  0.7852,  2.3594,
        -2.0781, -6.2188])
^ this should not change after training

example of prompt token:
tensor([-4.3131, -2.5815,  2.5195,  2.8988, -2.4462, -3.0837, -2.9662, -3.0946,
         3.9238, -2.4886])
^ this is trainable


### Train Model

In [5]:
trainer.train(valid=True)

Running tokenizer on train dataset:   0%|          | 0/8 [00:00<?, ?ba/s]

Running tokenizer on validation dataset:   0%|          | 0/3 [00:00<?, ?ba/s]

***** Running training *****
  Num examples = 8000
  Num Epochs = 3
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 750


Epoch,Training Loss,Validation Loss,Bleu,Gen Len
1,15.7009,0.314461,2.2748,2.0435
2,0.4194,0.222372,0.0,2.0
3,0.3428,0.211115,0.0,2.0


***** Running Evaluation *****
  Num examples = 2001
  Batch size = 32
Saving model checkpoint to /tmp/saved_model/checkpoint-250
Configuration saved in /tmp/saved_model/checkpoint-250/config.json
Model weights saved in /tmp/saved_model/checkpoint-250/pytorch_model.bin
tokenizer config file saved in /tmp/saved_model/checkpoint-250/tokenizer_config.json
Special tokens file saved in /tmp/saved_model/checkpoint-250/special_tokens_map.json
Copy vocab file to /tmp/saved_model/checkpoint-250/spiece.model
***** Running Evaluation *****
  Num examples = 2001
  Batch size = 32
Saving model checkpoint to /tmp/saved_model/checkpoint-500
Configuration saved in /tmp/saved_model/checkpoint-500/config.json
Model weights saved in /tmp/saved_model/checkpoint-500/pytorch_model.bin
tokenizer config file saved in /tmp/saved_model/checkpoint-500/tokenizer_config.json
Special tokens file saved in /tmp/saved_model/checkpoint-500/special_tokens_map.json
Copy vocab file to /tmp/saved_model/checkpoint-500/spiec

***** train metrics *****
  epoch                    =        3.0
  total_flos               = 12287178GF
  train_loss               =     5.4877
  train_runtime            = 0:07:15.95
  train_samples            =       8000
  train_samples_per_second =     55.052
  train_steps_per_second   =       1.72


In [6]:
for name, param in trainer.model.shared.named_parameters():
    print("original frozen token:", frozen_token_example)
    print("current frozen token:", param.data[tokenized_seq[0]-1][:10])
    print("^ this should not have changed\n")
    print("prompt token:")
    print(param.data[tokenized_seq[0]])
    print("^ this is trainable, so should have changed")

original frozen token: tensor([ 0.7617, -1.4844,  3.0625,  1.8516, -1.7812, -5.5938,  0.7852,  2.3594,
        -2.0781, -6.2188])
current frozen token: tensor([ 0.7617, -1.4844,  3.0625,  1.8516, -1.7812, -5.5938,  0.7852,  2.3594,
        -2.0781, -6.2188], device='cuda:0')
^ this should not have changed

prompt token:
tensor([ 1.3241, -2.8448,  2.0462,  ..., -0.2151,  8.1019, 13.3674],
       device='cuda:0')
^ this is trainable, so should have changed


### Test Model

In [7]:
input_text = "<p1><p2><p3><p4><p5> This is the worst movie I have ever seen!"
trainer.generate_single(input_text, max_length=8)

'negative'

In [8]:
input_text = "<p1><p2><p3><p4><p5> This is the best movie I have ever seen!"
trainer.generate_single(input_text, max_length=8)

'positive'