[Paper] Alpaca: A Strong, Replicable Instruction-Following Model

15 May 2023 #nlp #llm

Points

Alpaca aims to support academic research on instruction-following large language models (LLMs), addressing deficiencies like hallucinations, toxicity, and biases.
Uses the self-instruct approach to create an instruction-following dataset with text-davinci-003, costing under $500.
The LLaMA 7B model is fine-tuned using efficient techniques.

Background

LLMs trained through instruction-following, such as ChatGPT, have significantly impacted daily life. However, these models still face issues like generating misinformation, toxic content, and exhibiting social biases. To address these problems, academic research is essential. Closed-source models hinder this research, making it difficult to study instruction-following models.

Alpaca is a model designed for academic research, fine-tuned from the LLaMA 7B model using 52k instruction-following data generated from OpenAI’s text-davinci-003. Commercial use of Alpaca is prohibitied by following reasons:

Non-commercial license: LLaMA
Data restrictions: Based on text-davinci-003 prohibiting competition with OpenAI
Deployment caution: Not designed with adequate safety mesuares for general use.

Training Recipe

To train a high-quality instruction-following model under an academic budget, two key challenges are addressed:

Strong pre-trained language model: LLaMA models
High-quality instruction-following data: Self-instruct method

Self-instruct method

Seed set: 175 human-written instruction-following output pairs from self-instruct seed set.
Data generation: Prompting text-davinci-003 to generate more instructions using the seed set as examples.
Efficiency: Improved the self-instruct method, generating 52k unique instructions and outputs for less than $500 using the OpenAI API.

fig1

Fine-tuning the model

Process: LLaMA models are fine-tuned with the generated instruction-following dataset using fully shared data parallel (FSDP) and mixed precision trianing.
Cost and time: Fine-tuning a 7B LLaMA model took 3 hours on eight 80GB A100s, costing less than $100 on most cloud compute providers.

Preliminary Evaluation

Human evaluation was conducted on inputs from the self-instruct evaluation set. Key findings include:

Comparison: Alpaca 7B vs. text-davinci-003
Performance: Alpaca wins 90 to 89 comparisons.
- Given Alpaca’s smaller size and limited data, it performed similarly to text-davinci-003.
Generation style: Alpaca’s outputs tend to be similar with text-davinci-003, and reflect the general style of the training dataset.
Evaluation limitation: The evaluation data’s limitations should be noted.
An interactive demo was released to gather further feedback.

Known Limitiations

Alpaca shares common deficiencies with LLMs, such as hallucinations, toxicity, and stereotypes. It struggles particularly with hallucination, sometimes producing well-written misinformation. Despite these issues, Alpaca provides a lightweight model for studying these deficiencies, aiding academic research.

Release

Released assets:

Demo: Interactive demo for evaluation
Data: 52k demonstrations used to fine-tune Alpaca
Data generation process: Code for generating the data
Training code: Fine-tuning code using Hugging Face API

Future release:

Model weights: Pending guidance from Meta

The release aims to support academic studies on instruction-following LMs and developing new technique to address the existing deficiencies.

Coffee Chat Brewing AI Knowledge

[Paper] Alpaca: A Strong, Replicable Instruction-Following Model

Points

Background

Training Recipe

Self-instruct method

Fine-tuning the model

Preliminary Evaluation

Known Limitiations

Release

Related posts

Hopkins Statistic 31 Jul 2024

소프트웨어 관리 2: 애플리케이션 테스트 22 Jul 2024

소프트웨어 관리 1: 프로젝트 관리와 품질 관리 22 Jul 2024