[Paper] Alpaca: A Strong, Replicable Instruction-Following Model
Points
- Alpaca aims to support academic research on instruction-following large language models (LLMs), addressing deficiencies like hallucinations, toxicity, and biases.
- Uses the self-instruct approach to create an instruction-following dataset with text-davinci-003, costing under $500.
- The LLaMA 7B model is fine-tuned using efficient techniques.
Background
LLMs trained through instruction-following, such as ChatGPT, have significantly impacted daily life. However, these models still face issues like generating misinformation, toxic content, and exhibiting social biases. To address these problems, academic research is essential. Closed-source models hinder this research, making it difficult to study instruction-following models.
Alpaca is a model designed for academic research, fine-tuned from the LLaMA 7B model using 52k instruction-following data generated from OpenAI’s text-davinci-003. Commercial use of Alpaca is prohibitied by following reasons:
- Non-commercial license: LLaMA
- Data restrictions: Based on text-davinci-003 prohibiting competition with OpenAI
- Deployment caution: Not designed with adequate safety mesuares for general use.
Training Recipe
To train a high-quality instruction-following model under an academic budget, two key challenges are addressed:
- Strong pre-trained language model: LLaMA models
- High-quality instruction-following data: Self-instruct method
Self-instruct method
- Seed set: 175 human-written instruction-following output pairs from self-instruct seed set.
- Data generation: Prompting text-davinci-003 to generate more instructions using the seed set as examples.
- Efficiency: Improved the self-instruct method, generating 52k unique instructions and outputs for less than $500 using the OpenAI API.
Fine-tuning the model
- Process: LLaMA models are fine-tuned with the generated instruction-following dataset using fully shared data parallel (FSDP) and mixed precision trianing.
- Cost and time: Fine-tuning a 7B LLaMA model took 3 hours on eight 80GB A100s, costing less than $100 on most cloud compute providers.
Preliminary Evaluation
Human evaluation was conducted on inputs from the self-instruct evaluation set. Key findings include:
- Comparison: Alpaca 7B vs. text-davinci-003
- Performance: Alpaca wins 90 to 89 comparisons.
- Given Alpaca’s smaller size and limited data, it performed similarly to text-davinci-003.
- Generation style: Alpaca’s outputs tend to be similar with text-davinci-003, and reflect the general style of the training dataset.
- Evaluation limitation: The evaluation data’s limitations should be noted.
- An interactive demo was released to gather further feedback.
Known Limitiations
Alpaca shares common deficiencies with LLMs, such as hallucinations, toxicity, and stereotypes. It struggles particularly with hallucination, sometimes producing well-written misinformation. Despite these issues, Alpaca provides a lightweight model for studying these deficiencies, aiding academic research.
Release
Released assets:
- Demo: Interactive demo for evaluation
- Data: 52k demonstrations used to fine-tune Alpaca
- Data generation process: Code for generating the data
- Training code: Fine-tuning code using Hugging Face API
Future release:
- Model weights: Pending guidance from Meta
The release aims to support academic studies on instruction-following LMs and developing new technique to address the existing deficiencies.