OpenLLaMA, an open-source reproduction of Meta AI’s LLaMA model, has become public and suited for commercial use. And this is excellent news because previously released models like Vicuna, OpenAssistant, and Alpaca, which we once wrote about, are only great for research and сannot be used for commercial purposes.
OpenLLaMa’s creators used the RedPajama dataset to replicate the LLaMA training dataset containing over 1.2 trillion tokens. They applied the same preprocessing and training hyperparameters as those in the original LLaMA. Besides, engineers trained the models on cloud TPU-v4s applying EasyLM, a JAX-based training pipeline built for training and fine-tuning language models.
Engineers evaluated the performance of OpenLLaMA introduced as a 7B model on a few tasks. They matched the results to the original LLaMA and GPT-J, a 6B parameter model trained on the Pile dataset by EleutherAI. Upon evaluation, OpenLLaMA mostly displayed comparable or even better performance than the original LLaMA and GPT-J. Below are the results:
| Task/Metric | GPT-J 6B | LLaMA 7B | OpenLLaMA 7B 400B Tokens | OpenLLaMA 3B 350B Tokens |
|---|---|---|---|---|
| anli_r1/acc | 0.32 | 0.35 | 0.33 | 0.34 |
| anli_r2/acc | 0.34 | 0.34 | 0.33 | 0.34 |
| anli_r3/acc | 0.35 | 0.37 | 0.34 | 0.37 |
| arc_challenge/acc | 0.34 | 0.39 | 0.34 | 0.31 |
| arc_challenge/acc_norm | 0.37 | 0.41 | 0.34 | 0.33 |
| arc_easy/acc | 0.67 | 0.68 | 0.68 | 0.65 |
| arc_easy/acc_norm | 0.62 | 0.52 | 0.64 | 0.59 |
| boolq/acc | 0.66 | 0.75 | 0.67 | 0.60 |
| cb/acc | 0.36 | 0.36 | 0.43 | 0.11 |
| cb/f1 | 0.26 | 0.24 | 0.22 | 0.10 |
| hellaswag/acc | 0.50 | 0.56 | 0.49 | 0.45 |
| hellaswag/acc_norm | 0.66 | 0.73 | 0.67 | 0.61 |
| openbookqa/acc | 0.29 | 0.29 | 0.28 | 0.26 |
| openbookqa/acc_norm | 0.38 | 0.41 | 0.39 | 0.37 |
| piqa/acc | 0.75 | 0.78 | 0.74 | 0.72 |
| piqa/acc_norm | 0.76 | 0.78 | 0.74 | 0.73 |
| record/em | 0.88 | 0.91 | 0.88 | 0.86 |
| record/f1 | 0.89 | 0.91 | 0.88 | 0.87 |
| rte/acc | 0.54 | 0.56 | 0.61 | 0.56 |
| truthfulqa_mc/mc1 | 0.20 | 0.21 | 0.22 | 0.23 |
| truthfulqa_mc/mc2 | 0.36 | 0.34 | 0.36 | 0.35 |
| wic/acc | 0.50 | 0.50 | 0.50 | 0.50 |
| winogrande/acc | 0.64 | 0.68 | 0.66 | 0.61 |
| wsc/acc | 0.37 | 0.35 | 0.40 | 0.39 |
| Average | 0.50 | 0.52 | 0.51 | 0.47 |
The resource: Github
Furthermore, although OpenLLaMA was taught on 200 billion tokens instead of the 1 trillion tokens used for the original LLaMA and 500 billion tokens in GPT-J, creators expect its performance to improve even further when the training on 1 trillion tokens finishes.
The OpenLLaMA team has unleashed a preview checkpoint of their weights available in EasyLM and PyTorch formats for further contribution. OpenLLaMA’s tokenizer and weights are trained from ground zero, so developers won’t have to obtain the original LLaMA tokenizer and weights. Worth noting that OpenLLaMA uses the BOS token during training, so it should be prepended for optimal performance during a few-shot evaluation.
To date, the core developers have retrained the tokenizer based on the feedback from the community, as the previous checkpoint release had an inaccurately configured tokenizer that couldn’t keep new lines. The newly retrained tokenizer reduced training loss.
Currently, the team plans to complete the training process on the entire RedPajama dataset to streamline direct comparison between the original LLaMA and OpenLLaMA.
We at UnidataLab carefully select models for your purposes. If you have any questions on how we can leverage the newest technologies together, let’s discuss them!