Popular Large Language Models of 2023
Today, I would like to explore some of the popular Large Language Models (LLMs) that have gained prominence in 2023. Let’s take a look.
1. GPT-3 and GPT-4 by OpenAI
GPT-3 is a more general-purpose model that can be used for a wide range of language-related tasks. ChatGPT is designed specifically for conversational tasks. GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses and can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem solving abilities.
2. LaMDA by Google
LaMDA is a family of Transformer-based models that is specialized for dialog. These models have up to 137B parameters and are trained on 1.56T words of public dialog data.
3. PaLM by Google
PaLM is a language model with 540B parameters that is capable of handling various tasks, including complex learning and reasoning. It can outperform state-of-the-art language models and humans in language and reasoning tests. The PaLM system uses a few-shot learning approach to generalize from small amounts of data, approximating how humans learn and apply knowledge to solve new problems.
4. Gopher by Deepmind
DeepMind’s language model Gopher is significantly more accurate than existing large language models on tasks like answering questions about specialized subjects such as science and humanities and equal to them in other tasks like logical reasoning and mathematics. Gopher has 280B parameters that it can tune, making it larger than OpenAI’s GPT-3, which has 175 billion.
5. Chinchilla by Deepmind
Chinchilla uses the same computing budget as Gopher, however, with only 70 billion parameters and four times more data. It outperforms models like Gopher, GPT-3 on many downstream evaluation tasks. It uses significantly less computing for fine-tuning and inference, greatly facilitating downstream usage.
6. Ernie 3.0 Titan by Baidu
Ernie 3.0 was released by Baidu and Peng Cheng Laboratory. It has 260B parameters and excels at natural language understanding and generation. It was trained on massive unstructured data and achieved state-of-the-art results in over 60 NLP tasks, including machine reading comprehension, text categorization, and semantic similarity. Additionally, Titan performs well in 30 few-shot and zero-shot benchmarks, showing its ability to generalize across various downstream tasks with a small quantity of labeled data.
7. PanGu-Alpha by Huawei
Huawei has developed a Chinese-language equivalent of OpenAI’s GPT-3 called PanGu-Alpha. This model is based on 1.1 TB of Chinese-language sources, including books, news, social media, and web pages, and contains over 200 billion parameters, 25 million more than GPT-3. PanGu-Alpha is highly efficient at completing various language tasks like text summarization, question answering, and dialogue generation.
8. LLaMA by Meta AI
The Meta AI team introduces LLaMA (Large Language Model Meta AI), a collection of foundational language models with 7B to 65B parameters. LLaMA 33B and 65B were trained on 1.4 trillion tokens, while the smallest model, LLaMA 7B, was trained on one trillion tokens. They exclusively used publicly available datasets, without depending on proprietary or restricted data. The team also implemented key architectural enhancements and training speed optimization techniques. Consequently, LLaMA-13B outperformed GPT-3, being over 10 times smaller, and LLaMA-65B exhibited competitive performance with PaLM-540B.
9. OPT-IML by Meta AI
OPT-IML is a pre-trained language model based on Meta’s OPT model and has 175 billion parameters. OPT-IML is fine-tuned for better performance on natural language tasks such as question answering, text summarization, and translation using about 2000 natural language tasks.
This is all for now. Hope you enjoy that.
By Asahi
waithaw at 2023年07月11日 10:00:00