{"id":13244,"date":"2023-07-11T10:00:00","date_gmt":"2023-07-11T01:00:00","guid":{"rendered":"https:\/\/www.gigas-jp.com\/appnews\/?p=13244"},"modified":"2023-07-10T19:49:21","modified_gmt":"2023-07-10T10:49:21","slug":"popular-large-language-models-of-2023","status":"publish","type":"post","link":"https:\/\/www.gigas-jp.com\/appnews\/archives\/13244","title":{"rendered":"Popular Large Language Models of 2023"},"content":{"rendered":"\n<p>Today, I would like to explore some of the popular Large Language Models (LLMs) that have gained prominence in 2023. Let\u2019s take a look.<\/p>\n\n\n\n<p><strong>1. GPT-3 and GPT-4 by OpenAI<\/strong><\/p>\n\n\n\n<p>GPT-3 is a more general-purpose model that can be used for a wide range of language-related tasks. ChatGPT is designed specifically for conversational tasks. GPT-4 is OpenAI\u2019s most advanced system, producing safer and more useful responses and can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem solving abilities.<\/p>\n\n\n\n<p><strong>2. LaMDA by Google<\/strong><\/p>\n\n\n\n<p>LaMDA is a family of Transformer-based models that is specialized for dialog. These models have up to 137B parameters and are trained on 1.56T words of public dialog data.<\/p>\n\n\n\n<p><strong>3. PaLM by Google<\/strong><\/p>\n\n\n\n<p>PaLM is a language model with 540B parameters that is capable of handling various tasks, including complex learning and reasoning. It can outperform state-of-the-art language models and humans in language and reasoning tests. The PaLM system uses a few-shot learning approach to generalize from small amounts of data, approximating how humans learn and apply knowledge to solve new problems.<\/p>\n\n\n\n<p><strong>4. Gopher by Deepmind<\/strong><\/p>\n\n\n\n<p>DeepMind\u2019s language model Gopher is significantly more accurate than existing large language models on tasks like answering questions about specialized subjects such as science and humanities and equal to them in other tasks like logical reasoning and mathematics. Gopher has 280B parameters that it can tune, making it larger than OpenAI\u2019s GPT-3, which has 175 billion.<\/p>\n\n\n\n<p><strong>5. Chinchilla by Deepmind<\/strong><\/p>\n\n\n\n<p>Chinchilla uses the same computing budget as Gopher, however, with only 70 billion parameters and four times more data. It outperforms models like Gopher, GPT-3 on many downstream evaluation tasks. It uses significantly less computing for fine-tuning and inference, greatly facilitating downstream usage.<\/p>\n\n\n\n<p><strong>6. Ernie 3.0 Titan by Baidu<\/strong><\/p>\n\n\n\n<p>Ernie 3.0 was released by Baidu and Peng Cheng Laboratory. It has 260B parameters and excels at natural language understanding and generation. It was trained on massive unstructured data and achieved state-of-the-art results in over 60 NLP tasks, including machine reading comprehension, text categorization, and semantic similarity. Additionally, Titan performs well in 30 few-shot and zero-shot benchmarks, showing its ability to generalize across various downstream tasks with a small quantity of labeled data.<\/p>\n\n\n\n<p><strong>7. PanGu-Alpha by Huawei<\/strong><\/p>\n\n\n\n<p>Huawei has developed a Chinese-language equivalent of OpenAI\u2019s GPT-3 called PanGu-Alpha. This model is based on 1.1 TB of Chinese-language sources, including books, news, social media, and web pages, and contains over 200 billion parameters, 25 million more than GPT-3. PanGu-Alpha is highly efficient at completing various language tasks like text summarization, question answering, and dialogue generation.<\/p>\n\n\n\n<p><strong>8. LLaMA by Meta AI<\/strong><\/p>\n\n\n\n<p>The Meta AI team introduces LLaMA (Large Language Model Meta AI), a collection of foundational language models with 7B to 65B parameters. LLaMA 33B and 65B were trained on 1.4 trillion tokens, while the smallest model, LLaMA 7B, was trained on one trillion tokens. They exclusively used publicly available datasets, without depending on proprietary or restricted data. The team also implemented key architectural enhancements and training speed optimization techniques. Consequently, LLaMA-13B outperformed GPT-3, being over 10 times smaller, and LLaMA-65B exhibited competitive performance with PaLM-540B.<\/p>\n\n\n\n<p><strong>9. OPT-IML by Meta AI<\/strong><\/p>\n\n\n\n<p>OPT-IML is a pre-trained language model based on Meta\u2019s OPT model and has 175 billion parameters. OPT-IML is fine-tuned for better performance on natural language tasks such as question answering, text summarization, and translation using about 2000 natural language tasks.<\/p>\n\n\n\n<p>This is all for now. Hope you enjoy that.<\/p>\n\n\n\n<p>By Asahi<\/p>\n<div class='wp_social_bookmarking_light'>\n            <div class=\"wsbl_google_plus_one\"><g:plusone size=\"medium\" annotation=\"none\" href=\"https:\/\/www.gigas-jp.com\/appnews\/archives\/13244\" ><\/g:plusone><\/div>\n            <div class=\"wsbl_hatena_button\"><a href=\"\/\/b.hatena.ne.jp\/entry\/https:\/\/www.gigas-jp.com\/appnews\/archives\/13244\" class=\"hatena-bookmark-button\" data-hatena-bookmark-title=\"Popular Large Language Models of 2023\" data-hatena-bookmark-layout=\"standard\" title=\"\u3053\u306e\u30a8\u30f3\u30c8\u30ea\u30fc\u3092\u306f\u3066\u306a\u30d6\u30c3\u30af\u30de\u30fc\u30af\u306b\u8ffd\u52a0\"> <img src=\"\/\/b.hatena.ne.jp\/images\/entry-button\/button-only@2x.png\" alt=\"\u3053\u306e\u30a8\u30f3\u30c8\u30ea\u30fc\u3092\u306f\u3066\u306a\u30d6\u30c3\u30af\u30de\u30fc\u30af\u306b\u8ffd\u52a0\" width=\"20\" height=\"20\" style=\"border: none;\" \/><\/a><script type=\"text\/javascript\" src=\"\/\/b.hatena.ne.jp\/js\/bookmark_button.js\" charset=\"utf-8\" async=\"async\"><\/script><\/div>\n            <div class=\"wsbl_twitter\"><a href=\"https:\/\/twitter.com\/share\" class=\"twitter-share-button\" data-url=\"https:\/\/www.gigas-jp.com\/appnews\/archives\/13244\" data-text=\"Popular Large Language Models of 2023\" data-via=\"GIGASJAPAN_APPS\" data-lang=\"ja\">Tweet<\/a><\/div>\n            <div class=\"wsbl_facebook_like\"><div id=\"fb-root\"><\/div><fb:like href=\"https:\/\/www.gigas-jp.com\/appnews\/archives\/13244\" layout=\"button_count\" action=\"like\" width=\"100\" share=\"false\" show_faces=\"false\" ><\/fb:like><\/div>\n            <div class=\"wsbl_facebook_send\"><div id=\"fb-root\"><\/div><fb:send href=\"https:\/\/www.gigas-jp.com\/appnews\/archives\/13244\" colorscheme=\"light\" ><\/fb:send><\/div>\n    <\/div>\n<br class='wp_social_bookmarking_light_clear' \/>\n","protected":false},"excerpt":{"rendered":"<p>Today, I would like to explore some of the popular Large Language Models (LLMs) that have gained prominence in [&hellip;]<\/p>\n","protected":false},"author":20,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[96,100],"tags":[],"acf":[],"_links":{"self":[{"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/posts\/13244"}],"collection":[{"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/users\/20"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/comments?post=13244"}],"version-history":[{"count":1,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/posts\/13244\/revisions"}],"predecessor-version":[{"id":13245,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/posts\/13244\/revisions\/13245"}],"wp:attachment":[{"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/media?parent=13244"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/categories?post=13244"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gigas-jp.com\/appnews\/wp-json\/wp\/v2\/tags?post=13244"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}