• Published on

    AI21 has released the Jamba 1.5 family of open models, including Mini and Large versions, under the Jamba Open Model License.

    These models feature a 256K context window, the longest among open models.

    They are built on a novel combination of State Space Model (Mamba), Transformer architecture, and Mixture of Experts, resulting in excellent speed, efficiency, and quality.

    Jamba 1.5 supports multiple languages, including English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew.

    Jamba natively supports structured JSON output and function calling.

    The models are available through various platforms, including AI21 Studio, Google Cloud Vertex AI, Hugging Face and OpenRouter. Jamba 1.5 Mini outperforms larger models like Mixtral 8x22B on benchmarks, while Jamba 1.5 Large surpasses Llama 3.1 70B and 405B.

    The models are designed for enterprise applications, offering structured JSON output and function calling capabilities.

  • Published on

    Nous Research has unveiled Hermes 3, a new series of open language models designed for enhanced personalisation and adaptability. The models are available in 8B, 70B, and 405B parameter versions.

    Built by fine-tuning Llama 3.1, Hermes 3 offers improved reasoning capabilities and creative expression compared to previous versions. The 405B variant achieves state-of-the-art results among open models on several evaluations.

    Key features include a 128K token context window, advanced long-term memory retention, complex roleplaying abilities, and enhanced agentic function-calling. The models are trained on approximately 390 million tokens across diverse domains including general instructions, expert knowledge, mathematics, coding and tool use.

    A notable characteristic is the models' high steerability through system prompts, particularly in the 405B version which can exhibit unique behaviours like "Amnesia Mode" when given blank system prompts.

    The models are available on Hugging Face under an open license. Nous Research emphasises Hermes 3's neutral alignment and flexibility compared to more restricted commercial models.

  • Published on

    xAI has released Grok-2 and Grok-2 mini, the next generation of their large language models.

    These models are available to Grok users on the đť•Ź platform and will soon be accessible through an enterprise API.

    Grok-2 outperforms Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard. Both models show significant improvements in reasoning, reading comprehension, math, science, and coding capabilities.

    Grok-2 excels in visual tasks, achieving state-of-the-art performance in visual math reasoning and document-based question answering.

    The models are integrated with real-time information from the đť•Ź platform. 

  • Published on

    Gemma 2 is a new generation of open language models available in 2B, 9B and 27B parameter sizes. The models are built on a redesigned architecture optimised for inference efficiency.

    The 27B variant can run on a single NVIDIA H100, A100 80GB GPU, or Google Cloud TPU host at full precision. The 9B model outperforms comparable models like Llama 3 8B in its size class.

    Gemma 2 is available under the commercially-friendly Gemma license.

    Technical specifications include compatibility with major AI frameworks including Hugging Face Transformers, JAX, PyTorch, and TensorFlow via Keras 3.0. The models support vLLM, Gemma.cpp, Llama.cpp and Ollama implementations, with NVIDIA TensorRT-LLM optimization.

    The models are available through Google AI Studio, Kaggle, and Hugging Face, and Vertex AI.

  • Published on

    Alibaba Cloud’s QwenLM has released Qwen2, the latest iteration of its language model series.

    This release includes five models of varying sizes, from Qwen2-0.5B to Qwen2-72B, all of which have base and chat versions.

    The models have been trained on data in 27 additional languages beyond English and Chinese, significantly expanding their multilingual capabilities. Qwen2 demonstrates state-of-the-art performance in a wide range of benchmark evaluations, with notable improvements in coding and mathematics.

    The models also support extended context lengths, up to 128K tokens for the Qwen2-7B-Instruct and Qwen2-72B-Instruct models.

    The Qwen2-57B-A14B version is a mixture of experts with 14B parameters active.

    The 72B versions are released under the original Qianwen License, while all other models have adopted the Apache 2.0 license.

  • Published on

    Google has released PaliGemma, an open vision-language model that combines the SigLIP vision model and Gemma language model.

    PaliGemma is designed for transfer learning to a wide range of vision-language tasks like image/video captioning, visual question answering, object detection/segmentation, and text reading.

    It is a 3B parameter model – SigLiP + Gemma 2B, supporting images up to 896 x 896 resolution. Capable of document understanding, image detection, visual question answering, captioning and more.

    The release includes models fine-tuned on various downstream tasks as well as code to use and fine-tune PaliGemma.

    However, PaliGemma is an experimental research model, so caution is advised when using it for applications.

  • Published on

    OpenAI has announced GPT-4o, a new multimodal AI model that can reason across text, audio, and visual inputs to generate text, audio, or image outputs in real-time.

    GPT-4o matches GPT-4’s performance on text tasks while providing major improvements in multilingual understanding, audio processing with latencies as low as 232ms, and vision capabilities – all within a single model.

    It achieves state-of-the-art results on multimodal benchmarks while being 50% cheaper than GPT-4 in the API.

    GPT-4o represents a step towards more natural human-AI interaction by seamlessly integrating multiple modalities. Initial demos showcase GPT-4o’s abilities like real-time translation, multimodal dialogue, audio generation like singing, and visual understanding.

  • Published on

    DeepSeek-V2 is a Mixture-of-Experts (MoE) language model featuring 236B total parameters with 21B active parameters per token.

    It was pretrained on 8.1 trillion tokens and supports a 128K context window. It specialises in math, code and reasoning.

    The model introduces two key architectural innovations: Multi-head Latent Attention (MLA) for key-value compression, and DeepSeekMoE for enhanced Feed-Forward Network performance.

    The model requires 8x80GB GPUs for BF16 inference and is available through Hugging Face Transformers and vLLM implementations.

    DeepSeek-V2 is released under a commercial-use friendly license.

  • Published on

    Snowflake AI Research has announced Arctic, a new open source large language model that achieves top-tier performance on enterprise tasks like coding, SQL generation, and instruction following at a very low training cost of under $2 million.

    Arctic combines a dense transformer model with a residual mixture-of-experts component to enable efficient training and inference.

    It sets a new baseline for cost-effective training of high-quality custom LLMs for enterprises.

    The model weights, code, data recipes, and research insights are being fully open sourced under an Apache 2.0 license.

    Arctic combines a 10B dense transformer model with a residual 128×3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating. 4K context window. 32K to come.

    Arctic is available now on Hugging Face, the NVIDIA API catalog, Replicate, and other model catalogs, with support for Snowflake’s Cortex platform.

  • Published on

    Microsoft has developed a new series of open language models called Phi-3.

    The Phi-3 models were trained using a dataset of textbook-style content and synthetically generated data, rather than raw web data typically used for large language models.

    The first model being released is Phi-3-mini, which has 3.8 billion parameters. Phi-3-mini is instruction-tuned and available in two context-length variants — 4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality.

    The Phi-3 models are positioned for tasks like writing summaries, content generation, and answering straightforward queries.

    Other Phi-3 models include Phi-3-small (7 billion parameters) and Phi-3-medium (14 billion parameters).

    The family also includes a multimodal model, Phi-3 Vision Instruct. With a 128K context length, 4.2B parameters and contains image encoder, connector, projector, and Phi-3 Mini language model. Training data: 500B vision and text tokens.

  • Published on

    Meta Llama 3 is as open-source large language model available in two sizes, 8B and 70B parameters. It has been fine-tuned for instruction following, making it more steerable and capable of performing complex tasks.

    The model has been trained on over 15 trillion tokens of publicly available data and has demonstrated improved performance on a wide range of industry benchmarks.The model’s architecture has been designed with simplicity and efficiency in mind, using a standard decoder-only transformer architecture with a tokenizer that encodes language more efficiently. The model has also been optimized for inference efficiency, making it suitable for deployment on a wide range of devices.

    To ensure responsible development and deployment of the model, Meta has adopted a system-level approach to responsibility, which includes red-teaming (testing) for safety, developing new trust and safety tools such as Llama Guard 2 and Code Shield.

    The company plans to release additional models with new capabilities, including multimodality, longer context windows, and stronger overall capabilities, in the coming months.Overall, Meta Llama 3 represents a significant advancement in language model technology and has the potential to enable a wide range of applications and use cases across industries.

  • Published on

    Mistral has introduced Mixtral 8x22B, a sparse Mixture-of-Experts (SMoE) model that offers improved performance and efficiency.

    The model uses 39 billion active parameters out of 141 billion, making it a cost-efficient option.

    It is fluent in five languages: English, French, Italian, German, and Spanish. Additionally, it possesses strong mathematics and coding abilities, native function calling ability, and a 64K tokens context window.

    Mixtral 8x22B is being released under the Apache 2.0 license.

  • Published on

    Reka has introduced a multimodal language model called Reka Core that aims to compete with leading offerings from OpenAI, Anthropic and Google.

    The Core model includes understanding images, videos and audio in addition to text.

    It has a 128,000-token context window for improved information recall and enhanced reasoning skills for language and math tasks. Core also offers top-tier code generation for automated workflows and is fluent in English as well as several Asian and European languages.

    Deployment is via API, on-premises servers or on-device.

    Reka Core joins the company’s existing Edge and Flash models, forming a suite of AI tools for various industries.

  • Published on

    Stability AI has released Stable LM 2 12B, a pair of 12 billion parameter multilingual language models trained on 7 languages (English, Spanish, German, Italian, French, Portuguese, and Dutch).

    The release includes both a base model and an instruction-tuned variant.

    The models balance strong performance with efficiency, memory requirements, and speed.

    Stable LM 2 12B is available on Hugging Face for non-commercial and commercial use with a Stability AI membership.

    The release also includes an updated version of the smaller Stable LM 2 1.6B with improved conversational abilities across the 7 languages and added tool usage/function calling capabilities.

  • Published on

    Command R+ is a RAG-optimised model designed to tackle enterprise-grade workloads.

    It is a purpose-built LLM for real-world enterprise use cases offering advanced Retrieval Augmented Generation (RAG) with citation to reduce hallucinations, multilingual coverage in 10 key languages, and tool use to automate sophisticated business processes.

    It outperforms similar models and is competitive with significantly more expensive models on key business-critical capabilities.

    Command R+ is available on Azure, Oracle Cloud Infrastructure, and Cohere’s hosted API, with plans to expand to additional cloud platforms.

    Cohere remains committed to data privacy and security, offering private LLM deployments and the option to opt out of data sharing.

  • Published on

    Grok-1.5, the latest model from xAI, boasts enhanced reasoning capabilities and a context length of up to 128,000 tokens.

    The model achieved impressive scores on benchmarks like MATH (50.6%), GSM8K (90%), and HumanEval (74.1%), showcasing advancements in math, coding, and problem-solving skills.

    Grok-1.5 also demonstrates powerful retrieval abilities, handling longer and more complex prompts thanks to its expanded context window.

    Built on a custom distributed training framework, Grok-1.5 represents the team’s progress in large language model research.

  • Published on

    Databricks has unveiled DBRX, a new open-source LLM.

    DBRX outperforms established open models like GPT-3.5 and Grok-1 on a range of benchmarks, including language understanding, programming, and math tasks.

    The model uses a fine-grained mixture-of-experts architecture, which Databricks claims provides efficiency improvements in both training and inference.

    DBRX Instruct even surpasses specialised coding models like CodeLLaMA-70B on programming tasks.

  • Published on

    Apple’s MM1 is a multimodal large language model (MLLM) that can interpret both images and text data, developed by a team of computer scientists and engineers at Apple.

    The model is part of a family of multimodal models and is designed to improve capabilities in image captioning, visual question answering, and query learning by integrating text and image data. The largest model in MM1 is 30B and beats many 80B open-source LLMs in visual tasks. The family of multimodal models consists of both dense models and mixture-of-experts (MoE) variants.

    The MM1 model can count objects, identify objects that are part of an image, and use common sense about everyday objects to offer users useful information about what the image presents.

    It also has the ability to perform in-context learning, which means it does not need to start over every time a question is asked; it uses what it has learned in the current conversation.

  • Published on

    Cohere has launched Command-R, a new LLM designed for large-scale production workloads.

    The “scalable” model balances efficiency with accuracy, enabling companies to move from pilot projects to full-scale production.

    Command-R is optimised for tasks such as retrieval-augmented generation (RAG) and using external APIs and tools, and is designed to work with Cohere’s existing Embed and Rerank models to offer the best integration for RAG applications.

    It boasts strong accuracy for RAG and tool use, low latency and high throughput, a 128k context, and lower pricing.

    It is available immediately on Cohere’s API and will be available on major cloud providers soon.

  • Published on

    Inflection has released its latest version, Inflection-2.5, which it claims is as good as, if not better than, the world’s leading LLMs, such as GPT-4 and Gemini, but using only 40% of the usual computational power for training.

    The company, which aims to create a personal AI for every user, has already seen a significant uptick in user sentiment, engagement and retention since rolling out the upgrade to its one million daily and six million monthly active users.

    An average conversation with Pi, Inflection’s AI chatbot, lasts 33 minutes, with return usage from 60% of customers each week.

  • Published on

    Anthropic has released the Claude 3 model family, featuring three new models – Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus – that are more powerful and responsive than previous versions.

    The new models are available through the company’s API or platform, Claude.ai.

    Each model offers different balances of speed, cost and intelligence to suit different applications, with the company claiming that its new offering outperforms competitors on a range of cognitive tasks, including knowledge, reasoning, mathematics and forecasting.

    The models can power live customer chats, auto-completions, and data extraction tasks where responses must be immediate and in real-time.

    The company has also improved the models’ ability to understand longer context and visual materials such as images, charts and technical diagrams, as well as reducing refusal rates and inaccuracies.

    The Claude 3 family of models will initially offer a 200K context window.

  • Published on

    Mistral AI has released Mistral Large, a new advanced language model available through la Plateforme and Azure.

    The model offers top-tier reasoning capabilities, native fluency in five languages, a 32K tokens context window, and function calling capabilities.

    Mistral Large is the world’s second-ranked model generally available through an API.

    The company has also released Mistral Small, an optimised model for low latency workloads.

    Both models are available with open-weight and optimised model endpoints.

    JSON format mode and function calling have been introduced for easier interaction with the models.

  • Published on

    Google has released Gemma, a lightweight family of open models for AI development, built using the same technology as the company’s Gemini models.

    Gemma is available in two sizes, Gemma 2B and Gemma 7B, both of which have been pre-trained and instruction-tuned for specific tasks.

    Google has also released a responsible generative AI toolkit to provide guidance on building safe applications with Gemma. It includes a model debugging tool, best practices for developers and a methodology for building robust safety classifiers with minimal examples.

  • Published on

    Google has released its next-generation AI model, Gemini 1.5, which offers enhanced performance and a breakthrough in long-context understanding.

    The model can process up to 1 million tokens, the longest context window of any large-scale foundation model to date.

    The first 1.5 model, 1.5 Pro, achieves comparable quality to 1.0 Ultra while using less compute.

    A limited preview of the model with the long context window is available to select developers and enterprise customers in AI Studio and Vertex AI.

    The model offers a range of potential new capabilities, including seamless analysis of large amounts of content, sophisticated understanding and reasoning across different modalities and relevant problem-solving over longer blocks of code.

  • Published on

    Qwen has released the next iteration of its open-source large language model series, Qwen1.5. The update includes base and chat models in six sizes from 0.5 billion to 72 billion parameters.

    Key improvements include enhanced alignment with human preferences, stronger multilingual capabilities, long context support up to 32,768 tokens, and better performance on retrieval-augmented generation and tool use tasks compared to previous models. However, coding abilities still trail GPT-4.

    Qwen1.5 is available on platforms like Hugging Face, Ollama and DashScope’s API services.

  • Published on

    The Allen Institute for AI (AI2) has released its first set of Open Language Models (OLMo), which includes seven billion parameter models and one billion parameter variants.

    The institute aims to promote openness in the development and use of AI, allowing researchers to access and build upon the OLMo models, datasets, and tools.

    The models are available for download and fine-tuning, and the dataset and tools used to create the models are also openly available.

    Additionally, a license for the models and dataset allows researchers to contribute to the development of the OLMo family.

    The release of OLMo is seen as a step towards making AI more transparent and open, allowing for a broader discussion on the potential risks and benefits of language models.

  • Published on

    Adept has released its new multimodal model, Fuyu-Heavy, which has been designed specifically for digital agents.

    The model is the third-most capable of its kind in the world, and is only outranked by GPT4-V and Gemini Ultra, both of which are 10 to 20 times larger.

    Fuyu-Heavy excels at multimodal reasoning and UI understanding, and scores higher on the MMMU benchmark than even Gemini Pro.

    The model matches or exceeds the performance of those in its compute class on text-based benchmarks, despite having to devote part of its capacity to image modelling.

  • Published on

    OpenChat-3.5-1210 is the latest LLM model from OpenChat, built to excel at coding and enhance performance over previous versions.

    It has achieved a near 15-point increase on HumanEval, making it one of the best generalist models currently.

    OpenChat-3.5-1210 is specialised for coding and coding-related tasks, such as code generation, understanding, and debugging. It surpasses ChatGPT and Grok models in terms of performance.

    This upgrade demonstrates OpenChat’s ongoing commitment to developing cutting-edge large language models for specific applications, specialising in domains that are vital for business and everyday life, such as coding.

  • Published on

    Microsoft has released Phi-2, a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities.

    Phi-2 matches or outperforms models up to 25x larger thanks to innovations in model scaling and training data curation.

    On complex benchmarks, Phi-2 achieves state-of-the-art performance among base models with less than 13 billion parameters.

    On average, Phi-2 outperforms Google’s 7 billion parameter model, Mistral, and the 13 billion parameter model, Llama-2, whilst only being a quarter of the size.

    Furthermore, Phi-2 outperforms Google’s newly announced Gemini Nano 2 on several benchmarks.

    Licensed under the MIT open source license.

  • Published on

    Mixtral 8x7B has been launched by Mistral AI, a company that focuses on creating and providing access to open AI models to benefit the developer community.

    This is a sparse model that outperforms competitors on benchmarks, offering 6x faster inference and superior cost-performance results compared to alternatives, including GPT 3.5.

    It can handle contexts of up to 32k tokens and has been trained on web data to recognise and process English, French, Italian, German and Spanish to generate human-like responses and code.

    Two versions are available: Mixtral 8x7B, which offers a range of capabilities, and Mixtral 8x7B Instruct, which has been fine-tuned to follow instructions and achieve a score of 8.3 on the MT-Bench benchmark.

    Licensed under Apache 2.0.

  • Published on

    Stability AI has released StableLM Zephyr 3B, the latest in its series of lightweight language models designed for use on edge devices.

    Featuring 3 billion parameters, StableLM Zephyr 3B has been trained on multiple instruction datasets and fine-tuned using the Direct Preference Optimisation algorithm to perform well in text generation and align with human preferences.

    Stability AI says the model is particularly good at instruction following and answering questions, but is also capable of more creative tasks like copywriting and content personalisation.

    The model is released under a non-commercial license that permits non-commercial use.

  • Published on

    Google has released its largest and most capable AI model to date, called Gemini.

    The model has been trained at scale using the company’s AI-optimised infrastructure and its latest Tensor Processing Units (TPUs).

    It has been designed to be the most reliable and scalable model for training, as well as the most efficient to serve.

    The first version of Gemini, Gemini 1.0, has been optimised for three different sizes: Gemini Ultra, which is the company’s largest and most capable model; Gemini Pro, which is its best model for scaling across a wide range of tasks; and Gemini Nano, its most efficient model for on-device tasks.

    The Ultra model has been found to outperform human experts on a range of tasks, including complex maths problems.

  • Published on

    Perplexity AI is introducing two new large language models, pplx-7b-online and pplx-70b-online, which are the first of their kind online LLM API.

    These models are capable of using knowledge from the internet to provide up-to-date information in their responses. To achieve this, Perplexity has developed in-house search technology that prioritises non-SEOed sites.

    The models are finetuned to effectively use snippets to inform their responses and are accessible via the pplx-api and Perplexity Labs.

    Evaluations conducted using curated evaluation datasets have demonstrated that Perplexity’s models can match or surpass the performance of other large language models in providing accurate and up-to-date answers.

    The pplx-api is now available to the general public, following its release from beta. A new usage-based pricing structure has also been introduced.

  • Published on

    Language model Claude 2.1 has been released with a range of new features to boost its capabilities for enterprises, including support for a window of up to 200,000 tokens, allowing around 150,000 words of text to be analysed, a 2x decrease in hallucination rates and a beta tool that enables it to integrate with other user interfaces.

    The new model has been designed to provide more accurate and reliable responses than previous versions, including a 30% reduction in incorrect answers and up to a fourfold decrease in the rate of mistakes in concluding whether a document supports a given statement.

    The model is now available in API form for console users.

  • Published on

    xAI has launched an AI-assistant tool called Grok, which is modelled on the fictional Hitchhiker’s Guide to the Galaxy.

    The technology uses real-time knowledge and is designed to answer questions with a bit of wit, as well as being able to answer “spicy questions” that are rejected by most other AI systems, according to the company.

    Grok-1 displayed strong results, surpassing all other models in its compute class, including ChatGPT-3.5 and Inflection-1.

  • Published on

    OpenChat is an open-source language model based on a 7 billion parameter version of the transformer architecture.

    It has been trained using a method called C-RLFT, which uses reinforcement learning to fine-tune the model on a mixed-quality dataset of instruction-oriented data.

    The developers claim that OpenChat outperforms ChatGPT, even though it has fewer parameters, demonstrating the effectiveness of the training approach.

    The model can be accessed through an OpenAI-compatible API and is designed to handle high-throughput traffic for deployment on consumer GPUs.

    The developers have made the model and training code publicly available, and it is licensed under the Apache 2.0 license, meaning that commercial use is permitted.

  • Published on

    Adept AI has released Fuyu-8B, a smaller version of its multimodal model that powers the company’s product.

    According to the company, the model is exciting because it is designed from the ground up for digital agents and is easy to understand, scale and deploy, supporting arbitrary image resolutions and doing fine-grained localisation on screen images.

    In addition, the model performs well on standard image understanding benchmarks.

    The company warns that faces and people are generally not generated properly and that the model should not be used to generate factual representations of people or events.

  • Published on

    OpenHermes 2 Mistral 7B is a language model fine tuned on Mistral 7B, trained to navigate complex conversations with finesse.

    It uses the ChatML format, which allows for multi-turn conversations with structured system prompts, enabling OpenAI endpoint compatibility.

    OpenHermes 2 was trained on 900,000 entries of primarily GPT-4-generated data and outperformed previous models on benchmark tests, including GPT4All, AGI Eval, BigBench, and Averages Compared.

    The model is available on the AI2 platform and users can access it through the LM Studio interface for interactive use.

  • Published on

    Mistral AI has released Mistral 7B, a 7.3 billion parameter language model, which it claims outperforms Llama 2 13B on all benchmarks and Llama 1 34B on many.

    The model uses Grouped-query attention for faster inference and Sliding Window Attention to handle longer sequences.

    It is being released under the Apache 2.0 licence, with the company’s reference implementation and deployment options on various clouds.

    Mistral 7B has been fine-tuned for chat, achieving better performance than Llama 2 13B on MT-Bench, a metric for evaluating multilingual instruction models.

    The company said it was looking forward to working with the community on developing moderation for the model to allow it to be used in environments that require guardrails for outputs.

  • Published on

    Meta has introduced Llama 2, the next generation of its open-source large language model, in partnership with Microsoft.

    Llama 2 is free for research and commercial use, available through various cloud providers including Microsoft Azure and Amazon Web Services.

    The model comes in multiple versions, including pretrained and fine-tuned conversational variants.

    It supports multiple languages and focuses on responsible AI development, offering resources like a Responsible Use Guide and Acceptable Use Policy.

    Meta emphasises transparency and safety, having conducted red-teaming exercises and created a transparency schematic. Pricing details are not provided in the article.

  • Published on

    Claude 2, the upgraded version of the conversational AI model, is now available to the public in the US and UK.

    With improvements in performance, reasoning and coding abilities, the AI model can now assist users in tasks like writing documents and solving mathematical problems.

    The model can also power chat experiences and has been made available to businesses through the Claude API, which is being offered at the same price as its predecessor.

    Additionally, the model can handle inputs of up to 100,000 tokens, allowing it to work through hundreds of pages of text.

  • Published on

    Inflection AI has launched Pi, a digital assistant designed to be kind, supportive and curious.

    Pi is intended to be a confidante, creative partner and sounding board, as well as a source of knowledge based on each user’s interests.

    Built using the company’s own technology, Pi is intended to be a companion that puts human relationships first.

    While it is available on several platforms including Instagram, Facebook Messenger and WhatsApp, Pi can also continue the conversation wherever the user goes. It is free to use and can be downloaded from the Apple Store.

  • Published on

    LLaVA (Large Language-and-Vision Assistant) is an end-to-end trained large multimodal model that combines a vision encoder and the Vicuna large language model to enable general-purpose visual and language understanding.

    With 158,000 unique language-image instruction-following samples, LLaVA achieves impressive chat capabilities that sometimes mimic the behaviours of multimodal GPT-4 on unseen images and instructions.

    LLaVA utilises a two-stage instruction tuning procedure to align features and fine-tune the model end-to-end for visual chat and science question answering applications. Early experiments show LLaVA yields an 85.1% relative score compared to GPT-4 on a synthetic multimodal instruction-following dataset.

    The developers have open-sourced the GPT-4 generated visual instruction tuning data, model and code base to support further research. LLaVA demonstrates the potential for large multimodal models to enable powerful visual-language understanding and reasoning capabilities.

  • Published on

    Nomic AI has developed GPT4All, an open-source ecosystem of large language models that can run on most GPUs and CPUs, including consumer-grade hardware.

    The ecosystem supports various bindings such as Python, TypeScript, Go, C# and Java, and an API for inferencing LLMs from Docker containers.

    Users can access the models via a desktop chat client that supports a range of open-source models, with offline build functions available for older versions.

    The ecosystem is supported by compute partner Paperspace. Nomic AI encourages external contributions through its Discord channel and issue tracker.

  • Published on

    Anthropic has announced the launch of two new products built on its research into training helpful, harmless and honest AI systems: Claude and Claude Instant.

    Based on natural language processing, the products are aimed at improving productivity in the workplace and education settings by summarising information, generating creative writing, aiding coding and search functions, and offering Q&A capabilities.

    Early adaptors include tutoring service Juni Learning, which uses Anthropic to power its Juni Tutor Bot; productivity tool Notion; legal contract specialists Robin AI; and search engine DuckDuckGo.

    The company is also working with audio AI company AssemblyAI.