• Published on

    StackBlitz unveiled bolt.new, a chatbot interface that can help create full-stack applications directly in your web browser.

    Unlike other AI coding assistants, bolt.new integrates with package managers and development environments, allowing users to install and run npm packages, Vite, Next.js and other popular tools without leaving the browser.

    The platform uses WebContainers, StackBlitz's WebAssembly-based technology that runs Node.js in the browser, enabling developers to prompt, edit and debug full-stack applications in real time.

    Key features include direct integration with deployment services like Netlify and Cloudflare, database connectivity through Supabase, and the ability to share projects via URL. The system can generate production-ready applications with both frontend and backend components from natural language prompts.

    Its core components have been released as open-source software on GitHub.

  • Published on

    Pika Labs has released Pika 1.5 that introduces "Pikaffects" - a suite of six transformation effects including Inflate, Explode, Crush, Melt, Squish and "Cake-ify".

    Key technical improvements include enhanced physics simulations, longer video clip generation capabilities, and improved realism in character animations and movements.

    The platform now offers advanced cinematic camera controls like Bullet Time, Crash Zoom, Whip Pan and Crane shots.

  • Published on

    Yi-Coder, a new open-source code LLM series, has been released by 01.ai.

    Available in 1.5B and 9B parameter versions, Yi-Coder offers base and chat models for efficient inference and flexible training.

    The 9B version, built on Yi-9B, incorporates an additional 2.4T high-quality tokens from GitHub and filtered CommonCrawl data. Key features include continued pretraining on 52 major programming languages, a 128K token context window, and impressive performance metrics.

    Yi-Coder-9B-Chat achieved a 23.4% pass rate on LiveCodeBench, outperforming larger models.

    It also excelled in code editing, completion, and mathematical reasoning tasks, demonstrating capabilities comparable to or surpassing models with significantly more parameters.

  • Published on

    AI21 has released the Jamba 1.5 family of open models, including Mini and Large versions, under the Jamba Open Model License.

    These models feature a 256K context window, the longest among open models.

    They are built on a novel combination of State Space Model (Mamba), Transformer architecture, and Mixture of Experts, resulting in excellent speed, efficiency, and quality.

    Jamba 1.5 supports multiple languages, including English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew.

    Jamba natively supports structured JSON output and function calling.

    The models are available through various platforms, including AI21 Studio, Google Cloud Vertex AI, Hugging Face and OpenRouter. Jamba 1.5 Mini outperforms larger models like Mixtral 8x22B on benchmarks, while Jamba 1.5 Large surpasses Llama 3.1 70B and 405B.

    The models are designed for enterprise applications, offering structured JSON output and function calling capabilities.

  • Published on

    Ideogram has launched Ideogram 2.0, now freely available on ideogram.ai and their new iOS app, with premium features accessible via subscription plans.

    The beta Ideogram API is also released for developers. Ideogram 2.0, trained from scratch, excels in generating realistic images, graphic design, typography, and more, outperforming other text-to-image models in image-text alignment, subjective preference, and text rendering accuracy.

    The launch includes the Ideogram iOS app, Ideogram Search, and the Ideogram API.

    Users can choose from styles like Realistic, Design, 3D, and Anime, and control colour palettes.

  • Published on

    Nous Research has unveiled Hermes 3, a new series of open language models designed for enhanced personalisation and adaptability. The models are available in 8B, 70B, and 405B parameter versions.

    Built by fine-tuning Llama 3.1, Hermes 3 offers improved reasoning capabilities and creative expression compared to previous versions. The 405B variant achieves state-of-the-art results among open models on several evaluations.

    Key features include a 128K token context window, advanced long-term memory retention, complex roleplaying abilities, and enhanced agentic function-calling. The models are trained on approximately 390 million tokens across diverse domains including general instructions, expert knowledge, mathematics, coding and tool use.

    A notable characteristic is the models' high steerability through system prompts, particularly in the 405B version which can exhibit unique behaviours like "Amnesia Mode" when given blank system prompts.

    The models are available on Hugging Face under an open license. Nous Research emphasises Hermes 3's neutral alignment and flexibility compared to more restricted commercial models.

  • Published on

    xAI has released Grok-2 and Grok-2 mini, the next generation of their large language models.

    These models are available to Grok users on the đť•Ź platform and will soon be accessible through an enterprise API.

    Grok-2 outperforms Claude 3.5 Sonnet and GPT-4-Turbo on the LMSYS leaderboard. Both models show significant improvements in reasoning, reading comprehension, math, science, and coding capabilities.

    Grok-2 excels in visual tasks, achieving state-of-the-art performance in visual math reasoning and document-based question answering.

    The models are integrated with real-time information from the đť•Ź platform. 

  • Published on

    Gemma 2 is a new generation of open language models available in 2B, 9B and 27B parameter sizes. The models are built on a redesigned architecture optimised for inference efficiency.

    The 27B variant can run on a single NVIDIA H100, A100 80GB GPU, or Google Cloud TPU host at full precision. The 9B model outperforms comparable models like Llama 3 8B in its size class.

    Gemma 2 is available under the commercially-friendly Gemma license.

    Technical specifications include compatibility with major AI frameworks including Hugging Face Transformers, JAX, PyTorch, and TensorFlow via Keras 3.0. The models support vLLM, Gemma.cpp, Llama.cpp and Ollama implementations, with NVIDIA TensorRT-LLM optimization.

    The models are available through Google AI Studio, Kaggle, and Hugging Face, and Vertex AI.

  • Published on

    Stability AI has announced the release of Stable Diffusion 3 Medium, a new version of its popular AI model for generating images. SD3 Medium is a 2 billion parameter model.

    This release includes significant performance enhancements achieved through collaborations with NVIDIA and AMD. NVIDIA’s TensorRT optimisation for RTX GPUs boosts performance by 50%, while AMD has optimised inference for various devices.

    Stability AI has introduced new licensing options, including the Creator License for commercial use and an Enterprise License for large-scale commercial applications.

    The model is available for download or available via API.

  • Published on

    Alibaba Cloud’s QwenLM has released Qwen2, the latest iteration of its language model series.

    This release includes five models of varying sizes, from Qwen2-0.5B to Qwen2-72B, all of which have base and chat versions.

    The models have been trained on data in 27 additional languages beyond English and Chinese, significantly expanding their multilingual capabilities. Qwen2 demonstrates state-of-the-art performance in a wide range of benchmark evaluations, with notable improvements in coding and mathematics.

    The models also support extended context lengths, up to 128K tokens for the Qwen2-7B-Instruct and Qwen2-72B-Instruct models.

    The Qwen2-57B-A14B version is a mixture of experts with 14B parameters active.

    The 72B versions are released under the original Qianwen License, while all other models have adopted the Apache 2.0 license.

  • Published on

    Kling, developed by the Kuaishou AI Team, is an AI video generation model capable of creating high-quality videos up to two minutes long at 30fps.

    Kling utilises a 3D spatio-temporal joint attention mechanism to model complex motion while maintaining physical accuracy. The model can generate 1080p resolution videos and variable aspect ratio output. The ability to transform static images into 5-second animated sequences.

    The system allows users to control video generation through text prompts and can automatically extend existing videos by an additional 4.5 seconds. Furthermore, Kling supports consecutive video extensions, enabling the creation of videos up to 3 minutes in length.

    The model is available through Kuaishou's platform and Fal.

  • Published on

    Stability AI has introduced Stable Audio Open, an open-source text-to-audio model that generates up to 47 seconds of audio samples, sound effects, and production elements.

    The model enables users to create drum beats, instrument riffs, ambient sounds, and foley recordings using text prompts. It also allows for audio variations and style transfer of audio samples.

    Stable Audio Open is a more specialised model compared to Stability AI’s commercial product, which can produce full tracks up to three minutes long.

    This open-source model is trained on audio data from Freesound and the Free Music Archive, respecting creator rights.

    The model weights are available on Hugging Face, and users can download and explore its capabilities.

  • Published on

    Mistral AI has released Codestral, a 22B parameter code model designed for code generation.

    It supports over 80 programming languages, including Python, Java, C, C++, JavaScript, and Bash.

    Codestral features a 32k token context window, enabling it to outperform competitors on the RepoBench code completion evaluation.

    It achieves notable results on benchmarks like HumanEval, MBPP, CruxEval, and Spider, demonstrating proficiency in Python and SQL. Codestral also excels in completing partial code segments across various languages.

    It is available under the Mistral AI Non-Production License for research and testing, with commercial licenses available upon request.

  • Published on

    Aider, an AI-powered code assistant, achieved a state-of-the-art result of 26.3% on the SWE Bench Lite benchmark, surpassing the previous top leaderboard entry of 20.3% from Amazon Q Developer Agent.

    Aider’s success is attributed to its focus on static code analysis, reliable LLM code editing, and pragmatic UX for AI pair programming.

    The AI does not use RAG, vector search, tools, or give the LLM access to search the web or unilaterally execute code.

    It emphasises being an interactive tool for engineers to get real work done in real code bases using a chat interface.

    The benchmark methodology involved running aider in each problem’s git repository, with the problem statement submitted as the opening chat message. The AI scored 25.0% using GPT-4o alone, which was also matching the state-of-the-art before being surpassed by the 26.3% result using both GPT-4o and Opus.

  • Published on

    Google has released PaliGemma, an open vision-language model that combines the SigLIP vision model and Gemma language model.

    PaliGemma is designed for transfer learning to a wide range of vision-language tasks like image/video captioning, visual question answering, object detection/segmentation, and text reading.

    It is a 3B parameter model – SigLiP + Gemma 2B, supporting images up to 896 x 896 resolution. Capable of document understanding, image detection, visual question answering, captioning and more.

    The release includes models fine-tuned on various downstream tasks as well as code to use and fine-tune PaliGemma.

    However, PaliGemma is an experimental research model, so caution is advised when using it for applications.

  • Published on

    OpenAI has announced GPT-4o, a new multimodal AI model that can reason across text, audio, and visual inputs to generate text, audio, or image outputs in real-time.

    GPT-4o matches GPT-4’s performance on text tasks while providing major improvements in multilingual understanding, audio processing with latencies as low as 232ms, and vision capabilities – all within a single model.

    It achieves state-of-the-art results on multimodal benchmarks while being 50% cheaper than GPT-4 in the API.

    GPT-4o represents a step towards more natural human-AI interaction by seamlessly integrating multiple modalities. Initial demos showcase GPT-4o’s abilities like real-time translation, multimodal dialogue, audio generation like singing, and visual understanding.

  • Published on

    IBM Research has released the Granite family of open source decoder-only code models ranging from 3 to 34 billion parameters.

    The models are available in base and instruction-tuned variants optimised for enterprise software development workflows like code generation, fixing, explanation, and modernisation tasks.

    Benchmarks show the Granite models match state-of-the-art performance among open source code LLMs across multiple coding tasks and programming languages.

    The models leverage training data from sources like GitHub, CodeNet, and synthetic code-instruction pairs.

    IBM is open sourcing the Granite code models under Apache 2.0 to enable open innovation and provide high-performing, cost-efficient foundation models for enterprises to build generative AI tools for developers.

  • Published on

    DeepSeek-V2 is a Mixture-of-Experts (MoE) language model featuring 236B total parameters with 21B active parameters per token.

    It was pretrained on 8.1 trillion tokens and supports a 128K context window. It specialises in math, code and reasoning.

    The model introduces two key architectural innovations: Multi-head Latent Attention (MLA) for key-value compression, and DeepSeekMoE for enhanced Feed-Forward Network performance.

    The model requires 8x80GB GPUs for BF16 inference and is available through Hugging Face Transformers and vLLM implementations.

    DeepSeek-V2 is released under a commercial-use friendly license.

  • Published on

    Snowflake AI Research has announced Arctic, a new open source large language model that achieves top-tier performance on enterprise tasks like coding, SQL generation, and instruction following at a very low training cost of under $2 million.

    Arctic combines a dense transformer model with a residual mixture-of-experts component to enable efficient training and inference.

    It sets a new baseline for cost-effective training of high-quality custom LLMs for enterprises.

    The model weights, code, data recipes, and research insights are being fully open sourced under an Apache 2.0 license.

    Arctic combines a 10B dense transformer model with a residual 128×3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating. 4K context window. 32K to come.

    Arctic is available now on Hugging Face, the NVIDIA API catalog, Replicate, and other model catalogs, with support for Snowflake’s Cortex platform.

  • Published on

    Microsoft has developed a new series of open language models called Phi-3.

    The Phi-3 models were trained using a dataset of textbook-style content and synthetically generated data, rather than raw web data typically used for large language models.

    The first model being released is Phi-3-mini, which has 3.8 billion parameters. Phi-3-mini is instruction-tuned and available in two context-length variants — 4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality.

    The Phi-3 models are positioned for tasks like writing summaries, content generation, and answering straightforward queries.

    Other Phi-3 models include Phi-3-small (7 billion parameters) and Phi-3-medium (14 billion parameters).

    The family also includes a multimodal model, Phi-3 Vision Instruct. With a 128K context length, 4.2B parameters and contains image encoder, connector, projector, and Phi-3 Mini language model. Training data: 500B vision and text tokens.

  • Published on

    Meta Llama 3 is as open-source large language model available in two sizes, 8B and 70B parameters. It has been fine-tuned for instruction following, making it more steerable and capable of performing complex tasks.

    The model has been trained on over 15 trillion tokens of publicly available data and has demonstrated improved performance on a wide range of industry benchmarks.The model’s architecture has been designed with simplicity and efficiency in mind, using a standard decoder-only transformer architecture with a tokenizer that encodes language more efficiently. The model has also been optimized for inference efficiency, making it suitable for deployment on a wide range of devices.

    To ensure responsible development and deployment of the model, Meta has adopted a system-level approach to responsibility, which includes red-teaming (testing) for safety, developing new trust and safety tools such as Llama Guard 2 and Code Shield.

    The company plans to release additional models with new capabilities, including multimodality, longer context windows, and stronger overall capabilities, in the coming months.Overall, Meta Llama 3 represents a significant advancement in language model technology and has the potential to enable a wide range of applications and use cases across industries.

  • Published on

    Mistral has introduced Mixtral 8x22B, a sparse Mixture-of-Experts (SMoE) model that offers improved performance and efficiency.

    The model uses 39 billion active parameters out of 141 billion, making it a cost-efficient option.

    It is fluent in five languages: English, French, Italian, German, and Spanish. Additionally, it possesses strong mathematics and coding abilities, native function calling ability, and a 64K tokens context window.

    Mixtral 8x22B is being released under the Apache 2.0 license.

  • Published on

    Reka has introduced a multimodal language model called Reka Core that aims to compete with leading offerings from OpenAI, Anthropic and Google.

    The Core model includes understanding images, videos and audio in addition to text.

    It has a 128,000-token context window for improved information recall and enhanced reasoning skills for language and math tasks. Core also offers top-tier code generation for automated workflows and is fluent in English as well as several Asian and European languages.

    Deployment is via API, on-premises servers or on-device.

    Reka Core joins the company’s existing Edge and Flash models, forming a suite of AI tools for various industries.

  • Published on

    Udio is an AI app that transforms text into music across genres from pop to metal.

    Users enter prompts describing desired styles, lyrics, and elements to generate professional-quality vocal and instrumental tracks.

    Backed by musicians will.i.am and Common, and leading AI researchers and engineers formerly at Google DeepMind.

    The free beta allows 1200 song generations per month. While imperfect, Udio iterates quickly to improve quality, language support, and controllability. The team believes AI can expand musical boundaries for everyone.

  • Published on

    Stability AI has released Stable LM 2 12B, a pair of 12 billion parameter multilingual language models trained on 7 languages (English, Spanish, German, Italian, French, Portuguese, and Dutch).

    The release includes both a base model and an instruction-tuned variant.

    The models balance strong performance with efficiency, memory requirements, and speed.

    Stable LM 2 12B is available on Hugging Face for non-commercial and commercial use with a Stability AI membership.

    The release also includes an updated version of the smaller Stable LM 2 1.6B with improved conversational abilities across the 7 languages and added tool usage/function calling capabilities.

  • Published on

    Command R+ is a RAG-optimised model designed to tackle enterprise-grade workloads.

    It is a purpose-built LLM for real-world enterprise use cases offering advanced Retrieval Augmented Generation (RAG) with citation to reduce hallucinations, multilingual coverage in 10 key languages, and tool use to automate sophisticated business processes.

    It outperforms similar models and is competitive with significantly more expensive models on key business-critical capabilities.

    Command R+ is available on Azure, Oracle Cloud Infrastructure, and Cohere’s hosted API, with plans to expand to additional cloud platforms.

    Cohere remains committed to data privacy and security, offering private LLM deployments and the option to opt out of data sharing.

  • Published on

    Universal-1 achieves industry-leading performance in multilingual speech-to-text, with 10% or greater accuracy improvement over the next-best system in English, Spanish, and German.

    It reduces hallucination rate by 30% on speech data and 90% on ambient noise, compared to a widely used open-source model.

    Universal-1 also exhibits the ability to code-switch, transcribing multiple languages within a single audio file.

    Additionally, it improves word-level timestamp accuracy by 25.5% relative to a popular open-source model and enables 5x faster parallel inference.

    These advancements are the result of leveraging state-of-the-art ASR research and a robust system design.

  • Published on

    Stable Audio has unveiled its next-generation AI music model Stable Audio 2.0.

    The update enables generation of high-quality, structured music tracks up to 3 minutes long at 44.1 kHz from text prompts.

    It adds new audio-to-audio capabilities to transform uploaded samples through text guidance.

    Enhancements include expanded sound effect creation, style transfer for customisation and a diffusion transformer architecture for improved long-form coherence.

    The free model is available on the Stable Audio website, with an API upcoming.

    A 24/7 Stable Radio YouTube stream featuring AI-generated tracks also launched.

  • Published on

    SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice.

    Based on the GPT-4 model, it uses a series of LM-centric commands that allow the LM to browse the repository, view, edit and execute code files.

    The system has been tested on a benchmark, called SWE-bench, and it was able to resolve 12.29% of issues on the full test set, the best performance to date.

    The agents can be run on any GitHub issue.

  • Published on

    Grok-1.5, the latest model from xAI, boasts enhanced reasoning capabilities and a context length of up to 128,000 tokens.

    The model achieved impressive scores on benchmarks like MATH (50.6%), GSM8K (90%), and HumanEval (74.1%), showcasing advancements in math, coding, and problem-solving skills.

    Grok-1.5 also demonstrates powerful retrieval abilities, handling longer and more complex prompts thanks to its expanded context window.

    Built on a custom distributed training framework, Grok-1.5 represents the team’s progress in large language model research.

  • Published on

    Databricks has unveiled DBRX, a new open-source LLM.

    DBRX outperforms established open models like GPT-3.5 and Grok-1 on a range of benchmarks, including language understanding, programming, and math tasks.

    The model uses a fine-grained mixture-of-experts architecture, which Databricks claims provides efficiency improvements in both training and inference.

    DBRX Instruct even surpasses specialised coding models like CodeLLaMA-70B on programming tasks.

  • Published on

    AI music generator Suno has released v3 of its platform, its first model that produces music of radio-quality.

    Users can now create full two-minute songs in seconds in a variety of genres and styles, with better audio quality and improved prompt adherence, meaning fewer hallucinations and more graceful endings.

    The company is also developing a proprietary, inaudible watermarking technology to detect whether a song is created using Suno, to prevent users from creating music based on other artists’ references.

  • Published on

    Stability AI has released Stable Video 3D, which generates 3D model videos from single images without requiring additional data.

    The technology builds on the company’s earlier release of Stable Video Diffusion, which could be used for a variety of tasks, and is now being used commercially with a Stability AI membership.

    The company claims Stable Video 3D outperforms other open-source alternatives, such as Zero123-XL, and features two variants for generating orbital videos and 3D video along specified camera paths.

    It can also create novel multi-view videos of an object and generate 3D meshes.

  • Published on

    Apple’s MM1 is a multimodal large language model (MLLM) that can interpret both images and text data, developed by a team of computer scientists and engineers at Apple.

    The model is part of a family of multimodal models and is designed to improve capabilities in image captioning, visual question answering, and query learning by integrating text and image data. The largest model in MM1 is 30B and beats many 80B open-source LLMs in visual tasks. The family of multimodal models consists of both dense models and mixture-of-experts (MoE) variants.

    The MM1 model can count objects, identify objects that are part of an image, and use common sense about everyday objects to offer users useful information about what the image presents.

    It also has the ability to perform in-context learning, which means it does not need to start over every time a question is asked; it uses what it has learned in the current conversation.

  • Published on

    Software engineering company Cognition has developed an autonomous AI software engineer, called Devin, which can plan and execute complex coding tasks and learn over time.

    Devin has been equipped with developer tools, including a shell, code editor and browser, and can work alongside human engineers or autonomously on tasks, reporting progress in real time and accepting feedback.

    The AI has been evaluated on the SWE-bench coding benchmark, which assesses an algorithm’s ability to solve real-world coding issues, and outperformed all previous models, resolving 13.86% of issues, compared with 1.96% for the best previous model.

  • Published on

    Cohere has launched Command-R, a new LLM designed for large-scale production workloads.

    The “scalable” model balances efficiency with accuracy, enabling companies to move from pilot projects to full-scale production.

    Command-R is optimised for tasks such as retrieval-augmented generation (RAG) and using external APIs and tools, and is designed to work with Cohere’s existing Embed and Rerank models to offer the best integration for RAG applications.

    It boasts strong accuracy for RAG and tool use, low latency and high throughput, a 128k context, and lower pricing.

    It is available immediately on Cohere’s API and will be available on major cloud providers soon.

  • Published on

    Inflection has released its latest version, Inflection-2.5, which it claims is as good as, if not better than, the world’s leading LLMs, such as GPT-4 and Gemini, but using only 40% of the usual computational power for training.

    The company, which aims to create a personal AI for every user, has already seen a significant uptick in user sentiment, engagement and retention since rolling out the upgrade to its one million daily and six million monthly active users.

    An average conversation with Pi, Inflection’s AI chatbot, lasts 33 minutes, with return usage from 60% of customers each week.

  • Published on

    Anthropic has released the Claude 3 model family, featuring three new models – Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus – that are more powerful and responsive than previous versions.

    The new models are available through the company’s API or platform, Claude.ai.

    Each model offers different balances of speed, cost and intelligence to suit different applications, with the company claiming that its new offering outperforms competitors on a range of cognitive tasks, including knowledge, reasoning, mathematics and forecasting.

    The models can power live customer chats, auto-completions, and data extraction tasks where responses must be immediate and in real-time.

    The company has also improved the models’ ability to understand longer context and visual materials such as images, charts and technical diagrams, as well as reducing refusal rates and inaccuracies.

    The Claude 3 family of models will initially offer a 200K context window.

  • Published on

    Vercel has released the Vercel AI SDK 3.0, making it easier for developers to create user interfaces for language models by using React Server Components.

    The SDK lets developers stream UI components directly from LLMs without the need for heavy client-side JavaScript, making apps more interactive and responsive.

    It also allows developers to give LLMs rich, component-based interfaces so users can better interpret and visualise the outputs from AI models.

    The release is based on the company’s previous development of v0, a generative UI design tool that uses React Server Components to convert text and image prompts to React UIs.

  • Published on

    Ideogram has released Ideogram 1.0, its most advanced text-to-image model to date, with a feature called Magic Prompt that helps create detailed prompts for artistic images.

    The company believes generative media models will transform the creative economy, and it has raised $80m in Series A financing led by Andreessen Horowitz to accelerate its own growth in this field.

    Ideogram 1.0 offers state-of-the-art text rendering, photorealism and prompt adherence, and reliable text rendering capabilities for the creation of personalised messages and designs.

  • Published on

    Mistral AI has released Mistral Large, a new advanced language model available through la Plateforme and Azure.

    The model offers top-tier reasoning capabilities, native fluency in five languages, a 32K tokens context window, and function calling capabilities.

    Mistral Large is the world’s second-ranked model generally available through an API.

    The company has also released Mistral Small, an optimised model for low latency workloads.

    Both models are available with open-weight and optimised model endpoints.

    JSON format mode and function calling have been introduced for easier interaction with the models.

  • Published on

    STORM is a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking.

    It is designed to assist in writing grounded and organised long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

    A new study addresses the challenges in the pre-writing stage, such as researching the topic and preparing an outline, by using STORM to model the pre-writing stage. STORM discovers diverse perspectives, simulates conversations, and curates collected information to create an outline.

    The study evaluates STORM using FreshWiki, a dataset of recent high-quality Wikipedia articles, and outline assessments.

    Compared to an outline-driven retrieval-augmented baseline, STORM’s articles were deemed more organised (by 25%) and broader in coverage (by 10%).

    The study also identifies new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts, based on feedback from experienced Wikipedia editors.

  • Published on

    Google has released Gemma, a lightweight family of open models for AI development, built using the same technology as the company’s Gemini models.

    Gemma is available in two sizes, Gemma 2B and Gemma 7B, both of which have been pre-trained and instruction-tuned for specific tasks.

    Google has also released a responsible generative AI toolkit to provide guidance on building safe applications with Gemma. It includes a model debugging tool, best practices for developers and a methodology for building robust safety classifiers with minimal examples.

  • Published on

    Sora is a new AI-powered tool that creates videos from text prompts.

    It can currently generate videos up to a minute long that include complex scenes, specific types of motion and vibrant, accurate details.

    It achieves this by combining a deep understanding of language with a knowledge of how the physical world works in motion.

    The tool is still in development and researchers are seeking feedback from visual artists, designers and filmmakers to help improve it.

    Currently, Sora has some limitations and may struggle with simulating complex physics and interpreting some prompts with precise descriptions of events.

  • Published on

    Google has released its next-generation AI model, Gemini 1.5, which offers enhanced performance and a breakthrough in long-context understanding.

    The model can process up to 1 million tokens, the longest context window of any large-scale foundation model to date.

    The first 1.5 model, 1.5 Pro, achieves comparable quality to 1.0 Ultra while using less compute.

    A limited preview of the model with the long context window is available to select developers and enterprise customers in AI Studio and Vertex AI.

    The model offers a range of potential new capabilities, including seamless analysis of large amounts of content, sophisticated understanding and reasoning across different modalities and relevant problem-solving over longer blocks of code.

  • Published on

    Qwen has released the next iteration of its open-source large language model series, Qwen1.5. The update includes base and chat models in six sizes from 0.5 billion to 72 billion parameters.

    Key improvements include enhanced alignment with human preferences, stronger multilingual capabilities, long context support up to 32,768 tokens, and better performance on retrieval-augmented generation and tool use tasks compared to previous models. However, coding abilities still trail GPT-4.

    Qwen1.5 is available on platforms like Hugging Face, Ollama and DashScope’s API services.

  • Published on

    The Allen Institute for AI (AI2) has released its first set of Open Language Models (OLMo), which includes seven billion parameter models and one billion parameter variants.

    The institute aims to promote openness in the development and use of AI, allowing researchers to access and build upon the OLMo models, datasets, and tools.

    The models are available for download and fine-tuning, and the dataset and tools used to create the models are also openly available.

    Additionally, a license for the models and dataset allows researchers to contribute to the development of the OLMo family.

    The release of OLMo is seen as a step towards making AI more transparent and open, allowing for a broader discussion on the potential risks and benefits of language models.

  • Published on

    Code Llama 70B is the largest and most advanced model in the Code Llama family.

    This state-of-the-art language model is available in three versions: Code Llama – 70B, the foundational code model; Code Llama – 70B – Python, specialized for Python; and Code Llama – 70B – Instruct, fine-tuned for understanding natural language instructions.

    Built on top of Llama 2, Code Llama is capable of generating code and natural language about code from both code and natural language prompts.

    In internal benchmark testing, Code Llama outperformed other publicly available language models in code tasks and is free for both research and commercial use.

  • Published on

    Adept has released its new multimodal model, Fuyu-Heavy, which has been designed specifically for digital agents.

    The model is the third-most capable of its kind in the world, and is only outranked by GPT4-V and Gemini Ultra, both of which are 10 to 20 times larger.

    Fuyu-Heavy excels at multimodal reasoning and UI understanding, and scores higher on the MMMU benchmark than even Gemini Pro.

    The model matches or exceeds the performance of those in its compute class on text-based benchmarks, despite having to devote part of its capacity to image modelling.

  • Published on

    LangGraph is a new open-source module built on top of the popular LangChain framework. It is designed to enable the creation of cyclical graphs, which are often needed for developing AI agent runtimes.

    While LangChain already supported the creation of custom chains, it lacked an easy way to introduce cycles into these chains. LangGraph solves this problem by providing a simple interface for defining state machines as graphs, with nodes representing different components or actions, and edges defining the flow between them.

    The release includes two pre-built agent runtimes: the Agent Executor and the Chat Agent Executor.

  • Published on

    CrewAI is a new library for building and coordinating groups of AI agents, designed to work together on complex tasks.

    Its four building blocks are agents, which have distinct roles and capabilities; tasks, which are small, focused missions for agents to accomplish; tools, which are used by agents to carry out their tasks; and crews, which bring together agents, tasks and a process for coordinating their work.

    CrewAI is built on LangChain, a framework that enables developers to use a wide range of existing tools and toolkits, from local open-source models to popular platforms such as Ollama.

    One key advantage of CrewAI is that it enables AI agents to run in the cloud, making it quick and simple to get started with the platform.

    Additionally, as CrewAI is built on LangChain, it can be debugged using LangSmith, which allows developers to inspect what calls are being made, what input is being used and what output is being generated, helping to optimise the performance of the AI agents.

  • Published on

    OpenChat-3.5-1210 is the latest LLM model from OpenChat, built to excel at coding and enhance performance over previous versions.

    It has achieved a near 15-point increase on HumanEval, making it one of the best generalist models currently.

    OpenChat-3.5-1210 is specialised for coding and coding-related tasks, such as code generation, understanding, and debugging. It surpasses ChatGPT and Grok models in terms of performance.

    This upgrade demonstrates OpenChat’s ongoing commitment to developing cutting-edge large language models for specific applications, specialising in domains that are vital for business and everyday life, such as coding.

  • Published on

    Microsoft has released Phi-2, a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities.

    Phi-2 matches or outperforms models up to 25x larger thanks to innovations in model scaling and training data curation.

    On complex benchmarks, Phi-2 achieves state-of-the-art performance among base models with less than 13 billion parameters.

    On average, Phi-2 outperforms Google’s 7 billion parameter model, Mistral, and the 13 billion parameter model, Llama-2, whilst only being a quarter of the size.

    Furthermore, Phi-2 outperforms Google’s newly announced Gemini Nano 2 on several benchmarks.

    Licensed under the MIT open source license.

  • Published on

    Mixtral 8x7B has been launched by Mistral AI, a company that focuses on creating and providing access to open AI models to benefit the developer community.

    This is a sparse model that outperforms competitors on benchmarks, offering 6x faster inference and superior cost-performance results compared to alternatives, including GPT 3.5.

    It can handle contexts of up to 32k tokens and has been trained on web data to recognise and process English, French, Italian, German and Spanish to generate human-like responses and code.

    Two versions are available: Mixtral 8x7B, which offers a range of capabilities, and Mixtral 8x7B Instruct, which has been fine-tuned to follow instructions and achieve a score of 8.3 on the MT-Bench benchmark.

    Licensed under Apache 2.0.

  • Published on

    Stability AI has released StableLM Zephyr 3B, the latest in its series of lightweight language models designed for use on edge devices.

    Featuring 3 billion parameters, StableLM Zephyr 3B has been trained on multiple instruction datasets and fine-tuned using the Direct Preference Optimisation algorithm to perform well in text generation and align with human preferences.

    Stability AI says the model is particularly good at instruction following and answering questions, but is also capable of more creative tasks like copywriting and content personalisation.

    The model is released under a non-commercial license that permits non-commercial use.

  • Published on

    Google has released its largest and most capable AI model to date, called Gemini.

    The model has been trained at scale using the company’s AI-optimised infrastructure and its latest Tensor Processing Units (TPUs).

    It has been designed to be the most reliable and scalable model for training, as well as the most efficient to serve.

    The first version of Gemini, Gemini 1.0, has been optimised for three different sizes: Gemini Ultra, which is the company’s largest and most capable model; Gemini Pro, which is its best model for scaling across a wide range of tasks; and Gemini Nano, its most efficient model for on-device tasks.

    The Ultra model has been found to outperform human experts on a range of tasks, including complex maths problems.

  • Published on

    AI art generator, Playground, has released an upgraded version, Playground v2, which the developers say outperforms competitors in its quality and creativity.

    Users can visit the website and compare the results of using Playground v2 with those from other platforms, such as Stable Diffusion XL, via a series of prompts that ask users to compare images.

    The company has also released weights for the pre-training of AI models, meaning those with fewer computational resources can use the tech.

    Additionally, the company has devised a new benchmark test for AI image generation, MJHQ-30K, which measures FID (Frechet Inception Distance) scores to assess the quality of the output.

    The release is part of Playground’s mission to make AI more accessible, and the company has said it would like to hear from people who use the new tools.

  • Published on

    Generating art through large language models such as DALL-E has become popular, but one challenge is keeping a consistent style throughout a prompt when generating a series of images.

    To address this, Google Research has developed StyleAligned, a method for achieving consistent style across images using a pre-trained diffusion model without the need for fine-tuning.

    StyleAligned operates by encouraging information retention and style consistency through a shared attention mechanism, in which an image being generated attends to a user-provided reference image during the diffusion process.

    The researchers demonstrate the efficacy of the method across a range of artistic styles and text prompts, showing that StyleAligned can produce a series of images that maintain a consistent visual style without the need for fine-tuning or manual intervention.

    Styled aligned image generation can also be used in combination with other methods.

  • Published on

    Facebook AI Research has created a tool called DensePose, which can apply to videos and generate colour-coded representations of the human figure, labelling each body part.

    Flode-Labs has now created Vid2DensePose, a tool based on DensePose designed to convert videos into this format for use in animation.

    It is particularly useful in conjunction with MagicAnimate, an application that can take a series of labelled frames and generate smooth, animated transitions between them, enabling the creation of advanced, realistic human animations from still images.

    Vid2DensePose is available on GitHub, and includes instructions for installation and use.

  • Published on

    MagicAnimate is a framework for generating new animation data given a reference image and a motion sequence.

    The authors use a conditional diffusion model for generation, with an additional encoder used to preserve the identity of the reference image in the output.

    A simple but effective method of producing smooth transitions between video frames is also introduced, which is necessary for producing animations of reasonable length.

    Comparisons with other state-of-the-art methods on two benchmark datasets show that MagicAnimate produces more temporally consistent animations, while also better preserving the appearance of the reference image.

    The method performs well on both short and long animations, and when animating reference images with different identities to the motion sequence, showing the robustness and versatility of the approach.

  • Published on

    Magicoder is an open-source series of large language models (LLMs) for code developed to close the performance gap between synthetic instruction data-trained LLMs and real-world performance.

    To do this, Magicoder was trained on 75k synthetic instruction data created by a method called OSS-Instruct, which uses open-source code snippets to generate instruction data.

    This approach is designed to mitigate the inherent bias found in LLMs trained on synthetic data alone by enlightening the model with references, generating more diverse, realistic data and greater controllability.

    Magicoder and its enhanced version, MagicoderS, outperform other code models on a variety of benchmarks, including text-to-code generation in Python, multilingual coding and data-science program completion.

  • Published on

    Microsoft Research Asia scientists have developed GAIA, a method for generating talking avatars from a single portrait image and a speech sample.

    Previous avatar generation methods used domain-specific heuristics such as warping-based motion representation and 3D Morphable Models, which limit the diversity and realism of the results.

    GAIA uses a two-stage process, first disentangling the input video into motion and appearance representations, and then generating a motion sequence from the speech and portrait reference.

    The researchers trained the system on a large-scale, high-quality talking avatar dataset, and the resulting avatar generator was shown to be superior to existing methods in terms of naturalness, diversity, lip-sync quality and visual quality.

    Furthermore, the system is scalable, general and can be used for other applications such as generating avatars from textual instructions.

  • Published on

    Character animation, or the task of creating the illusion of movement in otherwise static images, is an important and challenging aspect of computer graphics.

    In a paper released on arXiv, a group of researchers from the Alibaba Group detail a new method for training a character animation model using a type of neural network called a diffusion model. Their method, which they have called Animate Anyone, preserves the intricate appearance details of the reference image and uses a technique called spatial attention to merge detail features.

    It also uses a technique called pose guiding to direct the character’s movement and an approach called temporal modelling to ensure smooth transitions between frames in the resulting animation.

    In testing, the researchers used datasets of fashion photos and human dance videos to demonstrate that their method out-performed existing approaches and could generate realistic animations from images of different characters.

  • Published on

    Meta has launched Audiobox, a successor to its Voicebox tool for generating audio from natural language prompts.

    The new tool can generate audio clips including speech in various styles and environments, non-speech sound effects and soundscapes.

    It can also restyle voices, making them sound as if they are speaking in a particular environment such as a cathedral, or with a certain emotion.

    Users can input a description of the sound they wish to generate, or combine a voice input with a text style prompt to create the desired audio.

    The tool has been released to a limited number of researchers and institutions to encourage the development of responsible AI use.

  • Published on

    Meta has launched Seamless, a new system for preserving expression and improving real-time translation using AI.

    The system includes two new models. The first is SeamlessExpressive, which preserves expression in speech-to-speech translation, and the second is SeamlessStreaming, which delivers “state-of-the-art results with around two seconds of latency”.

    The models are based on the latest version of the company’s foundational model, SeamlessM4T, and are designed to improve automatic speech recognition, speech-to-speech, speech-to-text and text-to-speech capabilities.

    Alongside the models, Meta is releasing metadata, data and data alignment tools to help the research community to improve on the work.

  • Published on

    Researchers from Shanghai AI Lab and Stanford University have presented an approach that enables the extraction of camera trajectory and character motion from filmed content for replication in new 2D or 3D content.

    The method enables the preservation of complex camera movements and character motion from the original shot and simulates new scenes with different characters, lighting, or environments.

    The researchers have demonstrated the technology through a series of video examples showcasing 2D and 3D cinematic transfers of filmed content ranging from Hollywood movies to animated features, with accurate preservation of camera movements and character motions.

    The team has made their code and datasets publicly available.

  • Published on

    Perplexity AI is introducing two new large language models, pplx-7b-online and pplx-70b-online, which are the first of their kind online LLM API.

    These models are capable of using knowledge from the internet to provide up-to-date information in their responses. To achieve this, Perplexity has developed in-house search technology that prioritises non-SEOed sites.

    The models are finetuned to effectively use snippets to inform their responses and are accessible via the pplx-api and Perplexity Labs.

    Evaluations conducted using curated evaluation datasets have demonstrated that Perplexity’s models can match or surpass the performance of other large language models in providing accurate and up-to-date answers.

    The pplx-api is now available to the general public, following its release from beta. A new usage-based pricing structure has also been introduced.

  • Published on

    Amazon Web Services (AWS) has launched Amazon Q, a generative AI tool for answering questions and solving problems using a customer’s own data and systems.

    The tool can be tailored to individual users, with more than 40 connectors to popular data sources, including Salesforce, Dropbox, ServiceNow and Zendesk.

    Amazon Q will be available through the AWS Management Console, and has been specifically developed to be accessible to companies of all sizes.

    The product will be rolled out in two flavours: Amazon Q in Connect, an application for customer service workers that will suggest answers to customer service workers, and Amazon Q in AWS Supply Chain, which will offer supply chain workers answers to complex questions based on their data.

  • Published on

    Video creation AI company Pika has announced Pika 1.0, a new AI model which generates and edits videos in several styles, including 3D animation and cinema.

    The new web experience also aims to improve usability.

    Pika 1.0 marks the company’s efforts to fulfil its vision of enabling everyone to be a director of their own content.

    Having started six months ago, the company has grown to half a million users generating millions of videos weekly. Pika has also announced it has raised $55m in funding, including investment from Elad Gil and Adam D’Angelo.

  • Published on

    Stability AI has launched SDXL Turbo, a text-to-image model that uses a method called Adversarial Diffusion Distillation (ADD) to create images in one step, generating near-real-time results.

    In tests, SDXL Turbo outperformed other state-of-the-art models in terms of both image quality and the number of steps required to generate an image, beating a 50-step model with just four steps.

    It also offers faster generation times; specifically, it can generate a 512Ă—512 image in 207ms.

    However, the company has made it clear that SDXL Turbo is not yet available for commercial use.

    The release of SDXL Turbo marks an important step forward for text-to-image generation models, promising both speed and high fidelity.

  • Published on

    Stability AI has released Stable Video Diffusion, its first foundation model for generative video, based on the image model Stable Diffusion.

    Currently available in research preview, the company has made the code available on GitHub and the weights available on the Hugging Face page.

    Two image-to-video models are available, generating 14 and 25 frames at customisable frame rates.

    The model is for research purposes only and is not yet intended for commercial use.

  • Published on

    Language model Claude 2.1 has been released with a range of new features to boost its capabilities for enterprises, including support for a window of up to 200,000 tokens, allowing around 150,000 words of text to be analysed, a 2x decrease in hallucination rates and a beta tool that enables it to integrate with other user interfaces.

    The new model has been designed to provide more accurate and reliable responses than previous versions, including a 30% reduction in incorrect answers and up to a fourfold decrease in the rate of mistakes in concluding whether a document supports a given statement.

    The model is now available in API form for console users.

  • Published on

    GitHub has announced a number of updates at its GitHub Universe 2023 event, including the general availability of GitHub Copilot Chat.

    The tool enables developers to write and understand code using natural language, and will be available as part of an existing GitHub Copilot subscription from December 2023.

    As well as integrating the tool into its web and mobile apps, GitHub has also launched GitHub Copilot Enterprise, which allows developers to quickly get up to speed with a codebase, search through and build documentation, and review pull requests.

    It costs $39 per user per month and will be generally available in February 2024.

    GitHub has also announced its GitHub Copilot Partner Program, which will see the integration of the tool with third-party developer tools and services, and new AI-powered security features for GitHub Advanced Security, including code scanning autofix and secret scanning.

  • Published on

    xAI has launched an AI-assistant tool called Grok, which is modelled on the fictional Hitchhiker’s Guide to the Galaxy.

    The technology uses real-time knowledge and is designed to answer questions with a bit of wit, as well as being able to answer “spicy questions” that are rejected by most other AI systems, according to the company.

    Grok-1 displayed strong results, surpassing all other models in its compute class, including ChatGPT-3.5 and Inflection-1.

  • Published on

    OpenChat is an open-source language model based on a 7 billion parameter version of the transformer architecture.

    It has been trained using a method called C-RLFT, which uses reinforcement learning to fine-tune the model on a mixed-quality dataset of instruction-oriented data.

    The developers claim that OpenChat outperforms ChatGPT, even though it has fewer parameters, demonstrating the effectiveness of the training approach.

    The model can be accessed through an OpenAI-compatible API and is designed to handle high-throughput traffic for deployment on consumer GPUs.

    The developers have made the model and training code publicly available, and it is licensed under the Apache 2.0 license, meaning that commercial use is permitted.

  • Published on

    Adept AI has released Fuyu-8B, a smaller version of its multimodal model that powers the company’s product.

    According to the company, the model is exciting because it is designed from the ground up for digital agents and is easy to understand, scale and deploy, supporting arbitrary image resolutions and doing fine-grained localisation on screen images.

    In addition, the model performs well on standard image understanding benchmarks.

    The company warns that faces and people are generally not generated properly and that the model should not be used to generate factual representations of people or events.

  • Published on

    OpenHermes 2 Mistral 7B is a language model fine tuned on Mistral 7B, trained to navigate complex conversations with finesse.

    It uses the ChatML format, which allows for multi-turn conversations with structured system prompts, enabling OpenAI endpoint compatibility.

    OpenHermes 2 was trained on 900,000 entries of primarily GPT-4-generated data and outperformed previous models on benchmark tests, including GPT4All, AGI Eval, BigBench, and Averages Compared.

    The model is available on the AI2 platform and users can access it through the LM Studio interface for interactive use.

  • Published on

    MemGPT, developed by researchers at UC Berkeley, is a Python framework designed to create LLM agents equipped with persistent memory capabilities and customisable tools.

    Drawing inspiration from operating system design, MemGPT uses a memory management system to remember information generated over time, allowing it to deal with contexts that extend beyond the LLM’s usual limits.

    The research paper, published on arXiv, shows that MemGPT can outperform LLMs on a number of tasks requiring long-term context recollection.

  • Published on

    AI models are becoming more prevalent and developers require appropriate tools to effectively utilise them.

    Current approaches to working with LLMs, such as prompting and fine-tuning, are insufficient, since developers do not sufficiently understand how the models produce outputs from their inputs.

    To address this, Martian has developed a model mapping technique to turn transformers into programs, allowing developers to understand how models work and make use of them more effectively.

    The first application of this technique is the model router, which can determine the best LLM to use for each query and route it in real time to achieve the best performance at the lowest cost.

    This is the first commercial application of large-scale AI interpretability and achieves better results than GPT-4 at a lower cost.

  • Published on

    AutoGen has been developed as an open-source Python package to help developers build complex applications using large language models (LLMs). The framework simplifies the design, implementation and automation of LLM workflows, reducing coding effort by more than four times, according to its developers.

    To use AutoGen, users first define a set of agents, which combine LLMs with human and tool-based intelligence to handle specific tasks. These agents can then engage in automated chat to solve workflows, with their interactions governed by reusable and composable behaviours.

    As well as reducing the complexity of LLM workflows, AutoGen also supports the creation of entirely new applications, such as conversational chess. The researchers behind AutoGen are now encouraging users to trial the package and provide feedback.

  • Published on

    Mistral AI has released Mistral 7B, a 7.3 billion parameter language model, which it claims outperforms Llama 2 13B on all benchmarks and Llama 1 34B on many.

    The model uses Grouped-query attention for faster inference and Sliding Window Attention to handle longer sequences.

    It is being released under the Apache 2.0 licence, with the company’s reference implementation and deployment options on various clouds.

    Mistral 7B has been fine-tuned for chat, achieving better performance than Llama 2 13B on MT-Bench, a metric for evaluating multilingual instruction models.

    The company said it was looking forward to working with the community on developing moderation for the model to allow it to be used in environments that require guardrails for outputs.

  • Published on

    ChatDev is a virtual software company powered by various intelligent agents with distinct roles, including CEO, CTO, product officer, programmer, reviewer, tester and designer. These agents work collaboratively, participating in specialised functional seminars to design, code, test and document software.

    The framework is based on large language models and offers incremental development and human-agent interaction modes. Users can activate the designer agent to generate images. The Art mode is available now.

    ChatDev launched as a SaaS platform to enable software developers and entrepreneurs to build software at low cost and barrier to entry.

  • Published on

    Meta has released Code Llama, a large language model specialising in coding that it claims outperforms other publicly available options on code-related tasks.

    Based on Meta’s LLM Llama 2 and built on a foundation of 500 billion code-related training tokens, Code Llama can generate code from prompts, as well as offering code completion and debugging support for a range of popular programming languages.

    Three versions are being released, with 7, 13 and 34 billion parameters, with the 7B and 13B models also including “fill-in-the-middle” capabilities to support code completion.

    While Meta recommends the use of Code Llama to assist software engineers, the company also warns developers not to use it for general natural language tasks, noting it is “not designed to follow natural language instructions”.

  • Published on

    Meta has introduced Llama 2, the next generation of its open-source large language model, in partnership with Microsoft.

    Llama 2 is free for research and commercial use, available through various cloud providers including Microsoft Azure and Amazon Web Services.

    The model comes in multiple versions, including pretrained and fine-tuned conversational variants.

    It supports multiple languages and focuses on responsible AI development, offering resources like a Responsible Use Guide and Acceptable Use Policy.

    Meta emphasises transparency and safety, having conducted red-teaming exercises and created a transparency schematic. Pricing details are not provided in the article.

  • Published on

    MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It does this by forming a collaborative software entity for complex tasks by assigning different roles to GPTs.

    MetaGPT can take a one line requirement and output user stories, competitive analysis, requirements, data structures, APIs and documents.

    The framework includes product managers, architects, project managers and engineers, providing the entire process of a software company. The core philosophy is Code = SOP (Team). This has been materialised by applying it to teams composed of LLMs.

  • Published on

    GPT Researcher is a tool that uses AI to undertake online research. It aims to address the problems of speed, determinism and reliability that researchers often face.

    The tool uses “planner” and “execution” agents to generate research questions, which are then used to search for relevant information online. This information is then filtered and aggregated into a research report.

    The tool is intended to produce factual and unbiased research whilst offering customisation options and using over 20 web sources to reduce the risk of incorrect information.

    The average research task takes three minutes and costs around $0.1.

  • Published on

    Superagent is a framework for building AI assistants with custom knowledge, brand identity, and external APIs.

    It allows developers to add AI-assistant functionality to their applications without requiring expertise in AI or machine learning.

    Superagent supports production use cases and provides a cloud platform for easy deployment. It offers features such as data sources, API connectivity, memory, and reporting.

    Examples of AI assistants built with Superagent include a legal document analyser, customer support chatbot, educational content generator, automated sales assistant, and code review assistant.

    Superagent is also open source, allowing users to contribute to its development.

  • Published on

    Claude 2, the upgraded version of the conversational AI model, is now available to the public in the US and UK.

    With improvements in performance, reasoning and coding abilities, the AI model can now assist users in tasks like writing documents and solving mathematical problems.

    The model can also power chat experiences and has been made available to businesses through the Claude API, which is being offered at the same price as its predecessor.

    Additionally, the model can handle inputs of up to 100,000 tokens, allowing it to work through hundreds of pages of text.

  • Published on

    PrivateGPT is an API that allows users to ask questions about their documents using the power of large language models (LLMs) without risking a data leak.

    The project is production-ready and provides two logical blocks: a high-level API that manages ingestion, splitting, metadata extraction, embedding generation and storage, and a low-level API for implementing more complex pipelines.

    It also offers a Gradio UI client to test the API and tools such as a bulk model download script and a documents folder watch.

    Based on the LlamaIndex framework, PrivateGPT is designed to be easily extended and adapted to suit user needs, with dependency injection, usage of abstractions, and simplicity of design.

  • Published on

    Inflection AI has launched Pi, a digital assistant designed to be kind, supportive and curious.

    Pi is intended to be a confidante, creative partner and sounding board, as well as a source of knowledge based on each user’s interests.

    Built using the company’s own technology, Pi is intended to be a companion that puts human relationships first.

    While it is available on several platforms including Instagram, Facebook Messenger and WhatsApp, Pi can also continue the conversation wherever the user goes. It is free to use and can be downloaded from the Apple Store.

  • Published on

    LLaVA (Large Language-and-Vision Assistant) is an end-to-end trained large multimodal model that combines a vision encoder and the Vicuna large language model to enable general-purpose visual and language understanding.

    With 158,000 unique language-image instruction-following samples, LLaVA achieves impressive chat capabilities that sometimes mimic the behaviours of multimodal GPT-4 on unseen images and instructions.

    LLaVA utilises a two-stage instruction tuning procedure to align features and fine-tune the model end-to-end for visual chat and science question answering applications. Early experiments show LLaVA yields an 85.1% relative score compared to GPT-4 on a synthetic multimodal instruction-following dataset.

    The developers have open-sourced the GPT-4 generated visual instruction tuning data, model and code base to support further research. LLaVA demonstrates the potential for large multimodal models to enable powerful visual-language understanding and reasoning capabilities.

  • Published on

    Babyagi is a lightweight Python script that uses OpenAI’s API to create and execute tasks in an infinite loop, storing the results in a vector database such as Chroma or Weaviate.

    The script is designed to be run continuously, pulling the first task from a task list, sending it to an execution agent that uses the OpenAI API, enriching the result and storing it in the database, and then creating new tasks based on an objective and the result of the previous task.

    Tasks are created and reprioritised using OpenAI’s API. The script can be run either directly or inside a Docker container, and supports a range of OpenAI models as well as the Llama model through Llama.cpp.

  • Published on

    AutoGPT is a project designed to make the creation and use of AI agents as accessible as possible to all.

    It provides tools to enable developers to build, test and use AI agents, eliminating boilerplate code to allow them to focus on the essential elements of their creations.

    As well as the tools, AutoGPT also provides an arena in which developers can measure the performance of their agents using an objective benchmark, and earn recognition for their best agents on a leaderboard.

    AutoGPT uses the agent protocol standard from the AI Engineer Foundation to keep things consistent and ensure it works well with many current and future applications. This standardises how your agent communicates with the frontend and benchmark.

  • Published on

    Nomic AI has developed GPT4All, an open-source ecosystem of large language models that can run on most GPUs and CPUs, including consumer-grade hardware.

    The ecosystem supports various bindings such as Python, TypeScript, Go, C# and Java, and an API for inferencing LLMs from Docker containers.

    Users can access the models via a desktop chat client that supports a range of open-source models, with offline build functions available for older versions.

    The ecosystem is supported by compute partner Paperspace. Nomic AI encourages external contributions through its Discord channel and issue tracker.

  • Published on

    Anthropic has announced the launch of two new products built on its research into training helpful, harmless and honest AI systems: Claude and Claude Instant.

    Based on natural language processing, the products are aimed at improving productivity in the workplace and education settings by summarising information, generating creative writing, aiding coding and search functions, and offering Q&A capabilities.

    Early adaptors include tutoring service Juni Learning, which uses Anthropic to power its Juni Tutor Bot; productivity tool Notion; legal contract specialists Robin AI; and search engine DuckDuckGo.

    The company is also working with audio AI company AssemblyAI.