• Published on

    StackBlitz unveiled bolt.new, a chatbot interface that can help create full-stack applications directly in your web browser.

    Unlike other AI coding assistants, bolt.new integrates with package managers and development environments, allowing users to install and run npm packages, Vite, Next.js and other popular tools without leaving the browser.

    The platform uses WebContainers, StackBlitz's WebAssembly-based technology that runs Node.js in the browser, enabling developers to prompt, edit and debug full-stack applications in real time.

    Key features include direct integration with deployment services like Netlify and Cloudflare, database connectivity through Supabase, and the ability to share projects via URL. The system can generate production-ready applications with both frontend and backend components from natural language prompts.

    Its core components have been released as open-source software on GitHub.

  • Published on

    Aider, an AI-powered code assistant, achieved a state-of-the-art result of 26.3% on the SWE Bench Lite benchmark, surpassing the previous top leaderboard entry of 20.3% from Amazon Q Developer Agent.

    Aider’s success is attributed to its focus on static code analysis, reliable LLM code editing, and pragmatic UX for AI pair programming.

    The AI does not use RAG, vector search, tools, or give the LLM access to search the web or unilaterally execute code.

    It emphasises being an interactive tool for engineers to get real work done in real code bases using a chat interface.

    The benchmark methodology involved running aider in each problem’s git repository, with the problem statement submitted as the opening chat message. The AI scored 25.0% using GPT-4o alone, which was also matching the state-of-the-art before being surpassed by the 26.3% result using both GPT-4o and Opus.

  • Published on

    SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice.

    Based on the GPT-4 model, it uses a series of LM-centric commands that allow the LM to browse the repository, view, edit and execute code files.

    The system has been tested on a benchmark, called SWE-bench, and it was able to resolve 12.29% of issues on the full test set, the best performance to date.

    The agents can be run on any GitHub issue.

  • Published on

    Software engineering company Cognition has developed an autonomous AI software engineer, called Devin, which can plan and execute complex coding tasks and learn over time.

    Devin has been equipped with developer tools, including a shell, code editor and browser, and can work alongside human engineers or autonomously on tasks, reporting progress in real time and accepting feedback.

    The AI has been evaluated on the SWE-bench coding benchmark, which assesses an algorithm’s ability to solve real-world coding issues, and outperformed all previous models, resolving 13.86% of issues, compared with 1.96% for the best previous model.

  • Published on

    LangGraph is a new open-source module built on top of the popular LangChain framework. It is designed to enable the creation of cyclical graphs, which are often needed for developing AI agent runtimes.

    While LangChain already supported the creation of custom chains, it lacked an easy way to introduce cycles into these chains. LangGraph solves this problem by providing a simple interface for defining state machines as graphs, with nodes representing different components or actions, and edges defining the flow between them.

    The release includes two pre-built agent runtimes: the Agent Executor and the Chat Agent Executor.

  • Published on

    CrewAI is a new library for building and coordinating groups of AI agents, designed to work together on complex tasks.

    Its four building blocks are agents, which have distinct roles and capabilities; tasks, which are small, focused missions for agents to accomplish; tools, which are used by agents to carry out their tasks; and crews, which bring together agents, tasks and a process for coordinating their work.

    CrewAI is built on LangChain, a framework that enables developers to use a wide range of existing tools and toolkits, from local open-source models to popular platforms such as Ollama.

    One key advantage of CrewAI is that it enables AI agents to run in the cloud, making it quick and simple to get started with the platform.

    Additionally, as CrewAI is built on LangChain, it can be debugged using LangSmith, which allows developers to inspect what calls are being made, what input is being used and what output is being generated, helping to optimise the performance of the AI agents.

  • Published on

    MemGPT, developed by researchers at UC Berkeley, is a Python framework designed to create LLM agents equipped with persistent memory capabilities and customisable tools.

    Drawing inspiration from operating system design, MemGPT uses a memory management system to remember information generated over time, allowing it to deal with contexts that extend beyond the LLM’s usual limits.

    The research paper, published on arXiv, shows that MemGPT can outperform LLMs on a number of tasks requiring long-term context recollection.

  • Published on

    AutoGen has been developed as an open-source Python package to help developers build complex applications using large language models (LLMs). The framework simplifies the design, implementation and automation of LLM workflows, reducing coding effort by more than four times, according to its developers.

    To use AutoGen, users first define a set of agents, which combine LLMs with human and tool-based intelligence to handle specific tasks. These agents can then engage in automated chat to solve workflows, with their interactions governed by reusable and composable behaviours.

    As well as reducing the complexity of LLM workflows, AutoGen also supports the creation of entirely new applications, such as conversational chess. The researchers behind AutoGen are now encouraging users to trial the package and provide feedback.

  • Published on

    ChatDev is a virtual software company powered by various intelligent agents with distinct roles, including CEO, CTO, product officer, programmer, reviewer, tester and designer. These agents work collaboratively, participating in specialised functional seminars to design, code, test and document software.

    The framework is based on large language models and offers incremental development and human-agent interaction modes. Users can activate the designer agent to generate images. The Art mode is available now.

    ChatDev launched as a SaaS platform to enable software developers and entrepreneurs to build software at low cost and barrier to entry.

  • Published on

    MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It does this by forming a collaborative software entity for complex tasks by assigning different roles to GPTs.

    MetaGPT can take a one line requirement and output user stories, competitive analysis, requirements, data structures, APIs and documents.

    The framework includes product managers, architects, project managers and engineers, providing the entire process of a software company. The core philosophy is Code = SOP (Team). This has been materialised by applying it to teams composed of LLMs.

  • Published on

    Superagent is a framework for building AI assistants with custom knowledge, brand identity, and external APIs.

    It allows developers to add AI-assistant functionality to their applications without requiring expertise in AI or machine learning.

    Superagent supports production use cases and provides a cloud platform for easy deployment. It offers features such as data sources, API connectivity, memory, and reporting.

    Examples of AI assistants built with Superagent include a legal document analyser, customer support chatbot, educational content generator, automated sales assistant, and code review assistant.

    Superagent is also open source, allowing users to contribute to its development.

  • Published on

    Babyagi is a lightweight Python script that uses OpenAI’s API to create and execute tasks in an infinite loop, storing the results in a vector database such as Chroma or Weaviate.

    The script is designed to be run continuously, pulling the first task from a task list, sending it to an execution agent that uses the OpenAI API, enriching the result and storing it in the database, and then creating new tasks based on an objective and the result of the previous task.

    Tasks are created and reprioritised using OpenAI’s API. The script can be run either directly or inside a Docker container, and supports a range of OpenAI models as well as the Llama model through Llama.cpp.

  • Published on

    AutoGPT is a project designed to make the creation and use of AI agents as accessible as possible to all.

    It provides tools to enable developers to build, test and use AI agents, eliminating boilerplate code to allow them to focus on the essential elements of their creations.

    As well as the tools, AutoGPT also provides an arena in which developers can measure the performance of their agents using an objective benchmark, and earn recognition for their best agents on a leaderboard.

    AutoGPT uses the agent protocol standard from the AI Engineer Foundation to keep things consistent and ensure it works well with many current and future applications. This standardises how your agent communicates with the frontend and benchmark.