- general
Awesome-AI-Agents
A collection of autonomous agents 🤖️ powered by LLM. - Jenqyang/Awesome-AI-Agents
12 jun 2026 GitHubBekijk → - general
AgentLLM
AgentLLM is a PoC for browser-native autonomous agents - idosal/AgentLLM
12 jun 2026 GitHubBekijk → - multi-agent
AutoGen
A programming framework for agentic AI. Contribute to microsoft/autogen development by creating an account on GitHub.
10 jun 2026 GitHubBekijk → - tool-use
Arcade AI
MCP Server Framework and Tool Development library for building custom capabilities into agents. - ArcadeAI/arcade-mcp
9 jun 2026 GitHubBekijk → - general
Manus
An AI agent for automated tasks and web app creation, capable of performing complex actions beyond simple chat responses.
8 jun 2026 WebBekijk → - general
Mistral AI
Offers frontier models and a flexible AI platform for enterprise applications, including open-weight models and low-cost APIs.
8 jun 2026 WebBekijk → - voice
Vapi
The official Python SDK for accessing Vapi's API. Contribute to VapiAI/server-sdk-python development by creating an account on GitHub.
8 jun 2026 GitHubBekijk → - tool-use
Composio
Composio powers 1000+ toolkits, tool search, context management, authentication, and a sandboxed workbench to help you build AI agents that turn intent into action. - ComposioHQ/composio
8 jun 2026 GitHubBekijk → - tool-use
Agent S2
Agent S: an open agentic framework that uses computers like a human - simular-ai/Agent-S
8 jun 2026 GitHubBekijk → - search
Tavily
The Tavily Python SDK allows for easy interaction with the Tavily API, offering the full range of our search, extract, crawl, map, and research functionalities directly from your Python programs. E...
8 jun 2026 GitHubBekijk → - code
Claude Code
An AI agent that assists software engineers by learning code conventions and best practices, effectively acting as a super-powered teammate.
7 jun 2026 Hacker NewsBekijk → - general
Survey on Evaluation of LLM-based Agents
LLM-based agents represent a paradigm shift in AI, enabling autonomous systems to plan, reason, and use tools while interacting with dynamic environments. This paper provides the first comprehensive survey of evaluation methods for these increasingly capable agents. We analyze the field of agent evaluation across five perspectives: (1) Core LLM capabilities needed for agentic workflows, like planning, and tool use; (2) Application-specific benchmarks such as web and SWE agents; (3) Evaluation of generalist agents; (4) Analysis of agent benchmarks' core dimensions; and (5) Evaluation frameworks and tools for agent developers. Our analysis reveals current trends, including a shift toward more realistic, challenging evaluations with continuously updated benchmarks. We also identify critical gaps that future research must address, particularly in assessing cost-efficiency, safety, and robustness, and in developing fine-grained, scalable evaluation methods.
7 jun 2026 WebBekijk →