ChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientists

state California state Ohio AI Machine Learning ChatGPT

08.08.2023 - 22:39

Reading now: 697

cointelegraph.com:

Nearly two dozen researchers from Tsinghua University, Ohio State University and the University of California at Berkeley collaborated to create a method for measuring the capabilities of large language models (LLMs) as real-world agents.

LLMs such as OpenAI’s ChatGPT and Anthropic’s Claude have taken the technology world by storm over the past year, as cutting-edge “chatbots” have proven useful at a variety of tasks, including coding, cryptocurrency trading and text generation.

Related: OpenAI launches web crawler 'GPTBot' amid plans for next model: GPT-5

Typically, these models are benchmarked based on their ability to output text perceived as humanlike or by their scores on plain-language tests designed for humans. By comparison, far fewer papers have been published on the subject of LLM models as agents.

Artificial intelligence (AI) agents perform specific tasks, such as following a set of instructions within a specific environment. For example, researchers will often train an AI agent to navigate a complex digital environment as a method for studying the use of machine learning to develop autonomous robots safely.

Traditional machine learning agents like the one in the video above aren’t typically built as LLMs due to the prohibitive costs involved with training models such as ChatGPT and Claude. However, the largest LLMs have shown promise as agents.

The team from Tsinghua, Ohio State and UC Berkeley developed a tool called AgentBench to evaluate and measure LLM models’ capabilities as real-world agents, something the team claims is the first of its kind.

According to the researchers’ preprint paper, the main challenge in creating AgentBench was going beyond traditional AI learning environments — video games

Read more on cointelegraph.com

All news from cointelegraph.com

About this in other media

Hollywood studios offer new proposal for AI and data transparency to curb strike cointelegraph.com /2 years ago

Class Action Filed Against Fenwick & West, LLP, Former Law Firm of FTX, in Connection with Largest Financial Fraud in U.S. History blockchain.news /2 years ago

Crypto futures and ETFs are knocking at the door: Law Decoded, Aug. 13–20. cointelegraph.com /2 years ago

The website gocryptonft.com is an aggregator of news from open sources. The source is indicated at the beginning and at the end of the announcement. You can send a complaint on the news if you find it unreliable.

ChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientists

Related News