Anthropic, the artificial intelligence (AI) research organization responsible for the Claude large language model (LLM), recently published landmark research into how and why AI chatbots choose to generate the outputs they do.
At the heart of the team’s research lies the question of whether LLM systems such as Claude, OpenAI’s ChatGPT and Google’s Bard rely on “memorization” to generate outputs or if there’s a deeper relationship between training data, fine-tuning and what eventually gets outputted.
On the other hand, individual influence queries show distinct influence patterns. The bottom and top layers seem to focus on fine-grained wording while middle layers reflect higher-level semantic information. (Here, rows correspond to layers and columns correspond to sequences.) pic.twitter.com/G9mfZfXjJT
According to a recent blog post from Anthropic, scientists simply don’t know why AI models generate the outputs they do.
One of the examples provided by Anthropic involves an AI model that, when given a prompt explaining that it will be permanently shut down, refuses to consent to the termination.
When an LLM generates code, begs for its life or outputs information that is demonstrably false, is it “simply regurgitating (or splicing together) passages from the training set,” ask the researchers. “Or is it combining its stored knowledge in creative ways and building on a detailed world model?”
The answer to those questions lies at the heart of predicting the future capabilities of larger models and, on the outside chance that there’s more going on underneath the hood than even the developers themselves could predict, could be crucial to identifying greater risks as the field moves forward:
Unfortunately, AI models such as Claude
Read more on cointelegraph.com