Notes on arxiv.org đź§µ Massive claims coming out of this paper. A framework for Artificial superintelligence by having an LLM based agent loop that automatically optimizes and discovers new NN architectures. Let's see if it holds up.
So the main claim that this paper makes is that they have hit AI's Alpha Go moment (meaning designing completely new unorthodox architectures that no human would ever make). For context see en.wikipedia.org
en.wikipedia.org
Their system they call ASI-ARCH is focused on using LLMs as autonomous agents to do research and write up their own NN architectures and evaluate them for performance. They claim to have discovered 106 novel state of the art architectures using the system.
Their agent roles seem to be * Researcher - look at the most recent 50 attempts and propose changes * Engineer - train, evaluate * Analyst - mine data for insights * A fitness function - mimics a genetic algorithm to pick the best design * rinse and repeat Prompts are provided in the appendix Their training ground was on improving deltanet. They provide some tables that show the architectures developed by the model outperforming the baseline on a 20M and 340M implementation
There's some interesting research here, but calling this Alpha Go levels of game changing is kind of much. IMO they are really just presenting an agentic workflow, it's interesting but not exactly earth shattering It doesn't address what I think is the main elephant in the room. How does a system trained to reproduce stuff like humans (standard LLMs) create something that no human would even dream of - which is what AlphaGo did for Go? Essentially, I think they'd need to show that their process can break out of the constraints described in arxiv.org
arxiv.org