Has anyone actually done anything of use with autogpt? I tied it for 4 tasks and it would inevitably get stuck on each and produce absolutely nothing of value.
These were fairly simple like research popular topics and write articles on them etc.
It would do things like Google something, the result wasn’t relevant , try again , got an error from one of the pages and then seemingly started to do something completely incoherent related to the error message
It is likely we can do better than 1:1 human input to GPT output on current tech; but the human in the loop is doing a lot of work very easily that the LLM is very bad at, just like the LLM is doing a lot of work very easily that is otherwise laborious for the human. We can't just take the things that the LLM is bad at and humans do easily and expect to fix it with more LLM.
Right now we have:
Step 1: Human reasoning, tool use, input.
Step 2: LLM output.
Step 3: Human reasoning, tool use, input.
Step 4: LLM output.
&etc.
The observation that the input and output are both just text makes it possible to make "agents". But the "agent" movement trying to totally close the whole loop right away is way too early.
It's fine to lay the groundwork though, and the frameworks for it, like AutoGPT, can be used to just do a couple extra steps rather than close the whole loop.
Plugins and browsing can be seen as merging some of step 2 and 3. But then you still need the &etc iteration with the human closely in the loop.
Chain of thought prompting techniques are similarly an attempt to merge a little bit of the human's process of vetting the output by trying to get better output in individual iterations. Sometimes I make the LLM output multiple options and pick the best one with its reasoning; this is really just compressing multiple runs of the LLM and having it pick one, rather than me retrying if I get a bad output.
Anyway I think this is the right way to look at it; these are good tools for trying to compress iterations of human-in-the-loop. For some things maybe we'll eventually remove the human, but we shouldn't expect it right now. The twitter demonstrations of "it did the whole thing" are a trick; good for influences, but not realistic right now.
In my experience, AutoGPT is limited primarily by the poor state of its tools. For instance, browsing web pages often does not return relevant text that a human would pick out of the same page content. GPT-4 makes very good plans of what it should do, but the tools fail to give it what a human would receive.
For example, when asked to search for the top executives at company X, it rightly uses Google Search with the query “top executives at company X,” which returns a list of web pages such as the company’s About page. It then parses the About page but because of messed up page formatting, it returns nonsense data like the LinkedIn profile URL and some marketing material like a case study link, even though the executive profiles are right there.
The google search function is also limited. For comparison, SerpAPI masterfully scrapes Google Search using a proxy network and very intelligent parsing. In experiments using SerpAPI in combination with Microsoft’s guidance module, I got much farther than AutoGPT.
Fortunately, a lot of people are contributing to AutoGPT now and it is improving quickly. They are revamping the core right now and I expect it will work far better when they are done. With time, better tools will be made available to GPT-4 and progress should then be faster.
> The google search function is also limited. For comparison, SerpAPI masterfully scrapes Google Search using a proxy network and very intelligent parsing. In experiments using SerpAPI in combination with Microsoft’s guidance module, I got much farther than AutoGPT.
The ratio of 45 second “Twitter video demos” vs. examples of actual code/prompts/real world use cases you can replicate is quite striking. Dipping into related discords, I feel like I’m always missing something obvious because there is so much activity but what feels like to me so little replicable substance. I’m a terrible coder so I partially chalk it up to that but it definitely seems like it’s hitting the current boundaries of a parrot echoing itself into gibberish.
Can you help me understand autogpt? Is it just a recursive gpt where an initial prompt is given and has the ability of the output to be used to pipe to other gpt prompts. Am I missing something?
I tried doing more complex tasks using GPT4 and was initially optimistic about plugins but they have all been very disappointing.
For instance, a dream for me would be something like: "Find some rental property opportunities within 1 hour commute to New York City, that have a high rent to sale price and low taxes"
Broken down into steps it would be:
1. Find towns within 50 miles or so from within Manhattan. Take the top 100 or so by population
2. Find commute times for each one leaving at 9am monday and coming back at 6pm. Narrow down the cities to 1 hours. Unfortunately I didn't see any map plugins but maybe something like wolfram alpha can suffice or just google commute time for each town
3. Use zillow to pull typical rents and sale prices for each town. build a simple model (maybe wolfram) to model the rent and apply them to homes for sale including taxes. calculate median expected rent / median sale price
4. Remove towns that you don't have enough data on (not enough rentals or homes for sale) and return the top towns with a few examples of how much you can get
If I were building something like autogpt, I would start with an example like this and use this almost like an integration test. Theoretically all the pieces are there, but it just falls apart very quickly. I've heard these models can't yet do "planning" and i'm not sure what that means technically but I think this kind of problem requires planning so it might be a model limitation
This is exactly the promise and there’s a number of handwavey demos, and it feels like it should be easy enough to have something exactly like this as a “hello world,” but I haven’t seen any of the auto GPT-type that can reliably execute even a basic version of this. As others have mentioned, a little scaffolding custom for the project can work great, but having GPT build that scaffolding isn’t there, as far as I can see (I think a lot of people could benefit from a step-by-step of the parent’s use case as a proof of concept).
The search plugin is bad. I have done some exploratory work [1] on specializing search agents to understand the query and result syntax and use that to structure results but I started with an explicitly structured data source to prove the concept and then kinda wandered off.
It was my hypothesis that the variety of trash returned by things like serpapi need to be massaged into something consistent and potentially run through a result retrieval and fine tuning stage to be useful to a high level agent like autogpt, but didn't make it far enough to have anything working to show.
The agent loop is too frail, and too prone to lose focus, especially if you also want to combine it with some sort of persistent chat context. Hardcoding the control loop instead of letting the ai ficlgure it out was the only way tlive found to extract actual work out of gpt, and works,to an extent, with smaller models as well.
Were you using it with gpt3 or 4? Personally, I can’t get gpt4 api access despite being a PI at a well known research institution with a project that would be great publicity for OpenAI. My theory is that people are running it mostly with 3 and then saying it’s useless. It definitely is useless with 3.
Are you an OpenAI user who has a valid payment method attached to your account?
I've notice they approve GPT-4 as long as you have a valid payment method and a reasonable justification, which in my case one liner.
Anecdotally (and unsurprisingly), they seem to be prioritising those with "value-add" use cases in a variety of industries over individuals just wanting to play.
Yes. I think LangChain fills in a lot of the problems and it becomes more like programming. So instead of having it “reason” about the results and plans just code those in and sometimes call a LLM when it makes sense.
Right now it’s extremely chaotic since there’s no human correction in the process so the errors compound until you quickly reach incoherence
It would do things like Google something, the result wasn’t relevant , try again , got an error from one of the pages and then seemingly started to do something completely incoherent related to the error message