Has anyone actually done anything of use with autogpt? I tied it for 4 tasks and...

furyofantares · on June 4, 2023

It is likely we can do better than 1:1 human input to GPT output on current tech; but the human in the loop is doing a lot of work very easily that the LLM is very bad at, just like the LLM is doing a lot of work very easily that is otherwise laborious for the human. We can't just take the things that the LLM is bad at and humans do easily and expect to fix it with more LLM.

Right now we have:

Step 1: Human reasoning, tool use, input.

Step 2: LLM output.

Step 3: Human reasoning, tool use, input.

Step 4: LLM output.

&etc.

The observation that the input and output are both just text makes it possible to make "agents". But the "agent" movement trying to totally close the whole loop right away is way too early.

It's fine to lay the groundwork though, and the frameworks for it, like AutoGPT, can be used to just do a couple extra steps rather than close the whole loop.

Plugins and browsing can be seen as merging some of step 2 and 3. But then you still need the &etc iteration with the human closely in the loop.

Chain of thought prompting techniques are similarly an attempt to merge a little bit of the human's process of vetting the output by trying to get better output in individual iterations. Sometimes I make the LLM output multiple options and pick the best one with its reasoning; this is really just compressing multiple runs of the LLM and having it pick one, rather than me retrying if I get a bad output.

Anyway I think this is the right way to look at it; these are good tools for trying to compress iterations of human-in-the-loop. For some things maybe we'll eventually remove the human, but we shouldn't expect it right now. The twitter demonstrations of "it did the whole thing" are a trick; good for influences, but not realistic right now.

coffeebeqn · on June 4, 2023

Right - programming is how you get computers to do things. AI isn’t magic

dinvlad · on June 4, 2023

Very well put, thanks for laying it out so clearly.

ttul · on June 4, 2023

In my experience, AutoGPT is limited primarily by the poor state of its tools. For instance, browsing web pages often does not return relevant text that a human would pick out of the same page content. GPT-4 makes very good plans of what it should do, but the tools fail to give it what a human would receive.

For example, when asked to search for the top executives at company X, it rightly uses Google Search with the query “top executives at company X,” which returns a list of web pages such as the company’s About page. It then parses the About page but because of messed up page formatting, it returns nonsense data like the LinkedIn profile URL and some marketing material like a case study link, even though the executive profiles are right there.

The google search function is also limited. For comparison, SerpAPI masterfully scrapes Google Search using a proxy network and very intelligent parsing. In experiments using SerpAPI in combination with Microsoft’s guidance module, I got much farther than AutoGPT.

Fortunately, a lot of people are contributing to AutoGPT now and it is improving quickly. They are revamping the core right now and I expect it will work far better when they are done. With time, better tools will be made available to GPT-4 and progress should then be faster.

ilyazub · on June 15, 2023

> The google search function is also limited. For comparison, SerpAPI masterfully scrapes Google Search using a proxy network and very intelligent parsing. In experiments using SerpAPI in combination with Microsoft’s guidance module, I got much farther than AutoGPT.

Thanks for your kind words. We are working on SerpApi integration for Auto-GPT: https://github.com/serpapi/public-roadmap/issues/905

behnamoh · on June 4, 2023

> In experiments using SerpAPI in combination with Microsoft’s guidance module, I got much farther than AutoGPT.

would love to see how you implemented this with guidance. Did you use GPT4?

morisy · on June 4, 2023

The ratio of 45 second “Twitter video demos” vs. examples of actual code/prompts/real world use cases you can replicate is quite striking. Dipping into related discords, I feel like I’m always missing something obvious because there is so much activity but what feels like to me so little replicable substance. I’m a terrible coder so I partially chalk it up to that but it definitely seems like it’s hitting the current boundaries of a parrot echoing itself into gibberish.

bko · on June 4, 2023

Can you help me understand autogpt? Is it just a recursive gpt where an initial prompt is given and has the ability of the output to be used to pipe to other gpt prompts. Am I missing something?

I tried doing more complex tasks using GPT4 and was initially optimistic about plugins but they have all been very disappointing.

For instance, a dream for me would be something like: "Find some rental property opportunities within 1 hour commute to New York City, that have a high rent to sale price and low taxes"

Broken down into steps it would be:

1. Find towns within 50 miles or so from within Manhattan. Take the top 100 or so by population

2. Find commute times for each one leaving at 9am monday and coming back at 6pm. Narrow down the cities to 1 hours. Unfortunately I didn't see any map plugins but maybe something like wolfram alpha can suffice or just google commute time for each town

3. Use zillow to pull typical rents and sale prices for each town. build a simple model (maybe wolfram) to model the rent and apply them to homes for sale including taxes. calculate median expected rent / median sale price

4. Remove towns that you don't have enough data on (not enough rentals or homes for sale) and return the top towns with a few examples of how much you can get

If I were building something like autogpt, I would start with an example like this and use this almost like an integration test. Theoretically all the pieces are there, but it just falls apart very quickly. I've heard these models can't yet do "planning" and i'm not sure what that means technically but I think this kind of problem requires planning so it might be a model limitation

morisy · on June 4, 2023

This is exactly the promise and there’s a number of handwavey demos, and it feels like it should be easy enough to have something exactly like this as a “hello world,” but I haven’t seen any of the auto GPT-type that can reliably execute even a basic version of this. As others have mentioned, a little scaffolding custom for the project can work great, but having GPT build that scaffolding isn’t there, as far as I can see (I think a lot of people could benefit from a step-by-step of the parent’s use case as a proof of concept).

mikeravkine · on June 4, 2023

The search plugin is bad. I have done some exploratory work [1] on specializing search agents to understand the query and result syntax and use that to structure results but I started with an explicitly structured data source to prove the concept and then kinda wandered off.

It was my hypothesis that the variety of trash returned by things like serpapi need to be massaged into something consistent and potentially run through a result retrieval and fine tuning stage to be useful to a high level agent like autogpt, but didn't make it far enough to have anything working to show.

[1] https://bitbucket.org/mike-ravkine/sara/src/master/

avereveard · on June 4, 2023

The agent loop is too frail, and too prone to lose focus, especially if you also want to combine it with some sort of persistent chat context. Hardcoding the control loop instead of letting the ai ficlgure it out was the only way tlive found to extract actual work out of gpt, and works,to an extent, with smaller models as well.

rohankshir · on June 4, 2023

Yeah I played around with it for a few days to improve my python code, and narrowed the command space enough for it to be helpful 50% of the time

here's the video: https://www.loom.com/share/5e83475be2464778950f7df7e209ac2d

_yb2s · on June 4, 2023

Were you using it with gpt3 or 4? Personally, I can’t get gpt4 api access despite being a PI at a well known research institution with a project that would be great publicity for OpenAI. My theory is that people are running it mostly with 3 and then saying it’s useless. It definitely is useless with 3.

gersh · on June 4, 2023

I've used GPT4, and never got it really do anything useful.

tmikaeld · on June 4, 2023

Same here, tried both autoGPT and autoGPT.js with GPT-4, kept failing at even the simplest of tasks.

tornato7 · on June 4, 2023

Same experience here, I really wanted it to work but it often got stuck in errors or infinite loops. Hoping it'll improve in the next few months.

dinvlad · on June 4, 2023

Same here with GPT-4 via ChatGPT Plus

supriyo-biswas · on June 4, 2023

Are you an OpenAI user who has a valid payment method attached to your account? I've notice they approve GPT-4 as long as you have a valid payment method and a reasonable justification, which in my case one liner.

blowski · on June 4, 2023

There's a waiting list to use the GPT-4 API. https://openai.com/waitlist/gpt-4-api

Anecdotally (and unsurprisingly), they seem to be prioritising those with "value-add" use cases in a variety of industries over individuals just wanting to play.

dinvlad · on June 4, 2023

A waiting list that gave preferential treatment to YC companies, that is. This is according to a recent lawsuit that was filed against OpenAI.

coffeebeqn · on June 4, 2023

Yes. I think LangChain fills in a lot of the problems and it becomes more like programming. So instead of having it “reason” about the results and plans just code those in and sometimes call a LLM when it makes sense.

Right now it’s extremely chaotic since there’s no human correction in the process so the errors compound until you quickly reach incoherence

_yb2s · on June 4, 2023

Yes I am

phillipcarter · on June 4, 2023

That about matches my experience. It's a neat project but that's about it.

coffeebeqn · on June 4, 2023

It’s a neat version 0.0001 of agentic AI. I’m sure one day it’ll be useful but not for a few years at least

kmod · on June 4, 2023

I think it's safe to say that if people were getting value out of this then we would be hearing about it a lot