GPT-4o with scheduled tasks (jawbone) is available in beta

141 points5 monthschatgpt.com

imsotiredspacex • 5 months ago

This is the prompt describing the function call parameters:

When calling the automation, you need to provide three main parameters: 1. Title (title): A brief descriptive name for the automation. This helps identify it at a glance. For example, "Check for recent news headlines". 2. Prompt (prompt): The detailed instruction or request you want the automation to follow. For example: "Search for the top 10 headlines from multiple sources, ensuring they are published within the last 48 hours, and provide a summary of any recent Russian military strikes in the Lviv Oblast." 3. Schedule (schedule): This uses the iCalendar (iCal) VEVENT format to specify when the automation should run. For example, if you want it to run every day at 8:30 AM, you might provide:

BEGIN:VEVENT RRULE:FREQ=DAILY;BYHOUR=8;BYMINUTE=30;BYSECOND=0 END:VEVENT

Optionally, you can also include: • DTSTART (start time): If you have a specific starting point, you can include it. For example:

BEGIN:VEVENT DTSTART:20250115T083000 RRULE:FREQ=DAILY;BYHOUR=8;BYMINUTE=30;BYSECOND=0 END:VEVENT

In summary, the call typically includes: • title (string): A short name. • prompt (string): What you want the automation to do. • schedule (string): The iCal VEVENT defining when it should run.

notachatbot123 • 5 months ago

What is the source for that claim?

thenameless7741 • 5 months ago

https://gist.github.com/thenameless7741/a1957c2898d80ce99ebd...

notachatbot123 • 5 months ago

That's different text.

dmadisetti • 5 months ago

The beta is inconsistently showing (required a few refreshes to get something to show up), but my limited usage of it showed a plethora of issues:

- Assumed UTC instead of EST. Corrected it and it still continued to bork

- Added random time deltas to my asked times (+2, -10 min).

- Couple notifications didn't go off at all

- The one that did go off didn't provide a push notification.

---

On top of that, only usable without search mode. In search mode, it was totally confused and gave me a Forbes article.

Seems half baked to me.

Doing scheduled research behind the scenes or sending a push notification to my phone would be cool, but surprised they thought this was OK for a public beta.

gukov • 5 months ago

You'd think Open AI's dev velocity and quality would be off the charts since they live and breathe "AI." If a company building ChatGPT itself often delivers buggy features then it doesn't bode well for this whole 'AI will eat the world' notion.

practice9 • 5 months ago

Well none of the labs have good frontend or mobile engineers or even infra engineers

Anthropic is ahead in this because they keep their UIs simplistic so the failure modes are also simple (bad connection)

OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).

Find it hilarious and sad that o1-pro just times out thinking on very long or image-intense chats. Need to reload page multiple times after it fails to reply and maybe answer will appear (or not? Or in 5 minutes?). Kinda shows they’re not testing enough and “not eating their own food” and feels like chatgpt 3.5 ui before the redesign

lolinder • 5 months ago

> Anthropic is ahead in this because they keep their UIs simplistic ... OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).

What's funny is that OpenAI's Canvas was their attempt to copy Anthropic's Artifacts! So it's not like Anthropic is stagnant and OpenAI is at least shipping, Anthropic is shipping and OpenAI can't even copy them right.

jeffgreco • 5 months ago

It's a good point, Anthropic is being VERY choosy and winds up knocking it out of the park with stuff like Artifacts. Meanwhile their MacOS app is junk, but obviously not a priority.

cma • 5 months ago

> because they keep their UIs simplistic

How do I edit a sent message in the Claude Android app? It's so simplistic I can't find it.

siquick • 5 months ago

You can’t edit on iOS either

golergka • 5 months ago

So far, I've found AI to be a great force multiplier in green field, small projects. In a huge corporate codebase, it has the power of advanced refactoring (which doesn't touch more than a handful files at a time) and a CSS wizard.

cruffle_duffle • 5 months ago

According to all the magazines I've been reading, all that is required is to just prompt it with "please fix all of these issues" and give it a bulleted list with a single sentence describing each issue. I mean, it's AI powered and therefore much better than overpaid prima-donna engineers, so obviously it should "just work" and all the problems will get fixed. I'm sure most of the bugs were the result of humans meddling in the AI's brilliant output.

Right now, in fact, my understanding is OpenAI is using their current LLM's to write the next generation ones which will far surpass anything a developer can currently do. Obviously we'll need to keep management around to tell these things what to do, but the days of being a paid software engineer are numbered.

xarope • 5 months ago

I think you forgot the /s (sarcasm) in your post!

ineedasername • 5 months ago

When I have it do a search I have to tell it to just get all the info it can in the search but wait for the next request. The I explicitly tell it we’re done searching and to treat the next prompt as a new request but using the new info it found.

That’s the only way I get it to have a halfway decent brain after a web search. Something about that mode makes it more like a PR drone version of whatever I asked it to search, repeating things verbatim even when I ask for more specifics in follow-up.

emkee • 5 months ago

Can you give an example prompt for this approach?

imsotiredspacex • 5 months ago

i posted the system prompt part describing the function call; if you read it and adjust your prompt for creating the task it works way better.

potatoman22 • 5 months ago

I'd rather have buggy things now than perfect things in a year.

dmadisetti • 5 months ago

Doesn't need to be perfect- but using this would actively reduce productivity

sprobertson • 5 months ago

First impressions matter, if the experience is this bad you're probably waiting a year to come back anyway.

jahewson • 5 months ago

Worked out great for Sonos when their timers and alarms didn’t work.

broknbottle • 5 months ago

Found the PM

arthurcolle • 5 months ago

DateTime stuff is generally super annoying to debug. Can't fault them too badly. Adding a scheduler is a key enabling idea for a ton of use cases

sensanaty • 5 months ago

> Can't fault them too badly

The same company that touts their super hyper advanced AI tool that can do everyone's (except the C-level's, apparently) jobs to the world can't figure out how to make a functional cron job happen? And we're giving them a pass, despite the bajillions of dollars that M$ and VC is funneling their way?

Quite interesting they wouldn't just throw the "proven to be AGI cause it passes some IQ tests sometimes" tooling at it and be done with it.

arthurcolle • 5 months ago

it would explain the bugs if they used the AI to make the datetime implementation though

cbeach • 5 months ago

Agreed on date/time being a frustrating area of software development.

But wouldn't a company like OpenAI use a tick-based system in this architecture? i.e. there's an event emitter that ticks every second (or maybe minute), and consumers that operate based on these events in realtime? Obviously things get complicated due to the time consumed by inference models, but if OpenAI knows the task upfront it could make an allowance for the inference time?

If the logic is event driven and deterministic, it's easy to test and debug, right?

singron • 5 months ago

The original cron was programmed this way, but it has to examine every job every tick to check if it should run, which doesn't scale well. Instead, you predict when the next run for a job will be and insert that into an indexed schedule. Then each tick it checks the front of the schedule in ascending order of timestamps until the remaining jobs are in the future.

This is also a bad case in terms of queueing theory. Looking at Kingmans equation, the arrival variance is very high (a ton of jobs will run at 00:00 and much fewer at 00:01), and the service time also has pretty high variance. That combo will either require high queue delay variance, low utilization (i.e. over-provosioning), or a sophisticated auto-scaler that aggressively starts and stops instances to anticipate the schedule. Most of the time it's ok to let jobs queue since most use cases don't care if a daily or weekly job is 5 minutes late.

dmadisetti • 5 months ago

Yeah, they're not exactly a scrappy startup- I'd be surprised if they had 0 QA.

Makes me wonder if they internally have "press releases / Q" as an internal metric to keep up the hype.

airstrike • 5 months ago

Maybe that's the Q* we've been hearing rumors about

ttul • 5 months ago

Amazon had an insane number of people working on just the alarms feature in Alexa when they interviewed me for a position years ago. They had entire teams devoted to the tiniest edge case within the realm of scheduling things with Alexa. This is no doubt one of the biggest use cases in computing: getting your computer to tell you what to do at a given time.

qgin • 5 months ago

Recurring schedules across time zones is an unbelievably maddening thing to implement. At first glance it seems simple, but it gets very weird very quickly.

uncomplexity_ • 5 months ago

this.

some people cant even wrap gheir heads around it, taking hours and hours of discussions. still my favourite problem though.

wkat4242 • 5 months ago

Yeah summer time in different countries switching on different days and often in a different direction (other hemisphere). I used to work on such matters and those weeks were the toughest.

ethbr1 • 5 months ago

Developers when they first start working with time across timezones: "This is a technical problem."

Developers after more research: "Oh... this is a political problem."

echeese • 5 months ago

Considering my iPhone alarm still sometimes fails to go off (it just shows the alarm screen silently), I'd be inclined to believe you.

ineedasername • 5 months ago

Thanks for that— I though I was going crazy (well still could be I guess) or had some strange habit or gesture I didn’t realize was silencing the alarm somehow.

emptiestplace • 5 months ago

https://www.theverge.com/2025/1/9/24340238/apple-iphone-alar...

yakz • 5 months ago

Whenever I have to wake for something that I absolutely can’t miss, I set 2-3 extra reminders 5 minutes apart precisely because of this “silent alarm” bug. It’s only happened to me a couple of times but twice was enough to completely destroy my trust in the alarm. The first time I thought I just did something in my sleep to cause it, but the UI shows it as if the alarm worked. I’m lucky to have the privilege that if I oversleep an hour or so it’s no big deal, otherwise ye olde tabletop alarm clock would be back.

emptiestplace • 5 months ago

I love the questioning my sanity before I've completely opened my eyes part. It's like a jump start to my day.

pedroslopez • 5 months ago

Hah - I also just assumed that I was turning the alarm off in my sleep without noticing. I started doubting it and really wish there was a log of when you tapped snooze or stopped the alarm...

This is too much of a dev feature for apple to implement and there are probably third party apps that do this, but meh

paul7986 • 5 months ago

Open AI just needs to create & release their own phone with Microsoft's help! H.E.R. the movie phone.

Apple has not innovated in years and a GPT Phone where your lock screen is a FaceTime call like UI/UX with your AI Agent who does everything for you would give Apple a run for it's money! Pick up your phone & see your agent waiting to assist & it could be skinned to look like a deceased loved one (mom still guiding your through life).

To get things done it would interface with other AI Agents of businesses, companies, your doctor, friends & family to schedule things & used as a knowledgebase.

Maybe this is their step towards creating said agents?

elicksaur • 5 months ago

> your lock screen is a FaceTime call like UI/UX with your AI Agent who does everything for you

I just… don’t want this. I don’t think anyone I know wants this.

paul7986 • 5 months ago

Cool, thanks for the comment!

I use chatGPT now for almost everything and when in the car have full back n forth conversations to get things done (or as a knowledge-base) there too. Recently i was discussing with it how do i properly get rid (junk) my old car in Pennsylvania. It provided all the steps and gave me local businesses. Though it didn't call them or interface with them to find their available times/costs, tell me such details & have me instruct it to schedule my preferred choice. Which i wish it did and prompted thoughts how it could do so, as technology that gets adopted mostly is tech that has simplified our lives.

I think my concept above is similar to what was seen in the movie H.E.R. (Joaquin Phoenix & Scarlett Johansson starred) so it's not that crazy or odd. Throwing in skinning it to be whoever like a deceased loved one, might to probably is.

elicksaur • 5 months ago

I wouldn’t want my grandmother “skinned” as a bland amalgamation of all of the internet’s text. That’s fucking horrifying.

H.E.R. is a deeply unsettling movie.

android521 • 5 months ago

And gmail schedule delivery just won't work if you want to email yourself a month later.

cbeach • 5 months ago

I'm sure it's brilliant, but I have no idea what it's capable of. What will it do? Send me a push notification? Have an answer waiting for me when I come back to it in a while?

I switched over to the "GPT4o with scheduled tasks" model and there were no UI hints as to how I might use the feature. So I asked it "what you can you follow up later on and how?"

It replied "Could you clarify what specifically you’d like me to follow up on later?"

This is a truly awful way to launch a new product.

benaduggan • 5 months ago

After asking it to schedule something, it prompted me to allow or block notifications, so sounds like this is just chatGPT scheduling push notifications? We'll see!

jerpint • 5 months ago

So basically canibalizing Siri ?

1propionyl • 5 months ago

Siri has access to a wealth of private existing and future on-device APIs to fuel context sensitive responses to queries on vendor locked devices used all day long. (Which Apple has apparently decided to just not use yet.)

OpenAI doesn't, they just have a ton of funding and (up to recently) a good mass media story, and the best natural language responses.

The moat around Siri is much deeper, and I don't really see any evidence OpenAI has any special sauce that can't be reproduced by others.

My prediction is that OpenAI's reliance on AI doomerism to generate a regulatory moat falters as they become unable to produce step changes in new models, while Apple's efforts despite being halting and incomplete become ubiquitous thanks to market share and access to on device context.

I wouldn't (and don't) put my money in OpenAI anymore. I don't see a future for them beyond being the first mover in an "LLM as a service" space in which they have no moat. On top of that they've managed to absorb the worst of criticism as a sort of LLM lightning rod. Worst of all, it may turn out that off-device isn't even really necessary for most consumer applications in which case they'll start to have to make most of their money on corporate contracts.

Maybe something will change, but right now OpenAI is looking like a bubble company with no guarantee to its dominant position. Because it is what it is: simply the largest pooling of money to try to corner this market. What else do they have?

secfirstmd • 5 months ago

1propionyl • 5 months ago

The counter argument is that Google doesn't maintain any of those services beyond the bare minimum for customer facing interactions, and exchanges between their services are even more poorly supported if they even exist at all.

Remember Google Sheets (already the Tonka Toys of spreadsheets) adding named tables to Sheets?

You can't use them in any of the AppsScript APIs. You have to fall back to manual searching for strings and index arithmetic.

Google Drive still barely supports anything like moving an entire folder to another folder.

They have failed at least a half dozen times now to deliver a functional chat/VOIP app after they already had one in Google Talk.

They regularly sunset products that actually have devoted and zealous user bases for indiscernible reasons.

Android is just chugging along doing nothing interesting and still carrying the same baggage it did before. It's a painful platform to develop for and the Jetpack Compose/Kotlin shift hasn't ameliorated much of that at all.

Their search offering is now worse than Bing, worse than Kagi, and worse than some of the LLM based tools that use their index. It's increasingly common that you can't even find a single link that you know an entire verbatim sentence from via Google search for inexplicable reasons. Exact keyword or phrase searches no longer work. You can't require a keyword in results.

I don't trust Google to deliver a single functional software product at this point, let alone a compelling integration of many different ones developed in different siloes.

About the only thing going for them is how many people still have Gmail accounts from that initial invite only and generous limits campaign... 20 years ago?

Google is not a healthy company. I don't invest in them anymore, and barring some major change I probably won't again. It's a dying blue chip which is a terrible position to have your money in.

P.S. oh, and Gemini is awful by comparison in both price and quality to competitors. It isn't saving them. It's just a "me too".

P.P.S. I'm personally just waiting for their next "game changing" announcement bound to fail to get in at the top floor on shorting what stock I have. It's one of those cases where finance has rose coloured glasses based on brand name that anyone who's used Google products for years would be thoroughly disabused of.

jerpint • 5 months ago

There are so many opportunities for google to improve their services.

For example, I found myself asking Claude about places to see in a city I’m visiting while switching back and forth to gmaps. This would have been a much better experience integrated directly with gmaps knowledge graph

siva7 • 5 months ago

Yep, this is a truly bad feature launch. I have no clue what this model does. Did they somehow lose their competent product people?

cbeach • 5 months ago

Ah, I've just stumbled on some hints after clicking around.. click on your avatar image (top right) and then click "Tasks"

Then there are some UI hints.

"Remind me of your mom's birthday on [X] date"

Wow, really maximising that $10bn GPU investment!

danpalmer • 5 months ago

Glad to see that the thriving 2010 market of TODO list apps will see a resurgence in the AI era.

delichon • 5 months ago

A todo app that you can write and modify by editing a natural language prompt, and that can parse inputs from the whole web with flexibility and nuance, is not a small thing.

danpalmer • 5 months ago

That also seems to not get timezones right, has a confusing search function...?

More seriously, todo apps are about productivity, not just about becoming a huge bucket of tasks. I've always found that the productivity comes from getting context out of my head and scheduled for the right time. This release appears to be more about that big bag of tasks and less about productivity. I'm all for AI in products, I think it can be powerful, but I've not had a use-case for it in my todo app.

frontalier • 5 months ago

prettyblocks • 5 months ago

It could get really interesting if they allow webhooks and structured output

sandspar • 5 months ago

Maybe it's effective at hitting a goal which you do not see.

PittleyDunkin • 5 months ago

Where are the release notes?

Edit: I suppose they'll be here at some point: https://help.openai.com/en/articles/9624314-model-release-no...

These seem like extremely shitty release notes. I have no clue why anybody pays for this model.

ben_w • 5 months ago

You might want this? It's more technical than the one you linked to:

https://platform.openai.com/docs/changelog

sunaookami • 5 months ago

Does this show "Invalid DateTime" only for me? Kinda ironic! https://i.imgur.com/ZAcwhxT.png

ben_w • 5 months ago

Not for me, this time; they're "December, 2024", "Dec 18", "Dec 17" respectively.

Recently someone shared a link to one of their chat sessions here, and it reliably 404'd for me but not others.

speedgoose • 5 months ago

It has consistently been the best model for the two last years and only Gemini is perhaps slightly better now.

PittleyDunkin • 5 months ago

Right, but free models you run on your local computer are just as good for 99% of use cases and don't cost an arm and a leg.

throwup238 • 5 months ago

The docs for the beta seem to already be up: https://help.openai.com/en/articles/10291617-scheduled-tasks...

TheJCDenton • 5 months ago

Nothing yet

sky2224 • 5 months ago

Pretty useless so far. I'm not sure what the intended application of this is so far, but I wanted it to schedule some work for me.

It only scheduled the first thing and that was after having to be specific by saying "7:30pm-11pm". I wanted to say "from now to 11pm" but it did couldn't process "now"

sandspar • 5 months ago

If you find a tool useless then it's likely that you lack imagination.

sky2224 • 5 months ago

Okay, let's say I do lack imagination: please enlighten me after you've had a chance to actually use this half-baked feature.

mulmboy • 5 months ago

https://www.theverge.com/2025/1/14/24343528/openai-chatgpt-r...

phgn • 5 months ago

What am I supposed to see at the link?

swifthesitation • 5 months ago

You click the drop down menu for model selection and choose 4o with scheduled tasks

elyase • 5 months ago

There is more information in these twitter threads:

https://x.com/karinanguyen_/status/1879270529066262733 https://x.com/OpenAI/status/1879267276291203329

encoderer • 5 months ago

Founder of Cronitor.io here — if you’re a developer considering using this, would it be valuable for you to be able to report in to Cronitor when it runs so we can keep an eye and alert you if your tasks are late, skipped or accidentally deleted?

We support just about every other job platform but I’d love to hear from potential users before I hack something together.

simple10 • 5 months ago

The UI is different in the desktop app for macOS. The ability to edit the schedule task is only available in the web UI for me.

I got the best results by not enabling Search the Web when I was trying to create tasks. It confuses the model. But scheduled tasks can successfully search the web.

It's flaky, but looks promising!

throwaway314155 • 5 months ago

Less relevant but why isn't canvas available in the desktop app? I thought they had feature parity but it seems not.

nycdatasci • 5 months ago

Lots of complaints mentioned here. If you have a legitimate need for a product like Tasks that is more fully baked, I’d encourage you to check out lindy.ai (no affiliation). I’ve been using it to send periodic email updates on specific topics and it works flawlessly.

luke-stanley • 5 months ago

Whoops! Might have been built wrong? I'm seeing a source map error: Source map error: Error: request failed with status 404 Stack in the worker:networkRequest@resource://devtools/client/shared/source-map-loader/utils/network-request.js:43:9

Resource URL: https://cdn.oaistatic.com/assets/jbl0aowda306m4s1.js Source Map URL: jbl0aowda306m4s1.js.map

Also I am getting`Unable to display this message due to an error.`a lot.

kgeist • 5 months ago

So I opened "gpt4o with scheduled tasks" in the mobile app and there was no hint in the UI how to use it. I asked, "what's a scheduled task" and it answered with a generic response about scheduled tasks in general. Then I tried my luck and said, "remind me to pet my cat in 5 minutes," and it seemed to work. I then closed the mobile app, but no push notification came after 5 minutes, however I got an email, which I didn't expect (I expected push notifications). Clearly the feature needs more polish.

abrichr • 5 months ago

It seems there are some issues with the rollout.

Me:

> Give me positive feedback every hour

ChatGPT:

> Provide positive feedback

> Next run Jan 15, 2025

> Got it! I’ll send you positive feedback every hour.

An hour later, I received the following email:

```

Your scheduled task couldn't be completed

ChatGPT tried to complete Provide positive feedback multiple times, but it encountered an error and wasn't able to send. It will try again the next time this task is scheduled.

Open chat If you have any questions, please contact through the help center.

All the best, ChatGPT

```

halamadrid • 5 months ago

This is interesting, although I am a little confused about the purpose of ChatGPT with this feature.

We already have many implementations where at a cron interval one could call the GPT APIs for stuff. And its nice to monitor it and see how things are working etc.

So I am curious whats the use case to embed a schedule inside the ChatGPT infrastructure. Seems like a little off its true purpose?

runeblaze • 5 months ago

I think we all agree this feature seems broadly of use, and given that presumably a professional full-time 1P team was behind this feature I am gonna use this product over the other implementations

sandspar • 5 months ago

It's for normies.

TheJCDenton • 5 months ago

There is an editable tasks list and in the settings menu you can choose to receive notifications via push and/or email.

sandspar • 5 months ago

It's a tech demo to get normies used to the idea of agents. HackerNews "20 years in industry" guys are flabbergasted because it defaults to UTC so is therefore totally useless, clearly. Perhaps you live in a bubble?

serjester • 5 months ago

This seems like such a strange product decision - why clutter the interface with such a niche use case? I’m trying to imagine OpenAI’s reasoning - a new angle on long term memory maybe? Or a potential interface for their agents?

sandspar • 5 months ago

It's to warm normal people up to the fact that we have agents now.

reversethread • 5 months ago

Does the world need another reminder/todo app?

Many existing apps (like Todoist) have already had LLM integrations for a while now, and have more features like calendars and syncing.

Or do I completely not understand what this product is trying to be?

bogdan • 5 months ago

Why not? I already pay for chatgpt but I don't pay for todoist so that doesn't help me.

frontalier • 5 months ago

how does it handle timezones?

i saw no mention of them on the help article, or the ui

if i ask for a daily early morning news summary will it show up in the middle of the night or around lunch time? will it get updated when i travel? seems interesting if what you're looking for is a reminder that is not time relevant, just a thing that should happen at some point with a time precision of about 1 day.

https://help.openai.com/en/articles/10291617-scheduled-tasks...

zb3 • 5 months ago

The link doesn't work, presumably because I won't pay OpenAI which stole my API credits by making them have an "expiration date".

throwaway314155 • 5 months ago

This is shaping up to be as bad as the Sora release.

krishadi • 5 months ago

For those unable to find this, you can find it as a new model in the model drop-down menu.

rfdearborn • 5 months ago

These are best understood as scheduled tasks for the AI instead of tasks for the user.

krishadi • 5 months ago

The biggest outcome here is that now the app has memory.

picografix • 5 months ago

why are they trying to be a model provider as well as service provider

rlt • 5 months ago

Why wouldn’t they? Most big tech cos offer products at multiple layers of the stack.

golergka • 5 months ago

Couldn't you do the same with giving an LLM access to your shell and a cron command?

onemoresoop • 5 months ago

Would you give an LLM that priviledge?

reverseblade2 • 5 months ago

lol. I was a going to build this. Even purchased the domain alert.now I even have news based version active implementation at alarms.global if you install it to your phone as PWA you get push notifications when something important happens in your region or can notify you before public holidays

I even have an automated x account @alarmsglobal

geepytee • 5 months ago

Imagine being an engineer on the Siri team, must be so demoralizing.

sagarpatil • 5 months ago

A glorified reminder? Really?

mempko • 5 months ago

Sorry, I simply cannot use OpenAI because it's leadership is kissing the ring of Trump.

ablation • 5 months ago

Friend, I've got some news about the leadership of the majority of tech services you will use over the next 4-8 years...

ldjkfkdsjnv • 5 months ago

This is going to eat software, and is the beginning of agents. The orchestrator of these tasks will come, and OpenAI will turn into a general purpose compute system, the endgame of workflow software. Soon there will be a database, and your prompts will be able to directly read and write to an openai hosted postgres instance. And your CRUD app will begin to disappear. Programming will feel pointless

rglover • 5 months ago

Possibly, but that's going to require 100% consistent, accurate outputs (tricky as that's not the nature of LLMs).

Otherwise, you'll have a lot of systems dependent on these orchestrators creating hard-to-debug mistakes up and down the pipeline. With software, you can reach a state where it does what you tell it to without having to worry if some model adjustment or API change is going to break the output.

If they solve that, then yes. Otherwise, what I personally expect is a lot of businesses rushing into implementing "agents" only to backpedal later when they start to have negative material effects on bottom lines.

ldjkfkdsjnv • 5 months ago

Its inevitable. You can argue about what's possible right now, but I'm not looking at it from that angle. I think these issues will be solved with time

ryan93 • 5 months ago

They are using infinity compute and can’t do simple notifications. How will changing the architecture slightly or ingesting more data change that?

rglover • 5 months ago

That belief is at odds with the mechanics of how LLMs work. It's not a question of more effort/investment/compute/whatever, it's just a reality of how the underlying systems work (non-deterministic). If you can find a way to make the context window on the scale of the human brain, you may be able to mostly mitigate this.

People want us to be at "Her" levels of AI, but we're at a far earlier stage. We can fake certain aspects of that (using TTS), but blindly trusting an AI to run everything is going to be a big mistake in the short-term. And in order for the inevitability of what you describe to take place, the predecessor(s) to that have to work in a way that doesn't scare people and businesses away.

The plowing of money and hype into the current forms of AI (not to mention the gaslighting about their ability) makes me think the real inevitability is a meltdown in the next 5-10 years which leads to AI-hesitancy on a mass scale.

ldjkfkdsjnv • 5 months ago

ben_w • 5 months ago

potatoman22 • 5 months ago

Why? Past progress =/= equal rate of future progress.

worldsayshi • 5 months ago

Sure but do they have a moat here? Anyone that can connect to an LLM could make that app.

zb3 • 5 months ago

Yes, they have the name "ChatGPT". For non-technical people this appears to be the most important thing.

nozzlegear • 5 months ago

Is it a household name? Anecdotally, only two of my five millennial/gen-z siblings use an AI app at all, and one of them calls her's "Gary" instead of ChatGPT. I'd be interested in seeing some actual data showing how much ChatGPT is an actual household name versus one that us technical people assume is a household name due to its ubiquity in our space.

ben_w • 5 months ago

> Is it a household name?

I think it is, yes.

It was interviewed under that name on one of the UK's main news broadcasts almost immediately after it came out. Few hundred million users. Anecdotes about teachers whose students use it to cheat.

But who knows. I was surprising people about the existence of Wikipedia as late as 2004, and Google Translate's augmented reality mode some time around the start of the pandemic.

ldjkfkdsjnv • 5 months ago

Does AWS have a moat on cloud computing?

scarface_74 • 5 months ago

Yes, it would take 10s of billions of dollars to recreate the infrastructure as far as servers and AWS has its own pipelines running under the oceans.

Then you have to recreate all of the services on top of the AWS.

Then you have to deal with regulations and certifications.

Then you have to convince decision makers to go against their own interests. “No one ever got fired for Amazon”.

Then you have to convince corporations to spend money to migrate.

worldsayshi • 5 months ago

Yes that requires huge infrastructure investments. Creating an LLM requires huge investments. Running an LLM requires medium to big investments but using one remotely require very little investment.

daveguy • 5 months ago

This significantly overestimates the reliability of LLMs -- both their output integrity and their ability to understand context.

throwaway314155 • 5 months ago

Bit of advice: you might want to actually use an offering before claiming it is revolutionary.

ldjkfkdsjnv • 5 months ago

I've got 15 years of engineering experience, worked on some of the largest distributed systems at FAANG. Its coming

scarface_74 • 5 months ago

> worked on some of the largest distributed systems at FAANG.

As have 10s of thousands of other people who could invert a btree on the whiteboard….

throwaway314155 • 5 months ago

Oh wow good for you! Didn't realize you were a prodigy or that this was a contest. I take it all back. /s

Maybe try some humility. You're not helping yourself with the bragging about frankly underwhelming and common (here) experience.