Back

Gemini "duck" demo was not done in realtime or with voice

1039 points7 monthstwitter.com
dvsfish7 months ago

I did this at university. It was our first comp sci class ever, we were given raspberry pi's. We had no coding experience or guidance, and were asked to create "something". All we had to work with was information on how to communicate with the pi using putty. Oddly, this assignment didn't require us to submit code, but simply demonstrate it working.

My group (3 of us) bought a moisture sensor to plug into the pi, and had the idea to make a "flood detection system" that would be housed under a bridge, and would send an email to relevant people when the bridge home from work is about to flood.

So for our demonstration, we had a guy in the back of the class with gmail open ready to send an email saying some variation of "flood warning". Our script was literally just printing lines with wait statements in between. Running the script, it prints to the screen "awaiting moisture", and after 3 seconds it will print "moisture detected". In that 3 seconds I dip the sensor into the glass of water. Then the script would wait a few more seconds before printing "sending email to xxx@yyy.com". We then opened up our email, our mate at the back of the room hit send, and an email appeared saying flood warning, and we would get full marks.

10u1527 months ago

Related, I work with industrial control systems. We’d call this “smoke and mirrors”. Sometimes the client would insist on seeing a small portion of a large project working well before it was ready. They’d misunderstand that 90% of the bulk of the work is not visible to the user but they’d want to see a finished state.

We’d set up a dummy HMI and have someone pressing buttons on it for the demo, and someone in the next room manually driving outputs and inputs to make it seem like it was working. Very common.

JohnFen7 months ago

I, too, work with industrial control systems. If any of us did that sort of thing, we'd be fired instantly -- and rightfully so.

nullc7 months ago

There would be no problem if you told the client that you were faking the back end behavior and if the client's motivation is that they wanted to see the workflow to make sure that there wasn't a misunderstanding on what you were supposed to be implementing, then a mocked backend would be perfectly fine for their purposes.

JohnFen7 months ago

Absolutely true. The comment, however, very, very strongly implied that the client wasn't made aware that the demo was faked. If the client was made aware, then why would they have someone hiding in the next room to "make it seem like it was working"?

im3w1l7 months ago

To me it sounds like a useful thing, for communicating your vision and getting early feedback on whether you are building the right thing. Like uh, a live action blueprint. Let's say the client is like "Email???? That's no good. It needs to be a text message". Now you have saved yourself the trouble of implementing email.

godelski7 months ago

> To me it sounds like a useful thing

To me it sounds like lying...

Context matters a ton though. Are you presenting the demo as if the events are being automatically triggered (as in the OP) or are you presenting as this is your plan? Explicitly. If it is implicit, it's deceptive. If you explicitly do not say what parts are faked, it is lying. Of course in a magic show this is totally okay because you're going to the show with the explicit intention to be lied to, but I'm not convinced the same is true for business but I'm sure someone could make a compelling argument.

turquoisevar7 months ago

There have literally been people sued over this because they used it to get funding; the most extreme example, where they kept doing it well beyond demo purposes, is Theranos.

And yet, you’ll still have people here acting like it’s totally fine.

As you said, it’s one thing to demonstrate a prototype, a “this is how we intend for it to work.” It’s a whole other thing to present it as the real deal.

+1
jballer7 months ago
lloeki7 months ago

Definitely a useful thing to do to validate UI/UX/fitness.

Relying on blackbox components first mocked then later swapped (or kept for testing) with the real implementation is also a thing, especially when dealing with hardware. It's a bit hard to put moisture sensors on CI!

That said...

I would have recommended at least being open about it being a mockup, but even when doing so I've had customers telling me "so why can't you release this tomorrow since it's basically done!? why is there still two months worth of work, you're trying to rip us off!!!"

gattilorenz7 months ago

It definitely is a thing, we literally teach this to CS students as part of the design process in the UX design course.

It’s called prototyping, in this case it would be a hi-fi prototype, and it lets you communicate ideas, test the implementation and try out alternatives before committing to your final product.

Lo-fi prototyping usually precedes it and is done on pen and paper, Figma, or similar very basic approaches.

Google (unsurprisingly) uses it and has instructional videos about it: https://youtu.be/lusOgox4xMI

magicalhippo7 months ago

I've done similar things by manually inserting and updating rows in the database etc to "demo" some process.

Like you say it can be useful as a way to uncover overall process or UX issues before all the internals are coded.

I'm quite open about what I'm doing though, as most of our clients are reasonable.

WWLink7 months ago

the phrase I've heard is "dog and pony show" lol

dr_dshiv7 months ago

Or “Wizard of Oz” — as a standard HCI practice for understanding human experience

chasd007 months ago

The industry term is “art of the possible”.

lifthrasiir7 months ago

I did this as well. I worked on a localized navigation system back when I was in a school, and unfortunately we broke all available GPS receivers in our hands over the course---that particular model of RS-232 GPS modules was really fragile. As a result we couldn't actually demonstrate a live navigation (and it was incomplete anyway). We proceeded to finish the GUI nevertheless, and then pretended that this is what you see during the navigation, but never actually ran the navigation code. It was an extracurricular activity and didn't affect GPA or anything, for the curious, but I remain kinda uneasy about it.

1vuio0pswjnm77 months ago

Learning those fraud skills needed later in the so-called "tech" industry.

gardenhedge7 months ago

Is this cheating? It sounds like cheating and reflects quite poorly on you.

thewakalix7 months ago

> It was our first comp sci class ever, we were given raspberry pi's. We had no coding experience or guidance, and were asked to create "something".

Garbage in, garbage out.

addicted7 months ago

Wow this is such an awful excuse.

Here’s a whole list of projects intended for kids.

https://all3dp.com/2/best-raspberry-pi-projects-for-kids/

It includes building out a whole weather station which includes a humidity sensor as one of the many things it can do.

+2
unionpivo7 months ago
latency-guy27 months ago

Creativity is a good thing, sad to see trust abused this way.

klntsky7 months ago

More like cheaters in, cheaters out.

nullc7 months ago

It's plausible to me that they weren't provided with what they needed precisely because pervasive cheating allowed their predecessor classmates to complete the assignments.

lizard7 months ago

This can depend a lot on the context, which we don't have a lot of.

Looking at this a different way, they gave first-year students, likely with no established pre-requistites, an open-ended project with fixed hardware but no expectation to submit the final project for review. If they wanted to verify the students actually developed a working program, they could have easily asked for the Pi's to be returned along with the source code.

A project like this was likely intended to get the students to think about the "what" and not worry so much about the "how." Faking it entirely may have gone a bit further than intended, but would still meet the goal of getting the students to think about what they could do with this computer (if they knew how)

While university instructors can vastly underestimate student's creativity, they are, generally speaking, not stupid. At the very least, they know if you don't tell students to submit their work, you can often count on them doing as little as possible.

godelski7 months ago

> If they wanted to verify the students actually developed a working program, they could have easily asked for the Pi's to be returned along with the source code.

Wait, is your argument honestly "it's not cheating because they just trusted the students"?

There's a huge difference between demoing something as "this is what we did" vs "we didn't quite get there, but this is what we're envisioning."

Edit: You all are responding very weirdly. The cheating is because you're presenting "something" that is not that thing. Put a dog in a dress and call it a pretty woman and I'll call you a conman.

lizard7 months ago

No, the argument is, "It's not cheating because it wasn't a programming assignment."

jimkoen7 months ago

> Put a dog in a dress and call it a pretty woman and I'll call you a conman.

Well if you're the TA and you're unwilling/too lazy to call out the conman, I call you an accomplice! Also, since when was the ideal scientific rigour ever build on interpersonal trust?

+1
joshspankit7 months ago
Jagerbizzle7 months ago

It certainly reflects poorly on the institution for not requiring anything other than a dog and pony show for grading.

whatever17 months ago

BS. The CEO of one of the largest public companies just did it and he is fine. Board and shareholders all happy.

theGnuMe7 months ago

I gleefully await Matt Levine’s article on AI demos as securities fraud.

meiraleal7 months ago

Well done, you're halfway to secure a job at Google, no eh ethics/morals needed.

elwell7 months ago

Of course it's cheating

JohnFen7 months ago

It's very obviously cheating. They didn't do what the assignment asked.

Uptrenda7 months ago

I'd call it cheating too but yeah. I like the pi and sensors though. Sounds like the start of something cool. Wish I could get a product like this to put in my roof to detect leaks. That would be useful.

icedchai7 months ago

If the teacher was competent, they would've asked to see the code.

frail_figure7 months ago

The view up on that high horse must be interesting! Were you the kid who reminded teachers about homework?

Literally all that matters is that they passed.

gardenhedge7 months ago

> Were you the kid who reminded teachers about homework?

Are you trying to bully me or something? Not going to work with me. You've revealed your poor character with that comment.

Kuraj7 months ago

Kind of? Yes but they still demonstrated as much as was expected from them, which was very little to begin with.

Moomoomoo3097 months ago

It depends on what the intention of the assignment was. If it was primarily to help the students understand what these devices _could_ be used for, then it's fine. If it was to have them actually do it, well, then the professor should have at least tried to verify that. Given that it's for first-years who have no experience programming, it really could be either.

motbus37 months ago

Well, you literally had a backend

JCharante7 months ago

do things that don't scale

masteruvpuppetz7 months ago

more like backhand lol

defaultcompany7 months ago

There’s a story about sales guys at a company (NewTek?) who faked a demo at CES of an Amiga 500 with two monitors showing the “Boing” ball bouncing from one screen to the next. This was insane because the Amiga didn’t have support for multiple monitors in hardware or software so nobody could figure out how they did it. Turns out they had another Amiga hidden behind running the same animation on the second monitor. When they started them at the right offset it looked believable.

bonney_io7 months ago

My version of this involved a Wii remote: freshmen-level CompSci class, and the group had to build a simple game in Python to be displayed at a showcase among the class. We wrote a space invaders clone. I found a Bluetooth driver that allowed your Wiimote to connect to your Mac as a game controller, so I set up a basic left/right tilt control using a Wiimote for our space invaders clone.

The Wiimote connection was the star of the show by a long shot :P

meiraleal7 months ago

Are you looking for a job at Google? Don't be evil. They have enough scammers there already, no help needed, including PR at hacker forums

cipritom7 months ago

Sounds very familiar... UoM, UK ? :)

DonnyV7 months ago

This is so crazy. Google invented transformers which is the bases for all these models. How do they keep fumbling like this over and over. Google Docs created in 2006! Microsoft is eating their lunch. Google creates the ability to change VM's in place and makes a fully automated datacenter. Amazon and Microsoft are killing them in the cloud. Google has been working on self driving longer than anyone. Tesla is catching up and will most likely beat them.

The amount of fumbles is monumental.

sourcegrift7 months ago

I was at MS in 2008 September and internally they had a very beautiful and well functioning Office web already (named differently, forgot the name but it wasn't sharepoint if I recall correctly, I think it had to do something with expense reports?) that would put Google Docs to shame today. They just didn't want to cannibalize their own product.

vitorgrs7 months ago

Microsoft demoed Office Web Apps in 2008 L.A PDC it seems: https://www.wired.com/2008/10/pdc-2008-look-out-google-docs-...

flakiness7 months ago

Don't forget they also invented XHR (aka fetch) in 2001. https://en.wikipedia.org/wiki/XMLHttpRequest

+1
marcusjt7 months ago
benbristow7 months ago

Funny, I asked Google Bard to guess what the actual product name was from the comment.

"It was probably Office Web Apps. It was a web-based office suite that was introduced in 2008. It included Word Web App, Excel Web App, Powerpoint Web App, and OneNote Web App. It was not SharePoint, but it was based on SharePoint technology."

corndoge7 months ago

Does bard browse the web yet? Is it possible it read the parent comment?

Wild that we have to ask these questions.

happytiger7 months ago

Don’t forget that McAfee was delivering virus scanning in a browser in 1998 with active x support, TinyMCE was full wysiwyg for content in the browser by 2004, and Google docs was released in 2006 on top of a huge ecosystem of document solutions and even some real-time co-authoring document writing platforms.

2008 is late to the party for a docs competitor! Microsoft got the runaround by Google and after Google launched docs they could have clobbered Microsoft which kind of failed to respond properly in kind, but they didn’t push the platform hard enough to eat the corporate market share, and didn’t follow up with a share point alternative that would appeal to the enterprise, and kind of blew the opportunity imo.

I mean to this day Google docs is free but it still hasn’t unseated Word in the marketplace, but the real killer app that keeps office on top is Excel, which some companies built their entire tooling around.

It’s crazy interesting to look back and realize how many twists there were leading us to where we are today.

Btw it was Office Server or Sharepoint Portal earlier (this is like Frontpage days so like 2001?) and Microsoft called it Tahoe internally. I don’t think it became Sharepoint until Office 365 launched.

The XMLHTTP object launched in 2001 and was part of the dhtml wave. That gave a LOT of the capabilities to browsers that we currently see as browser-based word processing, but there were efforts with proprietary extensions going back from there they just didn’t get broad support or become standards. I saw some crazy stuff at SGI in the late 90s when I was working on their visual workstation series launch.

addicted7 months ago

Google Apps have several other problems as well.

1. Poor Google Drive interface makes managing documents difficult.

2. You cannot just get a first class Google Doc file which you can then share with others over email, etc. Very often you don’t want to just share a link to a document online.

3. Lack of desktop apps.

amatwl7 months ago

NetDocs was an effort in 2000/2001 that is sometimes characterized as a web productivity suite. There was an internal battle between the Netdocs and Office groups, and Office won.

https://www.zdnet.com/article/netdocs-microsofts-net-poster-...

https://www.eweek.com/development/netdocs-succumbs-to-xp/

ginko7 months ago

>I was at MS in 2008 September and internally they had a very beautiful and well functioning Office web already

So why did they never release that and went with Office 365 instead?

rchaud7 months ago

They did, it was called Office Online with Word, PowerPoint, Excel and SkyDrive (later OneDrive). Everything got moved under the Office 365 umbrella because selling B2B cloud packages (with Sharepoint, Azure AD, Power BI, Teams, Power Automate) was more lucrative than selling B2C subscriptions.

kdamica7 months ago

Classic innovator’s dilemma!

lmm7 months ago

Interesting how it seems like MS may have been right this time? They were able to milk Office for years, and despite seeming like it might, Google didn't eat their lunch.

+2
theGnuMe7 months ago
rchaud7 months ago

More like the Acquirer's dilemma.

Google Analytics - acquired 2004, renamed from Urchin Analytics

Google Docs - acquired 2004, renamed from Writely

Youtube - acquired 2005

Android - acquired, 2005 (Samsung have done more to advance the OS than Google themselves)

renegade-otter7 months ago

Google doesn't know how to do anything else.

A product requires commitment, it requires grind. That 10% is the most critical one, and Google persistently refuses to push products across the finish line, just giving up on them and adding to the infamous Google Product Graveyard.

Honestly, what is the point? They could just maintain the core search/ads and not pay billions of dollars for tens of thousands of expensive engineers who have to go through a bullshit interview process and achieve nothing.

lumost7 months ago

If they tried to focus on ads, then they wouldn’t have the talent to support the business. They probably don’t need 17 chat apps - but they can’t start saying no without having other problems.

Shorel7 months ago

They only hire some talent to prevent other companies to hire them.

It's a way to strangle the competition. But also not good for the industry in general.

rurp7 months ago

While it is crazy, it's not too surprising. Google has become as notorious for product ineptitude as they have been for technical prowess. Dominating the fundamental research for GenAI but face planting on the resulting consumer products is right in line with the company that built Stadia, GMail/Inbox, and 17 different chat apps.

ren_engineer7 months ago

>Google Docs created in 2006

tech was based on an acquired company, Google just abused their search monopoly to make it more popular(same thing they did with YT). This has been the strategy for every service they've ever made, Google really hasn't launched a decent in-house product since Gmail and even that was grown using their search monopoly as free advertising

>Google Docs originated from Writely, a web-based word processor created by the software company Upstartle and launched in August 2005

robertlagrant7 months ago

> Google really hasn't launched a decent in-house product since Gmail

What about Chrome? And Chromebooks?

nieve7 months ago

Sorry if this was a joke and I didn't spot it. Chrome was based on WebKit which was itself based on KHTML if memory serves. Chromebooks are based on a version of that outside engine running on top of Linux which they also didn't create.

robertlagrant7 months ago

It's not a joke. Just because they didn't write everything from scratch (Chromebooks also are made with hard disks that Google didn't create from directly mining raw materials and performing all intermediate manufacturing stages) doesn't mean they haven't released successful products that they didn't just buy in.

lmm7 months ago

They used the KDE-derived HTML renderer, sure, but they wrote the whole Javascript runtime from scratch, which was what gave it the speed they used as a selling point.

+1
jamedjo7 months ago
+1
mda7 months ago
rchaud7 months ago

Chromebooks are worse version of the netbooks from 2008, which ran an actual desktop OS. Chromebooks are OLPCs for the rich world, designed with vendor lock-in built in. They eventually end up at discount wholesale lots if not landfills because how quickly they go obsolete.

camflan7 months ago

mmm, WebKit?

+2
bofaGuy7 months ago
andiareso7 months ago

I laughed out loud for this one

impulser_7 months ago

You bring up fumbles, but they still have the more products with more than a billion users than any company in the world.

This is what Google has always cared about. Bring application to the billions of users.

People are forgetting Google is the most profitable AI company in the world right now. All of their products use ML and AI.

So who is losing?

The goal of Gemini isn't to build a chatbot like ChatGPT despite Google having Bard.

The goal for Gemini is to integrate it into those 10 products they have with a billion users.

dmix7 months ago

This is like critiquing Disney for putting out garbage and then defending them because dummies keep giving them money regardless of quality. Having standards and expectations of greatness is a good thing and the last thing you want is for mediocrity to become acceptable in society.

acdha7 months ago

> People are forgetting Google is the most profitable AI company in the world right now. All of their products use ML and AI.

> So who is losing?

The people who use their products, which are worse than they’ve been in decades? The people who make the content Google now displays without attribution on search results?

x0x07 months ago

Sure, but I think Google's commanding marketshare is more at risk than it has been in a long time due to their fumbles in the AI space.

rchaud7 months ago

> All of their products use ML and AI.

Is that supposed to be a vote of confidence for the current state of Google search?

qingcharles7 months ago

I demo'd a full browser office suite in 1998 called Office Wherever (o-w.com). It used Java applets to do a lot of the more tricky functions.

Shopped it around VCs. Got laughed out of all the meetings. "Companies storing their documents on the Internet?! You're out of your mind!"

renegade-otter7 months ago

Some things are just too ahead of their times.

Globe dot com was basically Facebook, but the critical mass wasn't there. Nor were the smartphones.

nar0017 months ago

Im curious if the code would be available somewhere? I have to admit I'm curious how it worked!

Tommah7 months ago

Netscape had a feature called LiveConnect that allowed interaction between Java and JavaScript. See http://medialab.di.unipi.it/web/doc/JavaScriptGuide/livecon.... for some examples of how it worked. Even though AJAX wasn't available yet in 1998, I think you could have used LiveConnect to achieve the same thing. Java applets had the ability to make HTTP requests to the originating host (the host that served the applet to the browser).

qingcharles7 months ago

Long lost! Unless Internet Archive has a copy of the site? Never checked! The main domain was officewherever.com.

overconfident597 months ago

I agreed until the last bit. Waymo is making continuous progress and is years ahead of everyone else. Tesla is not catching up and won't beat anyone. Tesla plateaued years ago and has no clue how to improve further. Their Partial Self Driving app has never been anywhere near reliable.

_the_inflator7 months ago

I say it again and again: sales, sales. Money is earned in enterprise domains.

And this business is so totally different to Google in every way imaginable.

Senior Managers love customer support, SLAs - Google loves automation. Two worlds collide.

hot_gril7 months ago

Google customer support says "Won't Fix [Skill Issue]"

ASalazarMX7 months ago

Google Workspace works through resellers, they train less people, and those people give the customer support instead. IMO Google's bad reputation comes from their public customer support.

michaelt7 months ago

If you want the kind of support that, when there is a fault with the product, can get the fault fixed - then unfortunately Google Workspace's support is also trash.

Good if you want someone else to google the error message for you though.

rchaud7 months ago

> IMO Google's bad reputation comes from their public customer support.

Garbage in = garbage out.

If Google cannot deign to assign internal resources and staffing towards providing first-party support for paid products, it's not a good choice over the competition. You're not going to beat the incumbent (Office 365) by skimping on customer service.

holoduke7 months ago

They are an ads company. Focus is never on "core" products.

vitorgrs7 months ago

Sundar Pichai should have been out of Google long ago.

paxys7 months ago

> Google Docs created in 2006

Word and Excel have been dominant since the early 1980s. Google has never had a real shot in the space.

acheron7 months ago

You mean 1990s? I don't think Word and Excel even existed until the late 80s, and nobody[0] used them until Windows 3.1.

[0] yes, not literally nobody. I know about the Windows 2.0 Excel or whatever, but the user base compared to WordPerfect or 1-2-3 was tiny up until MS was able to start driving them out by leveraging Windows in the early-mid 90s.

atleastoptimal7 months ago

It's reassuring that the biggest tech company doesn't automatically make the best tech. If it were guaranteed that Google's resources would automatically trump any startup in the AI field, then it would likely predict a guaranteed dominance of incumbents and consolidation of power in the AI space.

w10-17 months ago

Isn't it always easier to learn from others' mistakes?

Google has the problem that it's typically the first to encounter a problem, and it has the resources to approach it (from search), but the incentive to monetize it (to get away from depending entirely on search revenue). And, management.

rurp7 months ago

I don't know if that really excuses Google in this case because it's a productization problem. Google never tried to release a ChatGPT competitor until after OpenAI had. OpenAI has been wildly successful as the first mover, despite having to blaze some new product trails. Even after months of watching them and with near-infinite resources, Google is still struggling to catch up.

hosh7 months ago

Outside of outliers like gmail, Google didn’t get their success with product. The organization is set up for engineering to carry the day, funded by search.

An AI product that makes search irrelevant is an existential threat, but I don’t think Google has the product DNA to pull off a replacement product for search themselves. I heard Google has been taken over by more business / management types, but it is still missing product as a core pillar.

rtsil7 months ago

Considerng the number of messaging apps they tried to launch, if there's at least one thing that can be concluded, it's that it isn't easier to learn from their own mistakes.

Shorel7 months ago

It's the curse of the golden goose.

They can't do anything that threatens their main income. They are tied to ads and ads technology, and can't do anything about it.

Microsoft had a crisis and that drives focus. Google... they probably mistreat their good employees if they don't work on ads.

lern_too_spel7 months ago

I was with you until the Tesla hot take. I'd bet dollars to donuts that Tesla doesn't get to level 4 by the end of the decade. Waymo is already there.

hot_gril7 months ago

I agree, but I also bet Waymo doesn't exist by the end of the decade. Not just because it's Google but because it's hard to profit from.

smat7 months ago

I could see that in the coming years the value of Waymo for Google is not actually in collecting revenue from transportation fees but to collect multi modal data to feed into its models.

The amount of data that is collected by these cars is massive.

bendbro7 months ago

[flagged]

onion2k7 months ago

None of that matters. They'll still make heaps of profit long into the future unless someone beats them in Search or Ads.

AI is a threat there, but it'd require an AI company to transform the culture of Internet use to stop people 'Googling', and that will require two things: something significantly better than Google Search that's worth switching to, and a company that is willing to reject whatever offer Google makes to buy it. Neither is very likely.

mejutoco7 months ago

I would love to see internal data on volume of search at google. Depending on the interpretation of them chatGPT can meet both of your requirements. Personally, I still search instead of chatGPT mostly, but I have seen other users chatGPT more and more.

Also "interesting" to see the if results being SEO spam generated using AI will keep seo search viable.

spaceman_20207 months ago

The difference seems to be the top leadership.

Nadella is an all time great CEO. Pichai is an uninspired MBA-type.

addicted7 months ago

Nadella is as much of an MBA type as Pichai. Their education and career paths are incredibly similar.

The difference is Nadella is a good CEO and Pichai isn’t.

Part of it could also be a result of circumstance. Nadella came at a time when MS was foundering and he had to make what appeared to be fairly obvious decisions (pivot to cloud…he was literally picked because of this, and reducing dependence on Windows…which was an obvious necessary step for the pivot to cloud). Pichai OTOH was selected to run Google when it was already doing pretty well. His biggest mandate was likely to not upset the Apple cart.

If roles were reversed, I suspect Nadella would still have been more successful than Pichai, but you never know. I’d Nadella introduction to the CEO job was to keep things going as they were, and Pichai’s was to change the entire direction of the company, maybe a decade later Pichai would have been the aggressive decision maker whereas Nadella would have been the overly cautious guy making canned demos.

kamaal7 months ago

>>Google Docs created in 2006! Microsoft is eating their lunch.

Of all the things, this.

I use both Google and Microsoft office products. One thing that strikes you is just how feature rich Microsoft products are.

Google doesn't look like is serious about making money.

I squarely blame rockstar product managers and OKRs for this. Not everything can be a 1000% profitable product built in the next quarter. A lot of things require small continuous improvement and care over years.

spaceman_20207 months ago

Microsoft’s killer product is Excel. I didn’t realize how powerful it was until I saw an expert use it. There are entire billion dollar organisations that would collapse without Excel.

hot_gril7 months ago

Engineer-driven company. Not enough top-down direction on the products. Too much self-perceived moral high ground. But lately they've been changing this.

Slackwise7 months ago

Uhh, no, not really; quite the opposite in fact.

Under Eric Schmidt they were engineer-driven, during the golden era of the 2000s. Nowadays they're MBA driven, which is why they had 4 different messaging apps from different product managers.

hot_gril7 months ago

Lack of top-down direction is what allowed that situation. Microsoft is MBA-driven and usually has a coherent product lineup, including messaging.

Also, "had." Google cleaned things up. They still sometimes do stuff just cause, but it's a lot less now. I still feel like Meet using laggy VP9 (vs H.264 like everyone else) is entirely due to engineer stubbornness.

+1
robertlagrant7 months ago
gscott7 months ago

20 versions of .net is wonderful. Changing the names of features over and over again is great too. I am also pleased that windows ten is the last version of windows.

+2
gardenhedge7 months ago
addicted7 months ago

The golden era of the 2000s produced no revenue stream other than ads on Google Search.

hot_gril7 months ago

Exactly. I never cared for the "golden age" Google. Maybe the old days were fun, but it wasn't going to be tenable forever.

igor477 months ago

My engineer friend who work at Google would strongly disagree with this assertion. I keep hearing about all sorts of hijinks initiated by senior PMs and managers trying to build their fiefdoms.

hot_gril7 months ago

Disagree with which part? The hijinks are there, no denying it. Kind of a thing at any company, but remedied by leaders above those PMs taking charge.

mmanfrin7 months ago

> Google has been working on self driving longer than anyone. Tesla is catching up and will most likely beat them.

I agree with your general post but I disagree with this. Tesla's FSD is so far behind Google it's almost negligent on the part of Tesla despite having so much more data.

asylteltine7 months ago

I can tell you exactly why. It’s because they have a separate vp and org for all these different products like search, maps, etc. none of them talk to each other and they all compete for promotions. There is no one shot caller same thing with gcp. Google does not know products.

theGnuMe7 months ago

A lot of companies have this structure. You have the Doritos line, the Pepsi line for example etc… maybe you find some common synergies but it’s not unusual.

What would the ideal setup in your opinion?

Uptrenda7 months ago

Big companies are where innovation goes to die.

rhuru7 months ago

There is little to no inertia inside google to build and invent stuff. But there is a massive bloat to ship stuff.

barfingclouds7 months ago

Tesla will not beat them at self driving simply due to hardware at the very least

larodi7 months ago

Errr sorry what’s the innovation of google docs exactly ? Being able to write simultaneously with somebody else? Ok, so this is what it takes for a top notch docs app to exist? Microsoft been developing this product for ages, Google tried to steal the show, although had little to no experience in producing and marketing office apps…

Besides collaborative reuniting is a no feature and there is much more important stuff than this for a word processor to be useful.

bradhe7 months ago

Microsoft eating Google's lunch on documents is laughable at best. Not to mention it confuses the entire timeline of office productivity software??

hot_gril7 months ago

Is paid MS Teams is more or less common than paid GSuite? It's hard to find stats on this. GSuite is the better product IMO, but MS has a stronger b2b reputation, and anecdotally I hear more about people using Teams.

bombcar7 months ago

Nobody pays for Teams, but everyone pays for Office, and if you get Teams for free with it ...

olyjohn7 months ago

This is how it became so popular so fast. If they had charged for it, all those Teams users would still be using Zoom.

+1
WWLink7 months ago
abustamam7 months ago

Does anyone use paid GSuite for anything other than docs/drive/Gmail ? In all companies I've worked at, we've used GSuite exclusively for those, and used slack/discord for chat, and zoom/discord for video/meetings.

I know that MS Teams is a more full-featured product suite, but even at companies that used it, we still used Zoom for meetings.

+4
seaucre7 months ago
hot_gril7 months ago

GSuite for calendar makes sense too. Chat sucks, and Meet would be decent if it weren't so laggy, but those are two things you can easily not use.

UrineSqueegee7 months ago

I worked at many companies in my times and all of them used teams except from one that used slack but all used MS products, none used googles.

darthrupert7 months ago

Gsuite is clearly a lot better product than Office365. I feel like I'm taking crazy pills when I see many institutions make the wrong choice here.

I base about 50% of my choice of employer on what they choose in that area.

addicted7 months ago

GSuite is an awful product for an employer.

If you have a problem there’s no one available to help you.

On the MS side they will literally pull an engineer who is writing the code for the product you have a problem for to help resolve the issue if you’re large enough.

The part you see in your browser isn’t the only part of the product a company has to buy. In fact, it’s not even the most expensive bit. If you see the most expensive plans for most SAAS products (ie the enterprise plans) almost the entire difference in costs is driven by support illustrating the importance and value of support.

Google unfortunately is awful at this.

bbarnett7 months ago

Teams will likely still be around in 20 years. I doubt gsuite will exist in 5... or even 1.

+2
hot_gril7 months ago
suslik7 months ago

> Microsoft is eating their lunch.

Well, that is trully shocking.

dbish7 months ago

Also hard to say it’s really true. OpenAI is certainly, is Microsoft without OpenAI’s tech eating Google’s lunch?

maroonblazer7 months ago

Given MSFT's level of investment in OpenAI, and all the benefits that accrue from it, they're one and the same.

latency-guy27 months ago

It is yet to be seen if MSFT has actually gained a benefit. Maybe from marketing perspective it has insane potential to print big bucks, but it is a bit too soon to announce that the efforts to deliver Copilot (all tools+agents) far and wide was/is successful.

We'll get a definitive answer in a few years. Til then, OpenAI benefits from the $ value from their end of products, MSFT eats the compute costs, but also gets a stock bump.

mugivarra697 months ago

[flagged]

htk7 months ago

The whole Gemini webpage and contents felt weird to me, it's in the uncanny valley of trying to look and feel like an Apple marketing piece. The hyperbolic language, surgically precise ethnic/gender diversity, unnecessary animations and the sales pitch from the CEO felt like a small player in the field trying to pass as a big one.

kozikow7 months ago

It's funny because now the OpenAI keynote feels like it's emulating the Google keynotes from 5 years ago.

Google Keynote feels like it's emulating the Apple keynote from 5 years ago.

And the Apple keynote looks like robots just out of an uncanny valley pretending to be humans - just like keynotes might look in 5 years, but actually made by AI. Apple is always ahead of the curve in keynote trends.

hanspeter7 months ago

You know those memes where AI keeps escalating a theme to more extreme levels with each request?

That's what Apple keynotes feel like now. It seems like each year, they're trying to make their presentations even more essentially 'Apple.' They crossed the uncanny valley a long time ago.

ilkke7 months ago

"make it feel more like a hospital"

+1
jq-r7 months ago
wharvle7 months ago

I hadn’t thought about it until just now, but the most recent Apple events really are the closest real-person thing I’ve ever seen to some of the “good” computer generated photorealistic (kinda…) humans “reading” with text-to-speech that I’ve seen.

It’s the stillness between “beats” that does it, I think, and the very-constrained and repetitive motion.

tailspin20197 months ago

Is there such a concept as a “reverse uncanny valley”??

Where humans behave so awkwardly that they seem artificial but are just not quite close enough…

If so, Apple have totally nailed the reverse uncanny valley!

lmm7 months ago

Hmm. Like the "NPC fetish" stuff that was going around for a brief minute?

robbomacrae7 months ago

The more I think about this the more it rings true...

cedws7 months ago

I got the same vibes. Ultra and Pro. It feels tacky that it declares the "Gemini era" before it's even available. Google really want to be seen as level on the playing field.

kjkjadksj7 months ago

I’m imagining the project managers are patting themselves on the back for checking all the performative boxes, blind to the absolute satire of it all.

crazygringo7 months ago

> surgically precise ethnic/gender diversity

What does that mean and why is it bad?

Diversity in marketing is used because, well, your desired market is diverse.

I don't know what it means for it to be surgically precise, though.

kokanee7 months ago

I imagine the commenter was calling out what they perceived to be an inauthentic yet carefully planned facade of diversity. This marketing trend rubs me the wrong way as well, because it reminds me of how I was raised and educated as a 90s kid to believe that racism was a thing of the past. That turned out to be a damaging lie.

I don't mean to imply that companies should avoid displays of diversity, I just mean that it's obvious when it's inauthentic. Virtue signaling in exchange for business is not progress.

idk17 months ago

It think it could be a seen as a good thing, it's a little chicken and egg. If you want to increase diversity at a company, one good way would be to represent diversity in your keynotes in order to make it look to a diverse base that they would be happy working there, thus increasing the diversity at the company.

asadotzler7 months ago

You'd prefer the alternative with just a few white guys in the picture and no consideration given at all to appearing diverse?

+1
gitaarik7 months ago
xwolfi7 months ago

Just take a group of people that actually know and work together and you're authentic. Forced diversity is idiotic: either you do it or you don't, but you show what you're doing to be authentic.

Imagine how cringe it would be if only white guys were allowed to work at Google and they displayed in all their marketing a fully diverse group of non-white girls. That would be... inauthentic.

Just the fact girls are less than guys in IT is something we should demonstrate, understand, change if needed. Not hide behind a facade of 50/50 display everywhere as if the problem was already solved or that it was even a problem in the first place.

cscurmudgeon7 months ago
rtsil7 months ago

It's bad if the makeup of the company doesn't reflect the diversity seen in the marketing, because it doesn't reflect any genuine value and is just for show.

Now, I don't know how diverse the AI workforce is at Google, but the YT thumbnails show precisely 50% of white men. Maybe that's what the parent meant by "surgically precise".

scotty797 months ago

It's new token black guy. It's not completely bad, just feels inauthentic.

cheeze7 months ago

Agreed with your comment. This is every marketing department on the planet right now, and it's not a bad thing IMO. Can feel a bit forced at times, but it's better than the alternative.

ilkke7 months ago

The alternative being showing actual level of diversity in the company?

GaryNumanVevo7 months ago

Of course to normal people, this just seems like another Google keynote. If OP is counting the number of white people, maybe they're the weird one here.

davesque7 months ago

A big red flag for me was that Sundar was prompting the model to report lots of facts that can be either true or false. We all saw the benchmark figures that they published and the results mostly showed marginal improvements. In other words, the issue of hallucination has not been solved. But the demo seemed to imply that it had. My conclusion was that they had mostly cherry picked instances in which the model happened to report correct or consistent information.

They oversold its capabilities, but it does still seem that multi-modal models are going to be a requirement for AI to converge on a consistent idea of what kinds of phenomena are truly likely to be observed across modalities. So it's a good step forward. Now if they can just show us convincingly that a given architecture is actually modeling causality.

LesZedCB7 months ago

i think this was demonstrated in that mark rober promo video[1] where he asked why the paper airplane stalled by blatantly leading the witness.

"do you believe that a pocket of hot air would lead to lower air pressure causing my plane to stall?"

he could barely even phrase the question correctly because it was so awkward. just embarrassing.

[1] https://www.youtube.com/watch?v=mHZSrtl4zX0&t=277s

dom967 months ago

Yeah, this was so obvious too. Clearly Mark Rober tried to ask it what to try and got stupid answers, then tried to give it clues and had to get really specific before he got a usable answer.

ryeights7 months ago

This has got to be satire! That is too funny.

nblgbg7 months ago

The issue of hallucinations won't be solved with the RAG approach. It requires a fundamentally different architecture. These aren't my words but Yann LeCun's. You could easily understand if you spend some time playing around. The autoregressive nature won't allow the LLMs to create an internally consistent model before answering the question. We have approaches like Chain of Thought and others, but they are merely band-aids and superficially address the issue.

golol7 months ago

If you build a complex Chain if Thought style Agent and then train/finetune further by reinforcement learning with this architecture then it is not a band-aid anymore, it is an integral part of the model and the weights will optimize to make use of this CoT ability.

phire7 months ago

It's been 3.5 years since GPT-3 was released, and just over a year since ChatGPT was released to the public.

If it was possible to solve LLM hallucinations with simple Chain-of-Thought style agents, someone would have done that and released a product by now.

The fact that nobody has released such a product, is pretty strong evidence that you can't fix hallucinations via Chain-of-Thought or Retrieval-Augmented Generation, or any other band-aid approaches.

+1
wokwokwok7 months ago
calf7 months ago

Ever since the "stochastic parrots" and "super-autocomplete" criticisms of LLMs, the question is whether hallucinations are solvable in principle at all. And if hallucinations are solvable, it would of such basic and fundamental scientific importance that I think would be another mini-breakthrough in AI.

adriand7 months ago

An interesting perspective on this I’ve heard discussed is whether hallucinations ought to be solved at all, or whether they are core to the way human intelligence works as well, in the sense that that is what is needed to produce narratives.

I believe it is Hinton that prefers “confabulation” to “hallucination” because it’s more accurate. The example in the discussion about hallucination/confabulation was that of someone who had been present in the room during Nixon’s Watergate conversations. Interviewed about what he heard, he provided a narrative that got many facts wrong (who said what, and what exactly was said). Later, when audio tapes surfaced, the inaccuracies in his testimony became known. However, he had “confabulated truthfully”. That is, he had made up a narrative that fit his recall as best as he was able, and the gist of it was true.

Without the ability to confabulate, he would have been unable to tell his story.

(Incidentally, because I did not check the facts of what I just recounted, I just did the same thing…)

TerrifiedMouse7 months ago

> Without the ability to confabulate, he would have been unable to tell his story.

You can tell a story without making up fiction. Just say you don’t know when you don’t know.

Inaccurate information is worse than no information.

+1
lmm7 months ago
addicted7 months ago

If “confabulation” is necessary, you can use confabulation for the use cases where it’s needed and turn it off for the use cases where you need to return actual “correct” information.

AlienRobot7 months ago

I've read similar thoughts before about AI art. When the process was still developing, you would see AI "artwork" that was the most inhumanly uncanny pictures. Things that twisted the physical forms that human artists perceive with the fundamental pixel format/denoising algorithms that the AI works with. It was just uniquely AI and not something a human being would be able to replicate. "There are no errors just happy accidents." In there you say there was a real art medium/genre with its own intrinsic worth.

After a few months AI developers refined the process to just replicate images so they looked like a human being made them, in effect killing what was the real AI art.

GolfPopper7 months ago

The best one I've run across so far is, "spicy autocomplete".

plaidfuji7 months ago

These LLMs do not have a concept of factual correctness and are not trained/optimized as such. I find it laughable that people expect these things to act like quiz bots - this misunderstands the nature of a generative LLM entirely.

It simply spits out whatever output sequence it feels is most likely to occur after your input sequence. How it defines “most likely” is the subject of much research, but to optimize for factual correctness is a completely different endeavor. In certain cases (like coding problems) it can sound smart enough because for certain prompts, the approximate consensus of all available text on the internet is pretty much true and is unpolluted by garbage content from laypeople. It is also good at generating generic fluffy “content” although the value of this feature escapes me.

In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.

zmmmmm7 months ago

> because for certain prompts, the approximate consensus of all available text on the internet is pretty much true

I think you're slightly mischaracterising things here. It has potential to be at least slightly and possibly much better than that. This is evidenced by the fact it is much better than chance at answering "novel" questions that don't have a direct source in the training data. Why it can do it is because at a certain point, to solve the optimisation problem of "what word comes next" the least complex strategy actually becomes to start modeling principles of logic and facts connecting them. It is not in any systematic or reliable way so you can't ever guarantee when or how well it is going to apply these, but it is absolutely learning higher order patterns than simple text / pattern matching, and it is absolutely able to generalise these across topics.

plaidfuji7 months ago

You’re absolutely right and I’m sure that something resembling higher-level pattern matching is present in the architecture and weights of the model, I’m just saying that I’m not aware of “logical thought” being explicitly optimized or designed for - it’s more of a sometimes-emergent feature of a machine that tries to approximate the content of the internet, which for some topics is dominated by mostly logical thought. I’m also unaware of a ground truth against which “correct facts” could even be trained for..

Closi7 months ago

> I’m also unaware of a ground truth against which “correct facts” could even be trained for..

Seems like there are quite a few obvious possibilities here off the top of my head. Ground truth for correct facts could be:

1) Wikidata

2) Mathematical ground truth (can be both generated and results validated automatically) including physics

3) Programming ground truth (can be validated by running the code and defining inputs/outputs)

4) Chess

5) Human labelled images and video

6) Map data

7) Dependent on your viewpoint, peer reviewed journals, as long as cited with sources.

eurekin7 months ago

The first question I always ask myself in such cases: how much input data has a simple "I don't know" lines? This is clearly a concept (not knowing sth) that has to be learned in order to be expressed in the output.

visarga7 months ago

What stops you from asking the same question multiple times, and seeing if the answers are consistent. I am sure the capital of France is always going to come out Paris, but the name of a river passing a small village might be hallucinated differently. Even better - use two different models, if they agree it's probably true. And probably the best - provide the data to the model in context, if you have a good source. Don't use the model as fact knowledge base, use RAG.

+1
ntlk7 months ago
plaidfuji7 months ago

Ha, probably an insignificant amount. The internet is nothing if not confidently-stated positive results, no matter how wrong they might be. No wonder this is how LLMs act.

TerrifiedMouse7 months ago

> In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.

I would say it’s worse than Google search. Google tells you when it can’t find what you are looking for. LLMs “guess” a bullshit answer.

Closi7 months ago

Not always, I think that is an unfair reflection of LLM's in their current state. See two trivial examples below:

https://chat.openai.com/share/ca733a4a-7cdb-4515-abd0-0444a4...

https://chat.openai.com/share/dced0cb7-b6c3-4c85-bc16-cdbf22...

Hallucinations are definitely a problem, but they are certainly less than they used to be - They will often say that they aren't sure but can speculate, or "it might be because..." etc.

TerrifiedMouse7 months ago

I get the feeling that LLMs will tell you they don’t know if “I don’t know” is one of the responses in their training data set. If they actually don’t know, i.e. no trained responses, that’s when they start hallucinating.

visarga7 months ago

> It simply spits out whatever output sequence it feels is most likely to occur after your input sequence... but to optimize for factual correctness is a completely different endeavor

What if the input sequence says "the following is truth:", assuming it skillfully predicts following text, it would mean telling the most likely truth according to its training data.

melagonster7 months ago

unfortunately, this is the product they want to sold.

kristopolous7 months ago

I mean it's a demo. Isn't this kinda what they all do

dougmwne7 months ago

I was fooled. The model release announcement said it could accept video and audio multi-modal input. I understood that there was a lot of editing and cutting, but I really believed I was looking at an example of video and audio input. I was completely impressed since it’s quite a leap to go from text and still images to “eyes and ears.” There’s even the segment where instruments are drown and music was generated. I thought I was looking at a model that could generate music based on language prompts, as we have seen specialized models do.

This was all fake. You are taking a collection of cherry picked prompt engineered examples, then dramatizing them for maximum shareholder hype. The music example was just outputting a description of a song, not the generated music we heard in the video.

It’s one thing to release a hype video with what-ifs and quite another to claim that your new multi-modal model is king of the hill then game all the benchmarks and fake all the demos.

Google seems to be in an evil phase. OpenAI and MS must be quite pleased with themselves.

skepticATX7 months ago

Exactly. Personally I’m fine with both:

1) Forward looking demoes that demonstrate the future of your product, where it’s clear that you’re not there yet but working in that direction

or

2) Demoes that show off current capabilities, but are scripted and edited to do so in the best light possible.

Both of those are standard practice and acceptable. What Google did was just wrong. They deserve to face backlash for this.

renegade-otter7 months ago

This kind of moral fraud - unethical behavior - is tolerated for some reason. It's almost like investors want to be fooled. There is no room for due diligence. They squeel like excited Taylor Swift fans as they are being lied to.

hackerlight7 months ago

This shouldn't be a surprise. Companies optimize for what benefits shareholders. Or if there's an agency conflict of interest, companies optimize for what benefits managements' career and bonuses (perhaps at the expense of shareholders). Companies pay lip service to external stakeholders, but really that's a ploy to reduce attention and the risk of regulation, there is no fundamental incentive to treat all stakeholders well.

If lying helps, which can happen if there aren't large legal costs or social repercussions on brand equity, or if the lie goes undetected, then they'll lie. This is what we necessarily get from the upstream incentives. Fortunately, lying in a marketing video is fairly low on the list of ethical violations that have happened in the recent past.

We've effectively got a governance alignment problem that we've been trying to solve with regulations, taxes and social norms. How can you structure guardrails in the form of an incentive system to align companies with ethical outcomes? That's the question and it's a difficult one. This question also applies to any form of human organization, including governments.

joshspankit7 months ago

As long as you’re not the last one out, “being fooled” can be very profitable

AndyKelley7 months ago

"phase"?

My friend, all these large corporations are going to get away with exactly as much as they can, for as long as they can. You're implying there's nothing to do but wait until they grace us with a "not evil phase", when in reality we need to be working on restoring our anti-monopoly regulation that was systematically torn down over the last 30 years.

hanspeter7 months ago

I too thought it was able to accept video.

Given the massive data volume in videos, I assumed it processed video into pictures by extracting a frame per second or something along those lines, while still taking the entire video as the initial input.

Turns out, it wasn't even doing that!

rdedev7 months ago

Seems reminiscent of a video where the lead research department within Google is an animation studio (wish I could remember more about that video)

Doing all these hype videos just for the sake of satisfying shareholders or whatever is just making me loose trust in their research division. I don't think they did anything like this when they released Bert.

Davidzheng7 months ago

I agree completely. When alphazero was announced I remember feeling like shocked over how they stated this revolutionary breakthrough as if it was like a regular thing. Alphafold and Alphacode are also impressive but this one just sounds like it was forced from Sundar and not the usual deepmind

chupapimunyenyo7 months ago

[flagged]

replwoacause7 months ago

Well put. I’m not touching anything Google does any more. They’re far too dishonest. This failed attempt at a release (which turns out was all sizzle and no steak) only underscored how far behind OpenAI they actually are. I’d love to have been a fly on the wall in the OAI offices when this demo video went live.

sinuhe697 months ago

I, too, was fooled to think Gemini has seen and heard through a video/audio feed instead of showing still images and prompting though text. While it might seem not much difference between still images and a video feed, in fact it requires a lot of (changing) context understanding to not make the bot babbling like an idiot all the time. It also requires the bot to recognize the “I don’t know it yet” state to keep appropriate silence in a conversation with live video feed, which is notoriously difficult with generative AI. Certainly one can do some hacking, build in some heuristics to make it easier, but to make a bot seems like a human partner in a conversion is indeed very hard. And that has been the most impressive aspect of the showed “conversations”, which are unfortunately all faked :(

WrockBro7 months ago

I went back to the video and it said Gemini was "searching" for that music, not generating it. Google has done some stuff with generative music (https://aitestkitchen.withgoogle.com/experiments/music-lm) but we don't know if they'll bring that into Gemini.

pms7 months ago

I bet OpenAI and MS do the same, but people have a positive perception of them due to the massive chatGPT hype wave.

miraculixx7 months ago

Do you believe everything verbatim that companies tell you in advertising?

gregshap7 months ago

If they show a car driving I believe it's capable of self-propulsion and not just rolling downhill.

macNchz7 months ago

A marketing trick that has, in fact, been tried: https://arstechnica.com/cars/2020/09/nikola-admits-prototype...

+1
daveguy7 months ago
olliej7 months ago

If I recall correctly, that led to literal criminal fraud charges.

And iirc Tesla is also being investigated for fraudulent claims for faking the safety of their self driving cars.

dylan6047 months ago

Hmm, might I interest you in a video of an electric semi-truck?

sp3327 months ago

When a company invents tech that can do this, how would their ad be different?

steego7 months ago

No, but most people tend to make a mental note of which companies tend to deliver and which ones work hard to mislead them.

You do understand the concept of reputation, right?

slim7 months ago

this was plausible

AndrewKemendo7 months ago

I have used Swype texting since the t9 days.

If I demoed swype texting as it functions in my day to day life to someone used to a querty keyboard they would never adopt it

The rate at which it makes wrong assumptions about the word, or I have to fix it is probably 10% to 20% of the time

However because it’s so easy to fix this is not an issue and it doesn’t slow me down at all. So within the context of the different types of text Systems out there, I t’s the best thing going for me personally, but it takes some time to learn how to use it.

This is every product.

If you demonstrated to people how something will actually work after 100 hours of habituation and compensation for edge cases, nobody would ever adopt anything.

I’m not sure how to solve this because both are bad.

(Edit: I’m keeping all my typos as meta-comment on this given that I’m posting via swype on my phone :))

mvdtnz7 months ago

Showing a product in its best light is one thing. Demonstrating a mode of operation that doesn't exist is entirely another. It would be like if a demo of your swipe keyboard included telepathic mind control for correcting errors.

AndrewKemendo7 months ago

I’m not sure I’d agree that what they showed will never be possible and in fact my whole point is that I think Google can most likely deliver on that in this specific case. Chalk it up to my experience in the space, but from what I can see it looks like something Google can actually execute on (unlike many areas where they fail on product regularly).

I would agree completely that it’s not ready for consumers the way it was displayed, which is my point.

I do want to add that I believe that the right way to do these types of new product rollout is not with these giant public announcements.

In fact, I think generally speaking the “right” way to do something like this demonstrates only things that are possible robustly. However that’s not the market that Google lives in. They’re capitalists trying to make as much money as possible. I’m simply evaluating that what they’re showing I think is absolutely technically possible and I think Google can deliver it even if its not ready today.

Do I think it’s supremely ethical the way that they did it? No I don’t.

robbomacrae7 months ago

The voice interaction part didn't look a far cry from what we are doing with Dynamic Interaction at SoundHound. Because of this I assumed (like many it seems) that they had caught up.

And it's dangerous to assume they can just "deliver later". It's not that simple. If it is why not bake it in right now instead of committing fraud?

This is damaging to companies that walk the walk and then people have literally said to me "but what about that Gemini"? and dismiss our work.

+1
AndrewKemendo7 months ago
mvdtnz7 months ago

I don't care what google could, in theory, deliver on some time in the future maybe. That's irrelevant. They are demonstrating something that can't be done with the product as they are selling it.

mulmen7 months ago

Does swype make editing easier somehow? iOS spellcheck has negative value. I turned it off years ago and it reduced errors but there are still typos to fix.

Unfortunately iOS text editing is also completely worthless. It forces strange selections and inserts edited text in awkward ways.

I’m a QWERTY texter but text entry on iOS is a complete disaster that has only gotten worse over time.

mikepurvis7 months ago

I'm an iOS user and prefer the swipe input implementation in GBoard over the one in the native keyboard. I'm not sure what the differences are, but GBoard just seems to overall make fewer mistakes and do a better job correcting itself from context.

nozzlegear7 months ago

As I was reading Andrew's comment to myself, I was trying to figure out when and why I stopped using swype typing on my phone. Then it hit me – I stopped after I switched from Android to iOS a few years ago. Something about the iOS implementation just doesn't feel right.

+1
rochak7 months ago
wlesieutre7 months ago

Have you tried the native keyboard since iOS 17? It’s quite a lot better than older versions.

pb77 months ago

Hard disagree. I could type your whole comment without any typos completely blindly (except maybe "QWERTY" because uppercaps don't get autocorrected).

newaccount747 months ago

Apple autocorrect has a tendency to replace technical terms with similar words, eg. rvm turns into rum or ram or something.

It's even worse on the watch somehow. I take care to hit every key exactly, the correct word is there, I hit space, boom replaced with a completely different word. On the watch it seems to replace almost every word with bullshit, not just technical terms.

rootusrootus7 months ago

> seems to replace almost every word with bullshit

Sort of related, it also doesn't let you cuss. It will insist on replacing fuck with pretty much anything else. I had to add fuck to the custom replacement dictionary so it would let me be. What language I choose to use is mine and mine alone, I don't want Nanny to clean it up.

turquoisevar7 months ago

They've pretty much solved this with iOS 17. You can even use naughty words now, provided you use it for a day or so to have it get used to your vocabulary.

edgyquant7 months ago

Maybe my fingers are just too big but the watch for anything like texting is basically impossible for me to use.

Aurornis7 months ago

> However because it’s so easy to fix this is not an issue and it doesn’t slow me down at all.

But that's a different issue than LLM hallucinations.

With Swype, you already know what the correct output looks like. If the output doesn't match what you wanted, you immediately understand and fix it.

When you ask an LLM a question, you don't necessarily know the right answer. If the output looks confident enough, people take it as the truth. Outside of experimenting and testing, people aren't using LLMs to ask questions for which they already know the correct answer.

snowwrestler7 months ago

The insight here is that the speed of correction is a crucial component of the perceived long-term value of an interface technology.

It is the main reason that handwriting recognition did not displace keyboards. Once the handwriting is converted to text, it’s easier to fix errors with a pointer and keyboard. So after a few rounds of this most people start thinking: might as well just start with the pointer and keyboard and save some time.

So the question is, how easy is it to detect and correct errors in generative AI output? And the unfortunate answer is that unless you already know the answer you’re asking for, it can be very difficult to pick out the errors.

AndrewKemendo7 months ago

I think this is a good rebuttal.

Yeah the feedback loop with consumers has a higher likelihood of being detrimental, so even if the iteration rate is high, it’s potentially high cost at each step.

I think the current trend is to nerf the models or otherwise put bumpers on them so people can’t hurt themselves. That’s one approach that is brittle at best and someone with more risk tolerance (OpenAI) will exploit that risk gap.

It’s a contradiction then at best and depending on the level of unearned trust from the misleading marketing, will certainly lead to some really odd externalities

Think “man follows google maps directions into pond” but for vastly more things.

I really hated marketing before but yeah this really proves the warning I make in the AI addendum to my scarcity theory (in my bio).

skywhopper7 months ago

I know marketing is marketing, but it's bad form IMO to "demo" something in a manner totally detached from its actual manner of use. A swype keyboard takes practice to use, but the demos of that sort of input typically show it being used in a realistic way, even if the demo driver is an "expert".

This is the sort of demo that 1) gives people a misleading idea of what the product can actually do; and 2) ultimately contributes to the inevitable cynical backlash.

If the product is really great, people can see it in a realistic demo of its capabilities.

cja7 months ago

I think you mean swipe. Swype was a brilliant third party keyboard app for Android which was better at text prediction and manual correction than Gboard is today. If however you really do still use Swype then please tell me how because I miss it.

AndrewKemendo7 months ago

Ha good point, and yes I agree Swype continues to be the best text input technology that I’ll never be able to use again. I guess I just committed genericide here but I meant the general “swiping” process at this point

thequadehunter7 months ago

I don't buy it. OpenAI did not have to do it with ChatGPT, and they always include a live demo when they release new products.

Maybe you can spice up a demo, but misleading to the point of implying things are generated when they're not (like the audio example) is pretty bad.

peteradio7 months ago

What is the latency is Swype? < 10ms? Not at all comparable to the video.

raincole7 months ago

> This is every product.

Except actual good ones, like ChatGPT or Gmail (by their time).

turquoisevar7 months ago

You make a decent point, but you might underestimate how much this Gemini demo is faked[0].

In your Swype analogy, it would be as if Swype works by having to write out on a piece of paper the general goal of what you're trying to convey, then having to write each individual letter on a Post-it, only for you to then organize these Post-its in the correct order yourself.

This process would then be translated into a slick promo video of someone swiping away on their keyboard.

This is not a matter of “eh, it doesn't 100% work as smooth as advertised.”

0: https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...

kjkjadksj7 months ago

Its honestly pretty mind boggling that we’d even use querty on a smartphone. The entire point of the layout is to keep your fingers on the home row. Meanwhile people text with a single or two thumbs 100% of the time.

heleninboodler7 months ago

The reason we use qwerty on a smartphone is extremely straightforward: people tend to know where to look for the keys already, so it's easy to adopt to even though it's not "efficient". We know it better than we know the positions of letters in the alphabet. You can easily see the difference if you're ever presented with an onscreen keyboard that's in alphabetical order instead of qwerty (TVs do this a lot, for some reason, and it's a different physical input method but alpha order really does make you have to stop and hunt). It slows you down quite a bit.

swores7 months ago

That's definitely a good reason why, but perhaps if iOS or Android were to research what the best layout is for typical touch screen typing and release that as a new default, people would find it quite quick to learn a second layout and soon get just the benefits?

After all, with TVs I've had the same experience as you with the annoying alphabetical keyboard, but we type into they maybe a couple of times a year, or maybe once in 5 years, whereas if we changed our phone keyboard layout we'd likely get used to it quite quickly.

Even if not going so far as to push it as a new default for all users (I'm willing to accept the possibility that I'm speaking for myself as the kind of geeky person who wouldn't mind the initial inconvenience of a new kb layout if it meant saving time in the long run, and that maybe a large majority of people would just hate it too much to be willing to give it a chance), they could at least figure out what the best layout is (maybe this has been studied and decided already, by somebody?) and offer that as an option for us geeks.

+1
mellinoe7 months ago
hiccuphippo7 months ago

I use 8vim[0] from time to time, it's a good idea but needs a dictionary/autocompletion. You can get ok speeds after an hour of usage.

[0] https://f-droid.org/en/packages/inc.flide.vi8/

rurp7 months ago

Path dependency is the reason for this, and is the reason why a lot of things are the way they are. An early goal with smart phone keyboards was to take a tool that everyone already knew how to use, and port it over with as little friction as possible. If smart phones happened to be invented before external keyboards the layouts probably would have been quite different.

jerf7 months ago

"The entire point of the layout is to keep your fingers on the home row."

No, that is how you're told to type. You have to be told to type that way precisely because QWERTY is not designed to keep your fingers on the home row. If you type in a layout that is designed to do that, you don't need to be told to keep your fingers on the home row, because you naturally will.

Nobody really knows what the designers were thinking, which I do not mean as sarcasm, I mean it straight. History lost that information. But whatever they were thinking that is clearly not it because it is plainly obvious just by looking at it how bad it is at that. Nobody trying to design a layout for "keeping your fingers on the home row" would leave hjkl(semicolon) under the resting position of the dominant hand for ~90% of the people.

This, perhaps in one of technical history's great ironies, makes it a fairly good keyboard for swype-like technologies! A keyboard layout like Dvorak that has "aoeui" all right next to each other and "dhtns" on the other would be constantly having trouble figuring out which one you meant between "hat" and "ten" to name just one example. "uio" on qwerty could probably stand a bit more separation, but "a" and "e" are generally far enough apart that at least for me they don't end up confused, and pushing the most common consonants towards the outer part of the keyboard rather than clustering them next to each other in the center (on the home row) helps them be distinguishable too. "fghjkl" is almost a probability dead zone, and the "asd" on the left are generally reasonably distinct even if you kinda miss one of them badly.

I don't know what an optimal swype keyboard would be, and there's probably still a good 10% gain to be made if someone tried to make one, but it wouldn't be enough to justify learning a new layout.

bigtunacan7 months ago

Hold up young one. The reason for QWERTYs design has absolutely not been lost to history yet.

The design was to spread out the hammers of the most frequently used letters to reduce the frequency of hammer jamming back when people actually used typewriters and not computers.

The problem it attempted to improve upon, and which is was pretty effective at, is just a problem that no longer exists.

jerf7 months ago

Also apocryphal: https://en.wikipedia.org/wiki/QWERTY#Contemporaneous_alterna...

And it does a bad job at it, which is further evidence that it was not the design consideration. People may not have been able to run a quick perl script over a few gigabytes of English text, but they would have gotten much closer if that was the desire. I don't believe that was their goal but they were just too stupid to get it even close to right.

throw109207 months ago

> The design was to spread out the hammers of the most frequently used letters to reduce the frequency of hammer jamming

That's a folk myth that's mostly debunked.

https://www.smithsonianmag.com/arts-culture/fact-of-fiction-...

+1
saagarjha7 months ago
kjkjadksj7 months ago

You have to be taught to use the home row because the natural inclination for most people is to peck and hunt with their two index fingers. Watch how old people or young kids type. That being said staying on the home row is how you type fast and make the most of the layout. Everything is comfortably reachable for the most part unless you are a windows user ime.

+1
jerf7 months ago
mvdtnz7 months ago

> Nobody really knows what the designers were thinking, which I do not mean as sarcasm, I mean it straight. History lost that information.

My understanding of QWERTY layout is that it was designed so that characters frequently used in succession should not be able to be typed in rapid succession, so that typewriter hammers had less chance of colliding. Or is this an urban myth?

+1
GolfPopper7 months ago
Animats7 months ago

The Twitter-linked Bloomberg page is now down.[1] Alternative page: [2] New page says it was partly faked. Can't find old page in archives.

[1] https://www.bloomberg.com/opinion/articles/2023-12-07/google...

[2] https://www.bloomberg.com/opinion/articles/2023-12-07/google...

atdaemon7 months ago

The report from TechCrunch has more details - https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...

sowbug7 months ago

I am similarly enraged when TV show characters respond to text messages faster than humans can type. It destroys the realism of my favorite rom-coms.

pizzafeelsright7 months ago

I suppose this is a great example of how trust in authentic videos, audio, images, company marketing must be questioned and, until verified, assumed to be 'generated'.

I am curious, if the voice, email, chat, and shortly video can all be entirely generated in real or near real time, how can we be sure that remote employee is actually not a full or partially generated entity?

Shared secrets are great when verifying but when the bodies are fully remote - what is the solution?

I am traveling at the moment. How can my family validate that it is ME claiming lost luggage and requesting a Venmo request?

raincole7 months ago

If you can't verify whether your employee is AI, then you fire them and replace them with AI.

vasco7 months ago

The question is if an attacker tells you they lost access can you please reset some credential, and your security process is getting on a video call because you're a fully remote company let's say.

takoid7 months ago

>I am traveling at the moment. How can my family validate that it is ME claiming lost luggage and requesting a Venmo request?

PGP

tadfisher7 months ago

Now you have two problems.

(I say this in jest, as a PGP user)

vasco7 months ago

Ask for information that only the actual person would know.

pizzafeelsright7 months ago

That will only work once if the channels are monitored.

vasco7 months ago

You only know one piece of information about your family? I feel like I could reference many childhood facts or random things that happened years ago in social situations.

adewinter7 months ago

Make up a code phrase/word for emergencies, share it with your family, then use it for these types of situations.

mikepurvis7 months ago

Fair, but that also assumes the recipients ("family") are in a mindset of constantly thinking about the threat model in this type of situation and will actually insist on hearing the passphrase.

pizzafeelsright7 months ago

This will only work once.

robbomacrae7 months ago

I think it's also why we as a community should speak out when we catch them for doing this as they are discrediting tech demos. It won't be enough because a lie will be around the world before the truth gets out the starting gates but we can't just let this go unchecked.

kjkjadksj7 months ago

At this point, probably a handwritten letter. Back to the 20th century we go.

kweingar7 months ago

The video itself and the video description give a disclaimer to this effect. Agreed that some will walk away with an incorrect view of how Gemini functions, though.

Hopefully realtime interaction will be part of an app soon. Doesn’t seem like there would be too many technical hurdles there.

TillE7 months ago

The entirety of the disclaimer is "sequences shortened throughout", in tiny text at the bottom for two seconds.

They do disclose most of the details elsewhere, but the video itself is produced and edited in such a way that it's extremely misleading. They really want you to think that it's responding in complex ways to simple voice prompts and a video feed, and it's just not.

dogprez7 months ago

Yea, of all the edits in the video, the editing for timing is the least of concern. My gripe is that the prompting was different and in order to get that information you have to watch the video only on YouTube, expand the description and click on a link to a different blog article. Linking a "making of" video where they show this and interview some of the minds behind Gemini would have been better PR.

jefftk7 months ago

The disclaimer in the description is "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity."

That's different from "Gemini was shown selected still images and not video".

tobr7 months ago

What I found impressive about it was the voice, the fast real-time response to video, and the succinct responses. So apparently all of that was fake. You got me, Google.

anigbrowl7 months ago

People don't really pay attention to disclaimers. Google made a choice knowing people would remember the hype, not the disclaimer.

3pt141597 months ago

I remember watching it and I was pretty impressed, but as I was walking around thinking to myself I came to the conclusion that there was something fishy about the demo. I didn't know exactly what they fudged, but it was far too polished to explain how well their current AI demos preform.

I'm not saying there have been no improvements in AI. There is and this includes Google. But the reason why ChatGPT has really taken over the world is that the demo is in your own hands and it does quite well there.

anigbrowl7 months ago

Indeed, and this is how Google used to be as a company. I remember when Google Maps & Earth launched, and how they felt like world-changing technology. I'm sure they're doing lots of innovative science and development still, but it's and advertising/services company now, and one that increasingly talks down to its users. Disappointing considering their early sense of mission.

Thinking back to the firm's early days, it strikes me that some HN users and perhaps even some Googlers have no memory of a time before Google Maps and simply can't imagine how disruptive and innovative things like that were at the time. Being able to browse satellite imagery for the whole world was something previously confined to the upper echelons of the military-industrial complex.

That's one reason I wish the firm (along with several other tech giants) were broken up; it's full of talented innovative people, but the advertising economics at the core of their business model warp everything else.

lainga7 months ago

    :%s/Google/the team
    :%s/people/the promotion board
Conway's law applied to the corporate-public interface :)
skepticATX7 months ago

No. The disclaimer was not nearly enough.

The video fooled many people, including myself. This was not your typical super optimized and scripted demo.

This was blatant false advertising. Showing capabilities that do not exist. It’s shameful behavior from Google, to be perfectly honest.

titzer7 months ago

Yeah, and ads on Google search have the teeniest, tiniest little "ad" chip on them, a long progression of making ads more in-your-face and less well-distinguished.

In my estimation, given the context around AI-generated content and general fakery, this video was deceptive. The only impressive thing about the video (to me) was how snappy and fluid it seemed to be, presumably processing video in real time. None of that was real. It's borderline fraudulent.

Jagerbizzle7 months ago

They were just parroting this video on CNBC without any disclaimers, so the viewers who don't happen to also read hacker news will likely form a different opinion than those of us who do.

peteradio7 months ago

If there weren't serious technical hurdles they wouldn't have faked it.

ruszki7 months ago

The difference between “Hey, figure out a game based on what you see right now” vs “here is a description of a game with the only too possible outcomes as examples” cannot be explained by the disclaimer.

billconan7 months ago

performance and cost are hurdles?

kweingar7 months ago

It can be realtime while still having more latency than depicted in the video (and the video clearly stated that Gemini does not respond that quickly).

A local model could send relevant still images from the camera feed to Gemini, along with the text transcript of the user’s speech. Then Gemini’s output could be read aloud with text-to-speech. Seems doable within the present cost and performance constraints.

sinuhe697 months ago

I, too, was fooled to think Gemini has seen and heard through a video/audio feed instead of showing still images and prompting though text. While it might seem not much difference between still images and a video feed, in fact it requires a lot of (changing) context understanding to not make the bot babbling like an idiot all the time. It also requires the bot to recognize the “I don’t know it yet” state to keep appropriate silence in a conversation with live video feed, which is notoriously difficult with generative AI. Certainly one can do some hacking, build in some heuristics to make it easier, but to make a bot seems like a human partner in a conversion is indeed very hard. And that has been the most impressive aspect of the showed “conversations”, which are unfortunately all faked :(

crazygringo7 months ago

Does it matter at all with regards to its AI capabilities though?

The video has a disclaimer that it was edited for latency.

And good speech-to-text and text-to-speech already exists, so building that part is trivial. There's no deception.

So then it seems like somebody is pressing a button to submit stills from a video feed, rather than live video. It's still just as useful.

My main question then is about the cup game, because that absolutely requires video. Does that mean the model takes short video inputs as well? I'm assuming so, and that it generates audio outputs for the music sections as well. If those things are not real, then I think there's a problem here. The Bloomberg article doesn't mention those, though.

beering7 months ago

Even your skeptical take doesn't fully show how faked this was.

> The video has a disclaimer that it was edited for latency.

There was no disclaimer that the prompts were different from what's shown.

> And good speech-to-text and text-to-speech already exists, so building that part is trivial. There's no deception.

Look at how many people thought it can react to voice in real-time - the net result is that a lot of people (maybe most?) were deceived. And the text prompts were actually longer and more specific than what was said in the video!

> somebody is pressing a button to submit stills from a video feed, rather than live video.

Somebody hand-picked images to convey exactly the right amount of information to Gemini.

> Does that mean the model takes short video inputs as well? I'm assuming so

It was given a hand-picked series of still images with the hands still on the cups so that it was easier to understand what cup moved where.

Source for the above: https://developers.googleblog.com/2023/12/how-its-made-gemin...

phire7 months ago

I'm ok with "edited for latency" or "only showing the golden path".

But the most impressive part of the demo, was the way the LLM just seemed to know when to jump in with a response. It appeared to be able to wait until the user had finished the drawing, or even jumping in slightly before the drawing finished. At one point the LLM was halfway though a response and then saw the user was now colouring the duck in blue, and started talking about how the duck appearing to be blue.

The LLM also appeared to know when a response wasn't needed because the user was just agreeing with the LLM.

I'm not sure how many people noticed that on a conscious level, but I positive everyone noticed it subconsciously, and felt the interaction was much more natural.

As you said, good speed-to-text and speech-to-text has already been done, along with multi-model image/video/audio LLMs and image/music generation. The only novel thing google appeared to be demonstrating and what was most impressive was this apparent natural interaction. But that part was all fake.

IanCal7 months ago

Audio input that's not text in the middle and video input are two things they made a big deal out of. Then they called it a hands on demo and it was faked.

> My main question then is about the cup game, because that absolutely requires video.

They did it with carefully timed images, and provided a few examples first.

> I'm assuming so, and that it generates audio outputs for the music sections as well

No, it was given the ability to search for music and so it was just generating search terms.

Here's more details:

https://developers.googleblog.com/2023/12/how-its-made-gemin...

Inward7 months ago

Yes, that was obvious as soon as I saw it wasn’t live I clicked off. You can train any LLM to perform a certain task(s) well and google engineers are not that dense. This was obvious marketing PR as open AI has completely made google basically obsolete with 90% of my queries can be answered without wading through LLM generated text for a simple answer.

nirvael7 months ago

>without wading through LLM generated text

...OpenAI solved this by generating LLM text for you to wade through?

rose_ann_7 months ago

No. It solved it by (most of the time) giving the OP and I the answer to our queries, without us needing to wade through spammy SERP links.

kweingar7 months ago

If LLMs can replace 90% of your queries, then you have very different search patterns from me. When I search on Kagi, much of the time I’m looking for the website of a business, a public figure’s social media page, a restaurant’s hours of operation, a software library’s official documentation, etc.

LLMs have been very useful, but regular search is still a big part of everyday life for me.

balder19917 months ago

Sure we now have options, but before LLMs, most queries relied solely on search engines, often leading to sifting through multiple paragraphs on websites to find answers — a barrier for most users.

Today, LLMs excel in providing concise responses, addressing simple, curious questions like, "Do all bees live in colonies?"

GolfPopper7 months ago

How do you tell a plausible wrong answer from a real one?

+1
rose_ann_7 months ago
data-ottawa7 months ago

GPT4 search is a very good experience.

Though because you don’t see the answers it doesn’t show you, it’s hard to really validate the quality, so I’m still wary, but when I look for specific stuff it tends to find it.

nojvek7 months ago

Anyone remember the Google IO demo where they had their “AI” call a barber to book an appointment.

Turns out it was all staged.

Lost a lot of trust after that.

Google is stuck in innovators dilemma.

They make 300B of revenue which ~90% is ads revenue.

Their actual mission that management chain optimizes for is their $ growth.

A superior AI model that gives the user exactly what they want would crash their market cap.

Microsoft has tons of products with Billion+ profit, Google has only a handful and other than cloud they all tie to Ads.

Google is addicted to ads. If chrome adds a feature that decreases ad revenue, that team gets a stick.

Nothing at Google should jeopardize their ad revenue.

AI is directly a threat to Google’s core business model - ads. It’s obvious they’re gonna half ass it.

For OpenAI, AI is existential for them. If they don’t deliver, they’ll die.

Alifatisk7 months ago

No way, was Google Duplex fake?!

Alifatisk7 months ago

Bummer

modeless7 months ago

That's not the only thing wrong. Gemini makes a false statement in the video, serving as a great demonstration of how these models still outright lie so frequently, so casually, and so convincingly that you won't notice, even if you have a whole team of researchers and video editors reviewing the output.

It's the single biggest problem with LLMs and Gemini isn't solving it. You simply can't rely on them when correctness is important. Even when the model has the knowledge it would need to answer correctly, as in this case, it will still lie.

The false statement is after it says the duck floats, it continues "It is made of a material that is less dense than water." This is false; "rubber" ducks are made of vinyl polymers which are more dense than water. It floats because the hollow shape contains air, of course.

jmathai7 months ago

This seems to be a common view among some folks. Personally, I'm impartial.

Search or even asking other expert human beings are prone to provide incorrect results. I'm unsure where this expectation of 100% absolute correctness comes from. I'm sure there are use cases, but I assume it's the vast minority and most can tolerate larger than expected inaccuracies.

dylan6047 months ago

> I'm unsure where this expectation of 100% absolute correctness comes from.

It's a computer. That's why. Change the concept slightly: would you use a calculator if you had to wonder if the answer was correct or maybe it just made it up? Most people feel the same way about any computer based anything. I personally feel these inaccuracies/hallucinations/whatevs are only allowing them to be one rung up from practical jokes. Like I honestly feel the devs are fucking with us.

bostonpete7 months ago

Speech to text is often wrong too. So is autocorrect. And object detection. Computers don't have to be 100% correct in order to be useful, as long as we don't put too much faith in them.

clumpthump7 months ago

Call me old fashioned, but I would absolutely like to see autocorrect turned off in many contexts. I much prefer to read messages with 30% more transparent errors rather than any increase in opaque errors. I can tell what someone meant if I see "elephent in the room", but not "element in the room" (not an actual example, autocorrect would likely get that one right).

+1
dylan6047 months ago
llbeansandrice7 months ago

People put too much faith in conspiracy theories they find on YT, TikTok, FB, Twitter, etc. What you're claiming is already not the norm. People already put too much faith into all kinds of things.

kamikaz1k7 months ago

Okay, but search is done on a computer, and like the person you’re replying to said, we accept close enough.

I don’t necessarily disagree with your interpretation, but there’s a revealed preference thing going on.

The number of non-tech ppl I’ve heard directly reference ChatGPT now is absolutely shocking.

+2
bitvoid7 months ago
altruios7 months ago

why should all computing be deterministic?

let me show you this "genius"/"wrong-thinking" person as to say about AL(artificial life) and deterministic computing.

https://www.cs.unm.edu/~ackley/

https://www.youtube.com/user/DaveAckley

To sum up a bunch of their content: You can make intractable problems solvable/crunchable if you allow just a little error into the result (which is reduced the longer the calculation calculates). And this is acceptable for a number of use cases where initial accuracy is less important that instant feedback.

It is radically different from a Von Neumann model of a computer - where there is a deterministic 'totalitarian finger pointer' pointing to some registry (and only one registry at a time) is an inherently limited factor. In this model - each computational resource (a unit of ram, and a processing unit) fights for and coordinates reality with it's neighbors without any central coordination.

Really interesting stuff. still in its infancy...

jen207 months ago

"Computer says no" is not a meme for no reason.

binwiederhier7 months ago

I'm a software engineer, and I more or less stopped asking ChatGPT for stuff that isn't mainstream. It just hallucinates answers and invents config file options or language constructs. Google will maybe not find it, or give you an occasional outdated result, but it rarely happens that it just finds stuff that's flat out wrong (in technology at least).

For mainstream stuff on the other hand ChatGPT is great. And I'm sure that Gemini will be even better.

potatolicious7 months ago

The important thing is that with Web Search as a user you can learn to adapt to varying information quality. I have a higher trust for Wikipedia.org than I do for SEO-R-US.com, and Google gives me these options.

With a chatbot that's largely impossible, or at least impractical. I don't know where it's getting anything from - maybe it trained on a shitty Reddit post that's 100% wrong, but I have no way to tell.

There has been some work (see: Bard, Bing) where the LLM attempts to cite its sources, but even then that's of limited use. If I get a paragraph of text as an answer, is the expectation really that I crawl through each substring to determine their individual provenances and trustworthiness?

The shape of a product matters. Google as a linker introduces the ability to adapt to imperfect information quality, whereas a chatbot does not.

As an exemplar of this point - I don't trust when Google simply pulls answers from other sites and shows it in-line in the search results. I don't know if I should trust the source! At least there I can find out the source from a single click - with a chatbot that's largely impossible.

jmathai7 months ago

> it rarely happens that it just finds stuff that's flat out wrong

"Flat out wrong" implies determinism. For answers which are deterministic such as "syntax checking" and "correctness of code" - this already happens.

ChatGPT, for example, will write and execute code. If the code has an error or returns the wrong result it will try a different approach. This is in production today (I use the paid version).

+1
bongodongobob7 months ago
yieldcrv7 months ago

I use chatgpt4 for very obscure things

If I ever worried about being quoted then I’ll verify the information

otherwise I’m conversational, have taken an abstract idea into a concrete one and can build on top of it

But I’m quickly migrating over to mistral and if that starts going off the rails I get an answer from chatgpt4 instead

epalm7 months ago

I know exactly where the expectation comes from. The whole world has demanded absolute precision from computers for decades.

Of course, I agree that if we want computers to “think on their own“ or otherwise “be more human“ (whatever that means) we should expect a downgrade in correctness, because humans are wrong all the time.

jmathai7 months ago

> The whole world has demanded absolute precision from computers for decades.

Computer engineers maybe. I think the general population is quite tolerant of mistakes as long as the general value is high.

People generally assign very high value to things computers do. To test this hypothesis all you have to do is ask folks to go a few days without their computer or phone.

creer7 months ago

> The whole world has demanded absolute precision from computers

The opposite. Far too tolerant of the excuse "sorry, computer mistake." (But yeah, just at the same time as "the computer says so".)

lancesells7 months ago

Is it less reliable than an encyclopedia? It is less reliable than Wikipedia? Those aren't infallible but what's the expectation if it's wrong on something relatively simple?

With the rush of investment in dollars and to use these in places like healthcare, government, security, etc. there should be absolute precision.

SkyBelow7 months ago

Humans are imperfect, but this comes with some benefits to make up for it.

First, we know they are imperfect. People seem to put more faith into machines, though I do sometimes see people being too trusting of other people.

Second, we have methods for measuring their imperfection. Many people develop ways to tell when someone is answering with false or unjustified confidence, at least in fields they spend significant time in. Talk to a scientist about cutting edge science and you'll get a lot of 'the data shows', 'this indicates', or 'current theories suggest'.

Third, we have methods to handle false information that causes harm. Not always perfect methods, but there are systems of remedies available when experts get things wrong, and these even include some level of judging reasonable errors from unreasonable errors. When a machine gets it wrong, who do we blame?

howenterprisey7 months ago

Absolutely! And fourth, we have ways to make sure the same error doesn't happen again; we can edit Wikipedia, or tell the person they were wrong (and stop listening to them if they keep being wrong).

taurath7 months ago

I find it ironic that computer scientists and technologists are frequently uberrationalists to the point of self parody but they get hyped about a technology that is often confidently wrong.

Just like the hype with AI and the billions of dollars going into it. There’s something there but it’s a big fat unknown right now whether any part of the investment will actually pay off - everyone needs it to work to justify any amount of the growth of the tech industry right now. When everyone needs a thing to work, it starts to really lose the fundamentals of being an actual product. I’m not saying it’s not useful, but is it as useful as the valuations and investments need it to be? Time will tell.

Frost1x7 months ago

>I'm unsure where this expectation of 100% absolute correctness comes from. I'm sure there are use cases, but I assume it's the vast minority and most can tolerate larger than expected inaccuracies.

As others hinted at, there's some bias because it's coming from a computer, but I think it's far more nuanced than that.

I've worked with many experts and professionals through my career ranging across medicine, various types of engineers, scientists, academics, researchers and so on and the pattern I often see is the level of certainty presented that always bothers me and the same is often embedded in LLM responses.

While humans don't typically quantify the certainty of their statements, the best SMEs I've ever worked with make it very clear what level of certainty they have when making professional statements. The SMEs who seem to be more often wrong than not speak in certainty quite often (some of this is due to cultural pressures and expectations surrounding being an "expert").

In this case, I would expect a seasoned scientist to say something in response to the duck question that: "many rubber ducks exist and are designed to float, this one very well might, we'd really need to test it or have far more information about the composition of the duck, the design, the medium we want it in (Water? Mecury? Helium?)" and so on. It's not an exact answer but you understand there's uncertainty there and we need to better clarify our question and the information surrounding that question. The fact is, it's really complex to know if it'll float or not from visual information alone.

It could have an osmimum ball inside that overcomes most the assumed buoyancy the material contains, including the air demonstrated to make it squeak. It's not transparent. You don't know for sure and the easiest way to alleviate uncertainty in this case is simply to test it.

There's so much uncertainty in the world, around what seem like the most certain and obvious things. LLMs seem to have grabbed some of this bad behavior from human language and culture where projecting confidence is often better (for humans) than being correct.

pid-17 months ago

Most people I worked with either tell me "I don't know" or "I think x, but with not sure" when they are not sure about something, the issue with LLMs is they don't have this concept.

tdeck7 months ago

The bigger problem is lack of context. When I speak with a person or review search results, I can use what I know about the source to evaluate the information I'm given. People have different areas of expertise and use language and mannerisms to communicate confidence in their knowledge or lack thereof. Websites are created by people (most times) and have a number of contextual clues that we have learned to interpret over the years.

LLMs do none of this. They pose as a confident expert on almost everything, and are just as likely to spit out BS as a true answer. They don't cite their sources, and if you ask for the source sometimes they provide ones that don't contain the information cited or don't even exist. If you hired a researcher and they did that you wouldn't hire them again.

kaffeeringe7 months ago

1. Hunans may also never be 100% - but it seems they are more often correct. 2. When AI is wrong it's often not only slighty off, but completely off the rails. 3. Humans often tell you when they are not sure. Even if it's only their tone. AI is always 100% convinced it's correct.

henriquez7 months ago

It’s not AI it’s a machine learning model

snowwrestler7 months ago

If it’s no better than asking a random person, then where is the hype? I already know lots of people who can give me free, maybe incorrect guesses to my questions.

At least we won’t have to worry about it obtaining god-like powers over our society…

JadeNB7 months ago

> At least we won’t have to worry about it obtaining god-like powers over our society…

We all know someone who's better at self promotion than at whatever they're supposed to be doing. Those people often get far more power than they should have, or can handle—and ChatGPT is those people distilled.

stefan_7 months ago

Let's see, so we exclude law, we exclude medical.. it's certainly not a "vast minority" and the failure cases are nothing at all like search or human experts.

jmathai7 months ago

Are you suggesting that failure cases are lower when interacting with humans? I don't think that's my experience at all.

Maybe I've only ever seen terrible doctors but I always cross reference what doctors say with reputable sources like WebMD (which I understand likely contain errors). Sometimes I'll go straight to WebMD.

This isn't a knock on doctors - they're humans and prone to errors. Lawyers, engineers, product managers, teachers too.

+1
stefan_7 months ago
sorokod7 months ago

Guessing from the last sentence that you are one of those "most" who "can tolerate larger than expected inaccuracies".

How much inaccuraciy would that be ?

eviks7 months ago

Where did you get the 100% number from? It's not in the original comment, it's not in a lot of similar criticisms of the models.

modeless7 months ago

Honestly I agree. Humans make errors all the time. Perfection is not necessary and requiring perfection blocks deployment of systems that represent a substantial improvement over the status quo despite their imperfections.

The problem is a matter of degree. These models are substantially less reliable than humans and far below the threshold of acceptability in most tasks.

Also, it seems to me that AI can and will surpass the reliability of humans by a lot. Probably not by simply scaling up further or by clever prompting, although those will help, but by new architectures and training techniques. Gemini represents no progress in that direction as far as I can see.

observationist7 months ago

There's a huge difference between demonstrating something with fuzzy accuracy and playing something off as if it's giving good, correct answers. An honest way to handle that would be to highlight where the bot got it wrong instead of running with the answer as if it was right.

Deception isn't always outright lying. This video was deceitful in form and content and presentation. Their product can't do what they're implying it can, and it was put together specifically to mislead people into thinking it was comparable in capabilities to gpt-4v and other competitor's tech.

Working for Google AI has to be infuriating. They're doing some of the most cutting edge research with some of the best and brightest minds in the field, but their shitty middle management and marketing people are doing things that undermine their credibility and make them look like untrustworthy fools. They're a year or more behind OpenAI and Anthropic, barely competitive with Meta, and they've spent billions of dollars more than any other two companies, with a trashcan fire for a tech demo.

It remains to be seen whether they can even outperform Mistral 7b or some of the smaller open source models, or if their benchmark numbers are all marketing hype.

latexr7 months ago

If a human expert gave wrong answers as often and as confidently as LLMs, most would consider no longer asking them. Yet people keep coming back to the same LLM despite the wrong answers to ask again in a different way (try that with a human).

This insistence on comparing machines to humans to excuse the machine is as tiring as it is fallacious.

toxik7 months ago

Aside: this is not what impartial means.

eurleif7 months ago

To be fair, one could describe the duck as being made of air and vinyl polymer, which in combination are less dense than water. That's not how humans would normally describe it, but that's kind of arbitrary; consider how aerogel is often described as being mostly made of air.

colonwqbang7 months ago

Is an aircraft carrier made of a material that is less dense than water?

skeaker7 months ago

I think you can safely say that air is a critical component of an aircraft carrier. I suppose the frame of it is not made of air, but the ballasts are designed with air in mind and are certainly made to utilize air. The whole system fails without air, meaning that it requires air to function. It comes down to a definitional argument of the word "made" which is pointless.

colonwqbang7 months ago

I guess it's a purely philosophical question. But no normal person would say "my house is made of air" or "atoms are made of vacuum".

leeoniya7 months ago

only if you average it out over volume :P

andrewmutz7 months ago

Is an aircraft carrier made of metal and air? Or just metal?

bee_rider7 months ago

Where’s the distinction between the air that is part of the boat, and the air that is not? If the air is included in the boat, should we all be wearing life vests?

oh_sigh7 months ago

If I take all of the air out of a toy duck, it is still a toy duck. If I take all of the vinyl/rubber out of a toy duck, it is just the atmosphere remaining

modeless7 months ago

The material of the duck is not air. It's not sealed. It would still be a duck in a vacuum and it would still float on a liquid the density of water too.

glitchc7 months ago

Well this seems like a huge nitpick. If a person said that, you would afford them some leeway, maybe they meant the whole duck, which includes the hollow part in the middle.

As an example, when most people say a balloon's lighter than air, they mean an inflated balloon with hot air or helium, but you catch their meaning and don't rush to correct them.

modeless7 months ago

The model specifically said that the material is less dense than water. If you said that the material of a balloon is less dense than air, very few people would interpret that as a correct statement, and it could be misleading to people who don't know better.

Also, lighter-than-air balloons are intentionally filled with helium and sealed; rubber ducks are not sealed and contain air only incidentally. A balloon in a vacuum would still contain helium (if strong enough) but would not rise, while a rubber duck in a vacuum would not contain air but would still easily float on a liquid of similar density to water.

The reason why it seems like a nitpick is that this is such an inconsequential thing. Yeah, it's a false statement but it doesn't really matter in this case, nobody is relying on this answer for anything important. But the point is, in cases where it does matter these models cannot be trusted. A human would realize when the context is serious and requires accuracy; these models don't.

awongh7 months ago

I’m not an expert but I suspect that this aspect of lack of correctness in these models might be fundamental to how they work.

I suppose there’s two possible solutions: one is a new training or inference architecture that somehow understand “facts”. I’m not an expert so I’m not sure how that would work, but from what I understand about how a model generates text, “truth” can’t really be a element in the training or inference that affects the output.

the second would be a technology built on top of the inference to check correctness, some sort of complex RAG. Again not sure how that would work in a real world way.

I say it might be fundamental to how the model works because as someone pointed out below, the meaning of the word “material” could be interpreted as the air inside the duck. The model’s answer was correct in a human sort of way, or to be more specific in a way that is consistent with how a model actually produces an answer- it outputs in the context of the input. If you asked it if PVC is heavier than water it would answer correctly.

Because language itself is inherently ambiguous and the model doesn’t actually understand anything about the world, it might turn out that there’s no universal way for a model to know what’s true or not.

I could also see a version of a model that is “locked down” but can verify the correctness of its statements, but in a way that limits its capabilities.

ajkjk7 months ago

> this aspect of lack of correctness in these models might be fundamental to how they work.

Is there some sense in which this isn't obvious to the point of triviality? I keep getting confused because other people seem to keep being surprised that LLMs don't have correctness as a property. Even the most cursory understanding of what they're doing understands that it is, fundamentally, predicting words from other words. I am also capable of predicting words from other words, so I can guess how well that works. It doesn't seem to include correctness even as a concept.

Right? I am actually genuinely confused by this. How is that people think it could be correct in a systematic way?

michaelt7 months ago

I think very few people on this forum believe LLMs are correct in a systematic way, but a lot of people seem to think there's something more than predicting words from other words.

Modern machine learning models contain a lot of inscrutable inner layers, with far too many billions of parameters for any human to comprehend, so we can only speculate about what's going on. A lot of people think that, in order to be so good at generating text, there must be a bunch of understanding of the world in those inner layers.

If a model can write convincingly about a soccer game, producing output that's consistent with the rules, the normal flow of the game and the passage of time - to a lot of people, that implies the inner layers 'understand' soccer.

And anyone who noodled around with the text prediction models of a few decades ago, like Markov chains, Bayesian text processing, sentiment detection and things like that can see that LLMs are massively, massively better than the output from the traditional ways of predicting the next word.

spadufed7 months ago

> Is there some sense in which this isn't obvious to the point of triviality?

This is maybe a pedantic "yes", but is also extremely relevant to the outstanding performance we see in tasks like programming. The issue is primarily the size of the correct output space (that is, the output space we are trying to model) and how that relates to the number of parameters. Basically, there is a fixed upper bound on the amount of complexity that can be encoded by a given number of parameters (obvious in principle, but we're starting to get some theory about how this works). Simple systems or rather systems with simple rules may be below that upper bound, and correctness is achievable. For more complex systems (relative to parameters) it will still learn an approximation, but error is guaranteed.

I am speculating now, but I seriously suspect the size of the space of not only one or more human language but also every fact that we would want to encode into one of these models is far too big a space for correctness to ever be possible without RAG. At least without some massive pooling of compute, which long term may not be out of the question but likely never intended for individual use.

If you're interested, I highly recommend checking out some of the recent work around monosemanticity for what fleshing out the relationship between model-size and complexity looks like in the near term.

janalsncm7 months ago

Just to play devil’s advocate: we can train neural networks to model some functions exactly, given sufficient parameters. For example simple functions like ax^2 + bx + c.

The issue is that “correctness” isn’t a differentiable concept. So there’s no gradient to descend. In general, there’s no way to say that a sentence is more or less correct. Some things are just wrong. If I say that human blood is orange that’s not more incorrect than saying it’s purple.

carstenhag7 months ago

Because it is assumed that it can think or/and reason. In this case, knowing the concepts of density, the density of a material, detecting the material from an image, detecting what object this image is. And, most importantly, knowing that this object is not solid. Because then it could not float.

xwolfi7 months ago

Maybe you simplify a bit what "guessing words from other words" means. HOW do you guess this, is what's mysterious to many: you can guess words from other words due to habit of language, a model of mind of how other people expect you to predict, a feedback loop helping you do it better over time if you see people are "meh" at your bad predictions, etc.

So if the chatbot is used to talking, knows what you'd expect, and listens to your feedback, why wouldn't it also want to tell the truth like you would instinctively, even best effort only ?

Sadly, the chatbots doesn't yet really care about the game it's playing, it doesn't want to make it interesting, it's just like a slave producing minimal low-effort outputs. I've talked to people exploited for money in dark places, and when they "seduce" you, they talk like a chatbot: most of it is lie, it just has to convince you a little bit to go their way, they pretend to understand or care about what you say, but end of the day, the goal is for you to pay. Like the chatbot.

awongh7 months ago

Yeah. I think there's some ambiguity around the meaning of reasoning- because it is a kind of reasoning to say a Duck's material is less dense than water. In a way it's reasoned that out, and it might actually say something about the way a lot of human reasoning works.... (especially if you've ever listened to certain people talk out loud and say to yourself... huh?)

ilaksh7 months ago

Bing chat uses gpt-4 and sites sources from it's retrieval.

freedomben7 months ago

I think this problem needs to be solved at a higher level, and in fact Bard is doing exactly that. The model itself generates its output, and then higher-level systems can fact check it. I've heard promising things about feeding back answers to the model itself to check for consistency and stuff, but that should be a higher level function (and seems important to avoid infinite recursion or massive complexity stemming from the self-check functionality).

modeless7 months ago

I'm not a fan of current approaches here. "Chain of thought" or other approaches where the model does all its thinking using a literal internal monologue in text seem like a dead end. Humans do most of their thinking non-verbally and we need to figure out how to get these models to think non-verbally too. Unfortunately it seems that Gemini represents no progress in this direction.

dragonwriter7 months ago

> "Chain of thought" or other approaches where the model does all its thinking using a literal internal monologue in text seem like a dead end. Humans do most of their thinking non-verbally and we need to figure out how to get these models to think non-verbally too.

Insofar as we can say that models think at all between the input and the stream of tokens output, they do it nonverbally. Forcing the structure of reduce some of it to verbal form short of the actual response-of-concern does not change that, just as the fact that humans reduce some of their thought to verbal form to work through problems doesn't change that human thought is mostly nonverbal.

(And if you don't consider what goes on between input and output thought, than chain of thought doesn't force all LLM thought to be verbal, because only the part that comes out in words is "thought" to start with in that case -- you are then saying that the basic architecture, not chain of thought prompting, forces all thought to be verbal.)

+1
modeless7 months ago
janalsncm7 months ago

The point of “verbalizing” the chain of thought isn’t that it’s the most effective method. And frankly I don’t think it matters that humans think non verbally. The goal isn’t to create a human in a box. Verbalizing the chain of thought allows us to audit the thought process, and also create further labels for training.

+1
modeless7 months ago
freedomben7 months ago

> Humans do most of their thinking non-verbally and we need to figure out how to get these models to think non-verbally too.

That's a very interesting point, both technically and philosophically.

Where Gemini is "multi-modal" from training, how close do you think that gets? Do we know enough about neurology to identical a native language in which we think? (not rhetorical questions, I'm really wondering)

janalsncm7 months ago

Neural networks are only similar to brains on the surface. Their learning process is entirely different and their internal architecture is different as well.

We don’t use neural networks because they’re similar to brains. We use them because they are arbitrary function approximators and we have an efficient algorithm (backprop) coupled with hardware (GPUs) to optimize them quickly.

twobitshifter7 months ago

I, a non-AGI, just ‘hallucinated’ yesterday. I hallucinated that my plan was to take all of Friday off and started wondering why I had scheduled morning meetings. I started canceling them in a rush. In fact, all week I had been planning to take a half day, but somehow my brain replaced the idea of a half day off with a full day off. You could have asked me and I would have been completely sure that I was taking all of friday off.

crazygringo7 months ago

EDIT: never mind, I missed the exact wording about being "made of a material..." which is definitely false then. Thanks for the correction below.

Preserving the original comment so the replies make sense:

---

I think it's a stretch to say that's false.

In a conversational human context, saying it's made of rubber implies it's a rubber shell with air inside.

It floats because it's rubber [with air] as opposed to being a ceramic figurine or painted metal.

I can imagine most non-physicist humans saying it floats because it's rubber.

By analogy, we talk about houses being "made of wood" when everybody knows they're made of plenty of other materials too. But the context is instead of brick or stone or concrete. It's not false to say a house is made of wood.

modeless7 months ago

> In a conversational human context, saying it's made of rubber implies it's a rubber shell with air inside.

Disagree. It could easily be solid rubber. Also, it's not made of rubber, and the model didn't claim it was made of rubber either, so it's irrelevant.

> It floats because it's rubber [with air] as opposed to being a ceramic figurine or painted metal.

A ceramic figurine or painted metal in the same shape would float too. The claim that it floats because of the density of the material is false. It floats because the shape is hollow.

> It's not false to say a house is made of wood.

It's false to say a house is made of air simply because its shape contains air.

furyofantares7 months ago

This is what the reply was:

> Oh, it it's squeaking then it's definitely going to float.

> It is a rubber duck.

> It is made of a material that is less dense than water.

Full points for saying if it's squeaking then it's going to float.

Full points for saying it's a rubber duck, with the implication that rubber ducks float.

Even with all that context though, I don't see how "it is made of a material that is less dense than water" scores any points at all.

yowzadave7 months ago

Yeah, I think arguing the logic behind these responses misses the point, since an LLM doesn't use any kind of logic--it just responds in a pattern that mimics the way people respond. It says "it is made of a material that is less dense than water" because that is a thing that is similar to what the samples in its training corpus have said. It has no way to judge whether it is correct, or even what the concept of "correct" is.

When we're grading the "correctness" of these answers, we're really just judging the average correctness of Google's training data.

Maybe the next step in making LLM's more "correct" is not to give them more training data, but to find a way to remove the bad training data from the set?

ugh1237 months ago

I don't see it as a problem with most non-critical uses cases (critical being things like medical diagnoses, controlling heavy machinery or robotics, etc).

LLMs right now are most practical for generating templated text and images, which when paired with an experienced worker, can make them orders of magnitude more productive.

Oh, DALL-E created graphic images with a person with 6 fingers? How long would it have taken a pro graphic artist to come up with all the same detail but with perfect fingers? Nothing there they couldn't fix in a few minutes and then SHIP.

zer00eyz7 months ago

>> Nothing there they couldn't fix in a few minutes and then SHIP.

If by ship, you mean put directly into the public domain then yes.

https://www.goodwinlaw.com/en/insights/publications/2023/08/...

and for more interesting takes: https://www.youtube.com/watch?v=5WXvfeTPujU&

FoeNyx7 months ago

After asserting it's a rubber duck, there are some claims without follow-up:

- Just after that it doesn't translate the "rubber" part

- It states there's no land nearby for it to rest or find food in the middle of the ocean: if it's a rubber duck it doesn't need to rest nor feed. (That's a missed opportunity to mention the infamous "Friendly Floatees spill"[1] in 1992 as some rubber ducks floated to that map position). Although it seems to recognize geographical features of the map, it fails to mention Easter Island is relatively nearby. And if it were recognized as a simple duck — which it described as a bird swimming in the water — it seems oblivious to the fact that the duck might feed itself in the water. It doesn't mention either that the size of the duck seems abnormally big in that map context.

- The concept of friends and foes doesn't apply to a rubber duck either. Btw labeling the duck picture as a friend and the bear picture as a foe seems arbitrary (e.g. a real duck can be very aggressive even with other ducks.)

Among other things, the astronomical riddle seems also flawed to me: it answered "The correct order is Sun, Earth, Saturn".

I'd like for it to state :

- the premises it used, like "Assuming it depicts the Sun, Saturn and the Earth" (there are other stars, other ringed-planets, and the Earth similarity seems debatable)

- the sorting criteria it used (e.g. using another sorting key like the average distance from us "Earth, Sun, Saturn" can be a correct order)

[1] https://en.wikipedia.org/wiki/Friendly_Floatees_spill

lemmsjid7 months ago

I did some reading and it seems that rubber's relative density to water has to do with its manufacturing process. I see a couple of different quotes on the specific gravity of so-called 'natural rubber', and most claim it's lower than water.

Am I missing something?

I asked both Bard (Gemini at this point I think?) and GPT-4 why ducks float, and they both seemed accurate: they talked about the density of the material plus the increased buoyancy from air pockets and went into depth on the principles behind buoyancy. When pressed they went into the fact that "rubber"'s density varies by the process and what it was adulterated with, and if it was foamed.

I think this was a matter of the video being a brief summary rather than a falsehood. But please do point out if I'm wrong on the rubber bit, I'm genuinely interested.

I agree that hallucinations are the biggest problems with LLMs, I'm just seeing them get less commonplace and clumsy. Though, to your point, that can make them harder to detect!

modeless7 months ago

Someone on Twitter was also skeptical that the material is more dense than water. I happened to have a rubber duck handy so I cut a sample of material and put it in water. It sinks to the bottom.

Of course the ultimate skeptic would say one test doesn't prove that all rubber ducks are the same. I'm sure someone at some point in history has made a rubber duck out of material that is less dense than water. But I invite you to try it yourself and I expect you will see the same result unless your rubber duck is quite atypical.

Yes, the models will frequently give accurate answers if you ask them this question. That's kind of the point. Despite knowing that they know the answer, you still can't trust them to be correct.

lemmsjid7 months ago

Ah good show :). I was rather preoccupied with the question but didn't have one handy. Well, I do, but my kid would roast me slowly over coals if I so much as smudged it. Ah the joy of the Internet, I did not predict this morning that I would end the day preoccupied with the question of rubber duck density!

I guess for me the question of whether or not the model is lying or hallucinating is if it's correctly summarizing its source material. I find very conflicting materials on the density of rubber, and most of the sources that Google surfaces claim a lower density than water. So it makes sense to me that the model would make the inference.

I'm splitting hairs though, I largely agree with your comment above and above that.

To illustrate my agreement: I like testing AIs with this kind of thing... a few months ago I asked GPT for advice as to how to restart my gas powered water heater. It told me the first step was to make sure the gas was off, then to light the pilot light. I then asked it how the pilot light was supposed to stay lit with the gas off and it backpedaled. My imagining here is that because so many instructional materials about gas powered devices emphasize to start by turning off the gas, that weighted it as the first instruction.

Interesting, the above shows progress though. I realized I asked GPT 3.5 back then, I just re-asked 3.5 and then asked 4 for the first time. 3.5 was still wrong. 4 told me to initially turn off the gas to disappate it, then to ensure gas was flowing to the pilot before sparking it.

But that said I am quite familiar with the AI being confidently wrong, so your point is taken, I only really responded because I was wondering if I was misunderstanding something quite fundamental about the question of density.

dogprez7 months ago

That's a tricky one though since the question is, is the air inside of the rubber duck part of the material that makes it? If you removed the air it definitely wouldn't look the same or be considered a rubber duck. I gave it to the bot since when taking ALL the material that makes it a rubber duck, it is less dense than water.

modeless7 months ago

A rubber duck in a vacuum is still a rubber duck and it still floats (though water would evaporate too quickly in a vacuum, it could float on something else of the same density).

dogprez7 months ago

A rubber duck with a vacuum inside (removing the air material) of it is just a piece of rubber with eyes. Assuming OP's point about the rubber not being less dense than water, it would sink, no?

+1
modeless7 months ago
bee_rider7 months ago

If you hold a rubber duck under water and squeeze out the air, it will fill with water and still be a rubber duck. If you send a rubber duck into space, it will become almost completely empty but still be a rubber duck. Therefore, the liquid used to fill the empty space inside it is not part of the duck.

I mean apply this logic to a boat, right? Is the entire atmosphere part of the boat? Are we all on this boat as well? Is it a cruise boat? If so, where is my drink?

WhitneyLand7 months ago

Agree, then the question becomes how will this issue play out?

Maybe AI correctness will be similar to automobile safety. It didn’t take long for both to be recognized as fundamental issues with new transformative technologies.

In both cases there seems to be no silver bullet. Mitigations and precautions will continue to evolve, with varying degrees of effectiveness. Public opinion and legislation will play some role.

Tragically accidents will happen and there will be a cost to pay, which so far has been much higher and more grave for transportation.

bbarnett7 months ago

Devil's advocate. It is made of a material less dense than water. Air.

It certainly isn't how I would phrase it, and I wouldn't count air as what something is made of, but...

Soda pop is chocked full of air, it's part of it! And I'd say carbon dioxide is a part of the recipe, of pop.

So it's a confusing world for a young LLM.

(I realise it may have referenced rubber prior, but it may have meant air... again, Devil's advocate)

modeless7 months ago

When you make carbonated soda you put carbon dioxide in deliberately and use a sealed container to hold it in. When you make a rubber duck you don't put air in it deliberately and it is not sealed. Carbonated soda ceases to be carbonated when you remove the air. A rubber duck in a vacuum is still a rubber duck and it even still floats.

bbarnett7 months ago

If the rubber duck has air inside, it is known, and intentional, for it is part of that design.

If you remove the air from the duck, and stop it so it won't refill, you have a flat rubber duck, which is useless for its design.

Much as flat pop is useless for its design.

And this nuance is even more nuance-ish than this devil's advocate post.

modeless7 months ago

A rubber duck in a vacuum (not a duck in atmosphere with a vacuum only inside) would not go flat or pop. It would remain entirely normal, as useful as it ever was, and it would still float on a liquid the density of water. Removing the air would have no effect on the duck whatsoever. It's not part of the material of the duck in any reasonable interpretation.

But pedantic correctness isn't even what matters here. The model made a statement where the straightforward interpretation is false and misleading. A person who didn't know better would be misled. Whether you can possibly come up with a tortured alternative interpretation that is technically not incorrect is irrelevant.

bitshiftfaced7 months ago

There's nothing wrong with what you're saying, but what do you suggest? Factuality is an area of active research, and Deepmind goes into some detail in their technical paper.

The models are too useful to say, "don't use them at all." Hopefully people will heed the warnings of how they can hallucinate, but further than that I'm not sure what more you can expect.

modeless7 months ago

The problem is not with the model, but with its portrayal in the marketing materials. It's not even the fact that it lied, which is actually realistic. The problem is the lie was not called out as such. A better demo would have had the user note the issue and give the model the opportunity to correct itself.

bitshiftfaced7 months ago

But you yourself said that it was so convincing that the people doing the demo didn't recognize it as false, so how would they know to call it out as such?

I suppose they could've deliberately found a hallucination and showcased it in the demo. In which case, pretty much every company's promo material is guilty of not showcasing negative aspects of their product. It's nothing new or unique to this case.

modeless7 months ago

They should have looked more carefully, clearly. Especially since they were criticized for the exact same thing in their last launch.

rowanG0777 months ago

The duck is indeed made of a material that is less dense. Namely water and air.

If you go to such technical routes your definition is wrong too. It doesn't float because it contains air. If you poke in the head of the duck it will sink. Even though at all times it contains air.

recursive7 months ago

The duck is made of water and air? Which duck are we talking about here.

brookst7 months ago

Is it possible for humans to be wrong about something, without lying?

windowshopping7 months ago

I don't agree with the argument that "if a human can fail in this way, we should overlook this failing in our tooling as well." Because of course that's what LLMs are, tools, like any other piece of software.

If a tool is broken, you seek to fix it. You don't just say "ah yeah it's a broken tool, but it's better than nothing!"

All these LLM releases are amazing pieces of technology and the progress lately is incredible. But don't rag on people critiquing it, how else will it get better? Certainly not by accepting its failings and overlooking them.

stocknoob7 months ago

“Broken” is word used by pedants. A broken tool doesn’t work. This works, most of the time.

Is a drug “broken” because it only cures a disease 80% of the time?

The framing most critics seem to have is “it must be perfect”.

It’s ok though, their negativity just means they’ll miss out on using a transformative technology. No skin off the rest of us.

bee_rider7 months ago

I think the comparison to humans is just totally useless. It isn’t even just that, as a tool, it should be better than humans at the thing it does, necessarily. My monitor is on an arm, the arm is pretty bad at positioning things compared to all the different positions my human arms could provide. But it is good enough, and it does it tirelessly. A tool is fit for a purpose or not, the relative performance compared to humans is basically irrelevant.

I think the folks making these tools tend to oversell their capabilities because they want us to imagine the applications we can come up with for them. They aren’t selling the tool, they are selling the ability to make tools based on their platform, which means they need to be speculative about the types of things their platform might enable.

rkeene27 months ago

If a broken tool is useful, do you not use it because it is broken ?

Overpowered LLMs like GPT-4 are both broken (according to how you are defining it) and useful -- they're just not the idealized version of the tool.

+1
freejazz7 months ago
freedomben7 months ago

I think you're reading a lot into GP's comment that isn't there. I don't see any ragging on people critiquing it. I think it's perfectly compatible to think we should continually improve on these things while also recognizing that things can be useful without being perfect

freejazz7 months ago

I don't think people are disputing that things can be useful without being perfect. My point was that when things aren't perfect, they can also lead to defects that would not otherwise be perceived based upon the belief that the tool was otherwise working at least adequately. Would you use a staple gun if you weren't sure it was actually working? If it's something you don't know a lot about, how can you be sure it's working adequately?

lxgr7 months ago

Lying implies an intent to deceive despite, or giving a response despite having better knowledge, which I'd argue LLMs can't do, at least not yet. It just requires a more robust theory of mind than I'd consider them to realistically be capable of.

They might have been trained/prompted with misinformation, but then it's the people doing the training/prompting who are lying, still not the LLM.

hunter2_7 months ago

To the question of whether it could have intent to deceive, going to the dictionary, we find that intent essentially means a plan (and computer software in general could be described as a plan being executed) and deceive essentially means saying something false. Furthermore, its plan is to talk in ways that humans talk, emulating their intelligence, and some intelligent human speech is false. Therefore, I do believe it can lie, and will whenever statistically speaking a human also typically would.

Perhaps some humans never lie, but should the LLM be trained only on that tiny slice of people? It's part of life, even non-human life! Evolution works based on things lying: natural camouflage, for example. Do octopuses and chameleons "lie" when they change color to fake out predators? They have intent to deceive!

og_kalu7 months ago

Not to say this example was lying but they can lie just fine - https://arxiv.org/abs/2311.07590

+1
lxgr7 months ago
vkou7 months ago

Most humans I professionally interact with don't double down on their mistakes when presented with evidence to the contrary.

The ones that do are people I do my best to avoid interacting with.

LLMs act more like the latter, than the former.

eviks7 months ago

Given the misleading presentation by real humans in these "whole teams" that this tweet corrects, this doesn't illustrate any underlying powers by the model

PepperdineG7 months ago

>It's the single biggest problem with LLMs and Gemini isn't solving it.

I loved it when the lawyers got busted for using a hallucinating LLM to write their briefs.

omginternets7 months ago

People seem to want to use LLMs to mine knowledge, when really it appears to be a next-gen word-processor.

margorczynski7 months ago

LLMs do not lie, nor do they tell the truth. They have no goal as they are not agents.

modeless7 months ago

With apologies to Dijkstra, the question of whether LLMs can lie is about as relevant as the question of whether submarines can swim.

ace23587 months ago

I totally agree with you on the confident lies. And it’s really tough. Technically the duck is made out of air and plastic right?

If I pushed the model further on the composition of a rubber duck, and it failed to mention its construction, then it’d be lying.

However there is this disgusting part of language where a statement can be misleading, technically true, not the whole truth, missing caveats etc.

Very challenging problem. Obviously Google decided to mislead the audience and basically cover up the shortcomings. Terrible behaviour.

modeless7 months ago

Calling the air inside the duck (which is not sealed inside) part of its "material" would be misleading. That's not how most people would interpret the statement and I'm confident that's not the explanation for why the statement was made.

onedognight7 months ago

The air doesn’t matter. Even with a vacuum inside it would float. It’s the overall density of “the duck” that matters, not the density of the plastic.

hunter2_7 months ago

A canoe floats, and that doesn't even command any thought regarding whether you can replace trapped air with a vacuum. If you had a giant cube half full of water, with a boat on the water, the boat would float regardless of whether the rest of the cube contained air or vacuum, and regardless of whether the boat traps said air (like a pontoon) or is totally vented (like a canoe). The overall density of the canoe is NOT influenced by its shape or any air, though. The canoe is strictly more dense than water (it will sink if it capsizes) yet in the correct orientation it floats.

What does matter, however, is the overall density of the space that was water and became displaced by the canoe. That space can be populated with dense water, or with a less dense canoe+air (or canoe+vacuum) combination. That's what a rubber duck also does: the duck+air (or duck+vacuum) combination is less dense than the displaced water.

mechagodzilla7 months ago

No, the density of the object is less than water, not the density of the material. The Duck is made of plastic, and it traps air. Similarly, you can make a boat that floats in water out of concrete or metal. It is an important distinction when trying to understand buoyancy.

__s7 months ago

It also says the attribute of squeaking means it'll definitely float

bongodongobob7 months ago

That's actually pretty clever because if it squeaks, there is air inside. How many squeaking ducks have you come across that don't float?

davesque7 months ago

You could call it clever or you could call it a spurious correlation.

catchnear43217 months ago

language models do not lie. (this pedantic distinction being important, because language models.)

b598317 months ago

[dead]

zozbot2347 months ago

Good, that video was mostly annoying and creepy. The AI responses as shown in the linked Google dev blogpost are a lot more reasonable and helpful. BTW I agree that the way the original video was made seems quite misleading in retrospect. But that's also par for the course for AI "demos", it's an enduring tradition in that field and part of its history. You really have to look at production systems and ignore "demos" and pointless proofs of concept.

danielbln7 months ago

The GPT-4 demo early this year when it was released was a lot less.. fake, and in fact very much indicative of it's feature set. The same is true for what OpenAI showed during their dev days, so at the very least those demos don't have too much fakery going on, as far as I could tell.

ukuina7 months ago

A certain minimum level of jank always makes demos more believable. Watching Brockman wade through Discord during the napkin-to-website demo immediately made the whole thing convincing.

AI is in the "hold it together with hope and duct tape" phase, and marketing videos claiming otherwise are easy to spot and debunk.

Frost1x7 months ago

>You really have to look at production systems and ignore "demos" and pointless proofs of concept.

While I agree, I wouldn't call proofs or concepts and demos pointless. They often illustrate a goal or target functionality you're working towards. In some cases it's really just a matter of allotting some time and resources to go from a concept to a product, no real engineering is needed, it all exists, but there's capital needed to get there.

Meanwhile some proof of concepts skip steps and show higher level function that needs some serious breakthrough work to get to, maybe multiple steps of that. Even this is useful because it illustrates a vision that may be possible so people can understand and internalize things you're trying to do or the real potential impact of something. That wasn't done here, it was embedded in a side note. That information needs to be before the demo to some degree without throwing a wet blanket on everything and needs to be in the same medium as the demo itself so it's very clear what you're seeing.

I have no problem with any of that. I have a lot of problems when people don't make it explicitly clear beforehand that it's a demo and explain earnestly what's needed. Is it really something that exists today in working systems someone just needs to invest money and wire it up without new research needed? Or is it missing some breakthroughs, how many/what are they, how long have these things been pursued, how many people are working on them... what does recent progress look like and so on (in a nice summarized fashion).

Any demo/poc should come up front with an earnest general feasibility assessment. When a breakthrough or two are needed then that should skyrocket. If it's just a lot of expensive engineering then that's also a challenge but tractable.

I've given a lot of scientific tech demonstrations over the years and the businesses behind me obviously want me to be as vague as possible to pull money in. I of course have some of those same incentives (I need to eat and pay my mortgage like everyone else). None-the-less the draw of science to me has always been pulling the veil from deception and mystery and I'm a firm believer in being as upfront as possible. If you don't lead with disclaimers, imaginations run wild into what can be done today. Adding disclaimers helps imaginations run wild about what can be done tomorrow, which I think is great.

peteradio7 months ago

What the Quack? I found it tasty as pâté.

Nekorosu7 months ago

Gemini demo looks like ChatGPT with a video feed, except it doesn't exist, like ChatGPT. I have ChatGPT on my phone right now, and it works (and it can process images, audio, and audio feed in). This means Google has shown nothing of substance. In my world, it's a classic stock price manipulation move.

onlyrealcuzzo7 months ago

Gemini Pro is available on Bard now.

Ultra is not yet available.

replwoacause7 months ago

Yeah and have you tried it? It’s as dogshit as the original Bard.

CrayKhoi7 months ago

I've been using Gemini in Bard since the launch, with respect to coding it is outperforming GPT4 in my opinion. There is some convergence in the answers,but Bard is outputting really good code now.

+1
replwoacause7 months ago
suriyaG7 months ago

The bloomberg article seems to have been taken down and is now going to 404. https://www.bloomberg.com/opinion/articles/2023-12-07/google...

dilap7 months ago

Just an error in the link, here's the corrected version: https://www.bloomberg.com/opinion/articles/2023-12-07/google...

thrtythreeforty7 months ago

and here's a readable version: https://archive.ph/ABhZi

rollulus7 months ago

I watched this video, impressed, and thought: what if it’s fake. But then dismissed the thought because it would come out and the damage wouldn’t be worth it. I was wrong.

imiric7 months ago

The worst part is that there won't be any damage. They'll release a blog post with PR apologies, but the publicity they got from this stunt will push up their brand in mainstream AI conversations regardless.

"There's no such thing as bad publicity."

steego7 months ago

There’s no such thing as bad publicity only applies to people and companies that know how to spin it.

Reading the comments of all these disillusioned developers, it’s already damaged them because now smart people will be extra dubious when Google starts making claims.

They just made it harder for themselves to convince developers to even try their APIs, let alone bet on them.

This was stupid.

xnx7 months ago

That demo was much further on the "marketing" end of the spectrum when compared to some of their other videos from yesterday which even included debug views: https://youtu.be/v5tRc_5-8G4?t=43

iamleppert7 months ago

You can tell whoever put together that demo video gave no f*cks whatsoever. This is the quality of work you can expect under an uninspiring leader (Sundar) in a culture of constant layoff fear and bureaucracy.

Literally everyone I know who works at Google hates their job and are completely checked out.

CamperBob27 months ago

Huh? It was a GREAT demo video!

If it had been real, that is.

throwitaway2227 months ago
recursive7 months ago

Thanks. This is the first I'm hearing of a duck demo, and couldn't figure out what it was.

Garrrrrr7 months ago
valine7 months ago

It’s not live, but it’s in the realm of outputs I would expect from a GPT trained on video embeddings.

Implying they’ve solved single token latency, however, is very distasteful.

zozbot2347 months ago

OP says that Gemini had still images as input, not video - and the dev blog post shows it was instructed to reply to each input in relevant terms. Needless to say, that's quite different from what's implied in the demo, and at least theoretically is already within GPT's abilities.

valine7 months ago

How do you think the cup demo works? Lots of still images?

watusername7 months ago

A few hand-picked images (search for "cup shuffling"): https://developers.googleblog.com/2023/12/how-its-made-gemin...

valine7 months ago

Holy crap that demo is misleading. Thanks for the link.

cryptoz7 months ago

I'll admit I was fooled. I didn't read the description of the video. The most impressive thing they showed was the real-time responses to watching a video. Everything else was about expected.

Very misleading and sad Google would so obviously fake a demo like this. Mentioning in the description that it's edited is not really in the realm of doing enough to make clear the fakery.

LesZedCB7 months ago

i too was excited and duped about the real-time implications. though i'm not surprised at all to find out it's false.

mea cupla i should have looked at the bottom of the description box on youtube where it probably says "this demonstration is based on an actual interaction with an LLM"

wvenable7 months ago

I'm surprised it was false. It was made to look realistic and I wouldn't expect Google to fake this kind of thing.

All they've done is completely destroy my trust in anything they present.

teleforce7 months ago

Is the link to the article broken and anyone has it archived somewhere?

I wish people stop posting Twitter messages to HN and provide a link directly to the original article. What's next, post on HN on an Instagram post?

gloosx7 months ago

I dont get it too, my browser was loading for good 15 seconds and made 141 requests fetching almost 9 MB of resources to show me exactly same content as provided in OpenGraph tags and a freaking redirect to a Bloomberg link. Feels like a slap in the face to open such phishing link at any time, just a useless redirect with nine million bytes of overhead.

kaoD7 months ago

How is this not false advertising?

barbazoo7 months ago

Or worse, fraud to make their stock go up

edit: s/stuck/stock

Alifatisk7 months ago

Why did you have to mention your edit?

barbazoo7 months ago

It was a late one

drcode7 months ago

I suppose it's not false advertising, since they don't even claim to have a product released yet that can do this, since Trojans Ultra won't be available until an unspecified time next year

imiric7 months ago

It's still false advertising.

This is common in all industries. Take gaming, for example. Game publishers love this kind of publicity, as it creates hype, which leads to sales. There have been numerous examples of this over the years: Watch Dogs, No Man's Sky, Cyberpunk 2077, etc. There's a period of controversy once consumers realize they've been duped, the company releases some fake apology and promises or doubles down, but they still walk out of it richer, and ready to do it again next time.

It's absolutely insidious, and should be heavily fined and regulated.

stephbu7 months ago

You're right, it's astroturfing a placeholder in the market in the absence of product. The difference is probably just the target audience - feels like this one is more aimed at share-holders and internal politics.

empath-nirvana7 months ago

possibly securities fraud though. Their stock popped a few percent on the back of that faked demo.

Tao33007 months ago

It's a software demo. If you ever gave an honest demo, you gave a bad demo. If you ever saw a good and honest demo, you were fooled.

dragontamer7 months ago

As a programmer, I'd say that all the demos of my code were honest and representative of what my code was doing.

But I recognize we're all different programmers in different circumstances. But at a minimum, I'd like to be honest with my work. My bosses seem to agree with me and I've never been pressured into hosting a fake demo or lie about the features.

In most cases, demos are needed because there's that dogfood problem. Its just not possible for me to know how my (prospective) customers will use my code. So I need to show off what has been coded, my progress, and my intentions for the feature set. In response, the (prospective) customer may walk away, they may have some comments that increases the odds of adoption, or they think its cool and amazing and take it on the spot. We can go back and forth with regards to feature changes or what is possible, but that's how things should work.

------------

I've done a few "I could do it like this" demos, where everyone in the room knew that I didn't finish the code yet and its just me projecting into the future of how code would work and/or how it'd be used. But everyone knew the code wasn't done yet (despite that, I've always delivered on what I've promised).

There is a degree of professional ethics I'd expect from my peers. Hosting honest demos is one of them, especially with technical audience members.

saagarjha7 months ago

I prefer to let my software be good enough to let it speak for itself without resorting to fraud, thank you ver much.

eh_why_not7 months ago

There was also the cringey "niiice!", "sweeeet!", "that's greaatt", "that's actually pretty good" responses from the narrator in a few of the demo videos that gave them the feel of a cheap 1980's TV ad.

carabiner7 months ago

It really reminds me of the Black Mirror episode Smithereens with the tech CEO talking with the shooter. Tech people really struggle with empathy, not just 1 on 1 but with the rest of the outside world which is predominantly low income relatively, with no college education. Paraphrased, Black Mirror ep was like:

[Tech CEO read instructions to "show empathy" from his assistant via Slack]

CEO: I hear you. It must be very hard for you.

Shooter: Of course you fucking hear me, we're on the phone! Talk like a normal person!

baobabKoodaa7 months ago

I remember that conversation! Man, that was a great episode.

neilv7 months ago

I missed the disclaimer. So, when watching it, I started to think "Wow, so Google is releasing their best stuff".

But then I soon noticed some things that were too smooth, so seemed at best to be cherry-picked interactions occasionally leaning on hand-crafted situation handlers. Or, it turns out, faked.

Regardless of disclaimers, this video seems misleading to be releasing right now, in the context of OpenAI eating Google's lunch.

Everyone is expecting Google to try to show they can do better. This isn't that. This isn't even an mocked-up interaction future of HCI concept video, because it's not showing a vision of what people want to do --- it's only showing a demo of technical capabilities.

It's saying "This is what a contrived tech demo (not application vision concept) could look like, but we can't do it yet, so we faked it. Hopefully, the viewer will get the message that we're competitive with OpenAI."

(This fake demo could just be an isolated oops of a small group, not representative of Google's ability to rise to the current disruption challenge, I don't know.)

miraculixx7 months ago

I knew immediately this was just overhyped PR when I noticed the author of the blogpost is Sundar.

jonplackett7 months ago

OK I get that everyone’s hype sensitive and I absolutely remain to be convinced on Gemini’s actual ability

BUT

The fact this wasn’t realtime or with voice is not the issue. Voice to text could absolutely capture this conversation easily. And from what I’ve seen Gemini seems quicker than GPT4

Being able to respond quicker and via voice chat is not actually a big deal.

The underlying performance of the model is what we should be focussing on

michaelt7 months ago

I disagree.

One of the major issues in LLMs is the economics; a lot of people suspect ChatGPT loses money on every user, or at least every heavy user, because they've got a big model and A100 GPUs are expensive and in short supply.

They're kinda reluctant to have customers, with API rate limits galore, and I've heard people claiming ChatGPT has lost the performance crown having switched to a cheaper-to-run model.

If google had a model that operated on video in realtime, that would imply they've got a model that performs well, and is also very fast or that their 'TPUs' outperform the A100 quite a bit, either of which would be a big step forward.

turquoisevar7 months ago

Even if you'd be inclined to shrug off the fact that this wasn't real-time voice and video-based, which you shouldn't because the underlying implications would be huge for performance, there's still the matter that the prompts used. The prompts shown are miles apart and significantly misrepresent the performance of the underlying model.

It goes from a model being able to infer a great deal at the level of human intelligence to a model that needs to be fed essential details, and that doesn't do much inferring.

I get the feeling that many here on HN who are just shrugging it off don't realize how much of the “demo” was faked. Here’s a decent article that goes into it some more: https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...

pphysch7 months ago

Exactly.

Corporate tech demo exaggerates actual capabilities and smoothes over rough edges? Impossible, this is unprecedented!!

The Apple vs Google brand war is so tiresome. Let's focus on the tech.

jonplackett7 months ago

To be clear - I’m not saying this makes Gemini good. Just that it isn’t bad for these reasons!

kccqzy7 months ago

While this might just be a bit of bad PR now, it will eventually be a nothing burger. Remember the original debut of Apple's Siri for which Apple also put out a promotional demo with greatly exaggerated functionality? People even sued Apple and they lost.

As much as I hate it, this is absolutely fine by our society's standards. https://www.theregister.com/AMP/2014/02/14/apple_prevails_in...

turquoisevar7 months ago

There's a vast difference between advertising a product, slightly shortening the sequences and, cutting out failed prompts, and completely misrepresenting the product at hand to a degree that the depiction doesn't resemble the product at all[0].

The former is considered Puffery[1] and is completely legal, and the latter is straight up lying.

0: https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...

1: https://en.wikipedia.org/wiki/Puffery

pookietactics7 months ago

At the 5:35 point, the phone screen rotates before he touches it. https://youtu.be/UIZAiXYceBI?t=335

sakesun7 months ago

This is what convince me that all are hoax.

onemoresoop7 months ago

It seems like the fake video did the trick, their stock is up 5.5% today.

sheepscreek7 months ago

I don’t understand why is Gemini even considered “jaw-dropping” to begin with. GPT-4V has set the bar so high that all their demos and presentations paled in comparison. And it’s available for anyone to use. People have already build mind-blowing demos with it (like https://arstechnica.com/information-technology/2023/11/ai-po...).

The entire launch felt like a concentrated effort to “appear” competitive to OpenAI. Google was splitting hairs talking about low single digit percentage improvement in benchmarks. Against a model that has been out for over 6 months.

I have never been so unimpressed with them. Not only has OpenAI managed to snag this one from under Google’s nose, IMO - they seem to be defending their lead quite well. Now that is something unmistakably remarkable. Color me impressed!

kjkjadksj7 months ago

Some other commenter, a former googler, a while back alluded to figuring out the big secret and being thrown for a tizzy at the resulting cognitive dissonance they realize they’ve been buying into. Its never about making a good product. Its about keeping up with the joneses in the eyes of tech investors. And just look at the movement on the stock today as a result of this probable lemon of a product: nothing else mattered except keeping up appearances. CEOs make historic careers optimizing companies for appearances over function like this.

sheepscreek7 months ago

I get that. But I don’t get why journalists who cover the industry and are expected to get it would call this “jaw dropping”. Maybe I am reading too much into it. It was likely added to increase the shock factor.

mirkodrummer7 months ago

I didn’t believed Google presentation off-hand because I don’t care anymore, especially because it comes from them. I just use tools and adapt. Copilot helps me automating boring tasks, can’t help much at new stuff, so I actually discovered I often do “interesting” work. I use gpt 3.5/4 for everything but work, it’s been a bless, best suggestion engine for movies, books and music with just a prompt and without the need of tons of data about my watch history(looking at you youtube). In these strange times I’m actually learning a lot more, productivity is more or less the same as before llms, but annoying tasks are relieved a bit. All of that without the hype. Sometimes I laugh at Google, it must be a real shit show inside that mega corporation, but I kinda understand the need of a marketing editing, having a first class ticket on the AI train is so important for them as it seems they see it as an existential threat. At least it seems so since they decided to take the risk of lying.

CamperBob27 months ago

Any sufficiently-advanced technology is indistinguishable from a rigged demo.

peteradio7 months ago

Fake it til you make it, then keep faking it.

mtrovo7 months ago

I guess a much better next step is to compare how GPT4V performs when asked similar prompts. Even if mostly staged this is very impressive to me, not much on the current tech but more on how much leverage Google has to win this race on the long run because of its hardware presence.

The more these models improve the more we will want less friction and faster interactions, this means that in the long term having to open an app and ask a question is not gonna fly compared to just pointing your phone camera to something, asking a question and getting an answer that's tailored to everything Google knows about you in real time.

Apple will most likely also roll their own in house solution for Siri instead of relying on an external company. This leaves OpenAI and the other small companies not just competing for the best models but also on how to put them in front of people in the first place and how to get access to their personal information.

bradhe7 months ago

> Even if mostly staged this is very impressive to me, not much on the current tech but more on how much leverage Google has to win this race on the long run because of its hardware presence.

I think you have too much information to form a reasonable opinion on the situation. Google is using editing techniques and specific scripting to try to demonstrate they have a sufficiently powerful general AI. The magnitude of this claim is huge, and the fact that they're faking it should be a likewise enormous scandal.

To sum this up "well I guess they're doing better than XYZ" discounts the absurd context of all this.

karaterobot7 months ago

This is endemic to public product demos. The thing never works as it does in the video. I'm not excusing it, I'm saying: don't trust public product demos. They are commercials, they exist to sell to you, not to document objectively and accurately, and they will always lie and mislead within the limits of the law.

1024core7 months ago

For more details about how the video was created, see this blog post: https://developers.googleblog.com/2023/12/how-its-made-gemin...

Kim_Bruning7 months ago

Even a year ago, this advert would have been obvious puffery in advertising.

But right now, all the bits needed to do this already exist (just need to be assembled and -to be fair- given a LOT of polish), so it would be somewhat reasonable to think that someone had actually Put In The Work already.

SheinhardtWigCo7 months ago

Just how many lives does Sundar have? Where is the board?

miraculixx7 months ago

Counting their bonusses?

maremmano7 months ago

Well, sometimes I have this "Google Duplex: A.I. Assistant Calls Local Businesses To Make Appointments" feeling.

https://www.youtube.com/watch?v=D5VN56jQMWM

davesque7 months ago

The hype really is drowning out the simple fact that basically no one really knows what these models are doing. Why does it matter so much that we include auto-correlation of embedding vectors as the "attention" mechanism in these models? And that we do this sufficiently many times across all the layers? And that we blindly smoosh values together with addition and call it a "skip" connection? Yes, you can tell me a bunch of stuff about gradients and residual information, but tell me why any of this stuff is or isn't a good model of causality.

Al-Khwarizmi7 months ago

I didn't even look at the demo. And not due to lack of interest in LLMs (I'm an academic working in NLP, and I have some work on LLMs).

The thing is that for all practical intents and purposes, if I can't try it, it doesn't exist. If they claim it exists, they should show it, a video or a few cherry-picked examples don't prove anything. It's easy to make a demo making even Eliza look like AGI by asking the right questions.

dingclancy7 months ago

What is clear is that Google really has no 'God model' that they were holding back all along.

Gemini Ultra is barely beating ChatGPT in their manufactured benchmarks, and this is all that they got.

What this means is that those who are saying, including people at Google, that they have better models but are not releasing them in the name of AI safety, imply that, at least in the realm of LLMs, Google DeepMind had nothing all along.

vjerancrnjak7 months ago

There is a possibility of dataset contamination on the competitive programming benchmark. A nice discussion on the page where AlphaCode2 was solving the problems https://codeforces.com/blog/entry/123035

Problem showed in the video was reused in a recent competition (so could have been available in the dataset).

gsuuon7 months ago

Wow - my first thought was I wonder what framerate they're sending video at. The whole demo seems significantly less impressive in that case.

OkGoDoIt7 months ago
Uptrenda7 months ago

Imagine being google and having 100 BILLION+ in liquid cash, tens of thousands of the best engineers world-wide, and everything you could possibly need for running tech products. Yet being completely unable to launch anything new or worthwhile. Like, how tf does that even happen? Is Google the next Kodak?

k__7 months ago

I really thought this was a realtime demo.

Shame on them :(

kokanee7 months ago

Bloomberg link not working; here's TechCrunch: https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...

imacomputertoo7 months ago

it was obviously marketing material, but if this tweet is right, then it was just blatant false advertising.

kjkjadksj7 months ago

Google always does fake advertising. “Unlimited” google drive accounts for example. They just have such a beastly legal team no one is going to challenge them on anything like that.

Dylan168077 months ago

What was fake about unlimited google drive? There were some people using petabytes.

The eventual removal of that tier and anything even close speaks to Google's general issues with cancelling services, but that doesn't mean it was less real while it existed.

ambrose27 months ago

What about when gmail was released and the storage was advertised as increasing forever, but at first they just increased it infinitesimally slower and then stopped increasing it all.

Dylan168077 months ago

Oh, long before google drive existed?

I don't remember the "increasing forever" ever being particularly fast. I found some results from 2007 and 2012 both saying it was 4 bytes per second, <130MB per year.

So it's true that the number hasn't increased in ten years, but that last increase was +5GB all by itself. They've done a reasonable job of keeping up.

Arguably they should have kept adding a gigabyte each year, based on the intermittent boosts they were giving, but by that metric they're only about 5GB behind.

larodi7 months ago

This must’ve been shot by one of the directors who did Apple’s new show Extrapolations. Very plausible illustration of AI in daily life though, despite the aggressive climate change claims made in it.

But neither AI not climate are there yet…

ShamelessC7 months ago

That show’s most out-there predictions all take place after 2050. Your criticism isn’t relevant.

jbverschoor7 months ago

Well, google has a history for faking things.. so I'm not not surprised. I expected that..

All companies are just yelling that they're "in" the AI/LLM game.. If they don't, share prices will drop.

macilacilove7 months ago

The demo may be fake, but people genuinely love it. the top comment on youtube of that demo is still:

"Absolutely mindblowing. The amount of understanding the model exhibits here is way way beyond anything else."

spaceman_20207 months ago

I’ll assume Gemini is vaporware unless I can actually use it.

dramm7 months ago

The more Google tries to over-hype stuff the more that keeps giving me a greater impression they are well behind OpenAI. Time to STFU and focus on working on stuff.

borissk7 months ago

Google did the same with Pixel 8 Pro advertising - they showed stuff like photo and video editing, that people couldn't replicate on their phones.

jaimex27 months ago

Google is done, they can't compete in this space.

milofeynman7 months ago

I looked at is as if it were a good aspirational target for 5 years from now. It was obvious the whole video was edited together not real time.

zdrummond7 months ago

This is just a tweet that makes a claim without backing, and links to an article that was pulled.

Can we change the URL to the real article if it still exists?

L_2267 months ago

the original linked article in the tweet [0] now returns 404 for me

[0] - https://www.bloomberg.com/opinion/articles/2023-12-07/google...

golly_ned7 months ago

If you've seen the video, it's very apparent it's a product video, not a tech demo. They cut out the latencies to make a compelling product video.

I wasn't at all under the impression they were showcasing TTS or low latencies as product features. I don't find the marketing misleading at all, and find these criticisms don't hit the mark.

https://www.youtube.com/watch?v=UIZAiXYceBI

DominikPeters7 months ago

It's not just cutting. The answers were obtained by taking still photos and inputting them into the model together with detailed text instructions explaining the context and the task to the model, giving some examples first and using careful chain-of-thought style prompting. (see e.g. https://developers.googleblog.com/2023/12/how-its-made-gemin...) My guess is that the video was fully produced after the Gemini outputs were generated by a different team, instead of while or before.

GaggiX7 months ago

I imagine the model also has some video embeddings like for the example when it needed to find where the ball was hiding.

pacomerh7 months ago

I was looking at this demo today and was wondering the same. It looked way to fast to be calculating all of that info.

zetabyte7 months ago

I wanted to see what's really going on, but the bloomberg article on the twit link seems taken down right now.

taspeotis7 months ago

Google Gemi-lie

peteradio7 months ago

Lol, could have done without the cocky narration. "I think we're done here."

cedws7 months ago

The whole launch is cocky. Bleh. Stick to the engineering.

nsonha7 months ago

why didn't they do so? what's the challenge? I suppose they could've programmed in some prompt indicator (like "OK Google" but less obvious), then that demo could have been technically feasible.

nblgbg7 months ago

That's my suspicion when I first saw it. Its really an impressive demo though.

jgalt2127 months ago

If the truck doesn't have a working engine, we can just roll it down the hill.

Brilliant idea!

drcongo7 months ago

Remember when they faked that Google Assistant booking a restaurant thing too.

umeshunni7 months ago

How was that fake?

dannyw7 months ago

It was actually a human doing the entire call.

miraculixx7 months ago

Mhm

retox7 months ago

AI: artificial incompetence

rattatossk7 months ago

It was in fact done by a duck.

I for one, welcome our new quack overlords.

ElijahLynn7 months ago

Link to the Bloomberg article from the Tweet is 404 now.

dilawar7 months ago

Bloomberg link in Xeet is 404 for me (Bangalore).

sugavaneshb7 months ago

I can't find the original article anymore

nyxtom7 months ago

Silicon Valley is plagued by over promising far too early and fooling everyone we are years ahead of where we actually are. Estimates are always way off, and claims and even fake demos are almost always oversold to get investors.

throw__away73917 months ago

Which means anyone who isn’t engaged in the hype cycle looks bad by comparison.

Still I would distinguish between genuine optimism and cynical exaggeration. If you go back and read what was being said in and around the 70s about technology and watch the demos they gave…it always makes me feel a little sad for them. People really thought so many things were right around the corner, e.g. if a computer can beat a human star chess player, surely understanding natural language or driving a car will be easy. There were so many things being made like primitive robots that could follow a line painted on the floor, that in a controlled environment seemed so promising but the realization of the vision would take decades longer than predicted. Without their efforts we would not be where we are today.

nyxtom7 months ago

True. I think the optimism and grandiosity of the engineers with gems in their eyes keeps the passion going to work hard - some are just are terrible at grounding ourselves in reality. Genuine science needs to take more foothold, skepticism and curiosity over certainty of accelerated outcomes or non existant (heck even fraudulent in some situations) capabilities

Alifatisk7 months ago

The bloomberg article gives 404 for me

lumost7 months ago

Anybody remember the google duo demo?

fennecfoxy7 months ago

I mean, I know people aren't gonna boycott Google (cause people aren't capable of boycotting anything), but fuck Google, man.

This AI stuff is becoming super cool and the last thing we need is the usual charlatans that surround technology stuff. They should be ashamed of themselves, but I know who called for it - not the engineers, but the "management" layer and they're incapable of feeling shame.

trash_cat7 months ago

I geniounly don't understand why this comes as a surprise. If it was a realtime video with realtime responses it would have been some other generational leap beyond transformers. Not only that, it would somehow deal with video + audio with smaller latency than the current SOTA model? Cmon...

skilled7 months ago

fake benchmarks, fake stitched together videos, disingenuous charts, no developer API on launch, announcements stuffed with marketing fluff.

As soon as I saw that opening paragraph from Sundar and how it was written I knew that Gemini is going to be a steaming pile of shit.

They should have watched the GPT-4 announcement from OpenAI again. That demo Greg Brockman did with converting a sketch on a piece of paper to a CodePen from a Discord channel, with all the error correcting and whatnot, is how you launch a product that's appealing to users.

TechCrunch, Twitter and some other sites (including HN i guess) are already piling on to this and by Monday things will go back to how they were and Google will have to go back to the drawing board to figure out another way to relaunch Gemini in the future.

FartyMcFarter7 months ago

Unpaywalled Bloomberg article linked in the tweet:

https://archive.is/4H1fB

bradhe7 months ago

Fucking. Shocking.

Anyone with half a brain could see through this "demo." It was vastly too uncanny to be real, to the point that it was poorly setup. Google should be ashamed.

stormfather7 months ago

So what? Voice to text is a solved problem. And in cases where realtime is important, just throw more compute at it. I'm missing the damning gotcha moment here.

squigglydonut7 months ago

Hard to build when you fire everyone huh

seydor7 months ago

I thought it was implied and obvious that the video was edited.

So what?

hifreq7 months ago

The red flag for me was that they started that demo video with a background noise to make it seem like it's a raw video. A subtle manipulation for no reason, it's obviously not a raw video.

The fact that they did not fact check the videos again makes me not particularly confident in the quality of Google's work. The bit where the model misinterpreted music notation (the circled area does not mean "piano"), and the "less dense than water" rubber duck are beyond the pale. The SVG demo where they generate a South Park looking tree looks like a parody.

artifact_447 months ago

[dead]

miraculixx7 months ago

[flagged]

nlpfromscratch7 months ago

[flagged]

frozenlettuce7 months ago

too little, too late. my impression is that google is not one, but two steps behind what MS can offer (they need a larger leap if they want to get ahead)