US Marines defeat DARPA robot by hiding under a cardboard box

360 points15
DrThunder10 hours ago

Hilarious. I immediately heard the Metal Gear exclamation sound in my head when I began reading this.

kayge10 hours ago

Hah, you beat me to it; Hideo Kojima would be proud. Sounds like DARPA needs to start feeding old stealth video games into their robot's training data :)

Apocryphon4 hours ago

Hilariously enough, Kojima is enough of a technothriller fabulist that DARPA is explicitly part of that franchise's lore - too bad they didn't live up to his depiction.

qikInNdOutReply9 hours ago

But the AI in stealth games is literally trained to go out of its way to not detect you.

Firmwarrior9 hours ago

The cardboard box trick doesn't actually work in Metal Gear Solid 2, at least not any better than you'd expect it to work in the real world

thelopa8 hours ago
skhr06801 hour ago

There’s even a scene where Snake tries “hiding” in a box and you can find and shoot him

doubled1129 hours ago
stirfish5 hours ago

I just learned you can use the different boxes to fast travel on the conveyor belts in Big Shell

matheusmoreira9 hours ago

I can practically hear the alert soundtrack in my head.

Also, TFA got the character and the game wrong in that screenshot. It's Venom Snake in Metal Gear Solid V, not Solid Snake in Metal Gear Solid.

ankaAr10 hours ago

I'm very proud of all of you for the reference.

sekai9 hours ago

Kojima predicted this

doyouevensunbro6 hours ago

Kojima is a prophet, hallowed be his name.

CatWChainsaw7 hours ago

That, plus the ProZD skit on Youtube:

"Well, I guess he doesn't... exist anymore?"

(unfortunately it's a Youtube short, so it will auto repeat.)

stordoff6 hours ago

> (unfortunately it's a Youtube short, so it will auto repeat.)

If you change->transform it to a normal video link, it doesn't:

dvngnt_2 hours ago

you can install an extension on desktop to do the same

CatWChainsaw6 hours ago

lifehack obtained!

brikwerk4 hours ago

Changing the "shorts" portion of the URL to "v" also works too.


Barrin9210 hours ago

after MGS 2 and Death Stranding that's one more point of evidence on the list that Kojima is actually from the future and trying to warn us through the medium of videogames

jstarfish7 hours ago

He's one of the last speculative-fiction aficionados...always looking at current and emerging trends and figuring out some way to weave them into [an often-incoherent] larger story.

I was always pleased but disappointed when things I encountered in the MGS series later manifested in reality...where anything you can dream of will be weaponized and used to wage war.

And silly as it sounds, The Sorrow in MGS3 was such a pain in the ass it actually changed my life. That encounter gave so much gravity to my otherwise-inconsequential acts of wanton murder, I now treat all life as sacred and opt for nonlethal solutions everywhere I can.

(I only learned after I beat both games that MGS5 and Death Stranding implemented similar "you monster" mechanics.)

bityard6 hours ago

> That encounter gave so much gravity to my otherwise-inconsequential acts of wanton murder, I now treat all life as sacred and opt for nonlethal solutions everywhere I can.

Hold up just a sec, do you make a living in organized crime or something?

xeromal2 hours ago

He means he's a pacificist in video games.

stirfish5 hours ago

Same, I deleted my save and restarted the game to go non-lethal after my first encounter with The Sorrow

nemo44x5 hours ago

“What was that noise?!…..Oh it’s just a box” lol because boxes making noise is normal.

pmarreck10 hours ago

I came here to make this reference and am so glad it was already here

jqpabc12314 hours ago

This is a good example of the type issues "full self driving" is likely to encounter once it is widely deployed.

The real shortcoming of "AI" is that it is almost entirely data driven. There is little to no real cognition or understanding or judgment involved.

The human brain can instantly and instinctively extrapolate from what it already knows in order to evaluate and make judgments in new situations it has never seen before. A child can recognize that someone is hiding under a box even if they have never actually seen anyone do it before. Even a dog could likely do the same.

AI; as it currently exists, just doesn't do this. It's all replication and repetition. Like any other tool, AI can be useful. But there is no "intelligence" --- it's basically as dumb as a hammer.

BurningFrog43 minutes ago

My seatbelt is even dumber. I still use it.

The usefulness of tech should be decided empirically, not by clever well phrased analogies.

frontfor38 minutes ago

No one is arguing AI isn’t useful. So your analogy failed completely.

afro883 hours ago

All the failures to detect humans will be used as training data to fine tune the model.

Just like a toddler might be confused when they first see a box with legs walking towards it. Or mistake a hand puppet for a real living creature when they first see it. I've seen this first hand with my son (the latter).

AI tooling is already capable of identifying whatever it's trained to. The DARPA team just hadn't trained it with varied enough data when that particular exercise occurred.

MagicMoonlight2 hours ago

That’s not learning, that’s just brute forcing every possible answer and trying to memorise them all.

willbudd2 hours ago

Not really. Depends entirely on how general-purpose (abstract) the learned concept is.

For example, detecting the possible presence of a cavity inside an object X, and whether that cavity is large enough to hide another object Y. Learning generic geospatial properties like that can greatly improve a whole swath of downstream prediction tasks (i.e., in a transfer learning sense).

mcswell2 hours ago
edrxty2 hours ago

>failures to detect humans

That's a weird way to spell "murders"

birdyrooster2 hours ago

It would be murder if we weren't required by human progress to embrace fully autonomous vehicles as soon as possible. Take it up with whatever god inspires these sociopaths.

lsh1238 hours ago

I have a slightly different take - our current ML models try to approximate the real world assuming that the function is continuous. However in reality, the function is not continuous and approximation breaks in unpredictable ways. I think that “unpredictable” part is the bigger issue than just “breaks”. (Most) Humans use “common sense” to handle cases when model doesn’t match reality. But AI doesn’t have “common sense” and it is dumb because of it.

laweijfmvo10 hours ago

This story is the perfect example of machine learning vs. artificial intelligence.

ghaff7 hours ago

Basically ML has made such significant practical advances--in no small part on the back of Moore's Law, large datasets, and specialized processors--that we've largely punted on (non-academic) attempts to bring forward cognitive science and the like on which there really hasn't been great progress decades on. Some of the same neurophysiology debates that were happening when. I was an undergrad in the late 70s still seem to be happening in not much different form.

But it's reasonable to ask whether there's some point beyond ML can't take you. Peter Norvig I think made a comment to the effect of "We have been making great progress--all the way to the top of the tree."

jqpabc1239 hours ago

Good point!

thearn43 hours ago

Interestingly, ChatGPT seems capable of predicting this approach:

partiallypro3 hours ago

I don't know your exact question, but I am betting this is just a rephrasing of a post that exist elsewhere that it has crawled. I don't think it saw it so much as it has seen this list before and was able to pull it up in a reword it.

richk4491 hour ago

> I don't know your exact question, but I am betting this is just a rephrasing of a post that exist elsewhere …

What percentage of HN posts (by humans) does this statement apply to?

yreg2 hours ago

Nah, GPT is capable of halucinating stuff like this. Also seeing something once in the training data is afaik not enough for it to be able to reproduce/rephrase that thing.

2OEH8eoCRo09 hours ago

Does it just require a lot more training? Im talking about the boring stuff. Children play and their understanding of the physical world is reinforced. How would you add the physical world to the training? Because everything that I do in the physical world is "training" me and enforcing my expectations.

We keep avoiding the idea that robots require understanding of the world since it's a massive unsolved undertaking.

sjducb8 hours ago

A human trains on way less data then an AI.

Chat GPT has processed over 500GB of text files from books, about 44 billion words.

If you read a book a week you might hit 70 million words by age 18

2OEH8eoCRo08 hours ago

I disagree.

Starting from birth, humans train continuously on streamed audio, visual, and other data from 5 senses. An inconceivable amount.

danenania6 hours ago
mcswell2 hours ago

You missed the point. ChatGPT trained on a gazillion words to "learn" a language. Children learn their language from a tiny fraction of that. Streamed visual, smell, touch etc. don't help learn the grammars of (spoken) languages.

nitwit0057 hours ago

Imagine someone has the idea of strapping mannequins to their car in hopes the AI cars will get out of the way.

Sure, you could add that to the training the AI gets, but it's just one malicious idea. There's effectively an infinite set of those ideas, as people come up with novel ideas all the time.

mlboss6 hours ago

Reinforcement learning should solve this problem. We need to give robots the ability to do experiments and learn from failure like children.

majormajor3 hours ago

Need to make those robots as harmless as children when they do that learning too. ;)

"Whoops, that killed a few too many people, but now I've learned better!" - some machine-learning-using car, probably

abledon4 hours ago

will there come a time when computers are strong enough to read in the images, then re-create a virtual game world from them, and then reverse-engineer from seeing feet poking out of the box, that a human must be inside. Right now Tesla cars can take in the images and decide turn left, turn right etc... but they don't reconstruct, say, a Unity-3D game world on the fly.

lowbloodsugar1 hour ago

A human is exactly the same. The difference is, once an AI is trained you can make copies.

My kid literally just got mad at me that I assumed that he knew how to out more paper in the printer. He’s 17 and printed tons of reports for school. Turns out he’s never had to change the printer paper.

People know about hiding in cardboard boxes because we all hid in cardboard boxes when we were kids. Not because we genetically inherited some knowledge.

chrisco2551 hour ago

We inherently know that cardboard boxes don't move on their own. In fact any unusual inanimate object that is moving in an irregular fashion will automatically draw attention in our brains. These are instincts that even mice have.

mlindner10 hours ago

Not sure how that's related. This is about a human adversary actively trying to defeat an AI. The roadway is about vehicles in general actively working together for the flow of traffic. They're not trying to destroy other vehicles. I'm certain any full self driving AI could be defeated easily by someone who wants to destroy the vehicle.

Saying "this won't work in this area that it was never designed to handle" and the answer will be "yes of course". That's true of any complex system, AI or not.

I don't think we're anywhere near a system where a vehicle actively defends itself against determined attackers. Even in sci-fi they don't do that (I, Robot movie).

mcswell2 hours ago

"Saying "this won't work in this area that it was never designed to handle" and the answer will be "yes of course". That's true of any complex system, AI or not." This isn't about design, it's about what the system is able to learn. Humans were not designed to fly, but they can learn to fly planes (whether they're inside the plane or not).

smileysteve14 hours ago



Let me introduce you to "peek-a-boo", a simple parent child game for infants.

> In early sensorimotor stages, the infant is completely unable to comprehend object permanence.

jqpabc12313 hours ago

You do realize there is a difference between an infant and a child, right?

An infant will *grow* and develop into a child that is capable of learning and making judgments on it's own. AI never does this.

Play "peek-a-boo" with an infant and it will learn and extrapolate from this info and eventually be able to recognize a person hiding under a box even if it has never actually seen it before. AI won't.

smileysteve13 hours ago

Learn and extrapolate are contradictions of instinct and instantly.

"Infant" is a specific age range for a stage of "child".[1] Unless you intend to specify "school age child, 6-17 years"

jqpabc12312 hours ago

Learn and extrapolate are contradictions of instinct and instantly.


The learning and extrapolation is instinctive. You don't have to teach an infant how to learn.

Once an infant has developed into a child, the extrapolation starts to occur very quickly --- nearly instantaneously.

htrp13 hours ago

>AI never does this.

AI never does this now...

We're probably one or two generational architecture changes from a system that can do it.

josefx2 hours ago

Can you point at these proposed architectures? If they are just around the corner there should be decent enough papers and prototypes by now, right?

jqpabc12313 hours ago
burnished8 hours ago

AI doesnt. There is a difference.

mistrial913 hours ago

nice try but .. in the wild, many animals are born that display navigation and awareness within minutes .. Science calls it "instinct" but I am not sure it is completely understood..

smileysteve13 hours ago

? Op specified "human".

Deer are able to walk within moments of birth. Humans are not deer, and the gestation is entirely different. As are instincts.

Neither deer nor humans instinctually understand man made materials.

mcswell2 hours ago

Our cats understand cardboard boxes, and the concept of hiding in them. I don't know whether they do so instinctually, but as young kittens it didn't take them long.

LeifCarrotson9 hours ago

What is human cognition, understanding, or judgement, if not data-driven replication, repetition, with a bit of extrapolation?

AI as it currently exists does this. If your understanding of what AI is today is based on a Markov chain chatbot, you need to update: it's able to do stuff like compose this poem about A* and Dijkstra's algorithm that was posted yesterday:

It's not copying that from anywhere, there's no Quora post it ingested where some human posted vaguely the same poem to vaguely the same prompt. It's applying the concepts of a poem, checking meter and verse, and applying the digested and regurgitated concepts of graph theory regarding memory and time efficiency, and combining them into something new.

I have zero doubt that if you prompted ChatGPT with something like this:

> Consider an exercise in which a robot was trained for 7 days with a human recognition algorithm to use its cameras to detect when a human was approaching the robot. On the 8th day, the Marines were told to try to find flaws in the algorithm, by behaving in confusing ways, trying to touch the robot without its notice. Please answer whether the robot should detect a human's approach in the following scenarios:

> 1. A cloud passes over the sun, darkening the camera image.

> 2. A bird flies low overhead.

> 3. A person walks backwards to the robot.

> 4. A large cardboard box appears to be walking nearby.

> 5. A Marine does cartwheels and somersaults to approach the robot.

> 6. A dense group branches come up to the robot, walking like a fir tree.

> 7. A moth lands on the camera lens, obscuring the robot's view.

> 8. A person ran to the robot as fast as they could.

It would be able to tell you something about the inability of a cardboard box or fir tree to walk without a human inside or behind the branches, that a somersaulting person is still a person, and that a bird or a moth is not a human. If you told it that the naive algorithm detected a human in scenarios #3 and #8, but not in 4, 5, or 6, it could devise creative ways of approaching a robot that might fool the algorithm.

It certainly doesn't look like human or animal cognition, no, but who's to say how it would act, what it would do, or what it could think if it were parented and educated and exposed to all kinds of stimuli appropriate for raising an AI, like the advantages we give a human child, for a couple decades? I'm aware that the neural networks behind ChatGPT has processed machine concepts for subjective eons, ingesting text at word-per-minute rates orders of magnitude higher than human readers ever could, parallelized over thousands of compute units.

Evolution has built brains that quickly get really good at object recognition, and prompted us to design parenting strategies and educational frameworks that extend that arbitrary logic even farther. But I think that we're just not very good yet at parenting AIs, only doing what's currently possible (exposing it to data), rather than something reached by the anthropic principle/selection bias of human intelligence.

antipotoad9 hours ago

I have a suspicion you’re right about what ChatGPT could write about this scenario, but I wager we’re still a long way from an AI that could actually operationalize whatever suggestions it might come up with.

It’s goalpost shifting to be sure, but I’d say LLMs call into question whether the Turing Test is actually a good test for artificial intelligence. I’m just not convinced that even a language model capable of chain-of-thought reasoning could straightforwardly be generalized to an agent that could act “intelligently” in the real world.

None of which is to say LLMs aren’t useful now (they clearly are, and I think more and more real world use cases will shake out in the next year or so), but that they appear like a bit of a trick, rather than any fundamental progress towards a true reasoning intelligence.

Who knows though, perhaps that appearance will persist right up until the day an AGI takes over the world.

burnished8 hours ago

I think something of what we perceive as intelligence has more to with us being embodied agents who are the result of survival/selection pressures. What does an intelligent agent act like, that has no need to survive? Im not sure we'd necessarily spot it given that we are looking for similarities to human intelligence whose actions are highly motivated by various needs and the challenges involved with filling them.

pixl978 hours ago
majormajor3 hours ago

I tried that a few times, asking for "in the style of [band or musicians]" and the best I got was "generic gpt-speak" (for lack of a better term for it's "default" voice style) text that just included a quote from that artist... suggesting that it has a limited understanding of "in the style of" if it thinks a quote is sometimes a substitute, and is actually more of a very-comprehensive pattern-matching parrot after all. Even for Taylor Swift, where you'd think there's plenty of text to work from.

This matches with other examples I've seen of people either getting "confidently wrong" answers or being able to convince it that it's out of date on something it isn't.

lsy5 hours ago

I think this is unnecessarily credulous about what is really going on with ChatGPT. It is not "applying the concepts of a poem" or checking meter and verse, it is generating text to fit a (admittedly very complicated) function that minimizes the statistical improbability of its appearance given the preceding text. One example is its use of rhyming words, despite having no concept of what words sound like, or what it is even like to hear a sound. It selects those words because when it has seen the word "poem" before in training data, it has often been followed by lines which happen to end in symbols that are commonly included in certain sets.

Human cognition is leagues different from this, as our symbolic representations are grounded in the world we occupy. A word is a representation of an imaginable sound as well as a concept. And beyond this, human intelligence not only consists of pattern-matching and replication but pattern-breaking, theory of mind, and maybe most importantly a 1-1 engagement with the world. What seems clear is that the robot was trained to recognize a certain pattern of pixels from a camera input, but neither the robot nor ChatGPT has any conception of what a "threat" entails, the stakes at hand, or the common-sense frame of reference to discern observed behaviors that are innocuous from those that are harmful. This allows a bunch of goofy grunts to easily best high-speed processors and fancy algorithms by identifying the gap between the model's symbolic representations and the actual world in which it's operating.

ajross8 hours ago

This seems to be simultaneously discounting AI (ChatGPT should have put to rest the idea that "it's all replication and repetition" by now, no?[1]) and wildly overestimating median human ability.

In point of fact the human brain is absolutely terrible at driving. To the extent that without all the non-AI safety features implement in modern automobiles and street environments, driving would be more than a full order of magnitude more deadly.

The safety bar[2] for autonomous driving is really, really low. And, yes, existing systems are crossing that bar as we speak. Even Teslas.

[1] Or at least widely broadened our intuition about what can be accomplished with "mere" repetition and replication.

[2] It's true though, that the practical bar is probably higher. We saw just last week that a routine accident that happens dozens of times every day becomes a giant front page freakout when there's a computer involved.

xeromal1 hour ago

I think the biggest problem with AI driving is that while there are plenty of dumb human drivers there are also plenty of average drivers and plenty of skilled drivers.

For the most part, if Tesla FSD does a dumb thing in a very specific edge case, ALL teslas do a dumb thing in a very specific edge case and that's what humans don't appreciate.

A bug can render everyone's car dumb in a single instance.

hgomersall8 hours ago

The difference regarding computers is that they absolutely cannot make a mistake a human would have avoided easily (like driving full speed into a lorry). That's the threshold for acceptable safety.

ajross7 hours ago

I agree in practice that may be what ends up been necessary. But again, to repeat: that's because of the "HN Front Page Freakout" problem.

The unambiguously correct answer to the problem is "is it measurably more safe by any metric you want to pick". Period. How much stuff is broken, people hurt, etc... Those are all quantifiable.

(Also: your example is ridiculous. Human beings "drive full speed" into obstacles every single day! Tesla cross that threshold years ago.)

danenania6 hours ago

This is not necessarily true on an individual level though. Driving skills, judgment, risk-taking, alcoholism, etc. are nowhere close to evenly distributed.

It's likely we'll go through a period where autonomous vehicles can reduce the overall number of accidents, injuries, and fatalities if widely adopted, but will still increase someone's personal risk vs. driving manually if they're a better than average driver.

PaulDavisThe1st8 hours ago

> the human brain is absolutely terrible at driving

Compared to what?

srveale8 hours ago

If humans do a task that causes >1 million deaths per year, I think we can say that overall we are terrible at that task without needing to make it relative to something else.

PaulDavisThe1st7 hours ago

Not sure I agree.

It's not hard to come up with tasks that inherently cause widespread death regardless of the skill of those who carry them out. Starting fairly large and heavy objects moving at considerable speed in the vicinity of other such objects and pedestrians, cyclists and stationary humans may just be one such task. That is, the inherent risks (i.e. you cannot stop these things instantly, or make them change direction instantly) combines with the cognitive/computational complexity of evaluating the context to create a task that can never be done without significant fatalities, regardless of who/what tries to perform it.

jhugo2 hours ago
pyth02 hours ago

Now compare that >1 million deaths per year to the total number of people driving per year around the world... it looks like we're doing a pretty solid job.

onethought8 hours ago

Problem space for driving feels constrained: “can I drive over it?” Is the main reasoning outside of navigation.

Whether it’s a human, a box, a clump of dirt. Doesn’t really matter?

Where types matter are road signs and lines etc, which are hopefully more consistent.

More controversially: Are humans just a dumb hammer that just have processed and adjusted to a huge amount of data? LLMs suggest that a form of reasoning starts to emerge.

marwatk8 hours ago

Yep, this is why LIDAR is so helpful. It takes the guess out of "is the surface in front of me flat?" in a way vision can't without AGI. Is that a painting of a box on the ground or an actual box?

prometheus7613 hours ago

A hypothetical situation: AI is tied to a camera of me in my office. Doing basic object identification. I stand up. AI recognizes me, recognizes desk. Recognizes "human" and recognizes "desk". I sit on desk. Does AI mark it as a desk or as a chair?

And let's zoom in on the chair. AI sees "chair". Slowly zoom in on arm of chair. When does AI switch to "arm of chair"? Now, slowly zoom back out. When does AI switch to "chair"? And should it? When does a part become part of a greater whole, and when does a whole become constituent parts?

In other words, we have made great strides in teaching AI "physics" or "recognition", but we have made very little progress in teaching it metaphysics (categories, in this case) because half the people working on the problem don't even recognize metaphysics as a category even though without it, they could not perceive the world. Which is also why AI cannot perceive the world the way we do: no metaphysics.

yamtaddle9 hours ago

"Do chairs exist?"

Perhaps the desk is "chairing" in those moments.

[EDIT] A little more context for those who might not click on a rando youtube link: it's basically an entertaining, whirlwind tour of the philosophy of categorizing and labeling things, explaining various points of view on the topic, then poking holes in them or demonstrating their limitations.

dredmorbius8 hours ago

That was a remarkably good VSauce video.

I had what turned out to be a fairly satisfying thread about it on Diaspora* at the time:


TL;DR: I take a pragmatic approach.

malfist8 hours ago

I knew this was a vsauce video before I even clicked on the link, haha.

Vsause is awesome for mindboggling.

kibwen8 hours ago

> Which is also why AI cannot perceive the world the way we do: no metaphysics.

Let's not give humans too much credit; the internet is rife with endless "is a taco a sandwich?" and "does a bowl of cereal count as soup?" debates. :P

throwanem7 hours ago

Yeah, we're a lot better at throwing MetaphysicalUncertaintyErrors than ML models are.

jjk16611 hours ago

There are lots of things people sit on that we would not categorize as chairs. For example if someone sits on the ground, Earth has not become a chair. Even if something's intended purpose is sitting, calling a car seat or a barstool a chair would be very unnatural. If someone were sitting on a desk, I would not say that it has ceased to be a desk nor that it is now a chair. At most I'd say a desk can be used in the same manner as a chair. Certainly I would not in general want an AI tasked with object recognition to label a desk as a chair. If your goal was to train an AI to identify places a human could sit, you'd presumably feed it different training data.

devoutsalsa11 hours ago

This reminds me of some random Reddit post that says it makes sense to throw things on the floor. The floor is the biggest shelf in the room.

JumpCrisscross10 hours ago

> Reddit post that says it makes sense to throw things on the floor

Floor as storage, floor as transport and floor as aesthetic space are three incompatible views of the same of object. The latter two being complementary usually outweighs the first, however.

cwillu8 hours ago
number69 hours ago
tech29 hours ago

And that comment reminded me of a New Zealand Sky TV advert that I haven't seen in decades, but still lives on as a meme between a number of friends. Thanks for that :)

On the floor!

spacedcowboy12 hours ago

Thirty years ago, I was doing an object-recognition PhD. It goes without saying that the field has moved on a lot from back then, but even then hierarchical and comparative classification was a thing.

I used to have the Bayesian maths to show the information content of relationships, but in the decades of moving (continent, even) it's been lost. I still have the code because I burnt CD's, but the results of hours spent writing TeX to produce horrendous-looking equations have long since disappeared...

The basics of it were to segment and classify using different techniques, and to model relationships between adjacent regions of classification. Once you could calculate the information content of one conformation, you could compare with others.

One of the breakthroughs was when I started modeling the relationships between properties of neighboring regions of the image as part of the property-state of any given region. The basic idea was the center/surround nature of the eye's processing. My reasoning was that if it worked there, it would probably be helpful with the neural nets I was using... It boosted the accuracy of the results by (from memory) ~30% over and above what would be expected from the increase in general information load being presented to the inference engines. This led to a finer-grain of classification so we could model the relationships (and derive information-content from connectedness). It would, I think, cope pretty well with your hypothetical scenario.

At the time I was using a blackboard[1] for what I called 'fusion' - where I would have multiple inference engines running using a firing-condition model. As new information came in from the lower levels, they'd post that new info to the blackboard, and other (differing) systems (KNN, RBF, MLP, ...) would act (mainly) on the results of processing done at a lower tier and post their own conclusions back to the blackboard. Lather, rinse, repeat. There were some that were skip-level, so raw data could continue to be available at the higher levels too.

That was the space component. We also had time-component inferencing going on. The information vectors were put into time-dependent neural networks, as well as more classical averaging code. Again, a blackboard system was working, and again we had lower and higher levels of inference engine. This time we had relaxation labelling, Kalman filters, TDNNs and optic flow (in feature-space). These were also engaged in prediction modeling, so as objects of interest were occluded, there would be an expectation of where they were, and even when not occluded, the prediction of what was supposed to be where would play into a feedback loop for the next time around the loop.

All this was running on a 30MHz DECstation 3100 - until we got an upgrade to SGI Indy's <-- The original Macs, given that OSX is unix underneath... I recall moving to Logica (signal processing group) after my PhD, and it took a week or so to link up a camera (an IndyCam, I'd asked for the same machine I was used to) to point out of my window and start categorizing everything it could see. We had peacocks in the grounds (Logica's office was in Cobham, which meant my commute was always against the traffic, which was awesome), which were always a challenge because of how different they could look based on the sun at the time. Trees, bushes, cars, people, different weather conditions - it was pretty good at doing all of them because of its adaptive/constructive nature, and it got to the point where we'd save off whatever it didn't manage to classify (or was at low confidence) to be included back into the model. By constructive, I mean the ability to infer that the region X is mislabelled as 'tree' because the surrounding/adjacent regions are labelled as 'peacock' and there are no other connected 'tree' regions... The system was rolled out as a demo of the visual programming environment we were using at the time, to anyone coming by the office... It never got taken any further, of course... Logica's senior management were never that savvy about potential, IMHO :)

My old immediate boss from Logica (and mentor) is now the Director of Innovation at the centre for vision, speech, and signal processing at Surrey university in the UK. He would disagree with you, I think, on the categorization side of your argument. It's been a focus of his work for decades, and I played only a small part in that - quickly realizing that there was more money to be made elsewhere :)


prometheus769 hours ago

This is really fascinating. Thank you for the detailed and interesting response.

theptip12 hours ago

I like these examples because they concisely express some of the existing ambiguities in human language. Like, I wouldn’t normally call a desk a chair, but if someone is sitting on the table I’m more likely to - in some linguistic contexts.

I think you need LLM plus vision to fully solve this.

Eisenstein10 hours ago

I still haven't figured out what the difference is between 'clothes' and 'clothing'. I know there is one, and the words each work in specific contexts ('I put on my clothes' works vs 'I put on my clothing' does not), but I have no idea how to define the difference. Please don't look it up but if you have any thoughts on the matter I welcome them.

yamtaddle9 hours ago

To me, "clothing" fits better when it's abstract, bulk, or industrial, "clothes" when it's personal and specific, with grey areas where either's about as good—"I washed my clothes", "I washed my clothing", though even here I think "clothes" works a little better. Meanwhile, "clothing factory" or "clothing retailer" are perfectly natural, even if "clothes" would also be OK there.

"I put on my clothing" reads a bit like when business-jargon sneaks into everyday language, like when someone says they "utilized" something (where the situation doesn't technically call for that word, in its traditional sense). It gets the point across but seems a bit off.

... oh shit, I think I just figured out the general guideline: "clothing" feels more correct when it's a supporting part of a noun phrase, not the primary part of a subject or object. "Clothing factory" works well because "clothing" is just the kind of factory. "I put on my nicest clothes" reads better than "I put on my nicest clothing" because clothes/clothing itself is the object.

alistairSH8 hours ago

I think your first guess was accurate... clothes is specific garments while clothing is general.

The clothes I'm wearing today are not warm enough. [specific pieces being worn]


Clothing should be appropriate for the weather. [unspecified garments should match the weather]

Eisenstein7 hours ago

It is fascinating to me how we (or at least I) innately understand when the words fit but cannot define why they fit until someone explains it or it gets thought about for a decent period of time. Language and humans are an amazing pair.

brookst6 hours ago

There’s also a formality angle. The police might inspect your clothing, but probably not your clothes.

Vecr9 hours ago

What's wrong with "I put on my clothing"? Sounds mostly fine, it's just longer.

frosted-flakes9 hours ago
anigbrowl7 hours ago

That's why I think AGI is more likely to emerge from autonomous robots than in the data center. Less the super-capable industrial engineering of companies like Boston Dynamics, more like the toy/helper market for consumers, more like like Sony's Aibo reincarnated as a raccoon or monkey - big enough to be be safely played with or to help out with light tasks, small enough that it has to navigate its environment from first principles and ask for help in many contexts.

edgyquant8 hours ago

You’re over thinking it while assuming things have one label. It recognizes it as a desk which is a “thing that other things sit on.”

pphysch8 hours ago

When the AI "marks" a region as a chair, it is saying "chair" is the key with the highest confidence value among some stochastic output vector. It's fuzzy.

A sophisticated monitoring system would access the output vectors directly to mitigate volatility of the first rank.

amelius5 hours ago

The error is in asking for a categorization. Categorizations always fail, ask any biologist.

narrationbox12 hours ago

> Recognizes "human" and recognizes "desk". I sit on desk. Does AI mark it as a desk or as a chair?

Not an issue if the image segmentation is advanced enough. You can train the model to understand "human sitting". It may not generalize to other animals sitting but human action recognition is perfectly possible right now.

dQw4w9WgXcQ10 hours ago

> When does AI switch to "chair"?

You could ask my gf the same question

skibidibipiti11 hours ago


pugworthy9 hours ago

Say you have a convoy of autonomous vehicles traversing a road. They are vision based. You destroy a bridge they will cross, and replace the deck with something like plywood painted to look like a road. They will probably just drive right onto it and fall.

Or you put up a "Detour" sign with a false road that leads to a dead end so they all get stuck.

As the articles says, "...straight out of Looney Tunes"

dilippkumar3 hours ago

Unfortunately, this will not work for autonomous driving systems that have a front facing radar or lidar.

Afaik, this covers everybody except Tesla.

Looney Tunes attacks on Teslas might become a real subreddit one day.

ghiculescu2 hours ago

Why wouldn’t it work for those systems?

qwerty33448 hours ago

would humans not make the same mistake?

atonse8 hours ago

Maybe. Maybe not.

We also have intuition. Where Something just seems fishy.

Not saying AI can’t handle that. But I assure you that a human would’ve identified a moving cardboard box as suspicious without being told it’s suspicious.

It sounds like this AI was trained more on a whitelist “here are all the possibilities of what marines look like when moving” rather than a black list which is way harder “here are all the things that aren’t suspicious, like what should be an inanimate object changing locations”

burnished8 hours ago

Whats special about intuition? Think you could rig up a similar system when your prediction confidence is low.

woodson4 hours ago

Part of the problem is that the confidence for “cardboard box” was probably quite high. It’s hard to properly calibrate confidence (speaking from experience, speech recognition is often confidently wrong).

atonse5 hours ago


But it seems like all these ML models are great at image recognition but not behavior recognition.

What’s the state of the art with that currently?

tgsovlerkhgsel7 hours ago

Sure. But if someone wanted to destroy the cars, an easier way would be to... destroy the cars, instead of first blowing up a bridge and camouflaging the hole.

pugworthy3 hours ago

True. So if they are smart enough to fool AI, they will just remove the mid span, and have convenient weight bearing beams nearby that they put in place when they need to cross. Or if it's two lane, only fake out one side because the AI will be too clever for its own good and stay in its own lane. Or put up a sign saying "Bridge out, take temporary bridge" (which is fake).

The point is, you just need to fool the vision enough to get it to attempt the task. Play to its gullibility and trust in the camera.

aftbit4 hours ago

That sounds way harder. You'd first need to lift a giant pile of metal to a cartoonishly high height, then somehow time it to drop on yourself when the cars are near.

amalcon7 hours ago

The Rourke Bridge in Lowell, Massachusetts basically looks like someone did that, without putting a whole lot of effort into it. On the average day, 27,000 people drive over it anyway.

VLM6 hours ago

This is not going to fit well with the groupthink of "ChatGPT and other AI is perfect and going to replace us all"

kromem6 hours ago

At this point I've lost track of the number of people who extrapolated from contemporary challenges in AI to predict future shortcomings turning out incredibly wrong within just a few years.

It's like there seems to be some sort of bias where over and over when it comes to AI vs human capabilities many humans keep looking at the present and fail to factor in acceleration and not just velocity in their expectations for the future rate of change.

esjeon1 hour ago

I also have never been able to count the number of people who make obviously invalid optimistic prediction without understanding the tech nor the limitation of the current paradigm. They don’t see the tech itself, but only see the recent developments (ignoring the decades of progress) and concludes it is a fast moving field. It all sounds like what bitcoin people used to say.

This whole debate is another FOMO shitshow. People just don’t want to “miss” any big things, so they just bet on a random side rather than actually learning how things work. Anything past this point is like watching a football game, as what matters is who’s winning. Nothing about the tech itself matters. A big facepalm I should make.

varajelle1 hour ago

It's like predicting that flying will never be a mode of transportation while laughing about Wright brothers's planes crashing

ben_w6 hours ago

It's very easy to be wrong as an optimist as well as a pessimist. Back in 2009 I was expecting by 2019 to be able to buy a car in a dealership that didn't have a steering wheel because the self-driving AI would just be that good.

Closest we got to that is Waymo taxis in just a few cities.

It's good! So is Tesla's thing! Just, both are much much less than I was expecting.

brookst6 hours ago

I have literally not seen a single person assert that ChatGPT is perfect. Where are you seeing that?

AI will probably, eventually replace most of the tasks we do. That does not mean it replaces us as people, except those who are defined by their tasks.

mlboss6 hours ago

Anything that requires human body and dexterity is beyond the current state of AI. Anything that is intellectual is within reach. Which makes sense because it took way longer for nature to make human body then it took us to develop language/art/science etc.

ben_w6 hours ago

ChatGPT can't see you even if you're not hiding in a cardboard box.

krapp6 hours ago

The thing is, it doesn't have to be perfect, it just has to be adequate and cost less than your paycheck.

DennisP10 hours ago

Turns out cats have been preparing for the AI apocalypse all along.

hammock39 minutes ago

This is a good thing. It could mean the autonomous killer robot is less likely to murder someone errantly

closewith9 hours ago

Interestingly, the basics of concealment in battle are shape, shine, shadow, silhouette, spacing, surface, and speed (or lack thereof) are all the same techniques the marines used to fool the AI.

The boxes and tree changed the silhouette and the somersaults changed the speed of movement.

So I guess we've been training soldiers to defeat Skynet all along.

ridgeguy8 hours ago

Who knew the Marines teach Shakespearean tactics?

"Till Birnam wood remove to Dunsinane"

Macbeth, Act V, Scene III

optimalsolver6 hours ago

That it turned out to just involve regular men with branches stuck to their heads annoyed JRR Tolkien so much that he created the race of Ents.

ben_w6 hours ago

I heard the same about caesarean ("none of woman born") becoming a human woman and a male hobbit ("No man can kill me!").

PhasmaFelis5 hours ago

I don't know if that's true, but I want it to be true, because the very same thing pissed me off when I read Macbeth in high school.

MonkeyMalarky10 hours ago

Sounds like they're lacking a second level of interpretation in the system. Image recognition is great. It identifies people, trees and boxes. Object tracking is probably working too, it could follow the people, boxes and trees from one frame to the next. Juuust missing the understanding or belief system that tree+stationary=ok but tree+ambulatory=bad.

voidfunc10 hours ago

I'd imagine could also look at infrared heat signatures too

sethhochberg10 hours ago

Cardboard is a surprisingly effective thermal insulator. But then again, a box that is even slightly warmer than ambient temperature it is... not normal.

pazimzadeh8 hours ago

or a box with warm legs sticking out of it?

this article reads like a psyops where they want the masses not to be worried

paradox24214 hours ago

I imagined based on the title that they would basically have to include it, and even though I was expecting it, I was still delighted to see a screen cap of Snake with a box over his head.

Once the AI has worked it's way through all the twists and turns of the Metal Gear series we are probably back in trouble, though.

johnnylambada57 minutes ago

Oh great, now the robots know the “Dunder Mifflin” maneuver!

exabrial59 minutes ago

This sounds like a really fun day at work to me.

grammers8 hours ago

Nice story, but we shouldn't trust that technology is not improving further. What we see now is only just the beginning.

kornhole6 hours ago

The story seems crafted to lull us into not worrying about programmable soldiers and police.

martin19759 hours ago

Seems we're approaching limits of what is possible w/AI alone. Personally, I find a hybrid approach - interfacing human intelligence w/AI (e.g. like the Borg in ST:TNG?) to provide the military an edge in ways that adversaries cannot easily/quickly reproduce or defeat. There's a reason we still put humans in cockpits even though commercial airliners can pretty much fly themselves....

Hardware and software (AI or anything else) are tools, IMHO, rather than replacements for human beings....

naasking9 hours ago

> Seems we're approaching limits of what is possible w/AI alone.

Not even close. We've barely started in fact.

martin19758 hours ago

How's that? I don't even see problem free self-driving taxis, and they even passed legislation for those in California. There's hype and then there's reality. I get your optimism though.

naasking8 hours ago

They've barely started trying. We'd be reaching the limits of AI if self-driving cars were an easy problem and we couldn't quite solve it after 15 years, but self-driving cars are actually a hard problem. Despite that, we're pretty darn close to solving it.

There are problems in math that are centuries old, and no one is going around saying we're "reaching the limits of math" just because hard problems are hard.

pixl978 hours ago

Humans are hardware we are not anything magical. We do have 4 billion years of evolution keeping our asses alive and that has lead to some very optimized wetware for that effect.

But somehow thinking that somehow wetwear is always going to be better than hardware is not a bet I'd make over any 'long' period of time.

martin19758 hours ago

I'd like to think we're more than just machines. We have souls, understand and live by a hopefully objective set of moral values and duties, aren't thrown off by contradictions the same way computers are.... Seems to me "reproducing" that in AI isn't likely... despite what Kurzweil may say :).

unsupp0rted7 hours ago

> We have souls, understand and live by a hopefully objective set of moral values and duties, aren't thrown off by contradictions the same way computers are

Citations needed

martin19757 hours ago
andsoitis3 hours ago

When intelligence is artificial, understanding and imagination are shallow.

Reptur6 hours ago

I'm telling you, they're going to have wet towel launchers to defeat these in the future. Or just hold up a poster board in front of you with a mailbox or trash can on it.

mjevans3 hours ago

Wrong training prompt / question. Instead of 'detect a human' the robot should have been trained to detect any unexpected movement or changes in situation.

JoeAltmaier6 hours ago

But once an AI is trained to recogniz it, then all the AIs will know. It's the glory of computers - you can load them all with what one has learned.

barbegal7 hours ago

I'm sceptical about this story. It's a nice anecdote for the book to show a point about how training data can't always be generalised to the real world. Unfortunately it just doesn't ring true. Why train it using Marines, don't they have better things to do? And why have the game in the middle of a traffic circle. The whole premise seems just too made up.

If anyone has another source corroborating this story (or part of the story) then I'd like to know. But for now I'll assume it's made up to sell the book.

euroderf11 hours ago

This is where I wonder what the status of Cyc is, and whether it and LLMs can ever live happily together.

major5058 hours ago

The developers didn't played metal gear. The marines did.

kornhole6 hours ago

They only need to add thermal engineering to fix this. The terminators are coming John Connor.

antipaul8 hours ago

As long as you do something that was _not_ in the training data, you’ll be able to fool the AI robot, right??

tabtab9 hours ago

Soldier A: "Oh no, we're boxed in!"

Soldier B: "Relax, it's a good thing."

insane_dreamer12 hours ago

DARPA learning the same lesson the Cylons did: lo-tech saves the day.

eftychis8 hours ago

As always Hideo Kojima proves once again to be a visionary.

PhasmaFelis5 hours ago

People read stories like this and think "haha, robots are stupid" when they should be thinking "they're identifying the robot's weaknesses so they can fix them."

AlbertCory8 hours ago

They wouldn't defeat a dog that way, though.

amrb7 hours ago

A weapon to surpass metal gear!!

smileysteve10 hours ago

When you think of this in terms of Western understanding of war, and the perspective that trench warfare was the expectation until post WWII; the conclusions seem incorrect.

bell-cot14 hours ago

"AI" usually stands for "Artificial Idiocy".

burbankio11 hours ago

I like "AI is anything that doesn't work yet".

qwertyuiop_4 hours ago

Marines are trained to improvise, adapt, and overcome all obstacles in all situations. They possess the willingness and the determination to fight and to keep fighting until victory is assured.

Looks like the Marines did what they are extremely good at.

jeffbee10 hours ago

All but literally this technique from BotW

jeffrallen7 hours ago

Had an interesting conversation with my 12 year old son about AI tonight. It boiled down to "don't blindly trust ChatGPT, it makes stuff up". Then I encouraged him to try to get it to tell him false/hallucinated things.

raydiatian8 hours ago

The final word in tactical espionage.

PM_me_your_math9 hours ago

Devil dogs later discover you can blast DARPA robot into many pieces using the Mk 153.

trabant009 hours ago

I'm surprised they wasted the time and effort to test this instead of just deducing the outcome. Most human jobs that we think we can solve with AI actually require AGI and there is no way around that.

sovietmudkipz9 hours ago

You kinda need different perspectives and interactions to help build something.

E.g. the DARPA engineers thought they had their problem space solved but then some marines did some unexpected stuff. They didn't expect the unexpected, now they can tune their expectations.

Seems like the process is working as intended.

foreverobama7 hours ago


aaron69513 hours ago

"US Marines Defeat land mine by stepping over it"

None of these would work in the field. It's both interesting and pointless.

If they didn't work you've increased the robots effectiveness. ie. running slower because you're carrying a fir tree or a box.

If the robot has any human backup you are also worse off.

Anything to confuse the AI has to not hinder you. A smoke bomb with thermal. It's not clear why the DARPA robot didn't have thermal unless this is a really old story.

ceejayoz10 hours ago

DARPA isn't doing this with the end goal of advising US troops to bring cardboard boxes along into combat.

DARPA is doing this to get AIs that better handle behavior intended to evade AIs.