Self-Supervised Learning from Images with JEPA (2023)

byyoung3 • 2 days ago

It’s not new and only superior in a very narrow set of categories.

heyitsguay • 2 days ago

As a computer vision guy I'm sad JEPA didn't end up more effective. Makes perfect sense conceptually, would have easily transferred to video, but other self-supervised methods just seem to beat it!

turnersr • 1 day ago

Yeah! JEPA seems awesome. Do you mind sharing what other self-supervised methods work better than JEPA?

blixt • 1 day ago

Needs a (2023) tag. But definitely the release of ARC2 and image outputs from 4o got me thinking about the JEPA family too.

I don't know if it's right (and I'm sure JEPA has lots of performance issues) but seems good to have a fully latent space representation, ideally across all modalities, so that the concept "an apple a day keeps the doctor away" becoming image/audio/text is a choice of decoder rather than dedicated token ranges being chosen even before the actual creation process in the model begins.

niemandhier • 1 day ago

GPTs are in the “exploit” phase of the “explore-exploit” trade-off.

JEPA is still in the explore phase, it’s good to read the paper and have an understanding of the architecture to gain an alternative perspective.

laughingcurve • 2 days ago

Not new, not notable right now, not sure why it's getting upvoted (just kidding, it's because people see YLC and upvote based on names)

MoonGhost • 2 days ago

Even average papers can have nice overview of the problem and references.

Grimblewald • 1 day ago

I don't care for names, i just thought it was an interesting read.

justanotheratom • 2 days ago

JEPA is presumably superior to Transformers. Can any expert enlighten us on the implications of this paper?

spmurrayzzz • 1 day ago

Transformers are usually part of JEPA architectures. In I-JEPA's case, there is a ViT that is used in the context encoding phase.