Back

Self-Supervised Learning from Images with JEPA (2023)

40 points2 daysarxiv.org
byyoung32 days ago

It’s not new and only superior in a very narrow set of categories.

heyitsguay2 days ago

As a computer vision guy I'm sad JEPA didn't end up more effective. Makes perfect sense conceptually, would have easily transferred to video, but other self-supervised methods just seem to beat it!

turnersr1 day ago

Yeah! JEPA seems awesome. Do you mind sharing what other self-supervised methods work better than JEPA?

blixt1 day ago

Needs a (2023) tag. But definitely the release of ARC2 and image outputs from 4o got me thinking about the JEPA family too.

I don't know if it's right (and I'm sure JEPA has lots of performance issues) but seems good to have a fully latent space representation, ideally across all modalities, so that the concept "an apple a day keeps the doctor away" becoming image/audio/text is a choice of decoder rather than dedicated token ranges being chosen even before the actual creation process in the model begins.

niemandhier1 day ago

GPTs are in the “exploit” phase of the “explore-exploit” trade-off.

JEPA is still in the explore phase, it’s good to read the paper and have an understanding of the architecture to gain an alternative perspective.

laughingcurve2 days ago

Not new, not notable right now, not sure why it's getting upvoted (just kidding, it's because people see YLC and upvote based on names)

MoonGhost2 days ago

Even average papers can have nice overview of the problem and references.

Grimblewald1 day ago

I don't care for names, i just thought it was an interesting read.

justanotheratom2 days ago

JEPA is presumably superior to Transformers. Can any expert enlighten us on the implications of this paper?

spmurrayzzz1 day ago

Transformers are usually part of JEPA architectures. In I-JEPA's case, there is a ViT that is used in the context encoding phase.