Back

Open-source rival for OpenAI’s DALL-E runs on your graphics card

345 points2 yearsmixed-news.com
vanadium1st2 years ago

Stable Diffusion is mind-blowingly good at some things. If you are looking for modern artistic illustrations (like the stuff that you would find on the front page of Artstation) - it's state of the art, better in my opinion then Dalle-2 and Midjourney.

But, the interesting thing is that while it is so good in producing detailed artworks and matching the styles of popular artists, it's surprisingly weak at other things, like interpreting complex original prompts. We've all seen the meme pictures made in Craiyon (previously Dalle-mini) of photoshop-collage-like visual jokes. Stable Diffusion with all its sophistication is much worse at those and is struggling to interpret a lot of prompts that the free and public Craiyon is great with. The compositions are worse, it misses a lot of requested objects or even misses the idea entirely.

Also as good as it is at complex artistic illustrations, it is as bad at minimalistic and simple ones, like logos and icons. I am a logo designer and I am already using AI a lot to produce sketches and ideas for commercial logos, and right now the free and publicly available Craiyon is head and shoulders better at that then Stable Diffusion.

Maybe in the future we will have a universal winner AI that is the best at any style of pictures that you can imagine. But right now we have an interesting competition when different AI have surprising strengths and weaknesses and there's a lot of reason in trying them all.

russdill2 years ago

Just think where we'll be two more papers down the line

elil172 years ago

For those unaware this is a catchphrase of Dr. Karoly Feher from the absolutely wonderful YouTube channel "Two Minute Papers" which focuses on advances in computer graphics and AI.

PoignardAzur2 years ago

Random rant: it feels like over time Two Minutes Paper has started to lean more and more into its catchphrases and gimmicks, while the density of interesting content keeps decreasing.

The whole "we're all fellow scholars here" bit feels like I'm watching a kid's show about science vulgarization, patting me on the head for being here.

"Look how smart you are, we're doing science!"

I dunno. I like the channel for what it is (a vulgarization newsletter for cool ML developments) but sometimes the author feels really patronizing / full of himself.

elil172 years ago

I agree that I like to for what it is - something more along the lines of Popular Science or Wired than Scientific American if you want to compare to magazines. However, the content, while surface level, is always accurate - something that can’t be said for other content creators in the field.

fezfight2 years ago

I agree that it can be a lot at times, especially if you watch several in a row, but I dunno, I kind of love that he's keeping that enthusiasm (real or not). I think the world is a brighter place because of it. Just a tiny bit, but still.

victor90002 years ago

I think the biggest benefit is the curation aspect. After all, how much can you actually learn in two minutes? Once I see something interesting, I go and read through the actual paper. Having said that, you're lucky if you can find a paper with enough details to actually reproduce the work.

balthigor2 years ago

You're mistaking earnest for patronizing. He's a genuinely positive dude.

QuadmasterXLII2 years ago

he stopped summarizing methods at some point- now its just results

pizza2 years ago

> Now squeeeze those papers!

GaggiX2 years ago

it's surprisingly weak at interpreting complex original prompts because the model is really small, the text encoder is just 183M parameters. Craiyon is much larger.

andybak2 years ago

I have a penchant for wanting to make technically "bad" or heavily stylized photos - and Stable Diffusion is pretty poor at those. There's very little good bokeh or tilt shift stuff and CCTV/Trailcam doesn't come out too well.

In fact Dall-E isn't as impressive for some styles as "older" models (Jax/Latent Diffusion etc)

vhold2 years ago

My hunch is that is the result of this: https://github.com/CompVis/stable-diffusion#weights

> 515k steps at resolution 512x512 on "laion-improved-aesthetics" (a subset of laion2B-en, filtered to images with an original size >= 512x512, estimated aesthetics score > 5.0

https://github.com/LAION-AI/laion-datasets/blob/main/laion-a... for more details.

What's remarkable is this: https://github.com/LAION-AI/laion-datasets/blob/main/laion-a...

That aesthetic predictor was apparently trained on only 4000 images. If my thinking is correct, imagine the impact those 4000 ratings have had on all of the output of this model.

You can see samples (some NSFW) of different images from the original training set in different rating buckets here, to get an idea of what was included or not in those training steps. http://3080.rom1504.fr/aesthetic/aesthetic_viz.html

babypuncher2 years ago

That is really a shame, because all I really want is a version of Craiyon that I can modify and run on my own hardware.

The amount of enjoyment I have derived from playing with Craiyon over the last two months is ridiculous.

aiddun2 years ago

IIRC Craiyon runs Dalle-mega. https://huggingface.co/dalle-mini/dalle-mega

Note I think you need 16gb of VRAM to run it.

emikulic2 years ago

You can run craiyon / dalle-mini on a card with 8GB of VRAM if you decrease batch size to 1 and skip the CLIP step. Takes about 7 sec to generate an image on a 3070.

I started with https://github.com/borisdayma/dalle-mini/blob/main/tools/inf... and pared it down.

culi2 years ago

Have you checked out MidJourney? Makes Craiyon look like crayons :P

glenneroo2 years ago

Craiyon is free, whereas Midjourney is not. If you want MJ level quality, check out Disco Diffusion or go straight to Visions of Chaos, which runs just about every AI diffusion script in existence. The dev is very active and adds new features every couple days, such as recently the ability to train your own diffusion models, which I've been doing the last 3 days nonstop on my little 3060 Ti (8GB VRAM, which is barely sufficient to run at mostly default settings).

+2
culi2 years ago
PoignardAzur2 years ago

> Of course, with open access and the ability to run the model on a widely available GPU, the opportunity for abuse increases dramatically.

> “A percentage of people are simply unpleasant and weird, but that’s humanity,” Mostaque said. “Indeed, it is our belief this technology will be prevalent, and the paternalistic and somewhat condescending attitude of many AI aficionados is misguided in not trusting society.”

Holy shit.

On the one hand, I'm super excited by this technology, and the novel applications that will become possible with these open-source models (stuff that would never be usable if Google and OpenAI had a monopoly on image generation).

On the other hand, I really really really hope Bostrom's urn[0] has no black ball in it, because we as a society seem to be rushing to extract as many balls as possible over increasingly short timescales.

[0] https://nickbostrom.com/papers/vulnerable.pdf

CM302 years ago

I don't see why this is incorrect. It seems ever since DALL-E, Midjourney, caught on, it seems like we've got more and more people trying to 'filter out' incorrect uses of their software under the assumption people cannot be trusted to just use it for whatever they want.

And it depresses me, because well... imagine if other pieces of tech were treated this way. If the internet or crypto or computers or whatever were heavily limited/restricted so the 'wrong people' couldn't use them for bad things. We'd consider it ridiculous, yet it's somehow accepted for these image generation systems.

yreg2 years ago

Imagine image editors worked like this. Only over the internet, with rules attached and if a human moderator finds you are breaking them then you get banned.

It sounds riddiculous, yet Photoshop is more dangerous than DALL-E in all regards.

dinosaurdynasty2 years ago

Nuclear tech is treated this way (much stricter even).

CM302 years ago

I think there might be at least a small difference between nuclear tech and image generation, at least as far as the effects that could happen if it goes wrong.

dkjaudyeqooe2 years ago

Get back to me when you can vaporise a city with a AI generated image.

+1
Aeolun2 years ago
rcxdude2 years ago

good cryptography was treated this way for a while, at least by the US government.

atwood222 years ago

Yes, and look how many amazing uses of cryptography have proliferated after it was made available, unrestricted, to the masses.

mortenjorck2 years ago

The length of the democratization cycle we're seeing – months to weeks between a breakthrough model and a competent open-source alternative that runs on commodity hardware – really highlights the genie-stuffing posture of Google and OpenAI. All the thoughtful, if highly paternalistic guardrails they build in amount to little more than fig leaves over the possible applications they intend to close off.

I'm personally in the "AI risk is overstated" camp. But if I'm wrong, all the top-down AI safety in the world is going to be meaningless in the face of a global network of researchers, enthusiasts, and tinkerers.

nonbirithm2 years ago

I wonder how it would be possible to stop people from publishing and spreading research that will doom some subsegment of humanity/culture. If it turns out that proliferating something like DALL-E 5 causes serious irreversible effects to human culture, what would the researchers conclude was the correct thing to do at this exact point in time? Stop publishing AI research? How would we get all 8 billion people on Earth to agree?

It's the reason why the rapid amount of progress in this field scares me at times. It feels like gradually being crushed under a wall of the inevitable march towards progress. It could be the case that stopping ourselves before it's too late isn't possible.

Sometimes I get the feeling that the laws of nature will eventually destroy or severely impact any lifeform that gains too much of an understanding about the world. For example, in some other universe it might be possible to survive X more decades if the laws of physics weakened the effects of nuclear war just enough for civilization to recover in a relatively short period of time, but we're stuck with what we have, and that doesn't necessarily map to the long-term survival of a highly intelligent lifeform.

It would be a shame if unbounded curiosity would be humanity's undoing. That curiosity is also a part of me, and my family, and my neighbors down the street, and those in the situation room.

Aeolun2 years ago

> How would we get all 8 billion people on Earth to agree?

You don’t, so the question is mostly academic.

hedora2 years ago

They claim the guardrails are for public good, but they're pretty clearly using them to try to establish a competitive moat.

It's similar to the "we don't sell personal information" claim. Sure, but that's because they make money renting malicious actors access to a black box that contains your personal information. Selling the contents of the box would reduce their overall revenue.

TulliusCicero2 years ago

To me it seems like an obvious case of reputational risk being much larger for more prominent organizations than for smaller ones.

It makes sense for Google to wait for some startup to "go first" in releasing a model largely without controls. That way, some random startup takes the initial heat of "people are using AI for bad things!!" headlines plastering tech blogs. Then Google can do basically the same thing a little bit later, and any attack pieces will sound old hat.

narrator2 years ago

I was using AI Dungeon with the full power GPT-3 model before they crippled it. That thing had a very uninhibited mind for erotica! Imagine what would happen when that power comes to image models!

lodovic2 years ago

It's only a matter of time before someone trains a model on stills from the major adult sites. Actually surprised it hasn't been done / made public yet.

jackblemming2 years ago

Yes, I feel much safer if OpenAI and Google are the sole keepers of such technology. They have my and the publics best interest at heart.

geraldwhen2 years ago

Is this satire?

pizza2 years ago

Yes

ljlolel2 years ago

Googlers and OpenAI legitimately believe this

dkjaudyeqooe2 years ago

Sarcasm.

PoignardAzur2 years ago

Let me put it this way: it's not great to live in a world where the immense majority of nukes are controlled by Donald Trump and Vladimir Putin.

But it's arguably better than living in a world where every single citizen has a nuke.

(Though the potential for harm of diffusion models is far below nuke; it's not "kill millions of people", it's "produce cheap disinformation and very convincing fake evidence to ruin someone's life")

nightski2 years ago

I think that would be a tough argument to make (in regards to image generation). The same could be said of just about any computing technology. The problem is we lose out on a lot of potential good.

Either way it doesn't matter, you can't control bits like you can enriched uranium. It's just a matter of time. In the grand scheme of things Open AI will be irrelevant.

Iv2 years ago

People will generate creepy porn and fake pictures. Humanity will survive this.

PoignardAzur2 years ago

You're not addressing my broader point, though, just the easy-to-snide-at version of my point.

Yes, it's pretty obvious that Dall-E and similar models won't destroy humanity.

My point isn't that Dall-E is a black ball. My point is we better hope a black ball doesn't exist at all, because the way this is going, if it exists, we are going to pick it, we clearly won't be able to stop ourselves.

(For the sake of dicussion, we can imagine a black ball could be "a ML model running on a laptop that can tell you how to produce an undetectable ultra-transmissible deadly virus from easily purchased items")

Aeolun2 years ago

> we can imagine a black ball could be "a ML model running on a laptop that can tell you how to produce an undetectable ultra-transmissible deadly virus from easily purchased items

I think we’re already past the point where we could have done something about this. In fact, we’ve probably been past that point since humanity was born.

I think it’s probably more valuable if we think about how we’ll deal with it if we do draw something that could be/is a black ball.

That said, so far all evidence points to extreme destruction just being really hard, which leads me to believe that truly black ball technologies may not exist.

geenew2 years ago

What do you mean by 'black ball'?

Like 'black balling', eg shunning?

Or 'black box', eg poorly understood technology?

metaphor2 years ago

https://nickbostrom.com/papers/vulnerable.pdf

> black ball: a technology that invariably or by default destroys the civilization that invents it.

Iv2 years ago

If you know that image generators are not it, then why talk about it here? Do you get this kind of angst at every technological increment?

Apart from nuclear scientists I don't know a field where participants are as conscious of the risks as AI research.

krisoft2 years ago

> Apart from nuclear scientists I don't know a field where participants are as conscious of the risks as AI research.

Great. Now some of these researchers preceived some risk with this technology. Not human extinction level risk, but risks. So they attempted to control the technology. To be specific: OpenAI is worried about deepfakes so they engineered guard rails into their implementation. OpenAI was worried about misinformation so they did not release the bigger GPT models.

Note: I’m not arguing either way if OpenAI was right, or honest about their motivations just observing that they expressed this opinion and acted on it to guard the risk.

Got this so far? Keep this in mind because I’m going to use this information to answer your question:

> If you know that image generators are not it, then why talk about it here?

Because it is a technology which were deemed risky by some practitioners and they attempted to control it, and those attempts to control the spread of the technology failed. This does not bode well towards our ability to restrain ourselves from picking up a real black ball, if we ever come across one. And that is why it is worth talking about black balls in this context.

Note it is unlikely that a black ball event will completely blindside us. It is unlikely that someone develops a clone of pacman with improved graphics and boom that alone leads to the inevitable death of humanity. It is much more likely that when the new and dangerous tech appears on our horizon there will be people talking about the potential dangers. What remains to be answered: what can we do then? This experience has shown us that if we ever encounter a black ball technology the steps taken by OpenAI doesn’t seem to be enough.

This is why it is worth talking about black ball technologies here. I hope this answers your question?

dkjaudyeqooe2 years ago

Social media is being used to undermine societies globally, arguably it has a fair chance of destroying humanity.

So I think that horse may have bolted already.

astrange2 years ago

Who says a computer be able to invent a virus by thinking about it really hard?

+1
krisoft2 years ago
dkjaudyeqooe2 years ago

Arthouse cinema has been doing that and more for decades and we're still here.

forty2 years ago

Exactly, and there are many pros for humanity to this too: people will be able to make funny pictures and things like that, so it's not like it's a bad deal.

sinenomine2 years ago

What if the black ball was a red herring all along and the usual suspect tech-CEO's hand(s) rushing to control the said crystal ball are the real hazard?

manquer2 years ago

Wouldn’t nuclear weapons or even plastics be a black ball already?

Humanity is not homogeneous, we will always react to new inventions or tools differently , many will use it positively some won’t . Short of weapons of mass destruction I am not sure anything else will destroy civilization itself .

ALittleLight2 years ago

No, the black ball is a technology that, once invented, humanity cannot survive. Nuclear weapons have been invented and humanity is surviving. Same with plastics.

A black ball would be like - suppose nuclear weapons ignited the atmosphere. We test the first nuke, it ignites the atmosphere, a global fire storm consumes all breathable oxygen, kills all plants and everyone on the surface and everything else suffocates shortly after. Plastics aren't even close to this level of harm.

tablespoon2 years ago

>> Wouldn’t nuclear weapons or even plastics be a black ball already?

> No, the black ball is a technology that, once invented, humanity cannot survive. Nuclear weapons have been invented and humanity is surviving. Same with plastics.

I don't think that definition is a good one. Technological civilization [1] has survived nuclear weapons for ~80 years, but there's no guarantee it will survive it for another 80 years, let alone forever. It seems like these "black balls" should be though of like time bombs, there are at least two variables: how much destruction it will cause when it goes off AND the delay time before that happens. We shouldn't confuse a dangerous technology with a long delay time for a safe technology. My intuition tells me that there will probably be nuclear war at some point over the next 1,000+ years.

[1] I don't think nuclear weapons can make humanity extinct, so long as there are still little poorly-connected subsistence communities in remote areas. However, if The Market, manages to extend its tentacles into every human community, we're probably fucked.

+1
ALittleLight2 years ago
michaelt2 years ago

I guess people are getting confused because in terms of risk-of-destroying-humanity, nuclear weapons seem higher risk than DALL-E.

+1
PoignardAzur2 years ago
neuronic2 years ago

> Same with plastics

Plastics are playing the long game. They have to turn into micro- and nanoplastics first and may then enact undesired, unforeseen biological functions, just like BPA [1].

Not even talking about weaponizing this stuff...

[1] https://pubmed.ncbi.nlm.nih.gov/21605673/

ALittleLight2 years ago

It seems implausible to me that plastics are going to kill all humanity. If the paper you linked makes that case, then I will read it, but I didn't get that from skimming the abstract.

yreg2 years ago

>Nuclear weapons have been invented and humanity is surviving.

Isn't the point of the discussion started by PoignardAzur about how we deal with such technology after it is pulled out of the bag?

If you define black ball technology as fundamentaly uncontainable, then there is no point in talking about our practices of restricting access to new tech.

krono2 years ago

Either we equalise chaos or we reduce chaos, there exist no other options for entropy incarnate.

colordrops2 years ago

Yet another "open" model that isn't open. We shall see if they actually do release to the public. We keep seeing promises from various orgs but it never pans out.

TulliusCicero2 years ago

Their plan seems less hand wavey, they're being explicit with "first we release it like this, then like that, then freely to everyone".

You're right that they could always change their minds and that would suck, but so far they seem to be being up front.

runnerup2 years ago

Zero of their flagship models have been released AFAIK, e.g. the pre-DALLE model, CLIP, has not had weights for its ViT-H/16 model released from a paper 19 months ago.

This was before DALL-E and way before DALL-E 2.

Not that I think they need to. It’s just weird people defend them as being open just because they release crippled versions of their model.

TulliusCicero2 years ago

They've said that they intend to release the model soon-ish for anyone to run on their own machine, with no censorship*. I don't think the other major players, like OpenAI, have committed to doing the same.

* They're actually working on a filter right now, but IIRC it's an optional one, for when you don't want to accidentally generate NSFW output.

+1
runnerup2 years ago
bloppe2 years ago

OpenAI should probably rebrand lol the "open" part is basically parody at this point

colordrops2 years ago

Agreed, and they are influencing others, like the company in this article, which is following their model of "opening" things.

dang2 years ago

Recent and related:

Stable Diffusion launch announcement - https://news.ycombinator.com/item?id=32414811 - Aug 2022 (37 comments)

axg112 years ago

I’m excited for the coming race to improve and miniaturise this tech. Apple has a great track record of making ML models light enough to run locally. There will come a day when photorealistic image generation can run on an iPhone.

andrewacove2 years ago

Maybe this is their long term plan for getting rid of the camera bump.

rasz2 years ago

3 days from launch to getting your twitter account suspended.

https://twitter.com/DiffusionPics/

ShamelessC2 years ago

Context?

humanistbot2 years ago

Can someone tell me how this compares to the guide and repo shared a few days ago on HN: https://news.ycombinator.com/item?id=32384646

Voloskaya2 years ago

This version is a bit more optimized, and better packaged. Also the model has been trained longer, so when the weights become publicly available the resulting quality should be much higher.

Geee2 years ago

There's also Disco Diffusion: https://www.reddit.com/r/DiscoDiffusion/

Not sure how they compare. DD seems to be quite popular. I'm currently setting up DD locally.

glenneroo2 years ago

I've been running DD for a few months now... I tend to just edit the python script or use e.g. entmike's fork which can read config files to make changes to the 50+ parameters (basically everything is better than having to use Jupyter Notebooks IMO), granted if you don't have a GPU with 6+ GB of VRAM, you can often get a decent enough GPU for free from Google Colab. For running locally, I can also highly recommend Visions of Chaos, which includes multiple versions/forks of Disco Diffusion, as well as a ton of other latent diffusion scripts, not to mention many many many other generation features such as fractals and even music. They also recently added the ability to train your own diffusion models which I've been doing the last few days using thousands of my own photographs. It also has a pretty nice GUI and the dev is extremely responsive on Discord. Also after you do the setup for VoC it handles running all the python venv setup stuff otherwise necessary with local DD installs. In any case, check out DD Discord and/or VoC discord for lots of info, tips, help, examples, and support.

Geee2 years ago

Thanks for the info. It is possible to do something like transfer learning on top of existing models or do you train your own models from scratch? I'll check out that Vision of Chaos thing. I'm just beginning my journey into this generative art stuff and just basically trying to get this running right now.

+1
glenneroo2 years ago
thorum2 years ago

If you want to see more examples of what this AI is capable of, check out the subreddit:

https://reddit.com/r/stablediffusion

humanistbot2 years ago

If anyone from Stability is reading, the confirmation e-mail to sign up is sending a broken link:

"We couldn't process your request at this time. Please try again later. If you are seeing this message repeatedly, please contact Support with the following information:

ip: XXXX

date: Mon Aug 15 2022 XX:XX:XX GMT-0700 (Pacific Daylight Time)

url: https://stability.us18.list-manage.com/subscribe/confirm"

wccrawford2 years ago

It worked for me just now, so maybe it was temporary, or they already fixed it?

notrealyme1232 years ago

I forwarded this Thread to a member of the Project.

hifikuno2 years ago

I also had this response.

ruuda2 years ago

The site shows a notification in German that I need to enable JavaScript to use the site, after the first paragraph. But then after that is the full article, including images, which is almost perfectly readable, except it's at 5% opacity (or maybe the JavaScript popup is 95% opacity overlaid on the article), which makes it impossible to read again. :'(

belltaco2 years ago

Article says it needs 5.1GB of Graphics RAM.

Does any one know how much data download and disk storage does it need?

_blop2 years ago

The v1.3 model weighs in at 4.3 GB. There's an additional download of 1.6 GB of other models due to usage of huggingface's transformers (only once on startup). And the conda env takes another 6 GBs due to pytorch and cuda.

Larger images will require (much) more than 5.1 GB. In my case, a target resolution of 768x384 (landscape) with a batch size of 1 will max out my 12GB card, an RTX3080Ti.

mdorazio2 years ago

I think this is a good time to ask if anyone is working on parallelizing machine learning compute anymore? For at-home computation like this it seems like it would be a lot better to allow people to stack a few cheaper GPUs rather than having to pony up thousands of dollars for ML-oriented beast cards to be able to do things like generate large images.

andybak2 years ago

AI upscaling will solve everything ;)

I've generated some remarkably good-looking print quality images by upscaling 512x512 sources

+2
bambax2 years ago
luismmolina2 years ago

If you read directly from the site. The requirements for the graphic card are 10 VRAM as a minimum. Because it's ruins locally you don't need to download anything apart from the initial model, this applies to the disk space too.

kgc2 years ago

Does this work on Apple silicon processors? They have plenty of RAM accessible to the GPU.

sroussey2 years ago

The articles says it will, but that it is not using the GPU unfortunately.

9999000009992 years ago

Has anyone made a pixel art generator, that can create the animation sprites ?

_w1kke_2 years ago

@KaliYuga did - she got hired by StabilityAI just a few days ago. Here is a link to the Pixel Art Diffusion notebook:

https://colab.research.google.com/github/KaliYuga-ai/Pixel-A...

0xdead1eaf2 years ago

Check out NUWA-Infinity[0][1], submitted to arxiv jul 20, 2022. It captures artistic style very well (though can't speak to the quality of the pixel art it would generate) and can do image to video.

[0] https://nuwa-infinity.microsoft.com/#/ [1] https://arxiv.org/abs/2207.09814

gxqoz2 years ago

You can use DALL-E and other models to make pixel art ("as pixel art"), although it can both be overkill and hard to get consistent results that you'd put into animation. I'm guessing that starting from more of a video model and then converting to pixel art could be better. Although it's also non-trivial to turn "realistic" video into convincing animation.

9999000009992 years ago

I pay good money for a specialized machine learning algorithm that can take a pixel art character, and then generate all the animated sprites for it.

I actually tried to get Dalle to do this, And it made like three good sprites in the rest were just broken. But it was so strange, because you could see it was still organized as a sprite sheet, it's just the sprites were useless.

I think the practical applications of this technology will be hyper specialized models for specific purposes.

panabee2 years ago

hi there. we're working on this, been working on a model for months now. hope to release something soon. how best to get in touch with you?

chucky1232 years ago

That's a really good idea.

tckerr2 years ago

This is exactly the type of application I am interested in as well. As a hobby game dev with only mediocre pixel art skills, having a generator to finish the busy work would be an absolute lifesaver. I'm also interested in using it for fleshing out artistic vision through generating variations of an initial concept.

Hopefully we aren't more than a few years away from something practical like this.

kragen2 years ago

This article says both that it's "open-source" and that it's "available for [only] research purposes upon request". These can't both be correct. Where is the error?

zone4112 years ago

They jumped the gun with this announcement. I get wanting to share the excitement of AI doing something cool with the world (I've been there) but they should've waited until it's accessible to the public.

Diris2 years ago

The code is open source, the models are not.

upupandup2 years ago

My friend wants to know when she can use this to generate porn, are we close?

GistNoesis2 years ago

I did a show HN about this https://news.ycombinator.com/item?id=31900095 a month ago, to experiment with the technology. The training was done in a week-end only, with 2 old gpus (1080ti).

Currently waiting to scale-up for improve quality mainly for economic reasons, not quite sure I could recoup the training costs yet. Even more so if I go with cloud training.

NVidia will release the 4090 in september, and ethereum may do "the merge" that will make GPU useless for mining so GPU price could be affordable so I can update my home cluster with affordable 3090s. (But electricity prices are also up).

Also there are new algorithms every month like the stable diffusion, that would obsolete your previous training.

The video generation cost is probably still too expensive compared to just paying a cam girl in a low wage country. But it will probably go down soon.

This is also some sensitive data, as plagued with copyright issues, so it's quite troublesome to legally share training datasets to share costs.

It also has its own challenges with respect to custom dataset creation with text description, so it's probably a better idea to adapt the algorithm to the currently available data to keep the costs low.

Finally once someone releases a model, in the next month there will be at least 3 clones.

There is also the problem to find an adult friendly payment processor.

And the multitude of potential legal issues.

But it's probably inevitable.

Voloskaya2 years ago

The data used to train those model is specifically filtered to remove sexual content, so the model can't generate porn because it has no idea what it looks like, beyond a few samples that made it past the filter.

So no, your "friend" can't use it for that.

upupandup2 years ago

Why is it that sexual content is so frowned upon in this space? If it's a content publishing platform I would understand that advertisers don't want that, but this is literally dictating people what is bad and good. I just don't understand this Puritan outrage with text-to-image porn generation.

Voloskaya2 years ago

Because you can't control what the model is going to output in response to a query. The model is trained to respond in a way that is aligned but there is no guarantee.

Since we certainly don't want to show generated image of porn or violence to someone that didn't specifically ask for that, the easiest way to ensure that's not going to happen is to just not train on that kind of data in the first place. The worst that can happen with a model trained on "safe" images is that the image is irrelevant or makes no sense, meaning you could deploy systems with no human curator on the other end, and nothing bad is going to happen. You lose that ability as soon as you integrate porn.

Also with techniques like in-painting, the potential for misuse of a model trained on porn/violence would be pretty terrifying.

So the benefits of training on porn seems very small compared to the inconvenience. I don't think it's anything to do with puritanism, it's just that if I am the one putting dollars and time to train such a model I am certainly not going to be taking on the added complexity and implications of dealing with porn to just to make a few people realize their fetishes at the risk of my entire model being undeployable because it's outputting too much porn or violence.

+2
upupandup2 years ago
spywaregorilla2 years ago

Because it's a lot more annoying for your innocuous content to be rendered as porn when the ai happens to interpret it that way than it is for you to be unable to render your pervy desires intentionally.

A porn model should really be it's own thing.

gs172 years ago

I imagine a large part of it is that it could generate photorealistic child porn (also "deepfake" porn of real people) and there's not really a good way to prevent it entirely while also allowing generalized sexual content AFAIK. There's probably some debate on how big a problem this really is, but no one wants their system to be the one with news stories about how it's popular with pedophiles. It was the issue they had with AI Dungeon.

+1
pixl972 years ago
sbierwagen2 years ago

Because if the model generates anything problematic the New York Times will ruin your life.

blowski2 years ago

I'd guess that, for general purpose companies, it's an area full of legal ambiguity and potential for media outrage, so just not worth the risk. However, given the evidence of human history, it's certain that someone with an appetite for exploiting this niche will develop exactly that kind of tool.

djbebs2 years ago

Because the law makes it very difficult to provide such services in the spirit of preventing the exploitation of minors.

Make no mistake this is an indirectly legal hurdle.

_blop2 years ago

This article hints that Stable Diffusion can at least generate normal looking nude women: https://techcrunch.com/2022/08/12/a-startup-wants-to-democra...

There are attempts to gather porn images and train or fine-tune existing networks on it, here's a recent attempt by an art student mentioned in the article above (NSFW!!): https://www.vice.com/en/article/m7ggqq/this-furry-porn-ai-ge...

isoprophlex2 years ago

Jesus H Christ those are some seriously cursed hindquarters

SV_BubbleTime2 years ago

I was a better person this morning for not knowing that furries had the term "hindquarters". I mean, that's fine for other people, you do you, but for me, I was better this morning.

isoprophlex2 years ago

You'll have to train it on your own data. as others have mentioned the training data for dall-e, stable diffusion etc has been cleaned prior to training.

However, if it is possible to re-start the training process from the weights of a non-sexually aware model, this finetuning might not take all that long..!

Mountain_Skies2 years ago

Is there a way to make money selling the model to people who want to use it to make porn? If so, it will trickle down relatively quickly. If not, it'll still eventually trickle down but will take longer.

TigeriusKirk2 years ago

Surely you could sell custom prompt runs for porn for a great deal more than OpenAI is charging for generalist custom prompts.

Making money at it should be easy, and places like PornHub wouldn't care about any outrage. The real challenge would be limiting criminal and civil liability, at least to my not-in-the-business thinking.

upupandup2 years ago

This is already a thing in the kpop fake porn industry. I don't know how Patreon/Onlfans are allowing this to happen, I mean it's a travesty that highly suggestive lyrics and stripper dance moves in scantily clad kpop idols are being used for sexual gratification

astrange2 years ago

Not their fault their country banned actual porn.

TaylorAlexander2 years ago

Once training can be done on a beefy home rig folks will be all over it.

planetsprite2 years ago

You'll need to train your own model, though I'm sure if someone manages to crowdsource it there's a very obvious economic incentive.

TulliusCicero2 years ago

Props for just coming out and saying it

GaggiX2 years ago

It is possible to generate nudes, but not pornographic ones.

unethical_ban2 years ago

Wait, so the closed source generator known as DALL-E is owned by a company called OpenAI?

alephxyz2 years ago

It's a bit of a dead horse at this point but yes. See the previous discussion: https://news.ycombinator.com/item?id=28416997

fariszr2 years ago

As Elon said "OpenAI should be more Open IMO"

glenneroo2 years ago

Curiously it seemed to lock down even more after they "partnered" with Microsoft.

stuckinhell2 years ago

This is pretty amazing, anyone have any tips on building a pc for machine learning with a RAID device ?

jessfyi2 years ago

Hasn't been updated since 2020, but Tim Dettmer's guide [0] is pretty much the gold standard for optimizing what to buy for which area of DL/ML you're interested in. The pricing has changed thanks to GPU prices coming back down to earth a bit, but what to look out/how much ram you need for which task hasn't. Check out the "TL;DR advice" section then scroll back up for detail info on why and common misconceptions. For the tips on a RAID/NAS setup alongside it, just head to the datahoarders subreddit and their FAQ.

[0] https://timdettmers.com/2020/09/07/which-gpu-for-deep-learni...

cellis2 years ago

Look into building an ethereum mining machine... it can double as an ML workstation. That's what I did.

hedora2 years ago

If you just want to try it out, consider using a remote cad workstation from a company like paperspace.

(No affiliation.)

dustingetz2 years ago

doesn’t build on my mac studio due to a dependency whose mac version is two major versions behind

fswd2 years ago

Unfortunately it's a commercial license and the model isn't available to the public so it isn't very useful.

andybak2 years ago

It's going to be MIT from what I have heard. On phone atm so can't provide sources.

fswd2 years ago
andybak2 years ago

That’s just a restricted interim release. The proper public release isn’t ready yet. No timescale but sounds like days/weeks rather than months/years.

+1
fswd2 years ago
AgentME2 years ago

Isn't that just temporary until the public release? Or is the article misleading by calling it open source?

th1s1sit2 years ago

Would be a blast if the cloud is up ended by RISC and GPUs powerful enough to crunch “big data” at home.

Would love to see FAANG and SV crash and burn, margins chipped away to nothing.

keepquestioning2 years ago

We are heading into uncharted territory :(