Open-source rival for OpenAI’s DALL-E runs on your graphics card

vanadium1st • 2 years ago

Stable Diffusion is mind-blowingly good at some things. If you are looking for modern artistic illustrations (like the stuff that you would find on the front page of Artstation) - it's state of the art, better in my opinion then Dalle-2 and Midjourney.

But, the interesting thing is that while it is so good in producing detailed artworks and matching the styles of popular artists, it's surprisingly weak at other things, like interpreting complex original prompts. We've all seen the meme pictures made in Craiyon (previously Dalle-mini) of photoshop-collage-like visual jokes. Stable Diffusion with all its sophistication is much worse at those and is struggling to interpret a lot of prompts that the free and public Craiyon is great with. The compositions are worse, it misses a lot of requested objects or even misses the idea entirely.

Also as good as it is at complex artistic illustrations, it is as bad at minimalistic and simple ones, like logos and icons. I am a logo designer and I am already using AI a lot to produce sketches and ideas for commercial logos, and right now the free and publicly available Craiyon is head and shoulders better at that then Stable Diffusion.

Maybe in the future we will have a universal winner AI that is the best at any style of pictures that you can imagine. But right now we have an interesting competition when different AI have surprising strengths and weaknesses and there's a lot of reason in trying them all.

russdill • 2 years ago

Just think where we'll be two more papers down the line

elil17 • 2 years ago

For those unaware this is a catchphrase of Dr. Karoly Feher from the absolutely wonderful YouTube channel "Two Minute Papers" which focuses on advances in computer graphics and AI.

PoignardAzur • 2 years ago

Random rant: it feels like over time Two Minutes Paper has started to lean more and more into its catchphrases and gimmicks, while the density of interesting content keeps decreasing.

The whole "we're all fellow scholars here" bit feels like I'm watching a kid's show about science vulgarization, patting me on the head for being here.

"Look how smart you are, we're doing science!"

I dunno. I like the channel for what it is (a vulgarization newsletter for cool ML developments) but sometimes the author feels really patronizing / full of himself.

elil17 • 2 years ago

I agree that I like to for what it is - something more along the lines of Popular Science or Wired than Scientific American if you want to compare to magazines. However, the content, while surface level, is always accurate - something that can’t be said for other content creators in the field.

fezfight • 2 years ago

I agree that it can be a lot at times, especially if you watch several in a row, but I dunno, I kind of love that he's keeping that enthusiasm (real or not). I think the world is a brighter place because of it. Just a tiny bit, but still.

victor9000 • 2 years ago

I think the biggest benefit is the curation aspect. After all, how much can you actually learn in two minutes? Once I see something interesting, I go and read through the actual paper. Having said that, you're lucky if you can find a paper with enough details to actually reproduce the work.

ceejayoz • 2 years ago

https://en.wikipedia.org/wiki/Flanderization

balthigor • 2 years ago

You're mistaking earnest for patronizing. He's a genuinely positive dude.

QuadmasterXLII • 2 years ago

he stopped summarizing methods at some point- now its just results

pizza • 2 years ago

> Now squeeeze those papers!

GaggiX • 2 years ago

it's surprisingly weak at interpreting complex original prompts because the model is really small, the text encoder is just 183M parameters. Craiyon is much larger.

andybak • 2 years ago

I have a penchant for wanting to make technically "bad" or heavily stylized photos - and Stable Diffusion is pretty poor at those. There's very little good bokeh or tilt shift stuff and CCTV/Trailcam doesn't come out too well.

In fact Dall-E isn't as impressive for some styles as "older" models (Jax/Latent Diffusion etc)

vhold • 2 years ago

My hunch is that is the result of this: https://github.com/CompVis/stable-diffusion#weights

> 515k steps at resolution 512x512 on "laion-improved-aesthetics" (a subset of laion2B-en, filtered to images with an original size >= 512x512, estimated aesthetics score > 5.0

https://github.com/LAION-AI/laion-datasets/blob/main/laion-a... for more details.

What's remarkable is this: https://github.com/LAION-AI/laion-datasets/blob/main/laion-a...

That aesthetic predictor was apparently trained on only 4000 images. If my thinking is correct, imagine the impact those 4000 ratings have had on all of the output of this model.

You can see samples (some NSFW) of different images from the original training set in different rating buckets here, to get an idea of what was included or not in those training steps. http://3080.rom1504.fr/aesthetic/aesthetic_viz.html

babypuncher • 2 years ago

That is really a shame, because all I really want is a version of Craiyon that I can modify and run on my own hardware.

The amount of enjoyment I have derived from playing with Craiyon over the last two months is ridiculous.

aiddun • 2 years ago

IIRC Craiyon runs Dalle-mega. https://huggingface.co/dalle-mini/dalle-mega

Note I think you need 16gb of VRAM to run it.

emikulic • 2 years ago

You can run craiyon / dalle-mini on a card with 8GB of VRAM if you decrease batch size to 1 and skip the CLIP step. Takes about 7 sec to generate an image on a 3070.

I started with https://github.com/borisdayma/dalle-mini/blob/main/tools/inf... and pared it down.

culi • 2 years ago

Have you checked out MidJourney? Makes Craiyon look like crayons :P

glenneroo • 2 years ago

Craiyon is free, whereas Midjourney is not. If you want MJ level quality, check out Disco Diffusion or go straight to Visions of Chaos, which runs just about every AI diffusion script in existence. The dev is very active and adds new features every couple days, such as recently the ability to train your own diffusion models, which I've been doing the last 3 days nonstop on my little 3060 Ti (8GB VRAM, which is barely sufficient to run at mostly default settings).

+2

culi • 2 years ago

PoignardAzur • 2 years ago

> Of course, with open access and the ability to run the model on a widely available GPU, the opportunity for abuse increases dramatically.

> “A percentage of people are simply unpleasant and weird, but that’s humanity,” Mostaque said. “Indeed, it is our belief this technology will be prevalent, and the paternalistic and somewhat condescending attitude of many AI aficionados is misguided in not trusting society.”

Holy shit.

On the one hand, I'm super excited by this technology, and the novel applications that will become possible with these open-source models (stuff that would never be usable if Google and OpenAI had a monopoly on image generation).

On the other hand, I really really really hope Bostrom's urn[0] has no black ball in it, because we as a society seem to be rushing to extract as many balls as possible over increasingly short timescales.

[0] https://nickbostrom.com/papers/vulnerable.pdf

CM30 • 2 years ago

I don't see why this is incorrect. It seems ever since DALL-E, Midjourney, caught on, it seems like we've got more and more people trying to 'filter out' incorrect uses of their software under the assumption people cannot be trusted to just use it for whatever they want.

And it depresses me, because well... imagine if other pieces of tech were treated this way. If the internet or crypto or computers or whatever were heavily limited/restricted so the 'wrong people' couldn't use them for bad things. We'd consider it ridiculous, yet it's somehow accepted for these image generation systems.

yreg • 2 years ago

Imagine image editors worked like this. Only over the internet, with rules attached and if a human moderator finds you are breaking them then you get banned.

It sounds riddiculous, yet Photoshop is more dangerous than DALL-E in all regards.

dinosaurdynasty • 2 years ago

Nuclear tech is treated this way (much stricter even).

CM30 • 2 years ago

I think there might be at least a small difference between nuclear tech and image generation, at least as far as the effects that could happen if it goes wrong.

dkjaudyeqooe • 2 years ago

Get back to me when you can vaporise a city with a AI generated image.

+1

Aeolun • 2 years ago

rcxdude • 2 years ago

good cryptography was treated this way for a while, at least by the US government.

atwood22 • 2 years ago

Yes, and look how many amazing uses of cryptography have proliferated after it was made available, unrestricted, to the masses.

mortenjorck • 2 years ago

The length of the democratization cycle we're seeing – months to weeks between a breakthrough model and a competent open-source alternative that runs on commodity hardware – really highlights the genie-stuffing posture of Google and OpenAI. All the thoughtful, if highly paternalistic guardrails they build in amount to little more than fig leaves over the possible applications they intend to close off.

I'm personally in the "AI risk is overstated" camp. But if I'm wrong, all the top-down AI safety in the world is going to be meaningless in the face of a global network of researchers, enthusiasts, and tinkerers.

nonbirithm • 2 years ago

I wonder how it would be possible to stop people from publishing and spreading research that will doom some subsegment of humanity/culture. If it turns out that proliferating something like DALL-E 5 causes serious irreversible effects to human culture, what would the researchers conclude was the correct thing to do at this exact point in time? Stop publishing AI research? How would we get all 8 billion people on Earth to agree?

It's the reason why the rapid amount of progress in this field scares me at times. It feels like gradually being crushed under a wall of the inevitable march towards progress. It could be the case that stopping ourselves before it's too late isn't possible.

Sometimes I get the feeling that the laws of nature will eventually destroy or severely impact any lifeform that gains too much of an understanding about the world. For example, in some other universe it might be possible to survive X more decades if the laws of physics weakened the effects of nuclear war just enough for civilization to recover in a relatively short period of time, but we're stuck with what we have, and that doesn't necessarily map to the long-term survival of a highly intelligent lifeform.

It would be a shame if unbounded curiosity would be humanity's undoing. That curiosity is also a part of me, and my family, and my neighbors down the street, and those in the situation room.

Aeolun • 2 years ago

> How would we get all 8 billion people on Earth to agree?

You don’t, so the question is mostly academic.

hedora • 2 years ago

They claim the guardrails are for public good, but they're pretty clearly using them to try to establish a competitive moat.

It's similar to the "we don't sell personal information" claim. Sure, but that's because they make money renting malicious actors access to a black box that contains your personal information. Selling the contents of the box would reduce their overall revenue.

TulliusCicero • 2 years ago

To me it seems like an obvious case of reputational risk being much larger for more prominent organizations than for smaller ones.

It makes sense for Google to wait for some startup to "go first" in releasing a model largely without controls. That way, some random startup takes the initial heat of "people are using AI for bad things!!" headlines plastering tech blogs. Then Google can do basically the same thing a little bit later, and any attack pieces will sound old hat.

narrator • 2 years ago

I was using AI Dungeon with the full power GPT-3 model before they crippled it. That thing had a very uninhibited mind for erotica! Imagine what would happen when that power comes to image models!

lodovic • 2 years ago

It's only a matter of time before someone trains a model on stills from the major adult sites. Actually surprised it hasn't been done / made public yet.

jackblemming • 2 years ago

Yes, I feel much safer if OpenAI and Google are the sole keepers of such technology. They have my and the publics best interest at heart.

geraldwhen • 2 years ago

Is this satire?

pizza • 2 years ago

Yes

ljlolel • 2 years ago

Googlers and OpenAI legitimately believe this

dkjaudyeqooe • 2 years ago

Sarcasm.

PoignardAzur • 2 years ago

Let me put it this way: it's not great to live in a world where the immense majority of nukes are controlled by Donald Trump and Vladimir Putin.

But it's arguably better than living in a world where every single citizen has a nuke.

(Though the potential for harm of diffusion models is far below nuke; it's not "kill millions of people", it's "produce cheap disinformation and very convincing fake evidence to ruin someone's life")

nightski • 2 years ago

I think that would be a tough argument to make (in regards to image generation). The same could be said of just about any computing technology. The problem is we lose out on a lot of potential good.

Either way it doesn't matter, you can't control bits like you can enriched uranium. It's just a matter of time. In the grand scheme of things Open AI will be irrelevant.

Iv • 2 years ago

People will generate creepy porn and fake pictures. Humanity will survive this.

PoignardAzur • 2 years ago

You're not addressing my broader point, though, just the easy-to-snide-at version of my point.

Yes, it's pretty obvious that Dall-E and similar models won't destroy humanity.

My point isn't that Dall-E is a black ball. My point is we better hope a black ball doesn't exist at all, because the way this is going, if it exists, we are going to pick it, we clearly won't be able to stop ourselves.

(For the sake of dicussion, we can imagine a black ball could be "a ML model running on a laptop that can tell you how to produce an undetectable ultra-transmissible deadly virus from easily purchased items")

Aeolun • 2 years ago

> we can imagine a black ball could be "a ML model running on a laptop that can tell you how to produce an undetectable ultra-transmissible deadly virus from easily purchased items

I think we’re already past the point where we could have done something about this. In fact, we’ve probably been past that point since humanity was born.

I think it’s probably more valuable if we think about how we’ll deal with it if we do draw something that could be/is a black ball.

That said, so far all evidence points to extreme destruction just being really hard, which leads me to believe that truly black ball technologies may not exist.

geenew • 2 years ago

What do you mean by 'black ball'?

Like 'black balling', eg shunning?

Or 'black box', eg poorly understood technology?

metaphor • 2 years ago

https://nickbostrom.com/papers/vulnerable.pdf

> black ball: a technology that invariably or by default destroys the civilization that invents it.

Iv • 2 years ago

If you know that image generators are not it, then why talk about it here? Do you get this kind of angst at every technological increment?

Apart from nuclear scientists I don't know a field where participants are as conscious of the risks as AI research.

krisoft • 2 years ago

> Apart from nuclear scientists I don't know a field where participants are as conscious of the risks as AI research.

Great. Now some of these researchers preceived some risk with this technology. Not human extinction level risk, but risks. So they attempted to control the technology. To be specific: OpenAI is worried about deepfakes so they engineered guard rails into their implementation. OpenAI was worried about misinformation so they did not release the bigger GPT models.

Note: I’m not arguing either way if OpenAI was right, or honest about their motivations just observing that they expressed this opinion and acted on it to guard the risk.

Got this so far? Keep this in mind because I’m going to use this information to answer your question:

> If you know that image generators are not it, then why talk about it here?

Because it is a technology which were deemed risky by some practitioners and they attempted to control it, and those attempts to control the spread of the technology failed. This does not bode well towards our ability to restrain ourselves from picking up a real black ball, if we ever come across one. And that is why it is worth talking about black balls in this context.

Note it is unlikely that a black ball event will completely blindside us. It is unlikely that someone develops a clone of pacman with improved graphics and boom that alone leads to the inevitable death of humanity. It is much more likely that when the new and dangerous tech appears on our horizon there will be people talking about the potential dangers. What remains to be answered: what can we do then? This experience has shown us that if we ever encounter a black ball technology the steps taken by OpenAI doesn’t seem to be enough.

This is why it is worth talking about black ball technologies here. I hope this answers your question?

dkjaudyeqooe • 2 years ago

Social media is being used to undermine societies globally, arguably it has a fair chance of destroying humanity.

So I think that horse may have bolted already.

astrange • 2 years ago

Who says a computer be able to invent a virus by thinking about it really hard?

+1

krisoft • 2 years ago

dkjaudyeqooe • 2 years ago

Arthouse cinema has been doing that and more for decades and we're still here.

forty • 2 years ago

Exactly, and there are many pros for humanity to this too: people will be able to make funny pictures and things like that, so it's not like it's a bad deal.

sinenomine • 2 years ago

What if the black ball was a red herring all along and the usual suspect tech-CEO's hand(s) rushing to control the said crystal ball are the real hazard?

adwi • 2 years ago

> [0] https://nickbostrom.com/papers/vulnerable.pdf

Mobile friendly:

https://onlinelibrary.wiley.com/doi/10.1111/1758-5899.12718

manquer • 2 years ago

Wouldn’t nuclear weapons or even plastics be a black ball already?

Humanity is not homogeneous, we will always react to new inventions or tools differently , many will use it positively some won’t . Short of weapons of mass destruction I am not sure anything else will destroy civilization itself .

ALittleLight • 2 years ago

No, the black ball is a technology that, once invented, humanity cannot survive. Nuclear weapons have been invented and humanity is surviving. Same with plastics.

A black ball would be like - suppose nuclear weapons ignited the atmosphere. We test the first nuke, it ignites the atmosphere, a global fire storm consumes all breathable oxygen, kills all plants and everyone on the surface and everything else suffocates shortly after. Plastics aren't even close to this level of harm.

tablespoon • 2 years ago

>> Wouldn’t nuclear weapons or even plastics be a black ball already?

> No, the black ball is a technology that, once invented, humanity cannot survive. Nuclear weapons have been invented and humanity is surviving. Same with plastics.

I don't think that definition is a good one. Technological civilization [1] has survived nuclear weapons for ~80 years, but there's no guarantee it will survive it for another 80 years, let alone forever. It seems like these "black balls" should be though of like time bombs, there are at least two variables: how much destruction it will cause when it goes off AND the delay time before that happens. We shouldn't confuse a dangerous technology with a long delay time for a safe technology. My intuition tells me that there will probably be nuclear war at some point over the next 1,000+ years.

[1] I don't think nuclear weapons can make humanity extinct, so long as there are still little poorly-connected subsistence communities in remote areas. However, if The Market, manages to extend its tentacles into every human community, we're probably fucked.

+1

ALittleLight • 2 years ago

michaelt • 2 years ago

I guess people are getting confused because in terms of risk-of-destroying-humanity, nuclear weapons seem higher risk than DALL-E.

+1

PoignardAzur • 2 years ago

neuronic • 2 years ago

> Same with plastics

Plastics are playing the long game. They have to turn into micro- and nanoplastics first and may then enact undesired, unforeseen biological functions, just like BPA [1].

Not even talking about weaponizing this stuff...

[1] https://pubmed.ncbi.nlm.nih.gov/21605673/

ALittleLight • 2 years ago

It seems implausible to me that plastics are going to kill all humanity. If the paper you linked makes that case, then I will read it, but I didn't get that from skimming the abstract.

yreg • 2 years ago

>Nuclear weapons have been invented and humanity is surviving.

Isn't the point of the discussion started by PoignardAzur about how we deal with such technology after it is pulled out of the bag?

If you define black ball technology as fundamentaly uncontainable, then there is no point in talking about our practices of restricting access to new tech.

krono • 2 years ago

Either we equalise chaos or we reduce chaos, there exist no other options for entropy incarnate.

colordrops • 2 years ago

Yet another "open" model that isn't open. We shall see if they actually do release to the public. We keep seeing promises from various orgs but it never pans out.

TulliusCicero • 2 years ago

Their plan seems less hand wavey, they're being explicit with "first we release it like this, then like that, then freely to everyone".

You're right that they could always change their minds and that would suck, but so far they seem to be being up front.

runnerup • 2 years ago

Zero of their flagship models have been released AFAIK, e.g. the pre-DALLE model, CLIP, has not had weights for its ViT-H/16 model released from a paper 19 months ago.

This was before DALL-E and way before DALL-E 2.

Not that I think they need to. It’s just weird people defend them as being open just because they release crippled versions of their model.

TulliusCicero • 2 years ago

They've said that they intend to release the model soon-ish for anyone to run on their own machine, with no censorship*. I don't think the other major players, like OpenAI, have committed to doing the same.

* They're actually working on a filter right now, but IIRC it's an optional one, for when you don't want to accidentally generate NSFW output.

+1

runnerup • 2 years ago

bloppe • 2 years ago

OpenAI should probably rebrand lol the "open" part is basically parody at this point

colordrops • 2 years ago

Agreed, and they are influencing others, like the company in this article, which is following their model of "opening" things.

dang • 2 years ago

Recent and related:

Stable Diffusion launch announcement - https://news.ycombinator.com/item?id=32414811 - Aug 2022 (37 comments)

axg11 • 2 years ago

I’m excited for the coming race to improve and miniaturise this tech. Apple has a great track record of making ML models light enough to run locally. There will come a day when photorealistic image generation can run on an iPhone.

andrewacove • 2 years ago

Maybe this is their long term plan for getting rid of the camera bump.

rasz • 2 years ago

3 days from launch to getting your twitter account suspended.

https://twitter.com/DiffusionPics/

ShamelessC • 2 years ago

Context?

humanistbot • 2 years ago

Can someone tell me how this compares to the guide and repo shared a few days ago on HN: https://news.ycombinator.com/item?id=32384646

Voloskaya • 2 years ago

This version is a bit more optimized, and better packaged. Also the model has been trained longer, so when the weights become publicly available the resulting quality should be much higher.

Geee • 2 years ago

There's also Disco Diffusion: https://www.reddit.com/r/DiscoDiffusion/

Not sure how they compare. DD seems to be quite popular. I'm currently setting up DD locally.

glenneroo • 2 years ago

I've been running DD for a few months now... I tend to just edit the python script or use e.g. entmike's fork which can read config files to make changes to the 50+ parameters (basically everything is better than having to use Jupyter Notebooks IMO), granted if you don't have a GPU with 6+ GB of VRAM, you can often get a decent enough GPU for free from Google Colab. For running locally, I can also highly recommend Visions of Chaos, which includes multiple versions/forks of Disco Diffusion, as well as a ton of other latent diffusion scripts, not to mention many many many other generation features such as fractals and even music. They also recently added the ability to train your own diffusion models which I've been doing the last few days using thousands of my own photographs. It also has a pretty nice GUI and the dev is extremely responsive on Discord. Also after you do the setup for VoC it handles running all the python venv setup stuff otherwise necessary with local DD installs. In any case, check out DD Discord and/or VoC discord for lots of info, tips, help, examples, and support.

Geee • 2 years ago

Thanks for the info. It is possible to do something like transfer learning on top of existing models or do you train your own models from scratch? I'll check out that Vision of Chaos thing. I'm just beginning my journey into this generative art stuff and just basically trying to get this running right now.

+1

glenneroo • 2 years ago

thorum • 2 years ago

If you want to see more examples of what this AI is capable of, check out the subreddit:

https://reddit.com/r/stablediffusion

humanistbot • 2 years ago

If anyone from Stability is reading, the confirmation e-mail to sign up is sending a broken link:

"We couldn't process your request at this time. Please try again later. If you are seeing this message repeatedly, please contact Support with the following information:

ip: XXXX

date: Mon Aug 15 2022 XX:XX:XX GMT-0700 (Pacific Daylight Time)

url: https://stability.us18.list-manage.com/subscribe/confirm"

wccrawford • 2 years ago

It worked for me just now, so maybe it was temporary, or they already fixed it?

notrealyme123 • 2 years ago

I forwarded this Thread to a member of the Project.

hifikuno • 2 years ago

I also had this response.

ruuda • 2 years ago

The site shows a notification in German that I need to enable JavaScript to use the site, after the first paragraph. But then after that is the full article, including images, which is almost perfectly readable, except it's at 5% opacity (or maybe the JavaScript popup is 95% opacity overlaid on the article), which makes it impossible to read again. :'(

belltaco • 2 years ago

Article says it needs 5.1GB of Graphics RAM.

Does any one know how much data download and disk storage does it need?

_blop • 2 years ago

The v1.3 model weighs in at 4.3 GB. There's an additional download of 1.6 GB of other models due to usage of huggingface's transformers (only once on startup). And the conda env takes another 6 GBs due to pytorch and cuda.

Larger images will require (much) more than 5.1 GB. In my case, a target resolution of 768x384 (landscape) with a batch size of 1 will max out my 12GB card, an RTX3080Ti.

mdorazio • 2 years ago

I think this is a good time to ask if anyone is working on parallelizing machine learning compute anymore? For at-home computation like this it seems like it would be a lot better to allow people to stack a few cheaper GPUs rather than having to pony up thousands of dollars for ML-oriented beast cards to be able to do things like generate large images.

andybak • 2 years ago

AI upscaling will solve everything ;)

I've generated some remarkably good-looking print quality images by upscaling 512x512 sources

+2

bambax • 2 years ago

luismmolina • 2 years ago

If you read directly from the site. The requirements for the graphic card are 10 VRAM as a minimum. Because it's ruins locally you don't need to download anything apart from the initial model, this applies to the disk space too.

kgc • 2 years ago

Does this work on Apple silicon processors? They have plenty of RAM accessible to the GPU.

sroussey • 2 years ago

The articles says it will, but that it is not using the GPU unfortunately.

999900000999 • 2 years ago

Has anyone made a pixel art generator, that can create the animation sprites ?

_w1kke_ • 2 years ago

@KaliYuga did - she got hired by StabilityAI just a few days ago. Here is a link to the Pixel Art Diffusion notebook:

https://colab.research.google.com/github/KaliYuga-ai/Pixel-A...

0xdead1eaf • 2 years ago

Check out NUWA-Infinity[0][1], submitted to arxiv jul 20, 2022. It captures artistic style very well (though can't speak to the quality of the pixel art it would generate) and can do image to video.

[0] https://nuwa-infinity.microsoft.com/#/ [1] https://arxiv.org/abs/2207.09814

gxqoz • 2 years ago

You can use DALL-E and other models to make pixel art ("as pixel art"), although it can both be overkill and hard to get consistent results that you'd put into animation. I'm guessing that starting from more of a video model and then converting to pixel art could be better. Although it's also non-trivial to turn "realistic" video into convincing animation.

999900000999 • 2 years ago

I pay good money for a specialized machine learning algorithm that can take a pixel art character, and then generate all the animated sprites for it.

I actually tried to get Dalle to do this, And it made like three good sprites in the rest were just broken. But it was so strange, because you could see it was still organized as a sprite sheet, it's just the sprites were useless.

I think the practical applications of this technology will be hyper specialized models for specific purposes.

panabee • 2 years ago

hi there. we're working on this, been working on a model for months now. hope to release something soon. how best to get in touch with you?

chucky123 • 2 years ago

That's a really good idea.

tckerr • 2 years ago

This is exactly the type of application I am interested in as well. As a hobby game dev with only mediocre pixel art skills, having a generator to finish the busy work would be an absolute lifesaver. I'm also interested in using it for fleshing out artistic vision through generating variations of an initial concept.

Hopefully we aren't more than a few years away from something practical like this.

kragen • 2 years ago

This article says both that it's "open-source" and that it's "available for [only] research purposes upon request". These can't both be correct. Where is the error?

zone411 • 2 years ago

They jumped the gun with this announcement. I get wanting to share the excitement of AI doing something cool with the world (I've been there) but they should've waited until it's accessible to the public.

Diris • 2 years ago

The code is open source, the models are not.

upupandup • 2 years ago

My friend wants to know when she can use this to generate porn, are we close?

GistNoesis • 2 years ago

I did a show HN about this https://news.ycombinator.com/item?id=31900095 a month ago, to experiment with the technology. The training was done in a week-end only, with 2 old gpus (1080ti).

Currently waiting to scale-up for improve quality mainly for economic reasons, not quite sure I could recoup the training costs yet. Even more so if I go with cloud training.

NVidia will release the 4090 in september, and ethereum may do "the merge" that will make GPU useless for mining so GPU price could be affordable so I can update my home cluster with affordable 3090s. (But electricity prices are also up).

Also there are new algorithms every month like the stable diffusion, that would obsolete your previous training.

The video generation cost is probably still too expensive compared to just paying a cam girl in a low wage country. But it will probably go down soon.

This is also some sensitive data, as plagued with copyright issues, so it's quite troublesome to legally share training datasets to share costs.

It also has its own challenges with respect to custom dataset creation with text description, so it's probably a better idea to adapt the algorithm to the currently available data to keep the costs low.

Finally once someone releases a model, in the next month there will be at least 3 clones.

There is also the problem to find an adult friendly payment processor.

And the multitude of potential legal issues.

But it's probably inevitable.

Voloskaya • 2 years ago

The data used to train those model is specifically filtered to remove sexual content, so the model can't generate porn because it has no idea what it looks like, beyond a few samples that made it past the filter.

So no, your "friend" can't use it for that.

upupandup • 2 years ago

Why is it that sexual content is so frowned upon in this space? If it's a content publishing platform I would understand that advertisers don't want that, but this is literally dictating people what is bad and good. I just don't understand this Puritan outrage with text-to-image porn generation.

Voloskaya • 2 years ago

Because you can't control what the model is going to output in response to a query. The model is trained to respond in a way that is aligned but there is no guarantee.

Since we certainly don't want to show generated image of porn or violence to someone that didn't specifically ask for that, the easiest way to ensure that's not going to happen is to just not train on that kind of data in the first place. The worst that can happen with a model trained on "safe" images is that the image is irrelevant or makes no sense, meaning you could deploy systems with no human curator on the other end, and nothing bad is going to happen. You lose that ability as soon as you integrate porn.

Also with techniques like in-painting, the potential for misuse of a model trained on porn/violence would be pretty terrifying.

So the benefits of training on porn seems very small compared to the inconvenience. I don't think it's anything to do with puritanism, it's just that if I am the one putting dollars and time to train such a model I am certainly not going to be taking on the added complexity and implications of dealing with porn to just to make a few people realize their fetishes at the risk of my entire model being undeployable because it's outputting too much porn or violence.

+2

upupandup • 2 years ago

spywaregorilla • 2 years ago

Because it's a lot more annoying for your innocuous content to be rendered as porn when the ai happens to interpret it that way than it is for you to be unable to render your pervy desires intentionally.

A porn model should really be it's own thing.

gs17 • 2 years ago

I imagine a large part of it is that it could generate photorealistic child porn (also "deepfake" porn of real people) and there's not really a good way to prevent it entirely while also allowing generalized sexual content AFAIK. There's probably some debate on how big a problem this really is, but no one wants their system to be the one with news stories about how it's popular with pedophiles. It was the issue they had with AI Dungeon.

+1

pixl97 • 2 years ago

sbierwagen • 2 years ago

Because if the model generates anything problematic the New York Times will ruin your life.

blowski • 2 years ago

I'd guess that, for general purpose companies, it's an area full of legal ambiguity and potential for media outrage, so just not worth the risk. However, given the evidence of human history, it's certain that someone with an appetite for exploiting this niche will develop exactly that kind of tool.

djbebs • 2 years ago

Because the law makes it very difficult to provide such services in the spirit of preventing the exploitation of minors.

Make no mistake this is an indirectly legal hurdle.

_blop • 2 years ago

This article hints that Stable Diffusion can at least generate normal looking nude women: https://techcrunch.com/2022/08/12/a-startup-wants-to-democra...

There are attempts to gather porn images and train or fine-tune existing networks on it, here's a recent attempt by an art student mentioned in the article above (NSFW!!): https://www.vice.com/en/article/m7ggqq/this-furry-porn-ai-ge...

isoprophlex • 2 years ago

Jesus H Christ those are some seriously cursed hindquarters

SV_BubbleTime • 2 years ago

I was a better person this morning for not knowing that furries had the term "hindquarters". I mean, that's fine for other people, you do you, but for me, I was better this morning.

isoprophlex • 2 years ago

You'll have to train it on your own data. as others have mentioned the training data for dall-e, stable diffusion etc has been cleaned prior to training.

However, if it is possible to re-start the training process from the weights of a non-sexually aware model, this finetuning might not take all that long..!

Mountain_Skies • 2 years ago

Is there a way to make money selling the model to people who want to use it to make porn? If so, it will trickle down relatively quickly. If not, it'll still eventually trickle down but will take longer.

TigeriusKirk • 2 years ago

Surely you could sell custom prompt runs for porn for a great deal more than OpenAI is charging for generalist custom prompts.

Making money at it should be easy, and places like PornHub wouldn't care about any outrage. The real challenge would be limiting criminal and civil liability, at least to my not-in-the-business thinking.

upupandup • 2 years ago

This is already a thing in the kpop fake porn industry. I don't know how Patreon/Onlfans are allowing this to happen, I mean it's a travesty that highly suggestive lyrics and stripper dance moves in scantily clad kpop idols are being used for sexual gratification

astrange • 2 years ago

Not their fault their country banned actual porn.

TaylorAlexander • 2 years ago

Once training can be done on a beefy home rig folks will be all over it.

planetsprite • 2 years ago

You'll need to train your own model, though I'm sure if someone manages to crowdsource it there's a very obvious economic incentive.

TulliusCicero • 2 years ago

Props for just coming out and saying it

GaggiX • 2 years ago

It is possible to generate nudes, but not pornographic ones.

unethical_ban • 2 years ago

Wait, so the closed source generator known as DALL-E is owned by a company called OpenAI?

alephxyz • 2 years ago

It's a bit of a dead horse at this point but yes. See the previous discussion: https://news.ycombinator.com/item?id=28416997

fariszr • 2 years ago

As Elon said "OpenAI should be more Open IMO"

glenneroo • 2 years ago

Curiously it seemed to lock down even more after they "partnered" with Microsoft.

stuckinhell • 2 years ago

This is pretty amazing, anyone have any tips on building a pc for machine learning with a RAID device ?

jessfyi • 2 years ago

Hasn't been updated since 2020, but Tim Dettmer's guide [0] is pretty much the gold standard for optimizing what to buy for which area of DL/ML you're interested in. The pricing has changed thanks to GPU prices coming back down to earth a bit, but what to look out/how much ram you need for which task hasn't. Check out the "TL;DR advice" section then scroll back up for detail info on why and common misconceptions. For the tips on a RAID/NAS setup alongside it, just head to the datahoarders subreddit and their FAQ.

[0] https://timdettmers.com/2020/09/07/which-gpu-for-deep-learni...

cellis • 2 years ago

Look into building an ethereum mining machine... it can double as an ML workstation. That's what I did.

hedora • 2 years ago

If you just want to try it out, consider using a remote cad workstation from a company like paperspace.

(No affiliation.)

dustingetz • 2 years ago

doesn’t build on my mac studio due to a dependency whose mac version is two major versions behind

fswd • 2 years ago

Unfortunately it's a commercial license and the model isn't available to the public so it isn't very useful.

andybak • 2 years ago

It's going to be MIT from what I have heard. On phone atm so can't provide sources.

fswd • 2 years ago

https://stability.ai/research-access-form

So far, I haven't got a response but it's a Monday

Here's the license

https://github.com/CompVis/stable-diffusion/blob/main/LICENS...

andybak • 2 years ago

That’s just a restricted interim release. The proper public release isn’t ready yet. No timescale but sounds like days/weeks rather than months/years.

+1

fswd • 2 years ago

AgentME • 2 years ago

Isn't that just temporary until the public release? Or is the article misleading by calling it open source?

th1s1sit • 2 years ago

Would be a blast if the cloud is up ended by RISC and GPUs powerful enough to crunch “big data” at home.

Would love to see FAANG and SV crash and burn, margins chipped away to nothing.

keepquestioning • 2 years ago

We are heading into uncharted territory :(