Back

AMD funded a drop-in CUDA implementation built on ROCm: It's now open-source

1045 points6 monthsphoronix.com
Keyframe6 months ago

This event of release is however a result of AMD stopped funding it per "After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs. One of the terms of my contract with AMD was that if AMD did not find it fit for further development, I could release it. Which brings us to today." from https://github.com/vosen/ZLUDA?tab=readme-ov-file#faq

so, same mistake intel made before.

tgsovlerkhgsel6 months ago

How is this not priority #1 for them, with NVIDIA stock shooting to the moon because everyone does machine learning using CUDA-centric tools?

If AMD could get 90% of the CUDA ML stuff to seamlessly run on AMD hardware, and could provide hardware at a competitive cost-per-performance (which I assume they probably could since NVIDIA must have an insane profit margin on their GPUs), wouldn't that be the opportunity to eat NVIDIA's lunch?

llm_trw6 months ago

Never underestimate AMD's ability to fail.

Ryzen was a surprise to everyone not because it was good, but because they didn't fuck it up within two generations.

AMD cards have more raw compute than nvidia, they are better than nvidia, yet the software is so bad that I gave up on using it and switched to nvidia. Two weeks of debugging driver errors vs 30 minutes of automated updates.

tormeh6 months ago

It's rather shocking that with RADV, Valve (mostly) has written a better RDNA2 driver than AMD has managed for their own cards. Besides the embarrassment, AMD is leaving tons of performance and therefore market share on the table. You have to wonder wtf is going on over at AMD.

dralley6 months ago

RADV was started by David Arlie of Red Hat, although Valve been dedicating some very significant resources over the past few years.

pheatherlite6 months ago

The only reason our lab bought 20k worth of Nvidia gpu cards rather than amd was the cuda industry standard (might as wellbe). It's kind of mind boggling how much business amd must be losing over this.

Modified30196 months ago

That was a good decision. The amount of lamenting engineers I’ve seen over the years who’ve been given the task of trying to get more affordable AMD cards to work with enterprise functionality is nontrivial. AMD nearly borders on hostility with its silence, even if you want to throw millions at them, it’s insane.

At least Nvidia, which I fucking hate, will happily hold out their hand for cash even from individuals.

So now we’re in a hilarious situation where people from hobbyists to enterprise devs are hoping for intel to save the day.

Rafuino6 months ago

So, your lab bought ~1 GPU?

polygamous_bat6 months ago

Hey stop shaming the GPU poor, not everyone is Mark Zuckerberg ordering $8bn. of GPUs.

paulmd6 months ago

or a rack of 3090s/4090s or quadros

(the "no datacenter" clause obviously excludes workstations, and the terms of this license cannot be applied to the open kernel driver since it's GPL'd)

exikyut6 months ago

Hey, I should go play with those workstation/server configurators now they'll have been updated to supply A100Xs and such...

up2isomorphism5 months ago

Your “lab” does not sound like a lab in the classical sense.

HarHarVeryFunny6 months ago

IMO the trouble is that CUDA is too low level to allow emulation without a major loss of performance, and even if there was a choice of CUDA-compatible vendors, people are ultimately going to vote with their wallets. It's not enough to be compatible - you need to be compatible while providing the same or better performance (else why not just use NVIDIA).

A better level to target compatibility would be at the framework level such as PyTorch, where the building blocks of neural networks (convolution, multi-head attention, etc, etc) are high level and abstract enough to allow flexibility in mapping them onto AMD hardware without compromising performance.

However, these frameworks are forever changing and playing continual catch-up there still wouldn't be a great place to be, especially without a large staff dedicated to the effort (writing hand-optimized kernels), which AMD don't seem to be able/willing to muster.

So, finally, perhaps the strategically best place for AMD to invest would be in compilers and software tools to allow kernels to be written in a high level language. Becoming a first class Mojo target wouldn't be a bad place to start, assuming they are not already in partnership.

hnfong5 months ago

> However, these frameworks are forever changing and playing continual catch-up there still wouldn't be a great place to be, especially without a large staff dedicated to the effort (writing hand-optimized kernels), which AMD don't seem to be able/willing to muster.

The situation in reality is quite actually quite bad.

Given that I have a M2 Max and no nVidia cards, I've tried enough PyTorch-based ML libraries that at some point, I basically expect them to flat out show an error saying CUDA 10.x+ is required once the dependencies are installed (eg. one of them being the bitsandbytes library -- in fairness, there's apparently some effort trying to port the code to other platforms as well).

As of today, the whole field is moving too fast that it's simply not worth it for a solo dev or even a small team to even attempt getting a non-CUDA stack up and running, especially with the other major GPU vendors not (able to?) hiring people to port the hand-optimized CUDA kernels.

Hopefully the situation will change after these couple years of frenzy, but in the time being I don't see any viable way to avoid using a CUDA stack if one is serious with getting ML stuff done.

test65545 months ago

Nvidia controls CUDA the software spec, Nvidia also controls the hardware CUDA runs on. The industry adopts CUDA standards and uses the latest features.

AMD cannot keep up with arbitrarily changing hardware and software while trying to please developers that want what was just released. They would always be a generation behind at tremendous expense.

make36 months ago

it's a common misconception that deep learning stuff is built in cuda. it's actually built on CUDNN kernels that don't use cuda but are actually gpu assembly written by hand by phds. I'm really not convinced that this project here would be able to be used for this. the ROCm kernels that are analogue to cudnn though, yes

abbra5 months ago

This project relies on ROCm for all it's CUDNN magic.

VoxPelli6 months ago

Sounds like he had a good contract, would be great to read more about that, hopefully more devs could include the same phrasing!

jacoblambda6 months ago

I mean it could also be that there was no business case for it as long as it remained closed source work.

If the now very clearly well functioning implementation continues to perform as well as it is, the community may be able to keep it funded and functioning.

And the other side of this is that with renewed AMD interest/support for the rocm/HIP project, it might be just good enough as a stopgap step to push projects towards rocm/HIP adoption. (included below is another blurb from the readme).

> I am a developer writing CUDA code, does this project help me port my code to ROCm/HIP?

> Currently no, this project is strictly for end users. However this project could be used for a much more gradual porting from CUDA to HIP than anything else. You could start with an unmodified application running on ZLUDA, then have ZLUDA expose the underlying HIP objects (streams, modules, etc.), allowing to rewrite GPU kernels one at a time. Or you could have a mixed CUDA-HIP application where only the most performance sensitive GPU kernels are written in the native AMD language.

RachelF6 months ago

Yeah, AMD look like idiots for doing this.

Either they are very stupid, or open sourcing the library stops NVidia from suing them in a repeat of the Oracle/Google lawsuit over Java APIs?

I'm not sure what the reason is?

yen2236 months ago

I think AMD is focusing on the "inference" side of ML, which doesn't really require CUDA or similar, and is what they believe a much larger market.

Time will tell if that strategy is going to pan out. Ceding the ML "training" market entirely to Nvidia is certainly a bold move

bitbang6 months ago

That was my immediate thought: the software is made publicity available to help add value to their product offering, but they plausable deniability in court and don't have to bear the burden of potential lawsuits or support.

theropost5 months ago

Perhaps AMD realizes that if they released something like this in a formal capacity, they might face a barrage of lawsuits, and IP claims from Nvidia. If this is completed through an Open Source project however, in which AMD is not directly funding, Nvidia would not have many legal avenues to attack. Just an opinion.

pk-protect-ai6 months ago

> After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs

Who was responsible at AMD for this project and why is he still not fired???????? How brain dead someone have to be to reject the major market share??????

nikanj6 months ago

This should be the top comment here, people are getting their hopes up for nothing

enonimal6 months ago

From the ARCHITECTURE.md:

> Those pointers point to undocumented functions forming CUDA Dark API. It's impossible to tell how many of them exist, but debugging experience suggests there are tens of function pointers across tens of tables. A typical application will use one or two most common. Due to they undocumented nature they are exclusively used by Runtime API and NVIDIA libraries (and in by CUDA applications in turn). We don't have names of those functions nor names or types of the arguments. This makes implementing them time-consuming. Dark API functions are are reverse-engineered and implemented by ZLUDA on case-by-case basis once we observe an application making use of it.

gdiamos6 months ago

These were a huge pain in the ass when I tried this 20 years ago on Ocelot.

Eventually one of the NVIDIA engineers just asked me to join and I did. :-P

jherico6 months ago

Do the job you want, not the job you have, eh?

leeoniya6 months ago

fertile soil for Alyssa and Asahi Lina :)

https://rosenzweig.io/

https://vt.social/@lina

smcl6 months ago

I know that Lina doesn't like a lot of the attention HN sends her way so it may be better if you don't link her socials here.

leeoniya6 months ago

pretty sure she's a vt.social admin, so she can always do what jwz does with HN referer headers :D

given how omnipresent she is with her live streaming, it's a bit like South Park's Worldwide Privacy Tour: https://www.youtube.com/watch?v=2N8_5LDkZwY

bigdict6 months ago

Sounds ridiculous, why have a public presence on a social network then?

dralley6 months ago

IIRC, it's not that she doesn't like the attention to her work, it's that she doesn't like how quickly conversations get derailed by things that have nothing to do with the work.

smcl5 months ago

She's gotten a fair amount of unwanted attention from HN specifically, guys trying to dox her and stirring rumours about her. Like just mean-spirited stuff, and if you or I had experienced this then we'd likely feel the same animosity towards HN.

+2
cyanydeez6 months ago
throw109206 months ago

I think it would be better to avoid mention of them (or the Asahi project) on HN entirely, for that matter.

If they don't want HN to criticize them, then they should expect to not get the free publicity that HN offers. Seems fair enough.

Also, between accusing HN of "supporting trans genocide" (which is some mix between "impossible" and "false"), and poisoning links with HN referrer URLs, they don't seem like very good people themselves.

+1
squigz5 months ago
PoignardAzur6 months ago

Having an ARCHITECTURE.md file at all is extremely promising, but theirs seems pretty polished too!

Cu3PO426 months ago

I'm really rooting for AMD to break the CUDA monopoly. To this end, I genuinely don't know whether a translation layer is a good thing or not. On the upside it makes the hardware much more viable instantly and will boost adoption, on the downside you run the risk that devs will never support ROCm, because you can just use the translation layer.

I think this is essentially the same situation as Proton+DXVK for Linux gaming. I think that that is a net positive for Linux, but I'm less sure about this. Getting good performance out of GPU compute requires much more tuning to the concrete architecture, which I'm afraid devs just won't do for AMD GPUs through this layer, always leaving them behind their Nvidia counterparts.

However, AMD desperately needs to do something. Story time:

On the weekend I wanted to play around with Stable Diffusion. Why pay for cloud compute, when I have a powerful GPU at home, I thought. Said GPU is a 7900 XTX, i.e. the most powerful consumer card from AMD at this time. Only very few AMD GPUs are supported by ROCm at this time, but mine is, thankfully.

So, how hard could it possibly to get Stable Diffusion running on my GPU? Hard. I don't think my problems were actually caused by AMD: I had ROCm installed and my card recognized by rocminfo in a matter of minutes. But the whole ML world is so focused on Nvidia that it took me ages to get a working installation of pytorch and friends. The InvokeAI installer, for example, asks if you want to use CUDA or ROCm, but then always installs the CUDA variant whatever you answer. Ultimately, I did get a model to load, but the software crashed my graphical session before generating a single image.

The whole experience left me frustrated and wanting to buy an Nvidia GPU again...

sophrocyne6 months ago

Hey there -

I'm a maintainer (and CEO) of Invoke.

It's something we're monitoring as well.

ROCm has been challenging to work with - we're actively talking to AMD to keep apprised of ways we can mitigate some of the more troublesome experiences that users have with getting Invoke running on AMD (and hoping to expand official support to Windows AMD)

The problem is that a lot of the solutions proposed involve significant/unsustainable dev effort (i.e., supporting an entirely different inference paradigm), rather than "drop in" for the existing Torch/diffusers pipelines.

While I don't know enough about your set up to offer immediate solutions, if you join the discord, am sure folks would be happy to try walking through some manual troubleshooting/experimentation to get you up and running - discord.gg/invoke-ai

Cu3PO426 months ago

Hi! I really appreciate you taking the time to reply.

I have since gotten Invoke to run and was already able to get some results I'm really quite happy with, so thank you for your time and commitment working on Invoke!

I understand that ROCm is still challenging, but it seems my problems were less related to ROCm or Invoke itself and more to Python dependency management. It really boiled down to getting the correct (ROCm) versions of packages installed. Installing Invoke from PyPi always removed my Torch and installed CUDA-enabled Torch (as well as cuBLAS, cuDNN, ...). Once I had the correct versions of packages, everything just worked.

To me, your pyproject.toml looks perfectly sane, so I wasn't sure how to go about fixing the problem.

What ended up working for me was to use one of AMD's ROCm OCI base images, manually installing all dependencies, foregoing a virtual environment, cloning your repo (, building the frontend), and then installing from there.

The majority of my struggle would have been solved by a recent working Docker image containing a working setup. (The one on Docker Hub is 9 months old.) Trying to build the Dockerfile from your repo, I also ended up with a CUDA-enabled Torch. It did install the correct one first, but in a later step removed the ROCm-enabled Torch to switch it for the CUDA-enabled one.

I hope you'll consider investing some resources into publishing newer, working builds of your Docker image.

sophrocyne6 months ago

You bet - Thanks for the feedback. Glad you're enjoying Invoke!

We do have Docker packages hosted on GH, but I'll be the first to admit that we haven't prioritized ROCm. Contributors who have AMDs are a scant few, but maybe we'll find some help in wrangling that problem now that we know there's an avenue to do so.

Cu3PO425 months ago

As promised in my other comment, I did send a PR! https://github.com/invoke-ai/InvokeAI/pull/5714

Cu3PO425 months ago

I hate maintaining my own build instructions as much as the next guy, so I'll try to get your Dockerfile working for me and then send a PR.

doctorpangloss6 months ago

> Installing Invoke from PyPi... To me, your pyproject.toml looks perfectly sane, so I wasn't sure how to go about fixing the problem.

You can't install the PyTorch that's best for the currently running platform using a pyproject.toml with a setuptools backend, for starters. Invoke would have to author a setup.py that deals with all the issues, in a way that is compatible with build isolation.

> The majority of my struggle would have been solved by a recent working Docker image containing a working setup. (The one on Docker Hub is 9 months old.)

Why? Given the state of the ecosystem, what guarantee is there really that the documentation for Docker Desktop with AMD ROCm device binding is going to actually work for your device? (https://rocm.docs.amd.com/projects/MIVisionX/en/latest/docke...)

There is a lot of ad-hoc reinvention of tooling in this space.

Cu3PO425 months ago

> You can't install the PyTorch that's best for the currently running platform using a pyproject.toml with a setuptools backend, for starters.

I see. I do know Python, but my knowledge of setuptools, pip, poetry and whatever else have you. To get my working setup, I specified an --index-url for my Torch installation. Does that not work while using their current setup?

> Why? Given the state of the ecosystem, what guarantee is there really that the documentation for Docker Desktop with AMD ROCm device binding is going to actually work for your device?

Well, they did work for me. Though I think only passing /dev/{dri,kfd} and setting seccomp=unconfined was sufficient. So for my particular case, getting a working image was the only missing step.

From a more general POV: it might not make sense to invest in a ROCm OCI image from a short-term business perspective, but in the long term and based purely on principal, I do think the ecosystem should strive to be less reliant on CUDA and only CUDA.

westurner5 months ago

> AMD's ROCm OCI base images,

ROCm docs > "Install ROCm Docker containers" > Base Image: https://rocm.docs.amd.com/projects/install-on-linux/en/lates... links to ROCm/ROCm-docker: https://github.com/ROCm/ROCm-docker which is the source of docker.io/rocm/rocm-terminal: https://hub.docker.com/r/rocm/rocm-terminal :

  docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/rocm-terminal
ROCm docs > "Docker image support matrix": https://rocm.docs.amd.com/projects/install-on-linux/en/lates...

ROCm/ROCm-docker//dev/Dockerfile-centos-7-complete: https://github.com/ROCm/ROCm-docker/blob/master/dev/Dockerfi...

Bazzite is a ublue (Universal Blue) fork of the Fedora Kinoite (KDE) or Fedora Silverblue (Gnome) rpm-ostree Linux distributions; ublue-os/bazzite//Containerfile : https://github.com/ublue-os/bazzite/blob/main/Containerfile#... has, in addition to fan and power controls, automatic updates on desktop, supergfxctl, system76-scheduler, and an fsync kernel:

  rpm-ostree install rocm-hip \
        rocm-opencl \
        rocm-clinfo
But it's not `rpm-ostree install --apply-live` because its a Containerfile.

To install a ublue-os distro, you install any of the Fedora ostree distros: {Silverblue, Kinoite, Sway Atomic, or Budgie Atomic} from e.g. a USB stick and then `rpm-ostree rebase <OCI_host_image_url>`:

  rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/bazzite:stable
  rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/bazzite-nvidia:stable
  rpm-ostree rebase ostree-image-signed:
ublue-os/config//build/ublue-os-just/40-nvidia.just defines the `ujust configure-nvidia` and `ujust toggle-nvk` commands: https://github.com/ublue-os/config/blob/main/build/ublue-os-...

There's a default `distrobox` with pytorch in ublue-os/config//build/ublue-os-just/etc-distrobox/apps.ini: https://github.com/ublue-os/config/blob/main/build/ublue-os-...

  [mlbox]
  image=nvcr.io/nvidia/pytorch:23.08-py3
  additional_packages="nano git htop"
  init_hooks="pip3 install huggingface_hub tokenizers transformers accelerate datasets wandb peft bitsandbytes fastcore fastprogress watermark torchmetrics deepspeed"
  pre-init-hooks="/init_script.sh"
  nvidia=true
  pull=true
  root=false
  replace=false
docker.io/rocm/pytorch: https://hub.docker.com/r/rocm/pytorch

pytorch/builder//manywheel/Dockerfile: https://github.com/pytorch/builder/blob/main/manywheel/Docke...

ROCm/pytorch//Dockerfile: https://github.com/ROCm/pytorch/blob/main/Dockerfile

The ublue-os (and so also bazzite) OCI host image Containerfile has Sunshine installed; which is a 4k HDR 120fps remote desktop solution for gaming.

There's a `ujust remove-sunshine` command in system_files/desktop/shared/usr/share/ublue-os/just/80-bazzite.just : https://github.com/ublue-os/bazzite/blob/main/system_files/d... and also kernel args for AMD:

  pstate-force-enable:
    rpm-ostree kargs --append-if-missing=amd_pstate=active
ublue-os/config//Containerfile: https://github.com/ublue-os/config/blob/main/Containerfile

LizardByte/Sunshine: https://github.com/LizardByte/Sunshine

moonlight-stream https://github.com/moonlight-stream

Anyways, hopefully this PR fixes the immediate issue: https://github.com/invoke-ai/InvokeAI/pull/5714/files

conda-forge/pytorch-cpu-feedstock > "Add ROCm variant?": https://github.com/conda-forge/pytorch-cpu-feedstock/issues/...

And Fedora supports OCI containers as host images and also podman container images with just systemd to respawn one or a pod of containers.

+1
Cu3PO425 months ago
latchkey6 months ago

Invoke is awesome. Let me know if you guys want some MI300x to develop/test on. =) We've also got some good contacts at AMD if you need help there as well.

nocombination6 months ago

As other folks have commented, CUDA not being an open standard is a large part of the problem. That and the developers who target CUDA directly when writing Stable Diffusion algorithms—they are forcing the monopoly. Even at the cost of not being able to squeeze every ounce out of the GPU, portability greatly improves software access when people target Vulkan et al.

westurner6 months ago

> Proton+DXVK for Linux gaming

"Building the DirectX shader compiler better than Microsoft?" (2024) https://news.ycombinator.com/item?id=39324800

E.g. llama.cpp already supports hipBLAS; is there an advantage to this ROCm CUDA-compatibility layer - ZLUDA on Radeon (and not yet Intel OneAPI) - instead or in addition? https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#hi... https://news.ycombinator.com/item?id=38588573

What can't WebGPU abstract away from CUDA unportability? https://news.ycombinator.com/item?id=38527552

HarHarVeryFunny5 months ago

BLAS will only get you so far. About the highest level operation it has is matmul, which you can use to build convolution (im2col, matmul, col2im), but that won't be as performant as a hand optimized cuDNN convolution kernel. Same goes for any other high level neural net building blocks - trying to build them on top of BLAS will not get you remotely close to performance of a custom kernel.

What's nice about BLAS is that there are optimized implementations for CPUs (Intel MKL) as well as NVIDIA (cuBLAS) and AMD (hipBLAS), so while it's very much limited in what it can do, you can at least write portable code around it.

westurner5 months ago

"CUDNN API supported by HIP" has a coverage table: https://rocm.docs.amd.com/projects/HIPIFY/en/amd-staging/tab...

ROCm/hipDNN wraps CuDNN on Nvidia and MiOpen on AMD; but hasn't been updated in awhile: https://github.com/ROCm/hipDNN

https://news.ycombinator.com/item?id=37808036 : conda-forge has various BLAS implementations, including MKL-optimized BLAS, and compatible NumPy and SciPy builds.

BLAS: Basic Linear Algebra Sub programs: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogra...

"Using CuPy on AMD GPU (experimental)" https://docs.cupy.dev/en/v13.0.0/install.html#using-cupy-on-... :

  $ sudo apt install hipblas hipsparse rocsparse rocrand rocthrust rocsolver rocfft hipcub rocprim rccl
HarHarVeryFunny5 months ago

I guess I misunderstood you.

You were asking if this CUDA compatability layer might hold any advantage over HIP (e.g. for use by llama.cpp) ?

I think the answer is no, since HIP includes pretty full-featured support for many of the higher level CUDA-based APIs (cuDNN, cuBLAS, etc), while per the Phoronix article ZLUDA only (currently) has minimal support for them.

I wouldn't expect ZLUDA to provide any performance benefit over HIP either, since on AMD hardware HIP is just a pass-thru to MIOpen (AMD's equivalent to cuDNN), rocBLAS, etc.

Certhas6 months ago

They are focusing on HPC first. Which seems reasonable if your software stack is lacking. Look for sophisticated customers that can help build an ecosystem.

As I mentioned elsewhere, 25% of GPU compute on the Top 500 Supercomputer list is AMD. This all on the back of a card that came out only three years ago. We are very rapidly moving towards a situation where there are many, many high-performance developers that will target ROCm.

ametrau6 months ago

Is a top 500 super computer list a good way of measuring relevancy in the future?

latchkey6 months ago

No, it isn't. What is a better measure is to look at businesses like what I'm building (and others), where we take on the capex/opex risk around top end AMD products and bring them to the masses through bare metal rentals. Previously, these sorts of cards were only available to the Top 500.

llm_trw6 months ago

Yes it is, it's how cuda got it's dominance 10 years ago. Businesses don't release their source code, super computers are attached to labs and universities and have much better licenses for software, or publish papers about it.

nialv76 months ago

I am surprised that everybody seem to have forgotten the (in)famous Embrace, Extend and Extinguish strategy.

It's time for Open Source to be on the extinguishing side for once.

formerly_proven6 months ago

> I'm really rooting for AMD to break the CUDA monopoly. To this end, I genuinely don't know whether a translation layer is a good thing or not. On the upside it makes the hardware much more viable instantly and will boost adoption, on the downside you run the risk that devs will never support ROCm, because you can just use the translation layer.

On the other hand:

> The next major ROCm release (ROCm 6.0) will not be backward [source] compatible with the ROCm 5 series.

Even worse, not even the driver is backwards-compatible:

> There are some known limitations though like currently only targeting the ROCm 5.x API and not the newly-released ROCm 6.x releases.. In turn having to stick to ROCm 5.7 series as the latest means that using the ROCm DKMS modules don't build against the Linux 6.5 kernel now shipped by Ubuntu 22.04 LTS HWE stacks, for example. Hopefully there will be enough community support to see ZLUDA ported to ROCM 6 so at least it can be maintained with current software releases.

bntyhntr6 months ago

I would love to be able to have a native stable diffusion experience, my rx 580 takes 30s to generate a single image. But it does work after following https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki...

I got this up and running on my windows machine in short order and I don't even know what stable diffusion is.

But again, it would be nice to have first class support to locally participate in the fun.

Cu3PO426 months ago

I have heard that DirectML was a somewhat easier story, but allegedly has worse performance (and obviously it's Windows only...). But I'm not entirely suprised that setup is somewhat easier on Windows, where bundling everything is an accepted approach.

With AMD's official 15GB(!) Docker image, I was now able to get the A1111 UI running. With SD 1.5 and 30 sample iterations, generating an image takes under 2s. I'm still struggling to get InvokeAI running.

washadjeffmad6 months ago

That has to include the model(s), no?

Also, nothing is easier on Windows. It's a wonder that anything works there, except for the power of recalcitrance.

Not dogging Windows users, but once your brain heals, it just can't go back.

Cu3PO425 months ago

It actually doesn't include the models! The image is Ubuntu with ROCm and a number of ML libraries, such as Torch, preinstalled.

> Also, nothing is easier on Windows.

As much as I, too, dislike Windows, I still have to disagree. I have encountered (proprietary) software which was much easier to get working on Windows. For example, Cisco AnyConnect with SmartCard authentication has been a nightmare for me on Linux.

whywhywhywhy6 months ago

> I'm really rooting for AMD to break the CUDA monopoly

Personally I want Nvidia to break the x86-64 monopoly, with how amazing properly spec'd Nvidia cards are to work with I can only dream of a world where Nvidia is my CPU too.

weebull6 months ago

> Personally I want Nvidia to break the x86-64 monopoly

The one supplied by two companies?

Keyframe6 months ago

Maybe he meant homogeneity which Nvidia did try and tries with Arm.. but, on the other hand how wild would it be for Nvidia to enter x86-64 as well? It's probably never going to happen due to licensing if nothing else, lest we remember nForce chipset ordeal with intel legal.

weebull5 months ago

Indeed, but I think people forget that the reason AMD have a license in the first place was because Intel's customers in the early days required a second source for it's processors.

Who owns the Cyrix x86 license these days?

paulmd6 months ago

"minor spelling/terminology mistake, activate the post-o-tron"

mickael-kerjean6 months ago

How would this a good idea? I am not very familiar with GPU programming but the small amount I've tried was nothing but pain a few years ago on linux, it was so bad that Torvald publicly used the f word in a very public event. That aside, CUDA seem like a great way to lock people in even further like AWS does with absolutely everything

whywhywhywhy5 months ago

>CUDA seem like a great way to lock people in even further like AWS does with absolutely everything

Lock people in to something that didn’t exist in a way any user could use before it existed? I get people hate CUDAs dominance but no one else was pushing this before CUDA and Apple+AMD completely fumbled OpenCL.

Can’t hate on something good just because it’s successful and I can’t be angry the talent behind the success wanting to profit.

Qwertious5 months ago

>I am not very familiar with GPU programming but the small amount I've tried was nothing but pain a few years ago on linux, it was so bad that Torvald publicly used the f word in a very public event.

I'm pretty sure Torvalds was giving the finger over the subject of GPU drivers (which run on the CPU), not programming on the Nvidia GPU itself. Particularly, they namedropped Bumblebee (and maybe Optimus?) which was more about power-management and making Nvidia cooperate with a non-Nvidia integrated GPU than it was about the Nvidia GPU itself.

smcleod6 months ago

That’s already been done with ARM.

kuschkufan6 months ago

apt username

bavell5 months ago

Try with ComfyUI... works great and easy setup on my 6750XT. I've had it working for about a year now with SD, LlamaCpp and WhisperCpp.

mtrower5 months ago

This is the exact reason* I bought a 4090 for my recent rebuild instead of the rDNA card I actually wanted. I really wanted to go with AMD for the driver integration with the Linux graphics stack —- I’m so, so tired of shenanigans when it comes to decades old features of X not working or working poorly due to some nvidia bug/non-integration.

But being able to leverage my graphics card for GPGPU was a top priority for me, and like you, I was appalled with the ROCm situation. Not necessarily the tech itself (though I did not enjoy the docker approach), but more the developer situation surrounding it.

* well, that and some vague notions about RTX

fariszr6 months ago

> after the CUDA back-end was around for years and after dropping OpenCL, Blender did add a Radeon HIP back-end... But the real kicker here is that using ZLUDA + CUDA back-end was slightly faster than the native Radeon HIP backend.

This is absolutely crazy.

toxik6 months ago

Is AMD just a puppet org to placate antitrust fears? Why are they like this?

swozey6 months ago

Is this really a theory? If so my $8 AMD stock from, 2015? is currently worth $176 so they should make more shell companies they're doing great.

I guess that might answer my "Why would AMD find that having a CUDA competitor isn't a business case unless they couldn't do it or the cards underperformed significantly."

kllrnohj6 months ago

For some reason AMD's GPU division continues to be run, well, horribly. The CPU division is crushing it, but the GPU division is comically bad. During the great GPU shortage AMD had multiple opportunities to capture chunks of the market and secure market share, increasing the priority for developers to acknowledge and target AMD's GPUs. What did they do instead? Not a goddamn thing, they followed Nvidia's pricing and managed to sell jack shit (like seriously the RX 580 is still the first AMD card to show up on the steam hardware survey).

They're not going big enough dies at the top end to compete with nvidia for the halo, and they're refusing to undercut at the low end where nvidia's reputation for absurd pricing is at an all time high. AMD's GPU division is a clown show, it's impressively bad. Even though the hardware itself is fine they just can't stop either making terrible product launches, awful pricing strategies, or just brain dead software choices like shipping a feature that triggered anti-cheat, getting their customers predictably banned & angering game devs in the process

And relevant to this discussion Nvidia's refusal to add VRAM to their lower end cards is a prime opportunity for AMD to go after the lower-end compute / AI interested crowd who will become the next generation software devs. What are they doing with this? Well, they're not making ROCm available to basically anyone, that's apparently the winning strategy. ROCm 6.0 only supports the 7900 XTX and the... Radeon VII. The weird one-off Vega 20 refresh. Of all the random cards to support, why the hell would you pick that one???

swozey6 months ago

> The (AMD) CPU division is crushing it

I worked at a baremetal CDN with 60 pops and a few years ago we had to switch to AMD because of PCIE bandwidth over to our smartNICs and nvmeOF sort of things. We'd long hit limits on Intel before the Epyc stuff came out so we had to have more servers running than we wanted because we had to limit how much we did with one server to not hit the limits and cause everything to lock.

And we were excited, not a single apprehension. Epyc crushed the server market, everyone is using them. Well, it's going ARM now but Epyc will still be around awhile.

+2
acchow5 months ago
dehrmann6 months ago

Like Mozilla?

lambdaone6 months ago

It seems to me that AMD are crazy to stop funding this. CUDA-on-ROCm breaks NVIDIA's moat, and would also act as a disincentive for NVIDIA to make breaking changes to CUDA; what more could AMD want?

When you're #1, you can go all-in on your own proprietary stack, knowing that network effects will drive your market share higher and higher for you for free.

When you're #2, you need to follow de-facto standards and work on creating and following truly open ones, and try to compete on actual value, rather than rent-seeking. AMD of all companies should know this.

RamRodification6 months ago

> and would also act as a disincentive for NVIDIA to make breaking changes to CUDA

I don't know about that. You could kinda argue the opposite. "We improved CUDA. Oh it stopped working for you on AMD hardware? Too bad. Buy Nvidia next time"

freeone30006 months ago

Most CUDA applications do not target the newest CUDA version! Despite 12.1 being out, lots of code still targets 7 or 8 to support old NVIDIA cards. Similar support for AMD isn’t unthinkable (but a rewrite to rocm would be).

lambdaone6 months ago

That's exactly the point I was making above.

mnau6 months ago

Also known as OS/2: Redux strategy.

outside4156 months ago

NVIDIA is about ecosystem plays, they have no interest in sabotage or anti competition plays. Leave that to apple and google and their dumb app stores and mobile OSs.

0x4576 months ago

> NVIDIA is about ecosystem plays, they have no interest in sabotage or anti competition plays.

Are we talking about the same NVIDIA? The entire Nvidia GPU strategy for nvidia is - make a feature (or find existing one) that performs better on their cards - pay developers to use (and sometimes misuse) it extensively.

tester7566 months ago

If you see:

1) billions of dollar at the stake

2) one of the most successful leadership

3) during hottest peroid of their business where they heard about Nvidia's moat probably thousands of times during last 18 months...

and you call some decision "crazy", then you probably do not have the same informations that they do

or they underperformed, who knows, but I bet on #1 reason.

Eisenstein6 months ago

The 'crazy' decision is them slowly abandoning the PC gaming market which is where consumers get these cards, and focusing on the 'client' market to sell their 'Insight' datacenter/AI cards. I think the parent you are responding to isn't questioning why it is a bad 'make money now' profit decision but why it is a bad 'get people to use your system' decision.

"AMD’s client segment, mostly chips for PCs and laptops, rose 62% year over year to $1.46 billion in sales, thanks to recent chip launches.

Sales in AMD’s gaming segment, which includes “semi-custom” processors for Microsoft Xbox and Sony PlayStation consoles, fell 17%. "

* https://www.cnbc.com/2024/01/30/amd-earnings-report-q4-2024....

saboot6 months ago

Yep, I develop several applications that use CUDA. I see AMD/Radeon powered computers for sale and want to buy one, but I am not going to risk not being able to run those applications or having to rewrite them.

If they want me as a customer, and they have not created a viable alternative to CUDA, they need to pursue this.

weebull6 months ago

Define "viable"?

croutons6 months ago

A backend that runs PyTorch out of the box and is as easy to setup / use as nvidia stack.

weebull5 months ago

Installing PyTorch with the PyTorch website instructions for AMD was pretty painless for me on Linux. I know everybodies experience is different, but install wasn't the issue for me.

For me the issue on AMD was stability in situations when VRAM was getting tight.

AndrewKemendo6 months ago

ROCm is not spelled out anywhere in their documentation and the best answers in search come from Github and not AMD official documents

"Radeon Open Compute Platform"

https://github.com/ROCm/ROCm/issues/1628

And they wonder why they are losing. Branding absolutely matters.

sorenjan6 months ago

Funnily enough it doesn't work on their RDNA ("Radeon DNA") hardware (with some exceptions I think), but it's aimed at their CDNA (Compute DNA). If they would come up with a new name today it probably wouldn't include Radeon.

AMD seems to be a firm believer in separating the consumer chips for gaming and the compute chips for everything else. This probably makes a lot of sense from a chip design and current business perspective, but I think it's shortsighted and a bad idea. GPUs are very competent compute devices, and basically wasting all that performance for "only" gaming is strange to me. AI and other compute is getting more and more important for things like image and video processing, language models, etc. Not only for regular consumers, but for enthusiasts and developers it makes a lot of sense to be able to use your 10 TFLOPS chip even when you're not gaming.

While reading through the AMD CDNA whitepaper I saw this and got a good chuckle. "culmination of years of effort by AMD" indeed.

> The computational resources offered by the AMD CDNA family are nothing short of astounding. However, the key to heterogeneous computing is a software stack and ecosystem that easily puts these abilities into the hands of software developers and customers. The AMD ROCm 4.0 software stack is the culmination of years of effort by AMD to provide an open, standards-based, low-friction ecosystem that enables productivity creating portable and efficient high-performance applications for both first- and third-party developers.

https://www.amd.com/content/dam/amd/en/documents/instinct-bu...

slavik816 months ago

ROCm works fine on the RDNA cards. On Ubuntu 23.10 and Debian Sid, the system packages for the ROCm math libraries have been built to run on every discrete Vega, RDNA 1, RDNA 2, CDNA 1, and CDNA 2 GPU. I've manually tested dozens of cards and every single one worked. There were just a handful of bugs in a couple of the libraries that could easily be fixed by a motivated individual. https://slerp.xyz/rocm/logs/full/

The system package for HIP on Debian has been stuck on ROCm 5.2 / clang-15 for a while, but once I get it updated to ROCm 5.7 / clang-17, I expect that all discrete RDNA 3 GPUs will work.

stonogo6 months ago

It doesn't matter to my lab whether it technically runs. According to https://rocm.docs.amd.com/projects/install-on-linux/en/lates... it only supports three commercially-available Radeon cards (and four available Radeon Pro) on Linux. Contrast this to CUDA, which supports literally every nVIDIA card in the building, including the crappy NVS series and weirdo laptop GPUs, and it basically becomes impossible to convince anyone to develop for ROCm.

phh6 months ago

I have no idea what CUDA stands for, and I live just fine without knowing it.

rvnx6 months ago

Countless Updates Developer Agony

hyperbovine6 months ago

Lost five hours of my life yesterday discovering the fact that "CUDA 12.3" != "CUDA 12.3 Update 2".

(Yes, that's obvious, but not so obvious when your GPU applications submitted to a cluster start crashing randomly for no apparent reason.)

egorfine6 months ago

This is the right definition.

moffkalast6 months ago

Cleverly Undermining Disorganized AMD

smokel6 months ago

Compute Unified Device Architecture [1]

[1] https://en.wikipedia.org/wiki/CUDA

alfalfasprout6 months ago

Crap, updates destroyed (my) application

rtavares6 months ago

Later in the same thread:

> ROCm is a brand name for ROCm™ open software platform (for software) or the ROCm™ open platform ecosystem (includes hardware like FPGAs or other CPU architectures).

> Note, ROCm no longer functions as an acronym.

ametrau6 months ago

>> Note, ROCm no longer functions as an acronym.

That is really dumb. Like LLVM.

atq21196 months ago

My understanding is that there was some trademark silliness around "open compute", and AMD decided that instead of doing a full rebrand, they would stick to ROCm but pretend that it wasn't ever an acronym.

michaellarabel6 months ago

Yeah it was due to the Open Compute Project AFAIK... Though for a little while AMD was telling me they really meant to call it "Radeon Open eCosystem" before then dropping that too with many still using the original name.

Farfignoggen6 months ago

Lisa Su in a later presentation/event announced that ROCM is no longer an acronym! So Radeon Open CoMpute is no longer the definition there! But ROCm/HIP and CUDA/CUDA Tools, and OneAPI/Level-0 are essentially the same coverage/scope for AMD, Nvidia, Intel respectively as far as GPU Compute API support goes and HPC/Accelerator workloads as well.

So there's a YouTube Video from some Supercomputer conference where the presenter goes over the support Matrix info for ROCm/HIP, CUDA/CUDA Tools, and OneAPI/Level-0 and they are similar in scope there.

marcus0x626 months ago

That, and it only runs on a handful of their GPUs.

NekkoDroid6 months ago

If you are talking about the "supported" list of GPUs, those listed are only the ones they fully validate and QA test, other of same gen are likely to work, but most likely with some bumps along the way. In one of the a bit older phoronix posts about ROCm one of their engeneers did say they are trying to expand the list of validated & QA'd cards, as well as destinguishing between "validated", "supported" and "non-functional"

machomaster6 months ago

They can say whatever but the action is what matters, not wishes and promises. And the reality is that list of supported GPUs has been unchanged since they first announced it a year ago.

slavik816 months ago

That is intentional. We had to change the name. ROCm is no longer an acronym.

AndrewKemendo6 months ago

I assume you’re on the team if you’re saying “we”

Can you say why you had to change the name?

alwayslikethis6 months ago

I mean, I also had to look up what CUDA stands for.

mjcohen6 months ago

Can't Use Devices (by) AMD

hasmanean6 months ago

Compute unified device architecture ?

swozey6 months ago

I may have missed it in the article, but this post would mean absolutely nothing to me except for the fact that last week I got into stable diffusion so I'm crushing my 4090 with pytorch and deepspeed, etc and dealing with a lot of nvidia ctk/sdk stuff. Well, I'm actually trying to do this in windows w/ wsl2 and deepmind/torch/etc in containers and it's completely broken so not crushing currently.

I guess awhile ago it was found that Nvidia was bypassing the kernels GPL license driver check and I read that kernel 6.6 was going to lock that driver out if they didn't fix it, and from what I've read there was no reply or anything done by nvidia yet. Which I think I probably just can't find.

Am I wrong about that part?

We're on kernel 6.7.4 now and I'm still using the same drivers. Did it get pushed back, did nvidia fix it?

Also, while trying to find answers myself I came across this 21 year old post which is pretty funny and very apt for the topic https://linux-kernel.vger.kernel.narkive.com/eVHsVP1e/why-is...

I'm seeing conflicting info all over the place so I'm not really sure what the status of this GPL nvidia driver block thing is.

bongodongobob6 months ago

Just run that stuff in Linux, you're making it harder than it needs to be. Can it work in Windows? Sure. Is there as much documentation? Not even close.

wheybags6 months ago

Cannot understand why AMD would stop funding this. It seems like this should have a whole team allocated to it.

otoburb6 months ago

They would always be at the mercy of NVIDIA's API. Without knowing the inner workings, perhaps a major concern with this approach is the need to implement on NVIDIA's schedule instead of AMD's which is a very reactive stance.

This approach actually would make sense if AMD felt, like most of us perhaps, that the NVIDIA ecosystem is too entrenched, but perhaps they made the decision recently to discontinue funding because they (now?) feel otherwise.

blagie6 months ago

They've been at mercy of Intel x86 APIs for a long time. Didn't kill them.

What happens here is that the original vendor loses control of the API once there are multiple implementations. That's the best possible outcome for AMD.

In either case, they have a limited window to be adopted, and that's more important. The abstraction layer here helps too. AMD code is !@#$%. If this were adopted, it makes it easier to fix things underneath. All that is a lot more important than a dream of disrupting CUDA.

lambdaone6 months ago

More than that, a second implementation of CUDA acts as a disincentive for NVIDIA to make breaking changes to it, since it would reduce any incentive for software developers to follow those changes, as it reduces the value of their software by eliminating hardware choice for end-users (which in some case like large companies are also the developers themselves).

At the same time, open source projects can be pretty nimble in chasing things like changing APIs, potentially frustrating the effectiveness of API pivoting by NVIDIA in a second way.

tikkabhuna6 months ago

My understanding is that with AMD64 there's a circular dependency where AMD need Intel for x86 and Intel need AMD for x86_64?

monocasa6 months ago

That's true now, but AMD has been making x86 compatible CPUs since the original 8086.

rubatuga6 months ago

x86 is not the same, the courts forced the release of x86 architecture to AMD during an antitrust lawsuit

+2
anon2916 months ago
hardware2win6 months ago

You think x86 would be changed in such a way that it'd break and?

Because what else?

If so, then i think that this is crazy because software is harder to change than hardware

visarga6 months ago

> They would always be at the mercy of NVIDIA's API.

They only need to support PyTorch. Not CUDA

viraptor5 months ago

AMD should be able to pay for two teams. They had a decent growth period recently. Doing one doesn't stop them from doing the other.

Farfignoggen6 months ago

Phoronix Article from earlier(1):

"While AMD ships pre-built ROCm/HIP stacks for the major enterprise Linux distributions, if you are using not one of them or just want to be adventurous and compile your own stack for building HIP programs for running on AMD GPUs, one of the AMD Linux developers has written a how-to guide. "(1)

(1)

"Building An AMD HIP Stack From Upstream Open-Source Code

Written by Michael Larabel in Radeon on 9 February 2024 at 06:45 AM EST."

https://www.phoronix.com/news/Building-Upstream-HIP-Stack

rekado6 months ago

You can also install HIP/ROCm via Guix:

https://hpc.guix.info/blog/2024/01/hip-and-rocm-come-to-guix...

> AMD has just contributed 100+ Guix packages adding several versions of the whole HIP and ROCm stack

JonChesterfield6 months ago

Hähnle is one of our best, that'll be solid. http://nhaehnle.blogspot.com/2024/02/building-hip-environmen.... Looks pretty similar to how I build it.

Side point, there's a driver in your linux kernel already that'll probably work. The driver that ships with rocm is a newer version of the same and might be worth building via dkms.

Very strange that the rocm github doesn't have build scripts but whatever, I've been trying to get people to publish those for almost five years now and it just doesn't seem to be feasible.

Farfignoggen6 months ago

From the Phoronix comments section of the Article that I linked to:

https://www.phoronix.com/forums/forum/linux-graphics-x-org-d...

And I'm on Linux Mint 21.3 and so how to change any instillation script to think that Mint is Ubuntu to get that to maybe work there but there's no how-to for Mint like the one that AMD provides for Ubuntu! And really that's compiled By AMD for the specific Linux Kernel so not any DKMS sort of methods there AFAIK! but I'm no Linux Expert and just want some one-click install or that to ship with the Distro already working so Blender 3D's iGPU/dGPU accelerated Cycles rendering is possible on AMD Radeon consumer GPUs.

JonChesterfield6 months ago

If you wait for a while then Debian packaging effort will flow down through the repos and you'll have apt install or synaptic GUI access to the prebuilt binaries.

You've already got an amdgpu driver in your kernel. Possibly an old one but it'll be there. ROCm is userspace.

Farfignoggen6 months ago

I'm waiting for Rusticl(OpenCL Implemented in the Rust programming language) as part of the MESA driver stack to get enabled in Mint 22(Sometime in April 2024). And maybe I can use the Older Legacy Blender 2.93/earlier editions that use OpenCl as the GPU compute API instead of Blender 3.0/later editions that have dropped support for OpenCL in favor of CUDA/PTX(Whatever) that requires the HIP part of ROCm to get translated to a form that can be executed on Radeon GPU hardware.

CUDA PTX is that Intermediate Language representation that's portable for cross platform usage but I'm not exactly sure how that is implemented for Blender 3.0/later.

P.S I have Ryzen 3000/Zen+ series APUs and Vega Integrated Graphics on 2 systems and the laptop has Ryzen 3550H/Vega 8CU iGPU and Polaris Radeon RX560X dGPU while the Mini Desktop PC has Ryzen 3400G/Vega 11CU iGPU only.

hd46 months ago
MegaDeKay6 months ago

Latest commit message: "Nobody expects the Red Team"

CapsAdmin6 months ago

One thing I didn't see mentioned anywhere apart from the repos readme:

> PyTorch received very little testing. ZLUDA's coverage of cuDNN APIs is very minimal (just enough to run ResNet-50) and realistically you won't get much running.

throwaway20375 months ago

From the same repo, I found this excellent, well-written architecture document: https://github.com/vosen/ZLUDA/blob/master/ARCHITECTURE.md

I love the direct, "no bullshit" style of writing.

Some gems:

> Anyone familiar with C++ will instantly understand that compiling it is a complicated affair.

> Additionally CUDA allows, to a large degree, mixing CPU code and GPU code. What does all this complexity mean for ZLUDA? Absolutely nothing

> Since an application can dynamically link to either Driver API or Runtime API, it would seem that ZLUDA needs to provide both. In reality very few applications dynamically link to Runtime API. For the vast majority of applications it's sufficient to provide Driver API for dynamic (runtime) linking.

miduil6 months ago

Wow, this is great news. I really hope that the community will find ways to sustainable fund this project, being suddenly run a lot of innovative CUDA based projects on AMD GPUs is a big game-changer, especially because you don't have to deal with the poor state of nvidia on linux support.

sam_goody6 months ago

I don't really follow this, but isn't it a bad sign for ROCm that, for example, ZLUDA + Blender 4's CUDA back-end delivers better performance than the native Radeon HIP back-end?

whizzter6 months ago

Could be that the CUDA backend has seen far more specialization optimizations whereas the seeingly fairly fresh HIP backend hasn't had as many developers looking at it, in the end a few more control instructions on the CPU side to go through the ZLUDA wrapper will be insignificant compared to all the time spent inside better optimized GPU kernels.

fariszr6 months ago

It really shows how neglected their software stack is, or at least how neglected this implementation is.

mdre6 months ago

I'd say it's even worse, since for rendering Optix is like 30% faster than CUDA. But that requires the tensor cores. At this point AMD is waaay behind hardware wise.

KeplerBoy6 months ago

Surely this can be attributed to Blender's HIP code just being suboptimal because nobody really cares about it. By extension nobody cares about it because performance is suboptimal.

It's AMDs job to break that circle.

hd46 months ago

The interest in this thread tells me there are a lot of people who are not cool with the CUDA monopoly.

smoldesu6 months ago

Those people should have spoken up when their hardware manufacturers abandoned OpenCL. The industry set itself 5-10 years behind by ignoring open GPGPU compute drivers while Nvidia slowly built their empire. Just look at how long it's taken to re-impliment a fraction of the CUDA featureset on a small handful of hardware.

CUDA shouldn't exist. We should have hardware manufacturers working together, using common APIs and standardizing instead of going for the throat. The further platforms drift apart, the more valuable Nvidia's vertical integration becomes.

mnau6 months ago

Common API means being replaceable, fungible. There are no margins in that.

smoldesu6 months ago

Correct. It's why the concept of 'proprietary UNIX' didn't survive long once program portability became an incentive.

Avamander6 months ago

Is my impression wrong, that people understood the need for OCL only after CUDA had already cornered and strangled the market?

pjmlp5 months ago

I attended a Webminar from Khronos were no one in the panel understood why the research community would want anything beyond C to program GPUs.

Meanwhile NVidia was adding C++, Fortran, PTX, supporting other programming language communities trying to target GPUS (Java, .NET, Haskell,..).

Making it as easy to debug GPUs as modern graphical debuggers for CPUs, building libraries,...

Intel, and AMD together with Khronos did this to themselves.

smoldesu6 months ago

You're mostly right. CUDA was a "sleeper product" that existed early-on but didn't see serious demand until later. OpenCL was Khronos Group's hedged bet against the success of CUDA; it was assumed that they would invest in it more as demand for GPGPU increased. After 10 years though, OpenCL wasn't really positioned to compete and CUDA was more fully-featured than ever. Adding insult to injury, OS manufacturers like Microsoft and Apple started to avoid standardized GPU libraries in favor of more insular native APIs. By the time demand for CUDA materialized, OpenCL had already been left for dead by most of the involved parties.

Qwertious5 months ago

AMD's budget isn't measured in people, it's measured in dollars. HN commenters don't necessarily decide how many graphics cards their business will buy.

codedokode6 months ago

As I understand, Vulkan allows to run custom code on GPU, including the code to multiply matrices. Can one simply use Vulkan and ignore CUDA, PyTorch and ROCm?

PeterisP5 months ago

You probably can, but why would you? The main (only?) reason to ignore the CUDA-based stack is so that you could save a bit of money by using some other hardware instead of nVidia. So the amount of engineering labor/costs you should be willing to accept is directly tied to how much hardware you intend to buy or rent and what % discount, if any, the alternative hardware enables compared to nVidia.

So if you'd want to ignore CUDA+PyTorch and reimplement all of what you need on top of Vulkan.... well, that becomes worthy of discussion only if you expect to spend a lot on hardware, if you really consider that savings on hardware can recoup many engineer-years of costs - otherwise it's more effective to just go with the flow.

Const-me6 months ago

I did a few times with Direct3D 11 compute shaders. Here’s an open-source example: https://github.com/Const-me/Cgml

Pretty sure Vulkan gonna work equally well, at the very least there’s an open source DXVK project which implements D3D11 on top of Vulkan.

sorenjan5 months ago

ncnn uses Vulkan for GPU acceleration, I've seen it used in a few projects to get AMD hardware support.

https://github.com/Tencent/ncnn

0xDEADFED55 months ago

there's a pretty cool Vulkan LLM engine here for example:

https://github.com/mlc-ai/mlc-llm

eddiewithzato6 months ago

of course, but then you are just recreating CUDA. And that won’t scale well across an industry since each company would have their own language. AMD can just do what you are describing and then sell it as a standard.

I mean they literally did that, but then dropped it so yea

sharts6 months ago

AMD fail to realize software toolchain is what makes nvidia great. AMD thinks the hardware is all that’s needed

JonChesterfield6 months ago

Nvidia's toolchain is really not great. Applications are just written to step around the bugs.

ROCm has different bugs, which the application workarounds tend to miss.

bornfreddy6 months ago

Yes. This is what makes Nvidia's toolchain, if not great, at least ok. As a developer I can actually use their GPUs. And what I developed locally I can yhen run on Nvidia hardware in the cloud and pay by usage.

AMD doesn't seem to understand that affordable entry-level hardware with good software support is key.

JonChesterfield6 months ago

Ah yes, so that one does seem to be a stumbling block. ROCm is not remotely convinced that running on gaming cards is a particularly useful thing. HN is really sure that being able to develop code on ~free cards that you've got lying around anyway is an important gateway to running on amdgpu.

The sad thing is people can absolutely run ROCm on gaming cards if they build from source. Weirdly GPU programmers seem determined to use proprietary binaries to run "supported" hardware, and thus stick with CUDA.

I don't understand why AMD won't write the names of some graphics cards under "supported", even if they didn't test them as carefully as the MI series, and I don't understand why developers are so opposed to compiling their toolchains from source. For one thing it means you can't debug the toolchain effectively when it falls over, weird limitation to inflict on oneself.

Strange world.

sorenjan5 months ago

Maybe someone that's trying to use a GPU to solve a particular problem doesn't necessarily also have the time, energy, knowledge, or interest to also first 1) find out how to construct the toolchain, 2) build said toolchain, and 3) debug the toolchain. Just because you're a programmer writing code for physics simulations, image processing, AI models, or whatever, doesn't mean you also want to spend hours or days on getting your tools working before you can even start writing your own code. And then do it again when deploying on another computer.

And it's really not surprising that people, GPU programmers included, doesn't want to spend time and money on trying out unsupported hardware and software combinations when again, it's supposed to be a tool to get a job done. If I got some Phillips head screws I'm not reaching for a flat head screwdriwer even though it probably will work, and if it's the only thing I have I'll buy some Phillips head ones for the next project.

pjmlp5 months ago

Strage world is expecting we are all Gentoo users, that like to build our software like making fire with stones and sticks.

Then people act surprised CUDA was won the hearts of the scientific developer community, that rather spend their time actually doing research work.

zoobab6 months ago

"For reasons unknown to me, AMD decided this year to discontinue funding the effort and not release it as any software product."

Managers at AMD never heard of AI?

navbaker6 months ago

The other big need is for a straightforward library for dynamic allocation/sharing of GPUs. Bitfusion was a huge pain in the ass, but at least it was something. Now it’s been discontinued, the last version doesn’t support any recent versions of PyTorch, and there’s only two(?) possible replacements in varying levels of readiness (Juice and RunAI). We’re experimenting now with replacing our Bitfusion installs with a combination of Jupyter Enterprise Gateway and either MIGed GPUs or finding a way to get JEG to talk to a RunAI installation to allow quick allocation and deallocation of portions of GPUs for our researchers.

CapsAdmin6 months ago

Hope this can benefit from the seemingly infinite enthusiasm from rust programmers

michalf66 months ago

Złuda roughly means "delusion" / "mirage" / "illusion" in Polish, given the author is called Andrzej Janik this may be a pun :)

rvba6 months ago

Arguably one could also translate it as "something that will never happen".

At the same time "cuda" could be translated as "wonders".

ultra_nick6 months ago

If anyone wants to work in this area, AMD currently has a lot of related job posts open.

yieldcrv6 months ago

Sam could get more chips for way less than $7 trillion if he helps fund and mature this

JonChesterfield6 months ago

I'm pretty tired of the business model of raising capital from VCs to give to Nvidia.

TheMagicHorsey6 months ago

I'm hoping something like Mojo makes it so the CUDA lock-in fades over time. Its never a good thing to have one company, like Nvidia, have a stranglehold on the hardware the whole ecosystem evolves on.

shmerl6 months ago

Anything that breaks CUDA lock-in is great! This reminds how DX/D3D lock-in was broken by dxvk and vkd3d-proton.

> It apparently came down to an AMD business decision to discontinue the effort

Bad decision if that's the case. May be someone can pick it up, since it's open now.

mdre6 months ago

Fun fact: ZLUDA means something like illusion/delusion/figment. Well played! (I see the main dev is from Poland.)

Detrytus6 months ago

You should also mention that CUDA in Polish means "miracles" (plural).

btown6 months ago

Why would this not be AMD’s top priority among priorities? Someone recently likened the situation to an Iron Age where NVIDIA owns all the iron. And this sounds like AMD knowing about a new source of ore and not even being willing to sink a single engineer’s salary into exploration.

My only guess is they have a parallel skunkworks working on the same thing, but in a way that they can keep it closed-source - that this was a hedge they think they no longer need, and they are missing the forest for the trees on the benefits of cross-pollination and open source ethos to their business.

hjabird6 months ago

The problem with effectively supporting CUDA is that encourages CUDA adoption all the more strongly. Meanwhile, AMD will always be playing catch-up, forever having to patch issues, work around Nvidia/AMD differences, and accept the performance penalty that comes from having code optimised for another vendor's hardware. AMD needs to encourage developers to use their own ecosystem or an open standard.

jvanderbot6 months ago

If you replace CUDA -> x86 and NVIDIA -> Intel, you'll see a familiar story which AMD has already proved it can work through.

These were precisely the arguments for 'x86 will entrench Intel for all time', and we've seen AMD succeed at that game just fine.

ianlevesque6 months ago

And indeed more than succeed, they invented x86_64.

+1
sangnoir6 months ago
+2
stcredzero6 months ago
clhodapp6 months ago

If that's the model, it sounds like the path would be to burn money to stay right behind NVIDIA and wait for them to become complacent and stumble technically, creating the opportunity to leapfrog them. Keeping up could be very expensive if they don't force something like the mutual licensing requirements around x86.

HarHarVeryFunny5 months ago

The difference is that AMD's CPUs are designed to implement the x86 and x86-64 ISA, so there is no loss of performance. In contrast, AMD and NVIDA's GPU instruction sets and architectures are not the same, and to get top performance out of these architectures code needs to be customized for them.

If you slap a CUDA compatibility layer on top of AMD, then CUDA code optimized for NVIDIA chips would run, but would suffer a performance penalty compared to code that was customized/tuned for AMD, so unless AMD GPUs were sold cheap enough (i.e. with low profit margin) to mitigate this loss of performance you might as well buy NVIDIA in the first place.

ethbr16 months ago

> These were precisely the arguments for 'x86 will entrench Intel for all time', and we've seen AMD succeed at that game just fine.

... after a couple decades of legal proceedings and a looming FTC monopoly case convinced Intel to throw in the towel, cross-license, and compete more fairly with AMD.

https://jolt.law.harvard.edu/digest/intel-and-amd-settlement

AMD didn't just magically do it on its own.

samstave6 months ago

Transmetta was Intels boogey-man in the 90s.

hjabird6 months ago

There are some great replies to my comment - my original comment was too reductive. However, I still think that entrenching CUDA as the de-facto language for heterogeneous computing is a mistake. We need an open ecosystem for AI and HPC, where vendors compete on producing the best hardware.

ethbr16 months ago

The problem with open standards is that someone has to write them.

And that someone usually isn't a manufacturer, lest the committee be accused of bias.

Consequently, you get (a) outdated features that SotA has already moved beyond, (b) designed in a way that doesn't correspond to actual practice, and (c) that are overly generalized.

There are some notable exceptions (e.g. IETF), but the general rule has been that open specs please no one, slowly.

IMHO, FRAND and liberal cross-licensing produce better results.

+1
jchw6 months ago
bick_nyers6 months ago

The latest version of CUDA is 12.3, and version 12.2 came out 6 months prior. How many people are running an older version of CUDA right now on NVIDIA hardware for whatever particular reason?

Even if AMD lagged support on CUDA versioning, I think it would be widely accepted if the performance per dollar at certain price points was better.

Taking the whole market from NVIDIA is not really an option, it's better to attack certain price points and niches and then expand from there. The CUDA ship sailed a long time ago in my view.

swozey6 months ago

I just went through this this weekend - If you're running in Windows and want to use deepspeed, you have to still use Cuda 12.1 because deepspeed 13.1 is the latest that works with 12.1. There's no deepspeed for windows that works with 12.3.

I tried to get it working this weekend but it was a huge PITA so I switched to putting everything into WSL2 then in arch on there pytorch etc in containers so I could flip versions easily now that I know how SPECIFIC the versions are to one another.

I'm still working on that part, halfway into it my WSL2 completely broke and I had to reinstall windows. I'm scared to mount the vhdx right now. I did ALL of my work and ALL of my documentation is inside of the WSL2 archlinux and NOT on my windows machine. I have EVERYTHING I need to quickly put another server up (dotfiles, configs) sitting in a chezmoi git repo ON THE VM. That I only git committed one init like 5 mins into everything. THAT was a learning experience, now I have no idea if I should follow the "best practice" of keeping projects in wsl or having wsl reach out to windows, there's a performance drop. The 9p networking stopped working and no matter what I reinstalled, reset, removed features, reset windows, etc, it wouldn't start. But at least I have that WSL2 .vhdx image that will hopefully mount and start. And probably break WSL2 again. I even SPECIFICALLY took backups of the image as tarballs every hour in case I broke LINUX, not WSL.

If anyone has done sd containers in wsl2 already let me know. I've tried to use WSL for dev work (i use osx) like this 2-3 times in the last 4-5 years and I always run into some catastrophically broken thing that makes my WSL stop working. I hadn't used it in years so hoped it was super reliable by now. This is on 3 different desktops with completely different hardware, etc. I was terrified it would break this weekend and IT DID. At least I can be up in windows in 20 minutes thanks to chocolately and chezmoi. Wiped out my entire gaming desktop.

Sorry I'm venting now this was my entire weekend.

This repo is from a deepspeed contrib (iirc) and lists the reqs for deepspeed + windows that mention the version matches

https://github.com/S95Sedan/Deepspeed-Windows

> conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

It may sound weird to do any of this in Windows, or maybe not, but if it does just remember that it's a lot of gamers like me with 4090s who just want to learn ML stuff as a hobby. I have absolutely no idea what I'm doing but thank god I know containers and linux like the back of my hand.

+2
bick_nyers6 months ago
andrecarini5 months ago

> I'm scared to mount the vhdx right now [...] I have EVERYTHING I need [...] sitting in a chezmoi git repo ON THE VM.

You probably already know but just in case you don't: you can set up a Linux VM with VirtualBox on your Windows and then mount the vhdx (read-only) as an additional disk to extract the stuff you need via shared folders.

CoolCold5 months ago

> halfway into it my WSL2 completely broke and I had to reinstall windows

I'm curios, so WSL2 broke that you cannot even add new distros, remove broken distros? or Windows host itself became unstable?

carlossouza6 months ago

Great comment.

I bet there are at least two markets (or niches):

1. People who want the absolute best performance and the latest possible version and are willing to pay the premium for it;

2. People who want to trade performance by cost and accept working with not-the-latest versions.

In fact, I bet the market for (2) is much larger than (1).

bluedino6 months ago

> How many people are running an older version of CUDA right now on NVIDIA hardware for whatever particular reason?

I would guess there are lots of people still running CUDA 11. Older clusters, etc. A lot of that software doesn't get updated very often.

coderenegade6 months ago

Especially if you're deploying models. The latest version of onnx runtime still defaults to cuda 11.

kgeist6 months ago

Intel embraced Amd64 ditching Itanium. Wasn't it a good decision that worked out well? Is it comparable?

kllrnohj6 months ago

Intel & AMD have a cross-license agreement covering everything x86 (and x86_64) thanks to lots and lots of lawsuits over their many years of competition.

So while Intel had to bow to AMD's success and give up Itanium, they weren't then limited by that and could proceed to iterate on top of it.

Meanwhile it'll be a cold day in hell before Nvidia licenses anything about CUDA to AMD, much less allows AMD to iterate on top of it.

krab6 months ago

Isn't API out of scope for copyright? In the case of CUDA, it seems they can copy most of it and then iterate in their own, keeping a compatible subset.

+1
kevin_thibedeau6 months ago
teucris6 months ago

In hindsight, yes, but just because a specific technology is leading an industry doesn’t mean it’s going to be the best option. It has to play out long enough for the market to indicate a preference. In this case, for better or worse, it looks like CUDA’s the preference.

+1
diggan6 months ago
coldtea6 months ago

>The problem with effectively supporting CUDA is that encourages CUDA adoption all the more strongly

Worked fine for MS with Excel supporting Lotus 123 and Word supporting WordPerfect's formats when those were dominant...

Dork12346 months ago

Microsoft could do that because they had the Operating System monopoly to leverage and take out both Lotus 123 and WordPerfect. Without the monopoly of the operating system they wouldn't of been able to Embrace, Extend, Extinguish.

https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...

bell-cot6 months ago

But MS controlled the underlying OS. Letting them both throw money at the problem, and (by accounts at the time) frequently tweak the OS in ways that made life difficult for Lotus, WordPerfect, Ashton-Tate, etc.

+2
p_l6 months ago
slashdev6 months ago

With Nvidia controlling 90%+ of the market, this is not a viable option. They'd better lean hard into CUDA support if they want to be relevant.

cduzz6 months ago

A bit of story telling here:

IBM and Microsoft made OS/2. The first version worked on 286s and was stable but useless.

The second version worked only on 386s and was quite good, and even had wonderful windows 3.x compatibility. "Better windows than windows!"

At that point Microsoft wanted out of the deal and they wanted to make their newer version of windows, NT, which they did.

IBM now had a competitor to "new" windows and a very compatible version of "old" windows. Microsoft killed OS2 by a variety of ways (including just letting IBM be IBM) but also by making it very difficult for last month's version of OS/2 to run next month's bunch of Windows programs.

To bring this back to the point -- IBM vs Microsoft is akin to AMD vs Nvidia -- where nvidia has the standard that AMD is implementing, and so no matter what if you play in the backward compatibility realm you're always going to be playing catch-up and likely always in a position where winning is exceedingly hard.

As WOPR once said "interesting game; the only way to win is to not play."

+1
panick21_6 months ago
+1
foobiekr6 months ago
+1
incrudible6 months ago
neerajsi6 months ago

I'm not in the gpu programming realm, so this observation might be inaccurate:

I think the case of cuda vs an open standard is different from os2 vs Windows because the customers of cuda are programmers with access to source code while the customers of os2 were end users trying to run apps written by others.

If your shrink-wrapped software didn't run on os2, you'd have no choice but to go buy Windows. Otoh if your ai model doesn't run on an AMD device and the issue is something minor, you can edit the shader code.

andy_ppp6 months ago

When the alternative is failure I suppose you choose the least bad option. Nobody is betting the farm on ROCm!

hjabird6 months ago

True. This is the big advantage of an open standard instead jumping from one vendors walled garden to another.

mqus6 months ago

If their primary objective is to sell cards, then they should make it as easy as possible to switch cards.

If their primary objective is to break the CUDA monopoly, they should up their game in software, which means going as far as implementing support for their hardware in the most popular user apps themselves, if necessary. But since they don't seem to want to do that, they should really go for option one, especially if a single engineer already got so far.

Let's say AMD sold a lot of cards with CUDA support. Now nvidia tries to cut them off. What will happen next? A lot of people will replace their cards with nvidia ones. But a lot of the rest will try to make their expensive AMD cards work regardless. And if AMD provides a platform for that, they will get that work for free.

mindcrime6 months ago

Yep. This is very similar to the "catch-22" that IBM wound up in with OS/2 and the Windows API. On the one hand, by supporting Windows software on OS/2, they gave OS/2 customers access to a ready base of available, popular software. But in doing so, they also reduced the incentive for ISV's to produce OS/2 native software that could take advantage of unique features of OS/2.

It's a classic "between a rock and a hard place" scenario. Quite a conundrum.

ianlevesque6 months ago

Thinking about the highly adjacent graphics APIs history, did anyone really 'win' the Direct3D, OpenGL, Metal, Vulkan war? Are we benefiting from the fragmentation?

If the players in the space have naturally coalesced around one over the last decade, can we skip the thrashing and just go with it this time?

tadfisher6 months ago

The game engines won. Folks aren't building Direct3D or Vulkan renderers; they're using Unity or Unreal or Godot and clicking "export" to target whatever API makes sense for the platform.

WebGPU might be the thing that unifies the frontend API for folks writing cross-platform renderers, seeing as browsers will have to implement it on top of the platform APIs anyway.

pjmlp5 months ago

You missed the Nintendo and Sony APIs as well.

FOSS folks make this a bigger issue than it really is, game studios make a pluggable API on their engine and call it a day, move on into everything else that matters in actually delivering a game.

bachmeier6 months ago

> The problem with effectively supporting CUDA is that encourages CUDA adoption all the more strongly.

I'm curious about this. Sure some CUDA code has already been written. If something new comes along that provides better performance per dollar spent, why continue writing CUDA for new projects? I don't think the argument that "this is what we know how to write" works in this case. These aren't scripts you want someone to knock out quickly.

Uehreka6 months ago

> If something new comes along that provides better performance per dollar spent

They won’t be able to do that, their hardware isn’t fast enough.

Nvidia is beating them at hardware performance, AND ALSO has an exclusive SDK (CUDA) that is used by almost all deep learning projects. If AMD can get their cards to run CUDA via ROCm, then they can begin to compete with Nvidia on price (though not performance). Then, and only then, if they can start actually producing cards with equivalent performance (also a big stretch) they can try for an Embrace Extend Extinguish play against CUDA.

+3
bachmeier6 months ago
dotnet006 months ago

If something new comes along that provides better performance per dollar, but you have no confidence that it'll continue to be available in the future, it's far less appealing. There's also little point in being cheaper if it just doesn't have the raw performance to justify the effort in implementing in that language.

CUDA currently has the better raw performance, better availability, and a long record indicating that the platform won't just disappear in a couple of years. You can use it on pretty much any NVIDIA GPU and it's properly supported. The same CUDA code that ran on a GTX680 can run on an RTX4090 with minimal changes if any (maybe even the same binary).

In comparison, AMD has a very spotty record with their compute technologies, stuff gets released and becomes effectively abandonware, or after just a few years support gets dropped regardless of the hardware's popularity. For several generations they basically led people on with promises of full support on consumer hardware that either never arrived or arrived when the next generation of cards were already available, and despite the general popularity of the rx580 and the popularity of the Radeon VII in compute applications, they dropped 'official' support. AMD treats its 'consumer' cards as third class citizens for compute support, but you aren't going to convince people to seriously look into your platform like that. Plus, it's a lot more appealing to have "GPU acceleration will allow us to take advantage of newer supercomputers, while also offering massive benefits to regular users" than just the former.

This was ultimately what removed AMD as a consideration for us when we were deciding on which to focus on for GPU acceleration in our application. Many of us already had access to an NVIDIA GPU of any sort, which would make development easier, while the entire facility had one ROCm capable AMD GPU at the time, specifically so they could occasionally check in on its status.

throwoutway6 months ago

Is it? Apple Silicon exists, but Apple created a translation layer above it so the transition could be smoother.

Jorropo6 months ago

This is extremely different, apple was targeting end consumers that just want their app to run. The performance between apple rosetta and native cpu were still multiple times different.

People writing CUDA apps don't just want stuff to run, performance is an extremely important factor else they would target CPUs which are easier to program for.

From their readme: > On Server GPUs, ZLUDA can compile CUDA GPU code to run in one of two modes: > Fast mode, which is faster, but can make exotic (but correct) GPU code hang. > Slow mode, which should make GPU code more stable, but can prevent some applications from running on ZLUDA.

piva006 months ago

> The performance between apple rosetta and native cpu were still multiple times different.

Not at all, the performance hit was in the low 10s %, before natively supporting Apple Silicon most of the apps I use for music/video/photography didn't seem to have a performance impact at all, even more when the M1 machines were so much faster than the Intels.

+1
hamandcheese6 months ago
jack_pp6 months ago

not really the same in that Apple was absolutely required to do this in order for people to transition smoothly and it wasn't competing against another company / platform, it just needed apps from its previous platform to work while people recompile apps for the current one which they will

more_corn6 months ago

They have already lost. The question is do they want to come in second in the game to control the future of the world or not play at all?

panick21_6 months ago

That's not guaranteed at all. One could make the same argument about Linux vs Commercial Unix.

If the put their stuff as OpenSource, including firmware, I think they will win out eventually.

And its also not a guarantee that Nvidia will always produce the superior hardware for that code.

fariszr6 months ago

According to the article, AMD seems to have pulled the plug on this as they think it will hinder ROCMv6 adoption, which still btw only supports two consumer cards out of their entire line up[1]

1. https://www.phoronix.com/news/AMD-ROCm-6.0-Released

kkielhofner6 months ago

With the most recent card being their one year old flagship ($1k) consumer GPU...

Meanwhile CUDA supports anything with Nvidia stamped on it before it's even released. They'll even go as far as doing things like adding support for new GPUs/compute families to older CUDA versions (see Hopper/Ada and CUDA 11.8).

You can go out and buy any Nvidia GPU the day of release, take it home, plug it in, and everything just works. This is what people expect.

AMD seems to have no clue that this level of usability is what it will take to actually compete with Nvidia and it's a real shame - their hardware is great.

KingOfCoders6 months ago

AMD thinks the reason Nvidia is ahead of them is bad marketing on their part, and good marketing (All is AI) by Nvidia. They don't see the difference in software stacks.

For years I want to get off the Nvidia train for AI, but I'm forced to buy another Nvidia card b/c AMD stuff just doesn't work, and all examples work with Nvidia cards as they should.

fortran776 months ago

At the risk of sounding like Jeff Ballmer, the reason I only use NVIDIA for GPGPU work (our company does a lot of it!) is the developer support. They have compilers, tools, documentation, and tech support for developers who want to do any type of GPGPU computing on their hardware that just isn't matched on any other platform.

roenxi6 months ago

You've got to remember that AMD are behind at all aspects of this, including documenting their work in an easily digestible way.

"Support" means that the card is actively tested and presumably has some sort of SLA-style push to fix bugs for. As their stack matures, a bunch of cards that don't have official support will work well [0]. I have an unsupported card. There are horrible bugs. But the evidence I've seen is that the card will work better with time even though it is never going to be officially supported. I don't think any of my hardware is officially supported by the manufacturer, but the kernel drivers still work fine.

> Meanwhile CUDA supports anything with Nvidia stamped on it before it's even released...

A lot of older Nvidia cards don't support CUDA v9 [1]. It isn't like everything supports everything, particularly in the early part of building out capability. The impression I'm getting is that in practice the gap in strategy here is not as large as the current state makes it seem.

[0] If anyone has bought an AMD card for their machine to multiply matrices they've been gambling on whether the capability is there. This comment is reasonable speculation, but I want to caveat the optimism by asserting that I'm not going to put money into AMD compute until there is some some actual evidence on the table that GPU lockups are rare.

[1] https://en.wikipedia.org/wiki/CUDA#GPUs_supported

+1
ColonelPhantom6 months ago
paulmd6 months ago

All versions of CUDA support PTX, which is an intermediate bytecode/compiler representation that can be finally-compiled by even CUDA 1.0.

So the contract is: as long as your future program does not touch any intrinsics etc that do not exist in CUDA 1.0, you can export the new program from CUDA 27.0 as PTX, and the GTX 6800 driver will read the PTX and let your gpu run it as CUDA 1.0 code… so it is quite literally just as they describe, unlimited forward and backward capability/support as long as you go through PTX in the middle.

https://docs.nvidia.com/cuda/archive/10.1/parallel-thread-ex...

https://en.wikipedia.org/wiki/Parallel_Thread_Execution

spookie6 months ago

To be fair, if anything, that table still shows you'll have compatibility with at least 3 major releases. Either way, I agree their strategy is getting results, it just takes time. I do prefer their open source commitment, I just hope they continue.

Certhas6 months ago

The most recent "card" is their MI300 line.

It's annoying as hell to you and me that they are not catering to the market of people who want to run stuff on their gaming cards.

But it's not clear it's bad strategy to focus on executing in the high-end first. They have been very successful landing MI300s in the HPC space...

Edit: I just looked it up: 25% of the GPU Compute in the current Top500 Supercomputers is AMD

https://www.top500.org/statistics/list/

Even though the list has plenty of V100 and A100s which came out (much) earlier. Don't have the data at hand, but I wouldn't be surprised if AMD got more of the Top500 new installations than nVidia in the last two years.

+1
latchkey6 months ago
+1
kkielhofner6 months ago
voakbasda6 months ago

In the embedded space, Nvidia regularly drops support for older hardware. The last supported kernel for their Jetson TX2 was 4.9. Their newer Jetson Xavier line is stuck on 5.10.

The hardware may be great, but their software ecosystem is utter crap. As long as they stay the unchallenged leader in hardware, I expect Nvidia will continue to produce crap software.

I would push to switch our products in a heartbeat, if AMD actually gets their act together. If this alternative offers a path to evaluate our current application software stack on an AMD devkit, I would buy one tomorrow.

kkielhofner6 months ago

In the embedded space customers develop bespoke solutions to well, embed them in products where they (essentially) bake the firmware image and more-or-less freeze the entire software stack less incremental updates. The next version of your product uses the next fresh Jetson and Jetpack release. Repeat. Using the latest and greatest kernel is far from a top consideration in these applications...

I was actually advising an HN user against using Jetson just the other day because it's such an extreme outlier when it comes to Nvidia and software support. Frankly Jetson makes no sense unless you really need the power efficiency and form-factor.

Meanwhile, any seven year old >= Pascal card is fully supported in CUDA 12 and the most recent driver releases. That combined with my initial data points and others people have chimed in with on this thread is far from "utter crap".

Use the right tool for the job.

streb-lo6 months ago

I have been using rocm on my 7800xt, it seems to be supported just fine.

bhouston6 months ago

AMD should have the funds to push both of these initiatives at once. If the ROCM team has political reasons to kill the competition, it is because they are scared it will succeed. I've seen this happen in big companies.

But management at AMD should be above petty team politics and fund both because at the company level they do not care which solution wins in the end.

imtringued6 months ago

Why would they be worried about people using their product? Some CUDA wrapper on top of ROCM isn't going to get them fired. It doesn't get rid of ROCM's function as a GPGPU driver.

zer00eyz6 months ago

If your AMD you don't want to be compatible till you have a compelling feature of your own.

Good enough CUDA + New feature x gives them leverage in the inevitable court battle(S) and patten sharing agreement that everyone wants to see.

AMD' already stuck its toe in the water: new CPU's with their AI cores built in. If you can get a AM5 socket to run with 196 gigs, that's a large (all be it slow) model you can run.

MrBuddyCasino6 months ago

AMD truly deserves its misfortune in the GPU market.

incrudible6 months ago

That is really out of touch. ROCm is garbage as far as I am concerned. A drop in replacement, especially one that seems to perform quite well, is really interesting however.

jandrese6 months ago

AMD's management seems to be only vaguely aware that GPU compute is a thing. All of their efforts in the field feel like afterthoughts. Or maybe they are all just hardware guys who think of software as just a cost center.

giovannibonetti6 months ago

Maybe they just can't lure in good software developers with the right skill set, either due to not paying them enough or not having a good work environment in comparison to the other places that could hire them.

captainbland6 months ago

I did a cursory glance at Nvidia's and AMD's respective careers pages for software developers at one point - what struck me was they both have similarly high requirements for engineers in fields like GPU compute and AI but Nvidia hires much more widely, geographically speaking, than AMD.

As a total outsider it seems to me that maybe one of AMD's big problems is they just aren't set up to take advantage of the global talent pool in the same way Nvidia is.

newsclues6 months ago

They are aware, but it wasn’t until recently that they had the resources to invest in the space. They had to build Zen and start making buckets of money first

beebeepka6 months ago

Exactly. AMD stock was like 2 dollars just eight years ago. They didn't have any money and, amusingly, it was their GPU business that kept them going on life support.

Their leadership seems quite a bit more competent than random forum commenters give them credit for. I guess what they need, marketing wise, is a few successful halo GPU launches. They haven't done that in a while. Lisa acknowledged this years ago. It's marketing 101. I guess these things are easier said than done.

lostdog6 months ago

It feels like "Make the AI software work on our GPUs," is on some VP's OKRs, but isn't really being checked on for progress or quality.

trynumber96 months ago

That doesn't explain CDNA. They focused on high-throughput FP64 which is not where the market went.

modeless6 months ago

I've been critical of AMD's failure to compete in AI for over a decade now, but I can see why AMD wouldn't want to go the route of cloning CUDA and I'm surprised they even tried. They would be on a never ending treadmill of feature catchup and bug-for-bug compatibility, and wouldn't have the freedom to change the API to suit their hardware.

The right path for AMD has always been to make their own API that runs on all of their own hardware, just as CUDA does for Nvidia, and push support for that API into all the open source ML projects (but mostly PyTorch), while attacking Nvidia's price discrimination by providing features they use to segment the market (e.g. virtualization, high VRAM) at lower price points.

Perhaps one day AMD will realize this. It seems like they're slowly moving in the right direction now, and all it took for them to wake up was Nvidia's market cap skyrocketing to 4th in the world on the back of their AI efforts...

matchagaucho6 months ago

But AMD was formed to shadow Intel's x86?

atq21196 months ago

AMD was founded at almost the same time as Intel. X86 didn't exist at the time.

But yes, AMD was playing the "follow x86" game for a long time until they came up with x86-64, which evened the playing field in terms of architecture.

modeless6 months ago

ISAs are smaller and less stateful and better documented and less buggy and most importantly they evolve much more slowly than software APIs. Much more feasible to clone. Especially back when AMD started.

+1
paulmd6 months ago
owlbite6 months ago

Code portability isn't performance portability, a fact that was driven home back in the bad old OpenCL era. Code is going to have to rewritten to be efficient on AMD architectures.

At which point why tie yourself to the competitor's language. Probably much more effective to just write a well optimized library that serves the MLIR/whatever is popular API in order to run big ML jobs.

whywhywhywhy6 months ago

> Why would this not be AMD’s top priority among priorities?

Same reason it wasn't when it was obvious Nvidia was taking over this space maybe 8 years ago now when they let OpenCL die then proceeded to do nothing till it's too late.

Speaking to anyone working in general purpose GPU coding back then they all just said the same thing, OpenCL was a nightmare to work with and CUDA was easy and mature compared to it. Writing was on the wall where things were heading the second you saw a photon based renderer running on GPU vs CPU all the way back then, AMD has only themselves to blame because Nvidia basically showed them the potential with CUDA.

btown6 months ago

One would hope that they've learned since then - but it could very well be that they haven't!

largbae6 months ago

It certainly seems ironic that the company that beat Intel at its own compatibility game with x86-64 would abandon compatibility with today's market leader.

rob746 months ago

The situation is a bit different: AMD got its foot in the door with the x86 market because IBM back in the early 1980s forced Intel to license the technology so AMD could act as a second source of CPUs. In the GPU market, ATI (later bought by AMD) and nVidia emerged as the market leaders after the other 3D graphics pioneers (3Dfx) gave up - but their GPUs were never compatible in the first place, and if AMD tried to make them compatible, nVidia could sue the hell out of them...

alberth6 months ago

DirectX vs OpenGL.

This brings back memories of late 90s / early 00s of Microsoft pushing hard their proprietary graphic libraries (DirectX) vs open standards (OpenGL).

Fast forward 25-years and even today, Microsoft still dominates in PC gaming as a result.

There's a bad track record of open standard for GPUs.

Even Apple themselves gave up on OpenGL and has their own proprietary offering (Metal).

okanat6 months ago

OpenGL was invented at SGI and it was closed source until it was given away. It is very popular in its niche i.e. CAD design because the original closed source SGI APIs were very successful.

DirectX was targetted at gaming and was a much more limited simpler API which made programming games in it easier. It couldn't do everything that OpenGL can which is why CAD programs didn't use it even on Windows. DirectX worked because it chose its market correctly and delivered what the customers want. Window's exceptional backwards compatibility helped greatly as well. Many simple game engines still use DX9 API to this day.

It is not so much about having an open standard, but being able to provide extra functionality and performance. Unlike the CPU-dominated areas where executing the common baseline ISA is very competitive, in accelerated computing using every single bit of performance and having new and niche features matter. So providing exceptional hardware with good software is critical for the competition. Closed APIs have much more quick delivery time and they don't have to deal with multiple vendors.

Nobody except Nvidia delivers good enough low level software and their hardware is exceptionally good. AMD's combination is neither. The hardware is slower and it is hard to program so they continuously lose the race.

incrudible6 months ago

To add to that, Linux gaming today is dominated by a wrapper implementing DirectX.

Zardoz846 months ago

Vulkan running an emulation of DirectX and being faster

pjmlp6 months ago

Also to note, dispite urban myths, OpenGL never mattered on game consoles, which people keep forgeting about when praising OpenGL "portability".

Then there is the whole issue of extension spaghetti, and incompatibilities across OpenGL, OpenGL ES and WebGL, hardly possible to have portable code 1:1 everywhere, beyond toy examples.

beebeepka6 months ago

I guess every recent not-xbox never mattered.

pjmlp6 months ago

Like Nintendo, SEGA and Sony ones?

Keyframe6 months ago

Let's not forget the Fahrenheit maneuver by Microsoft that left SGI stranding and not forward OpenGL.

pjmlp6 months ago

Yeah, it never mattered to game consoles either way.

iforgotpassword6 months ago

Someone built the same a while ago for Intel gpus, I think even still the old pre-Xe ones. With arc/xe on the horizon, people had the same question: why isn't Intel sponsoring this or even building their own. It was speculated that this might get them into legal hot water with Nvidia, Google VS. Oracle was brought up, etc...

my1236 months ago

They financed the prior iteration of Zluda: https://github.com/vosen/ZLUDA?tab=readme-ov-file#faq

but then stopped

formerly_proven6 months ago

> [2021] After some deliberation, Intel decided that there is no business case for running CUDA applications on Intel GPUs.

oof

Cheer21716 months ago

Are you freaking kidding me!?!? Fire those MBAs immediately.

AtheistOfFail6 months ago

> After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs.

Oof x2

+2
iforgotpassword6 months ago
chem836 months ago

To be fair to AMD, they've been trying to solve ML workload portability at more fundamental levels with the acquisition of Nod.ai and de-facto incorporation of Google's IREE compiler project + MLIR.

izacus6 months ago

Why do you think running after nVidia for this submarket is a good idea for them? The AMD GPU team isn't especially big and the development investment is massive. Moreover, they'll have the opportunity cost for projects they're now dominating in (all game consoles for example).

Do you expect them to be able to capitalize on the AI fad so much (and quickly enough!) that it's worth dropping the ball on projects they're now doing well in? Or perhaps continue investing into the part of the market where they're doing much better than nVidia?

nindalf6 months ago

AMD is betting big on GPUs. They recently released the MI300, which has "2x transistors, 2.4x memory and 1.6x memory bandwidth more than the H100, the top-of-the-line artificial-intelligence chip made by Nvidia" (https://www.economist.com/business/2024/01/31/could-amd-brea...).

They very much plan to compete in this space, and hope to ship $3.5B of these chips in the next year. Small compared to Nvidia's revenues of $59B (includes both consumer and data centre), but AMD hopes to match them. It's too big a market to ignore, and they have the hardware chops to match Nvidia. What they lack is software, and it's unclear if they'll ever figure that out.

incrudible6 months ago

They are trying to compete in the segment of data center market where the shots are called by bean counters calculating FLOPS per dollar.

latchkey6 months ago

That's why I'm going to democratize that business and make it available to anyone who wants access. How does bare metal rentals of MI300x and top end Epyc CPUs sound? We take on the capex/opex/risk and give people what they want, which is access to HPC clusters.

BearOso6 months ago

A market where Nvidia chips are all bought out, so what's left?

currymj6 months ago

everyone buying GPUs for AI and scientific workloads wishes AMD was a viable option, and this has been true for almost a decade now.

the hardware is already good enough, people would be happy to use it and accept that's it's not quite as optimized for DL as Nvidia.

people would even accept that the software is not as optimized as CUDA, I think, as long as it is correct and reasonably fast.

the problem is just that every time i've tried it, it's been a pain in the ass to install and there are always weird bugs and crashes. I don't think it's hubris to say that they could fix these sorts of problems if they had the will.

throwawaymaths6 months ago

IIRC (this could be old news) AMD GPUs are preferred in the supercomputer segment because they offer better flops/unit energy. However without a cuda-like you're missing out on the AI part of supercompute, which is increasing proportion.

The margins on supercompute-related sales are very high. Simplifying, but you can basically take a consumer chip, unlock a few things, add more memory capacity, relicense, and your margin goes up by a huge factor.

anonylizard6 months ago

They are preferred not because of inherent superiority of AMD GPUs. But simply because they have to price lower and have lower margins.

Nvidia could always just half their prices one day, and wipe out every non-state-funded competitor. But Nvidia prefers to collect their extreme margins and funnel it into even more R&D in AI.

Symmetry6 months ago

It's more that the resource balance in AMD's compute line of GPUs (the CDNA ones) has been more focused on the double precision operations that most supercomputer code makes heavy use of.

throwawaymaths6 months ago

Thanks for clarifying! I had a feeling I had my story slightly wrong

jandrese6 months ago

If the alternative it to ignore one of the biggest developing markets then yeah, maybe they should start trying to catch up. Unless you think GPU compute is a fad that's going to fizzle out?

izacus6 months ago

One of the most important decisions a company can do, is to decide which markets they'll focus in and which they won't. This is even true for megacorps (see: Google and their parade of messups). There's just not enough time to be in all markets all at once.

So, again, it's not at all clear that AMD being in the compute GPU game is the automatic win for them in the future. There's plenty of companies that killed themselves trying to run after big profitable new fad markets (see: Nokia and Windows Phone, and many other cases).

So let's examine that - does AMD actually have a good shot of taking a significant chunk of market that will offset them not investing in some other market?

+2
jandrese6 months ago
imtringued6 months ago

Are you seriously telling me they shouldn't invest into one of their core markets? The necessary investments are probably insignificant. Let's say you need a budget of 10 million dollars (50 developers) to assemble a dev team to fix ROCM. How many 7900 XTX to break even on revenue? Roughly 9000. How many did they sell? I'm too lazy to count but Mindfactory a German online shop alone sold around 6k units.

yywwbbn6 months ago

> So, again, it's not at all clear that AMD being in the compute GPU game is the automatic win for them in the future. There's

You’re right about that but it seems that it’s pretty clear that not being in the compute GPU game is an automatic loss for them (look at their recent revenue growth in the past quarter and two by in each sector)

thfuran6 months ago

Investing in what other market?

hnlmorg6 months ago

GPU for compute has been a thing since the 00s. Regardless of whether AI is a fad (it isn't, but we can agree to disagree on this one) not investing more in GPU compute is a weird decision.

carlossouza6 months ago

Because the supply for this market is constrained.

It's a pure business decision based on simple math.

If the estimated revenues from selling to the underserved market are higher than the cost of funding the project (they probably are, considering the obscene margins from NVIDIA), then it's a no-brainer.

bonton896 months ago

AMD also has the problem that they make much better margins on their CPUs than on their GPUs and there are only so many TSMC wafers. So in a way making more GPUs is like burning up free money.

yywwbbn6 months ago

Because their current market valuation was massively inflated because of the AI/GPU boom and/or bubble?

In rational world their stock price would collapse if they don’t focus on it and are unable to deliver anything competitive in the upcoming year or two

> of the market where they're doing much better than nVidia?

So the market that’s hardly growing, Nvidia is not competing in and Intel still has bigger market share and is catching up performance wise? AMD’s valuation is this highly only because they are seen as the only company that could directly compete with Nvidia in the data center GPU market.

FPGAhacker6 months ago

It was Microsoft’s strategy for several decades (outsiders called it embrace, extend, extinguish, only partially in jest). It can work for some companies.

geodel6 months ago

Well simplest reason would be money. There are few companies rolling in kind of money like Nvidia and AMD is not one of them. Cloud vendors would care a bit for them it is just business if Nvidia cost a lot more they in turn charge their customers a lot more while keeping their margins. I know some people still harbors notion that competition will lower the price, and it may, just not in sense customers imagine.

imtringued6 months ago

This feels like a massive punch in the gut. An opensource project, not ruined by AMD's internal mismanaged gets shit done within two years and AMD goes "meh"?!? There are billions of dollars on the line! It's like AMD actively hates it's customers.

Now the only thing they need to do is make sure ROCm itself is stable.

varelse6 months ago

[dead]

phero_cnstrcts6 months ago

Because the two CEOs are family? Like literally.

CamperBob26 months ago

That didn't stop World War I...

irusensei6 months ago

I'll try later with projects in which I had issues to make it work like TortoiseTTS. I'm not expecting comparable Nvidia speeds but definitely faster than pure CPU.

rasz5 months ago

>For reasons unknown to me, AMD decided this year to discontinue funding the effort and not release it as any software product.

classic AMD

pjmlp6 months ago

So polyglot programing workflows via PTX targeting are equally supported?

fancyfredbot6 months ago

Wouldn't it be fun to make this work on intel graphics as well.

eqvinox6 months ago

Keeping my hopes curtailed until I see proper benchmarks…

mogoh6 months ago

Why is CUDA so prevalent oppose to its alternatives?

smoldesu6 months ago

At first, it was because Nvidia had a wide variety of highly used cards that almost all support some form of CUDA. By-and-large, your gaming GPU could debug and run the same code that you'd scale up to a datacenter, which was a huge boon for researchers and niche industry applications.

With that momentum, CUDA got incorporated into a lot of high-performance computing applications. Few alternatives show up because there aren't many acceleration frameworks that are as large or complete as CUDA. Nvidia pushed forward by scaling down to robotics and edge-compute scale hardware, and now are scaling up with their DGX/Grace platforms.

Today, Nvidia is prevalent because all attempts to subvert them have failed. Khronos Group tried to get the industry to rally around OpenCL as a widely-supported alternative, but too many stakeholders abandoned it before the initial crypto/AI booms kicked off the demand for GPGPU compute.

JonChesterfield6 months ago

Opencl was the alternative, came along later, couldn't write a lot of programs that cuda can. Cuda is legitimately better than opencl.

notso4116 months ago

[dead]

Donaldbendo5 months ago

[flagged]

Bobbyben6 months ago

[flagged]

cashsterling6 months ago

I feel like AMD's senior executives all own a lot of nVIDIA stock.

2OEH8eoCRo06 months ago

Question: Why aren't we using LLMs to translate programs to use ROCm?

Isn't translation one of the strengths of LLMs?

JonChesterfield6 months ago

You can translate cuda to hip using a regex. LLM is rather overkill.

sam_goody6 months ago

Aside from the latest commit, there has been no activity for almost 3 years (latest code change on Feb 22, 2021).

People are criticizing AMD for dropping this, but it makes sense to stop paying for development when the dev has stopped doing the work, no?

And if he means that AMD stopped paying 3 years ago - well, that was before dinosaurs and ChatGPT, and alot has changed since then.

https://github.com/vosen/ZLUDA/commits/v3

michaellarabel6 months ago

As I wrote in the article, it was privately developed the past 2+ years while being contracted by AMD during that time... In a private GitHub repo. Now that he's able to make it public / open-source, he squashed all the changes into a clean new commit to make it public. The ZLUDA code from 3+ years ago was when he was experimenting with CUDA on Intel GPUs.

EspadaV96 months ago

Pretty sure this was developed in private, but because AMD cancelled the contract he has been allowed to open source the code, and this is the "throw it over the fence" code dump.

rrrix16 months ago

This.

    762 changed files with 252,017 additions and 39,027 deletions.
https://github.com/vosen/ZLUDA/commit/1b9ba2b2333746c5e2b05a...
Ambroisie6 months ago

My thinking is that the dev _did_ work on it for X amount of time, but as part of their contract is not allowed to share the _actual_ history of the repo, thus the massive code dumped in their "Nobody expects the Red Team" commit?

rswail6 months ago

Have a look at the latest commit and the level of change.

Effectively the internal commits while he was working for AMD aren't in the repo, but the squashed commit contains all of the changes.

Zopieux6 months ago

If only this exact concern was addressed explicitly in the first FAQ at the bottom of the README...

https://github.com/vosen/ZLUDA/tree/v3?tab=readme-ov-file#fa...

SushiHippie6 months ago

The code prior to this was all for the intel gpu zluda, and then the latest commit is all the amd zluda code, hence why the commit talks about the red team

Detrytus6 months ago

This is really interesting (from the project's README):

> AMD decided that there is no business case for running CUDA applications on AMD GPUs.

Is AMD leadership brain-damaged, or something?