Back

Making C and Python Talk to Each Other

71 points3 daysleetarxiv.substack.com
hugs2 hours ago

This is one of the "killer apps" for Nim. Nim makes makes it easy to wrap C and easy to talk to Python (via Nimpy).

hughw2 hours ago

I realize I'm talking about C++ not C, but coincidentally just today I ported our 7 year old library's Swig/Python interface to nanobind. What a fragile c9k Swig has been all these years (don't touch it!) and the nanobind transformation is so refreshing and clean, and lots of type information suddenly available to Python programs. One day of effort and our tests all pass, and now nanobind seems able to allow us to improve the ergonomics (from the Python pov) of our lib.

rossant4 hours ago

My visualization library [1] is written in C and exposes a visualization API in C. It is packaged as a Python wheel using auto-generated ctypes bindings, which includes the shared library (so, dylib, or dll) and a few dependencies. This setup works very well, with no need to compile against each Python version. I only need to build it for the supported platforms, which is handled automatically by GitHub Actions. The library is designed to minimize the number of C calls, making the ctypes overhead negligible in practice.

[1] https://datoviz.org/

muragekibicho3 days ago

Lots of articles focus on Cython and optimizing Python using C code.

This article is about embedding Python scripts inside a C codebase

kvemkon5 hours ago

Once I needed to implement a simple python plugin engine in a C/C++ software, I've been successfully using the official guide [1].

[1] https://docs.python.org/3/extending/embedding.html

jebarker7 hours ago

Lots of people argue that AI R&D is currently done in Python because of the benefits of the rich library ecosystem. This makes me realize that's actually a poor reason for everything to be in Python since the actually useful libraries for things like visualization could easily be called from lower level languages if they're off the hot path.

crote6 hours ago

> could easily be called from lower level languages

Could? Yes. Easily? No.

People write their business logic in Python because they don't want to code in those lower-level languages unless they absolutely have to. The article neatly shows the kind of additional coding overhead you'd have to deal with - and you're not getting anything back in return.

Python is successful because it's a high-level language which has the right tooling to create easy-to-use wrappers around low-level high-performance libraries. You get all the benefits of a rich high-level language for the cold path, and you only pay a small penalty over using a low-level language for the hot path.

jebarker5 hours ago

The problem I see (day to day working on ML framework optimization) is that it's not just a case of python calling lower level compiled code. Pytorch, for example, has a much closer integration of python and the low level functions than that and it does cause performance bottlenecks. So in theory I agree that using high level languages to script calls to low level is a good idea, but in practice that gets abused to put python in the hot path. Perhaps if the lower level language were the bulk of the framework and just called python for helper functions we'd see better performance-aware design from developers.

yowlingcat5 hours ago

> but in practice that gets abused to put python in the hot path

But if that's an abuse of the tools (which I agree with) how does that make it the fault of the language rather than the user or package author? Isn't the language with the "rich library ecosystem" the natural place to glue everything together (including performant extensions in other languages) rather than the other way around -- and so in your example, wouldn't the solution just be to address the abuse in pytorch rather than throw away the entire universe within which it's already functionally working?

jebarker4 hours ago

The problem is that python allows people to be lazy and ignore subtle performance issues. That's much harder in a lower level language. Obviously the tradeoff is that it'd slow down (or completely stop) some developers. I'm really just wondering out loud if the constraints of a lower level language would help people write better code in this case and whether that trade-off would be worth it

efavdb3 hours ago

FWIW I would be up to write in c or something else, but use python for the packages / network effects.

ashishb6 hours ago

I rewrote a simple RAG ingestion pipeline from Python to Go.

It reads from a database. Generates embeddings. Writes it to a vector database.

  - ~10X faster
  - ~10X lower memory usage
The only problem is that you have to spend a lot of time figuring out how to do it.

All instructions on the Internet and even on the vector database documentation are in Python.

chpatrick5 hours ago

If speed and memory use aren't a bottleneck then "a lot of time figuring out how to do it" is probably the biggest cost for the company. Generally these things can be run offline and memory is fairly cheap. You can get a month of a machine with a ton of RAM for the equivalent of one hour of developer time of someone who knows how to do this. That's why Python is so popular.

kgeist4 hours ago

>I rewrote a simple RAG ingestion pipeline from Python to Go

I also wrote a RAG pipeline in Go, using OpenSearch for hybrid search (full-text + semantic) and the OpenAI API. I reused OpenSearch because our product was already using it for other purposes, and it supports vector search.

For me, the hardest part was figuring out all the additional settings and knobs in OpenSearch to achieve around 90% successful retrieval, as well as determining the right prompt and various settings for the LLM. I've found that these settings can be very sensitive to the type of data you're applying RAG to. I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too

ashishb4 hours ago

> I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too

There are Python libraries that will simplify the task by giving a better structure to your problem. The knobs will be fewer and more high-level.

giancarlostoro7 hours ago

I think its more than just because of the available libraries. I think that industry has just predominantly preferred Python. Python is a really rich modern language, it might be quirky, but so is every single language you can name. Nothing is quite as quirky as JavaScript though, maybe VB6 but that's mostly dead, though slightly lingering.

Mind you I've programmed in all the mentioned languages. ;)

whattheheckheck6 hours ago

It's the ease of distribution of packages and big functionality being a pip install away

kstrauser6 hours ago

That's the killer feature. Whatever it is you want to do, there's almost certainly a package for it. The joke is that Python's the second best language for everything. It's not the best for web backends, but it's pretty great. It's not the best for data analysis, but it's pretty great. It's not the best at security tooling, but it's pretty great. And it probably is the best language for doing all three of those things in one project.

+1
scj4 hours ago
+1
username2234 hours ago
wallunit6 hours ago

This is actually rather a reason to avoid Python in my opinion. You don't want pip to pollute your system with untracked files. There are tools like virtualenv to contain your Python dependencies but this isn't by default, and pip is generally rather primitive compared to npm.

bee_rider5 hours ago

Ubuntu complains now if you try to use pip outside a virtual environment… I think things are in a basically ok state as far as that goes.

Arguably it could be a little easier to automatically start up a virtual environment if you call pip outside of one… but, I dunno, default behavior that papers over too many errors is not great. If they don’t get a hard error, confused users might become even more confused when they don’t learn they need to load a virtual environment to get things working.

+1
montebicyclelo5 hours ago
kristjansson6 hours ago

One ... could? But it doesn't seem particularly ergonomic.

jebarker5 hours ago

Ergonomics isn't the point, performance is.

mkoubaa3 hours ago

Nobody has ever, in the history of Python, called the Python C API easy.

dexzod3 hours ago

The title of the article is misleading. Making C and python talk to each other implies, calling python from C and calling C from python. The article only covers the former.

eth_hack775 hours ago

Thanks a lot for the article. Here's a QQ: did you measure the time of some basic operations python vs C? (e.g. if I do a loop of 10 billion iterations, just dividing numbers in C and do the same in python, and then import these operations into one another as libraries, does anything change?)

I'm a beginner engineer so please don't judge me if my question is not making perfect sense.

bdbenton52554 hours ago

C is many magnitudes faster than Python and you can measure this using nested conditionals. Python is built for a higher level of abstraction and this comes at the cost of speed. It is what makes it very natural and human-like to write in.

xandrius4 hours ago

Syntax has nothing to do with the speed of the language: python could be "natural" and "human-like" while being much faster and also "unnatural" and "inhuman" while being slower.

bdbenton52554 hours ago

It does, actually, as the syntax is a result of the language's design and a simpler and more human-like syntax requires a higher level of abstraction that reduces efficiency.

The design of a language, including its syntax, has a great bearing on its speed and efficiency.

Compare C with Assembly, for example, and you will see that higher level languages take complex actions and simplify them into a more terse syntax.

You will also observe that languages such as Python are not nearly as suitable for lower level tasks like writing operating systems where C is much more suitable due to speed.

Languages like Python and Ruby include a higher level of built-in logic to make writing in them more natural and easy at the cost of efficiency.

johannes12343214 hours ago

Then let's look at C++, which in some areas has a higher abstraction level than C, but in some areas can still be faster than C. (Due to usage of templates, which then inline the library code, which then can be optimized on actual types, rather than using library functions which use void pointers, which will require a function call and have a not as optimized compiled form.

The main thing about python being slower is that in most contexts it is used as an interpreted/interpiled language running on its own VM in cpython.

throwaway3141554 hours ago

Language abstractions that are not "zero-cost" inevitably lead to worse performance. Python has many such abstractions designed to improve developer experience. I think that's all the person you're responding to meant.

+1
bdbenton52554 hours ago
SandmanDP7 hours ago

I’ve been curious, what are the motivations for most projects to use Lua for enabling scripting in C over this? Is the concern around including an entire Python interpreter in a project and Lua is lighter?

crote5 hours ago

Lua is absolutely trivial to isolate. As the embedder, you have complete control over what the interpreter and VM are doing. Don't want your Lua scripts to have file access? Don't hook up those functions and you're done. Want to prevent against endless loops? Tell the VM to stop after 10.000 instructions. Want to limit the amount of memory a script can use? Absolutely trivial. This makes Lua very attractive for things like game development. You can run untrusted addon code without any worry that it'll be able to mess up the game - or the rest of the system.

Doing the same with Python is a lot harder. Python is designed first and foremost to run on its own. If you embed Python you are essentially running it besides your own code, with a bunch of hooks in both directions. Running hostile Python code? Probably not a good idea.

OskarS5 hours ago

Another thing to mention is that until very recently (Python 3.12, I think?) every interpreter in the address space shared a lot of global state, including most importantly the GIL. For my area (audio plugins) that made Python a non-starter for embedding, while Lua works great.

I agree though: biggest reason is probably the C API. Lua's is so simple to embed and to integrate with your code-base compared to Python. The language is also optimized for "quick compiling", and it's also very lightweight.

These days, however, one might argue that you gain so much from embedding either Python or JavaScript, it might be worth the extra pain on the C/C++ side.

bandoti6 hours ago

Lua is much lighter but the key is that it’s probably one of the easiest things to integrate (just copy the sources/includes and add them to build it’ll work)—like a “header only” kind of vibe.

But, you can strip down a minimal Python build and statically compile it without too much difficulty.

I tend to prefer Tcl because it has what I feel the perfect amount of functionality by default with a relatively small size. Tcl also has the better C APIs of the bunch if you’re working more in C.

Lua is very “pushy” and “poppy” due to its stack-based approach, but that can be fun too if you enjoy programming RPN calculators haha :)

spacechild15 hours ago

People already mentioned that Lua is very lightweight and easy to integrate. It's also significantly faster than Python. (I'm not even talking about LuaJIT.)

Another big reason: the Lua interpreter does not have any global variables (and therefore also no GIL) so you can have multiple interpreters that are completely independent from each other.

90s_dev7 hours ago

Network effect.

a_t487 hours ago

Useful, I’m going to be doing something similar w/C++ soon.

brcmthrowaway4 hours ago

How does this compare to pybind11?

nubinetwork4 hours ago

Isn't this the whole point to cffi and cython?

softwaredoug4 hours ago

Definitely though Cython is a layer of abstraction that might feel like Python has all kinds of weirdness you might as well write in a better understood language like C.