OCaml's Wings for Machine Learning

108 points2 monthsgithub.com

toolslive • 2 months ago

Am I the only one who doesn't like notebooks ?

I don't want to write code in a browser. For python you have something like ipython which allows you to have an interactive experience, while also allowing you to have your favourite editing environment. For ocaml, surely there are also repls that provide this kind of thing.

TJSomething • 2 months ago

My issue with REPLs is that they're often too transient. Sometimes I make a mistake on step 2 that I realize while iterating on step 10 of exploring some data. Then I need to iterate on step 2 a little, taking advantage of the ability to easily just run some of the calculations. Once I've got that, I can run the whole notebook again and make sure all my intermediate graphs look good.

arbitrandomuser • 2 months ago

The alternative is not programming purely in the repl but to have an editor where you type out your code and from where you can send bits of text to the repl to run and get feedback. All major editors have plugins to do this.

neonsunset • 2 months ago

Me too, I just write .fsx scripts instead (F# interactive), it works nicely together with Plotly.NET. And can crunch / parallelize well if you want to do something over many files.

nathan_compton • 2 months ago

I hate em. They make people write bad code and using them for interactive development is much worse than using a REPL. I cannot understand what people see in them.

pletnes • 2 months ago

I use my IDE when I care about the production code, but I prefer notebooks when I’m just using code to understand data through visualization and statistics.

It drives me nuts that some people see this as «either or».

Linqpad is another and different take on data-driven environments. As is various «SQL admin» alternatives.

nathan_compton • 2 months ago

I love my fellow humans and do not begrudge you the workflow that you feel works for you.

nsingh2 • 2 months ago

My workflow is to keep the notebook as thin as possible, and concise enough to fit in my brain. I explore the problem space and write some functions and data structures directly in the notebook, but as soon as I can, I refactor them into separate modules that I then import.

If I don’t, the notebook quickly becomes bloated, I start getting paranoid about stale cells and lose track of what I'm actually trying to do.

Ideally the notebook’s sole purpose is interactivity, as kind of a quick-and-dirty frontend when it's not worth it to write an actual frontend, everything infrastructural gets moved out.

aldanor • 2 months ago

You can write great code in notebooks; it all depends on you. Eg if you use vscode and not the browser for jupyter notebooks, you will have the same type checking and formatting as in the editor itself if you have it enabled.

For interactive development, repl is a joke. Sketching out a fully working prototype of a few hundred lines of code (possibly depending on some data and context you don't want to reload) is impossible in repl.

patagurbon • 2 months ago

They’re great for teaching. But I agree for actual work.

az09mugen • 2 months ago

I totally dislike notebooks in the first place for a totally different reason. I'm really attached to my keyboard shortcuts, when I'm in a text editor or an IDE I have one specific "layout", but in the browser another. Some of my text editing shortcuts conflict with those in the browser (looking at you, ctrl-w for example).

I don't want to adapt, the shortcuts in the browser remain in the browser, and same for text editing, no overlapping.

bb1234 • 2 months ago

I don't like them either. I find it hard to articulate why I don't like them. I definitely have experienced problems with them where they get into some state where the cell results are incorrect. Then, if I restart the kernel and run the cells again, I get the correct result. But I don't like them for reasons other than this one, and cannot explain why. I prefer the ipython REPL to the notebook.

sahilagarwal • 2 months ago

Its a bit of nostalgia for me. It took a bit of work to understand ipython when I first started as a programmer, but that effort helped me in the long run. Using ipdb for breakpoints was a game changer in my first job.

And it also was a good way to get comfortable with using terminal.

Using notebooks removes all these learnings. I dislike it because it makes for less confident programmers in the long run.

abathologist • 2 months ago

OCaml has utop, down, and the unadorned ocaml top-level.

mattpallissard • 2 months ago

Obligatory, Joel Grus: I don't like notebooks.

https://www.youtube.com/watch?v=7jiPeIFXb6U

And the slides; https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...

FrustratedMonky • 2 months ago

Is anybody building things like this using FSharp? It seems like FSharp would have more of the ecosystem for machine learning and AI, than OCaml, yet with the functionality of OCaml.

rybosome • 2 months ago

As it happens, I’ve been thinking about a library I’d like to develop in F# as an exercise to learn the language.

I have a hand-rolled proxying inference framework written in Python that is similar in purpose to something like LangChain but much more stripped down, less abstraction. Similarly to many other Python tools, it leverages the reflective capabilities of the language to do things like ask LLMs for responses conforming to a data class, or pass native Python functions as tools. Best of all, it relies on native Python constructs like docstrings to provide additional context to the inference APIs, making clean and well documented code a secondary programming model in a sense.

Perhaps it’s vanity, but at least in Python I find the resulting code quite elegant. I became curious what it would be like to port this to other languages, and surprisingly found that F# would, to my eyes, end up with the most lovely analogue.

Even languages I expected to be expressive and terse, like Haskell, couldn’t express the same ideas as understandably yet concisely as F#.

nickpeterson • 2 months ago

It’s a true shame F# isn’t more popular. It’s a great language/runtime combination for doing work quickly, correctly.

greener_grass • 2 months ago

A small community but some things going on:

- https://github.com/dotnet/TorchSharp

- https://github.com/DiffSharp/DiffSharp

- https://github.com/fsprojects/Furnace

- https://github.com/plotly/Plotly.NET

brianberns • 2 months ago

Yes, I use F# with TorchSharp to do machine learning. My most recent project is a Hearts AI that is quite good. Here are some links:

* https://github.com/brianberns/Hearts

* https://github.com/brianberns/MinGptSharp

* https://github.com/brianberns/ModestGpt

* https://github.com/brianberns/DeepKuhnPoker

nextos • 2 months ago

F# has bindings to Infer.NET, by MS Research, which is incredibly good for some classes of probabilistic models and very mature.

In particular, it shines on very large models or where fast quasi real-time inference is required.

StopDisinfo910 • 2 months ago

F# is a nice language if you want to use the dotnet ecosystem but it’s basically Ocaml without anything that makes Ocaml interesting (parametrised modules).

neonsunset • 2 months ago

Besides parametrized modules what else do you think F# is missing? To me it seems like a 10x more powerful choice because of much richer GUI and web framework choices, great tooling and high flexibility as a language (computation expressions, does scripting, multitasking/multithreading and even low-level (when needed) quite well).

deredede • 2 months ago

This looks interesting, it's great to see more machine learning efforts in typed languages.

I'm a bit surprised to see no mention of Owl (https://github.com/owlbarn/owl an older project for scientific computing in OCaml that was resurrected recently), I wonder how they compare.

The Raven README mentions:

> We prioritize developer experience and seamless integration.

so maybe that's one difference — I used Owl on a course project about a decade ago, and while it got the job done, I remember the experience being rather painful compared to Numpy (even though I was more experienced with OCaml than with Python at the time).

UncleOxidant • 2 months ago

Wasn't there something about Owl being "concluded" about a year ago because the 2 developers no longer had time for the project? Is Raven the successor to Owl?

deredede • 2 months ago

It was, but then they changed their mind apparently

https://discuss.ocaml.org/t/owl-project-restructured/14226

Too early to tell what it will lead to.

> Is Raven the successor to Owl?

That's what I am wondering too (especially with the name), but I couldn't find a reference to Owl in the repo.

behnamoh • 2 months ago

OCaml surprises me—it's old enough to be mature in terms of features and libs, and it's got a small but enthusiastic community, but every time I tried to convince myself to OCaml I found myself more drawn to Haskell and Elixir.

giraffe_lady • 2 months ago

I like ocaml for the things other people like go for. It's a grimy roughneck language. Not a lot of fun to play around with or explore ideas but in my experience codebases written in it are stable and age well, easy to maintain.

Elixir vs ocaml I use both languages but for such completely different things I don't even think about a comparison. Elixir is for when the problem I have suits beam's strengths.

noelwelsh • 2 months ago

Its unfortunate the cleaned up syntax never took off, and that OCaml dropped the ball on multicore for over a decade. If OCaml had decent multicore around 2010 or so the current programming language langscape could look very different.

StopDisinfo910 • 2 months ago

Python had no multicore during the same period and that never prevented it from becoming successful. Plus, Ocaml always had descent solution for concurrent I/O. The absence of multicore is a complete red herring in why Ocaml isn't more successful.

Ocaml issue never was the syntax which is completely fine. The current syntax is actually a lot nicer that what Facebook proposed. Ocaml issue is not being a USA-born project nor having a significant marketing push in English.

Plus, Ocaml always was too far ahead of its time (including now with its effect system). First, you have the functional approach which was already very unfamiliar for most. Then, you have to add module level programming on top which is still very unfamiliar to most. Just look at this comment page and people thinking Ocaml is not fun to use or less interesting than Haskell, it's trully sad.

Multicore has added the extremely promising effect system but that's once again a step too far for most current developers.

In a lof of way, Ocaml is to programming language what the Pixies are to rock music. Everyone who felt deeply in love with it went on to write a language of their own. Some got really successful.

kangda123 • 2 months ago

aseipp • 2 months ago

OCaml's syntax is not what's "holding it back" in my personal opinion, but that doesn't mean it's "fine". I regularly praise it as a language (even as a former developer of the main Haskell compiler, which adopted Mixin-style modules decades after the fact) and "the syntax looks like shit" is absolutely one of the first immediate turn-offs for pretty much everyone I talk to. It's also one of my own complaints, but OCaml has a lot of good stuff too so I can put up with it.

dismalaf • 2 months ago

> Ocaml issue is not being a USA-born project

Tons of languages started outside the US. Python and Ruby being notable.

GIFtheory • 2 months ago

100% agreed. The moment I saw the “revised” syntax I had a real “what were they thinking” moment. You’re supposed to take the good parts of a good thing and add it to a bad thing, not the other way around!

Really, OCaml’s syntax is beautiful, unless you’re one of those people who loves to show off how adept they are at typing matching braces, parentheses, commas, and semicolons. Writing a little lambda in C++ is an impressive display of manual dexterity… [&](…,…){… return …; }; Who would rather write that than let f x y = … in?

The only way I’d improve OCaml syntax would be to add something like python style list comprehension.

noelwelsh • 2 months ago

hardwaregeek • 2 months ago

deredede • 2 months ago

behnamoh • 2 months ago

giraffe_lady • 2 months ago

Yeah though iirc they did have to rewrite the runtime to get multicore so who knows what sort of tradeoffs they'd have had to make to have had it be like that from the beginning. A lot of what made it good (to the extent it was) in the 2000s was you got a very sophisticated type & module system and a fast compiler without giving up any runtime performance compared to its peers. I don't know if that would have been achievable with its dev resources alongside multicore early. I don't know that it wouldn't either though. Just a big what if all round.

I suspect a larger or at least comparable limitation was essentially pretending windows didn't exist for uh like thirty years. If you knew what you were doing you could cross compile for it but it was not easy. Getting a dev environment running on windows was basically impossible until like five years ago.

The syntax idk I don't have strong feelings about it. I initially recoiled like everyone else of course, but to me style and naming conventions are almost as important and on that front ocaml's are also among the worst in the world lol. Once you get used to it it's kind of endearing how fucked up it is.

StopDisinfo910 • 2 months ago

toolslive • 2 months ago

behnamoh • 2 months ago

> Its unfortunate the cleaned up syntax never took off, and that OCaml dropped the ball on multicore for over a decade.

It just shows the mindset of its devs was a little behind the realities of the industry, or they simply didn't care about concurrency.

In comparison, I like how Python always tries to be on top of things by exploring new PEPs.

sidkshatriya • 2 months ago

ackfoobar • 2 months ago

To be pedantic, concurrency in OCaml has been fine with lwt. Single threaded concurrency, not unlike how JS does it.

toolslive • 2 months ago

I'm going to be sarcastic _and_ on topic (you baited me): yeah, like how python completely solved concurrency & parallelism.

abathologist • 2 months ago

OCaml has had great concurrency support for years and years. Multicore is only a question of shared memory parallelism.

cube2222 • 2 months ago

Fingers crossed, though I’m not holding my breath for anything taking a sizable bite out of Python in the area of ML/DL.

OCaml seems to be a lovely language based on my limited experience with it. It’s a pity it’s not more popular.

lawnchair • 2 months ago

Terrateam is hiring ;)

evacchi • 2 months ago

so finally someone is actually putting ML in ML