a contest is a measurement ritual
an experiment is a measurement ritual
an election is a measurement ritual
at some point trying to wrap my head around the Standard Model, I realized most of science is about measurement rituals, and reccords of measurements, and repeating andor replication of measurement rituals, and it all basically revolves around measurement, instruments of measurement, keeping historical reccords of the measured data and of which instruments were used, and so on....
A lot of the "science" we do is experimenting on bunches of humans, giving them surveys, and treating the result as objective. How many places can we do much better by surveying a specific AI?
It may not be objective, but at least it's consistent, and it reflects something about the default human position.
For example, there are no good ways of measuring the amount of technical debt in a codebase. It's such a fuzzy question that only subjective measures work. But what if we show the AI one file at a time, ask "Rate, 1-10, the comprehensibility, complexity, and malleability of this code," and then average across the codebase. Then we get measure of tech debt, which we can compare over time to measure if it's rising or falling. The AI makes subjective measurements consistent.
This essay gives such a cool new idea, while only scratching the surface.
> it reflects something about the default human position
No it doesn't. Nothing that comes out of an LLM reflects anything except the corpus it was trained on and the sampling method used. That definitionally true, since those are the very things it is a product of.
You get NO subjective or objective insight from asking the AI about "technical debt" you only get an opaque statistical metric that you can't explain.
If you knew that the model never changed it might be very helpful, but most of the big providers constantly mess with their models.
Still RTFA but this made me rage:
> In fact, we as engineers are quite willing to subject each others to completely inadequate tooling, bad or missing documentation and ridiculous API footguns all the time. “User error” is what we used to call this, nowadays it's a “skill issue”. It puts the blame on the user and absolves the creator, at least momentarily. For APIs it can be random crashes if you use a function wrong
I recently implemented Microsoft's MSAL authentication on iOS which includes as you might expect a function that retrieves the authenticated accounts. Oh sorry, I said function, but there's two actually: one that retrieves one account, and one that retrieves multiple accounts, which is odd but harmless enough right?
Wrong, because whoever designed this had an absolutely galaxy brained moment and decided if you try and retrieve one account when multiple accounts are signed in, instead of, oh I dunno, just returning an error message, or perhaps returning the most recently used account, no no no, what we should do in that case is throw an exception and crash the fucking app.
I just. Why. Why would you design anything this way!? I can't fathom any situation you would use the one-account function in when the multi-account one does the exact same fucking thing, notably WITHOUT the potential to cause a CRASH, and just returns a set of one, and further, why then if you were REALLY INTENT ON making available one that only returned one, it wouldn't itself just call the other function and return Accounts.first.
</ rant>
call my a web programmer, but
isn't this actually an issue of a missing try-except (or a try-catch if not-python) block higher up the call stack?