Username ending with MIME type format is not allowed

wcoenen • 4 years ago

I'm a little confused about the issue description, because "mov" is not a MIME type.

Examples of MIME types: "text/plain", "text/html", "image/png" "application/pdf", "video/quicktime", ...

If I was prevented from using the username "wcoenentext/html", then I wouldn't really be bothered by that. (Although I might question the design decisions that would necessitate such a restriction.)

scblzn • 4 years ago

Hello,

I’m the author of the issue on Gitlab (small world, isn’t it ?)

Yes the message is confusing and I agree that .mov isn’t a MIME type but I was merely reporting the error message shown ( plus, they added .mov in their list of file types and had aliased it to .mp4 format, please see: https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/in... )

codetrotter • 4 years ago

> they added .mov in their list of file types and had aliased it to .mp4 format

That’s weird. Why’d they do that. They should make a separate entry for mov and associate it with video/quicktime

Guess it might be something related to https://stackoverflow.com/a/44785870 but like they point out, mov is a container format that can contain one of many different codecs used. And isn’t mp4 just a container too? Referring to mov files as video/mp4 seems straight up incorrect to me

robertony • 4 years ago

Modern mov files are just mp4 containers.

+2

codetrotter • 4 years ago

john_cogs • 4 years ago

A GitLab team member has opened a merge request to make the error message more clear: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/70374/...

splintercell • 4 years ago

Instead of saying with ‘a file extension’, It should say with ‘a reserved file extension’.

+1

john_cogs • 4 years ago

eyelidlessness • 4 years ago

Damn and I was about to start registering usernames like for json/bourne but I guess that’s overly specific.

waspight • 4 years ago

That will be the name of my next child.

anonymousiam • 4 years ago

Better than Bobby Tables.

https://xkcd.com/327/

pulse7 • 4 years ago

Please don't do that... Elon Musk's son is named "X Æ A-12", don't follow his steps...

+7

ben_w • 4 years ago

+4

bierjunge • 4 years ago

zodiakzz • 4 years ago

The word they're looking for is "file extension name".

qwerty456127 • 4 years ago

There is no such thing as a file extension name, it's a file name extension. It's called this way because file names could only be 8-byte strings initially and then this was extended with 3 extra byte places.

Neither part of the MIME type format even has to match any existing (commonly used) file name extension anyway. E.g. it can be `text/plain`. Even when it does it is just a coincidence (although very common), it actually references the format name (IIRC `image/jpeg` was used even when almost nobody were using `jpeg` for the extension and the convention was to use `jpg`).

kps • 4 years ago

> * file names could only be 8-byte strings initially and then this was extended*

At least part of that is not true. The ‘popular’ MS-DOS 8.3 form derives via CP/M from DEC OSes which used 6 character file names and 3 character file types, due to their use of RAD50¹ to fit 3 characters in a 16- or 18-bit word. The type field always existed, so the word ‘extension’ most likely refers to its presentation on the end of the file name, rather than an addition to a previous format.

¹ https://en.wikipedia.org/wiki/DEC_RADIX_50

pastage • 4 years ago

This is interesting because compounds are mostly written together in other languages.

Never thought about it but file extension name really is a word. Someone replied saying this is three words, but it is not is it? It's an open compound word or maybe a "set phrase", I wanted to call it an idiomatic expression but that was clearly wrong.

hunter2_ • 4 years ago

The Wikipedia articles for "Set phrase" [0] and "Compound (linguistics)" [1] actually don't offer much of a distinction, so it's hard to say which of those is correct. Regardless, a compound with spaces ("open," as you said) is multiple words, not one word.

[0] https://en.m.wikipedia.org/wiki/Set_phrase

[1] https://en.m.wikipedia.org/wiki/Compound_(linguistics)

iamtedd • 4 years ago

Who are you, Kath Day-Night?

iamtedd • 4 years ago

I wanted to find an example from Kath & Kim, but couldn't find any on Youtube that wasn't just the "Look at moi" set-up to the actual joke. I'm sorry.

pwdisswordfish8 • 4 years ago

> I'm a little confused about the issue description

Not helped by the fact that even with explicit cajoling* to give steps to reproduce, the reporter wrote:

> Steps to reproduce

>Try to login/create an user in Gitlab (on-premises/Gitlab.com) where the username ends with a MIME type format

Goddamit, those are not steps to reproduce! Has the current era of "social coding" and its terrible software development practices completely turned people's brains to mush?

(In fact, such cajoling shouldn't even be required. If you don't know you need to provide STR without someone going to the lengths of coddling you by going out of their way to create a template, you don't have any business using a bug tracker.)

svrtknst • 4 years ago

ok boomer

IIsi50MHz • 4 years ago

The provided step-to-reproduce (not steps, since there's only one) seems to be regarded by the reporter as an automic operation (indivisble; a single step).

The other person is, I think, trying to express that it can potentially be done more than one way, so additional steps are required.

From a QA perspective, I prefer not to guess what the reporter might have intended. It's much better to have tons of detail, but I can sympathise with being the user and _thinking_ you have been totally clear and am know I've sometimes done this myself.

(( Unrelatedly, "ok boomer"? Since that seems such a non sequitur, I'll take in another unrelated direction and raise you an "ok athena" and "waddaya hear, starkuck?". ))

pwdisswordfish8 • 4 years ago

You have been trolled. The entire point of saying "ok boomer" here is to, as economically as possible, be dismissive while getting the other side to spend disproportionate energy defending the the thing being dismissed.

> From a QA perspective, I prefer not to guess what the reporter might have intended

There's not really any other perspective. That thing which you say you "prefer" is much more than a preference. It is the only reason why the bug tracker is configured to even ask for STR. If you have a "no solicitors" sign on your front door and a solicitor comes knocking anyway, or you have your door locked and a burglar climbs through a broken window and and takes all your stuff, you wouldn't respond by saying, "from my perspective, it's really disruptive and time-consuming to deal with solicitors who ignore the sign" or, "try to understand that from my perspective, it's really inconvenient when you take my things, because I have to work and spend money to replace them when I could have used that time and money doing something else." To do so tacitly legitimizes an illegitimate position held by the other.

Please look up "DARVO" to understand why explaining yourself like this is a bad idea.

pwdisswordfish8 • 4 years ago

What, did that make you mad? Why're you mad?

sytse • 4 years ago

This is a very confusing issue description which is caused by a confusing error message. This comment https://news.ycombinator.com/item?id=28540665 does a great job explaining the context. TLDR; you can't have a username end in .filetype because it might cause the user profile page to not load. The limitation is _not_ related to injection attack prevention, that would be concerning (bobby tables xkcd https://xkcd.com/327/ ).

duskwuff • 4 years ago

And it was exacerbated by another bug which was causing the absence of a period to be ignored, so any username ending in a recognized filetype was blocked (e.g. "AsiMOV" in the example, or "MaasTIFF" in the comments).

I initially suspected that a regex was involved and someone forgot to escape the period, but it looks like that wasn't even the case -- the erroneous code was literally checking if the username ended in any recognized extension.

https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65954/...

a-dub • 4 years ago

it seems pretty obvious that they mean any file extension registered to a known MIME type.

gpvos • 4 years ago

It's easy, basic, and very important for clarity to use the correct terminology in this case. A MIME type is really something different.

a-dub • 4 years ago

"ERROR: The Gitlab server has rejected the proposed username, as it ends in the same suffix as a file extension that is registered to a MIME type in the Ruby runtime under which the server runs. This list is quite long, and possibly difficult to retrieve, so we will not list it here, but you can find a list of extensions commonly used if you Google for MIME. Alternatively, if this makes no sense at all, ask a local alpha geek and they should be able to help. We understand this is weird, but the reasons for doing so are currently embargoed as they potentially have wide ranging security or stability consequences for a large number of installs. If you wouldn't mind, please do us a small one and keep quiet while we have a chance to prepare and distribute a patch without forcing anyone to forego any nights of sleep 18 months into a global pandemic complete with associated societal fracturing and potential economic collapse. Thank you for your cooperation on this easy, basic and very important matter."

fixed it!

p49k • 4 years ago

It’s unrealistic to expect everyone to be able to know and use terminology perfectly. The description itself was well-written and that’s what’s important in terms of finding/fixing the issue.

+2

DonHopkins • 4 years ago

rjmunro • 4 years ago

The bug reporter merely copies the incorrect error that gitlab gave him; The issue here is that someone working in login / security of gitlab doesn't know what a mime type is. That is extremely worrying - it's not a part of the code where you can afford to be sloppy.

It's a very odd error. Apparently .nro is a file extension used by the Nintendo Switch video game console; .o is obviously the output of compilers, so I'm not sure why my username wasn't rejected. Maybe it would be if I tried to register now.

jfrunyon • 4 years ago

No file extension <-> MIME type "registry" exists. File extensions exist completely outside of MIME. Many file extensions do not correspond to a registered MIME type, or in many cases even a de-facto one (other than application/octet-stream or text/plain).

It seems pretty obvious - based on the failed username in question, and to someone with fairly deep technical knowledge - that they mean anything they consider a file extension. Which is not an excuse for this marvel of awful UI slapped on top of a poorly-thought-out workaround (for some unknown vuln (that's been patched for over 2 months and is still private? quite strange for an "open" company eh?).

defanor • 4 years ago

FWIW, the IANA media type registry [1] lists "File extension(s)" under "Additional information" for some of the media types, so it may make sense to speak of filename extensions associated with registered media types. Though given the initial odd wording, it could be anything.

Edit: as for what's actually used, looks like [2] it's the ruby mime-types gem [3], which is based on both IANA registry and various other recommendations [4], AIUI.

[1] https://www.iana.org/assignments/media-types/media-types.xht...

[2] https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65954/...

[3] https://rubygems.org/gems/mime-types

[4] https://github.com/mime-types/mime-types-data

+1

jfrunyon • 4 years ago

yuliyp • 4 years ago

Officially no such registry exists. In practice, Apache does have such a registry by default: https://svn.apache.org/repos/asf/httpd/httpd/trunk/docs/conf... and other systems do use that mapping or a similar one.

+1

jfrunyon • 4 years ago

hnlmorg • 4 years ago

In fairness, it is Gitlabs wording the issue reporter is using. Check the error message.

Grollicus • 4 years ago

Gitlab recently exchanged the "WIP" prefix for merge requests (Work in Progress = started to do something but didn't complete it yet) for "Draft", which has connotations of throwing the draft/sketch away to build the final product.

Which is definitively not what is meant there. But I think it shows that Gitlab is not a company I'd go to if I wanted linguistic precision.

+4

gls2ro • 4 years ago

d1sxeyes • 4 years ago

Unfortunately, here, I think the problem is that English is not very clear.

As a native English speaker, the first time I encountered the word 'draft' was at primary school. We used it to describe a piece of writing where presentation was not the focus, instead, content and accuracy in terms of spelling, grammar, and punctuation would be the focus. Once the drafts were complete, we would 'copy these up' in our neatest handwriting.

However, if I prepare a 'draft' of some document or other for my boss, I expect it to be essentially an unapproved version of a final document, perhaps needing some minor modifications before release, but also perhaps not.

Although personally, I would use 'sketch' to mean something disposable which illustrates a more perfect version, my grandmother is an artist, and she refers to the initial drawings she makes on the canvas as 'sketches', which she then paints over in more detail.

Essentially, I don't think there's a big difference between draft and sketch - both could (in my opinion) represent either a version which will be discarded or which will be developed further.

Overall, I think here, the 'Work In Progress' label is the clearest and least likely to be interpreted differently by different users.

Grollicus • 4 years ago

> I am not a native English speaker (and I don't work at Gitlab) but I am curious and interested in using proper terms, as I think as developer that naming is a very important skill.

That is it for me too and why I think this change is so annoying.

WIP means something is being worked on. [0]

Draft can mean the same, but it also has a bunch of other possible meanings. Note that if you search in both of your sources, you'll find "sketch" as an explaination. It can mean the same, but it can also mean a bunch of other things. It's a strictly worse name.

[0] https://en.wiktionary.org/wiki/WIP

dmurray • 4 years ago

Yeah, I think this is fair.

A draft in the visual arts (drawing, painting), is typically abandoned, like a sketch. Perhaps it uses a different medium to the final version, and in any case can't easily be adapted.

A draft in writing is typically incrementally improved, or at least we think of it that way now that we write on computers. We don't need to start again even for substantial changes like adding a new paragraph.

gerdesj • 4 years ago

Draft is also the depth of water displaced by a boat/ship.

Draught (sounds the same as draft) is the wind through a crack or a type of beer. A door or window might be draughty but a beer wont.

Drought looks similar but sounds like "drowt" and is what you get when there is a long period of time without rain.

Be careful with draft, draught and drought!

capableweb • 4 years ago

> which has connotations of throwing the draft/sketch away to build the final product

I think it's only software developers who urge others to first write a draft, then throw it away and start working on the real thing. Usually, a draft precedes the "real" version and it's a status attached to something. Eventually, a "draft" becomes "published" or something similar.

Authors, scientists, email writes, report creators, movie/music producers all create drafts that (maybe) eventually become the real thing, I don't think many of them throw away the draft but rather work on the draft until it's not a draft anymore.

justinclift • 4 years ago

> ... which has connotations of throwing the draft/sketch away to build the final product.

Interesting. What industry does it that way?

eurasiantiger • 4 years ago

GitHub is using ”draft”, maybe that is the reason.

hibbelig • 4 years ago

> it seems pretty obvious that they mean any file extension registered to a known MIME type.

I guess I'm dense, but I actually thought it's about users such as Mr. Joe R Text/Plain. When I read about "mov" in the actual issue, then it became clear.

DonHopkins • 4 years ago

Since file extensions can be any three or fewer (or even more) valid characters, then no string ending with zero to three characters, in other words, no user names are valid.

AlfeG • 4 years ago

I guess author of ticket were referring to internal type name mime_type

id5j1ynz • 4 years ago

This seems on par with the general GitLab-style. Is anybody else getting a bit frustrated with them?

They keep on having high-severity security bugs being fixed every month (e.g. auth checks not being done everywhere). Then there's all these odd edge case bugs everywhere.

As an outsider, it just seems to me that GitLab isn't being engineered in a principled way: on sound abstractions and with separation of concerns (e.g. auth should be some universal middleware, not ad-hoc per call). Just really basic stuff.

john_cogs • 4 years ago

GitLab team member here. I'd like to add some additional context to my previous comments [1][2].

Due to a security concern in which a profile containing a file extension would not load [3], we do not allow usernames that end with file extensions (ex: .mov). As noted by many folks here, these are associated with a MIME type but are not MIME types themselves. It is not related to preventing an injection or any such attack vector.

The error message for this check incorrectly included MIME type rather than file extension. This has been updated [4].

Additionally, there was an issue with how the actual check as it did not include the leading dot. The leading dot was added to the check in a subsequent MR [5].

Thanks for all the feedback.

1 - https://news.ycombinator.com/item?id=28535739

2 - https://news.ycombinator.com/item?id=28538166

3 - https://gitlab.com/gitlab-org/gitlab/-/issues/26295

4 - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/70374/...

5 - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65954

Arnavion • 4 years ago

>Due to a security concern in which a profile containing a file extension would not load [3], we do not allow usernames that end with file extensions (ex: .mov).

Why did you not fix your routing engine to not consider file extensions where the username / group name should go?

bogwog • 4 years ago

gitlab profile URLs are `gitlab.com/<username>`, so a user with the name "dashboard.html" would have the URL `gitlab.com/dashboard.html`, which obviously conflicts with the existing dashboard.html file.

Besides blacklisting certain usernames or breaking a bunch links to profiles, how would you fix that?

EDIT: IIRC, github has the same issue, but they have profiles as lower priority. So if your username conflicts with an existing URL, your profile page doesn't work.

smallbizdev420 • 4 years ago

Isn't the problem arising because GitLabs files are in the global namespace? If the user is the namespace for all their files, and GitLab files were under a Gitlab user, this wouldn't be a problem. Under the current implementation, every time you add a file, you have to make sure its name doesn't conflict with an existing profile. And a username has to avoid conflict with all present and future filenames. Mutual pain doesn't seem like a good way forward.

jrochkind1 • 4 years ago

I can think of a few possible ways to fix it.

1. deny-list only usernames that are actually existing conflicts

2. Change the URL for only usernames that have conflicts, to `https://gitlab.com/u/<username>`.

3. Change the URL for _all_ usernames to `gitlab.com/u/<username>` as this collision points out the flaw in the original URL design in the first place, because of possible collisions. 301 redirects could of course be used for any non-colliding usernames.

I am now wondering how _github_ takes care of it though. Github also has `github.com/<username>` urls. What does it do with collisions? Github pages don't even all end in `.html` or contain a `.` at all, so gitlab's particular solution would not work. For instance, there is a page `https://github.com/topics`. What happens if you try to create a github user called "topics"?

If I try to create one, it says "Username 'topics' is unavailable." Same for say `marketplace` or `trending`. Perhaps they've deny-listed only actually-existing github urls? That does seem tricky, whenever they want to create a new top-level /page on github, they can only do it if there isn't already a github account with that name?

But if as someone else says `/dashboard.html` is just a weird non-canonical alternate for `/dashboard`, which already had to be reserved, maybe gitlab is already doing (1) anyway? Then why do they need to also deny any username with ending in any valid extension? Unclear.

It still makes me wonder if they have a routing precedence problem, which they worked around by just forbidding any username that triggered it, instead of fixing the actual issue.

jhugo • 4 years ago

In what situation would someone be requesting `https://gitlab.com/dashboard.html`? When I go there, I get the exact same page as I get at `https://gitlab.com/`, why was it necessary to support both URLs? Now they're stuck with it of course, if anyone actually uses /dashboard.html, but surely they could just special-case filenames that actually exist, just like they presumably special-case URLs they use like /help already. It doesn't seem necessary to blanket-ban anything with a file extension.

remram • 4 years ago

Or gitlab.com/dashboard. GitLab, GitHub, etc already have a need to reserve specific usernames (like `org`, `settings`, `projects`, `new`, `explore`, `marketplace`, `help`, ...). Since you already have to blocklist specific names not containing extensions, I really don't see how banning extensions help them.

Hopefully we'll know more once their security ticket becomes public.

Arnavion • 4 years ago

For dashboard.html, sure. Fixing that requires making a breaking change to URLs.

My comment, and the issue that was submitted here, is about *.mov

boleary-gl • 4 years ago

GitHub doesn't allow a "." or really an special characters besides "-" and "_" in usernames

symlinkk • 4 years ago

Band aids on top of band aids. Respect for being honest and open about it though

stephenr • 4 years ago

I prefer the term “lipstick on a pig”.

bob1029 • 4 years ago

Why should the username matter? In my systems, I could have an insane URL like...

  https://myservice.com/my.super.duper.crazy-ass.user.name.pdf.exe

and still have it return a proper HTML document that covers that user's profile page. Hell, the username could be some insane zalgo-tier shit and still function properly.

I see some comments defending arbitrary "bandaid" architecture and I think that this is not defensible for something the scale of GitLab. This is basic HTTP stuff.

boleary-gl • 4 years ago

Of note, GitHub doesn't allow periods in usernames either. I'm not a Ruby expert but I wonder if file extensions give some specific Ruby gotchas that means both GitLab and GitHub operate this way.

gargron • 4 years ago

It's not a Ruby gotcha, it's a Rails routing gotcha. You can specify alternative formats on any path, e.g. you can access /path to get HTML (or whatever the "default" format for that controller method is) or you can access /path.json or /path.xml and if the controller method specifies handlers for those formats, you get that format. So if you allow a username like "john.doe" and the route is something like /john.doe then Rails will interpret the "john" as the ID part of the path and "doe" as the format part of the path. You can override this in your routes to support periods but then you do lose the capability of accessing alternative formats which sometimes can be useful.

dnsmichi • 4 years ago

Thanks. Now I also get a better understanding of why .patch for MRs or .keys for user names as file extension work on both Rails platforms. I always found these file extension hacks very useful for quick access and automation.

Examples:

https://gitlab.com/gitlab-org/gitlab/-/merge_requests/70427....

https://gitlab.com/dnsmichi.keys

e12e • 4 years ago

Ruby? No. Rails? Perhaps. (Both github and gitlab are built with ruby on rails).

For an old post on a somewhat related topic, see:

https://ryanbigg.com/2009/04/how-rails-works-2-mime-types-re...

I could imagine the mix of rails #respond_to and "file extensions" at the end of urls might make a mess (think /users/profile/smith.html vs /smith vs /smith.json vs /smith.txt - essentially what might have been /smith?format=json etc).

Ed: current documentation: https://apidock.com/rails/v6.1.3.1/ActionController/MimeResp...

boleary-gl • 4 years ago

I meant to say Rails.

Note to self: Never say Ruby when you mean Rails on HN

thrwn_frthr_awy • 4 years ago

Some apps like to allow usernames to be used as sub domains and so periods are not allowed.

dnsmichi • 4 years ago

Hi, how would you address this in GitLab's code? Maybe you'd like to create a merge request with suggested fixes :)

https://news.ycombinator.com/item?id=28540665 has all URLs and issues available to get started in the code.

vultour • 4 years ago

It’s not his job to fix a paid product.

bob1029 • 4 years ago

I would begin by familiarizing myself with the Content-Type HTTP header:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Co...

As for actually performing this change myself, it would take me quite some time to grok the codebase.

Maybe I get bored tonight and see how hard this would be to resolve.

pechay • 4 years ago

Once had my Australian production system go down because a js plugin we were using had added .au to its list of media types. Integration had a different TLD. Response I got from the author was 'LOL!'. :)

CodesInChaos • 4 years ago

Why is the application treating a TLD like a file extension?

detaro • 4 years ago

probably because some broken code matched on the end of the URL, without checking if it was just the naked domain?

hinkley • 4 years ago

Every programming language has a library to parse urls.

Please, for the love of your own mothers, people: stop pretending like you know how to parse urls like they're strings. You don't. And even if you can, you won't do it right every time.

It's been demonstrated over and over again.

Just use the already tested url facilities to give you the host/path/query parameters

cmpb • 4 years ago

I'm going to have to find a way to work "naked domain" into my lexicon today, thanks!

detaro • 4 years ago

On second thought, I actually used it wrong above: usually, naked domain specifically refers to the registered domain, i.e. without any subdomains.

quesera • 4 years ago

Yet another reason Australia should have kept TLD .oz!

ncann • 4 years ago

Looks like the fix restricted the check to "usernames that end with dot and MIME type"

Still, what is the attack vector?

anentropic • 4 years ago

Probably: some web frameworks do content negotiation by appending a content type like .json to the end of the url

Not sure if it's an attack vector per se, or just that the behaviour is incompatible with allowing usernames containing . and then having urls where the username is the last segment of the url

seems like a badly designed url scheme :)

jfrunyon • 4 years ago

> some web frameworks do content negotiation by appending a content type like .json to the end of the url

This has always disturbed me, considering that HTTP has had content negotiation for ... oh, basically its entire history [https://www.w3.org/Protocols/HTTP/1.0/spec.html#Accept].

Macha • 4 years ago

For non-programmatic usage and verification, the extensions can be easier?

On a similar topic HIBP allowed people to request versioning via a custom HTTP Header, a Accept Content-Type, or a version segment in the URL path and approximately everyone went with option 3.

+1

jfrunyon • 4 years ago

gkop • 4 years ago

Yes, the content negotiation you describe is a very longstanding default behavior of Rails. It should probably be made opt-in rather than opt-out in the next major version.

walty8 • 4 years ago

But the in the screen capture of article, the user name is actually 'issac.asimov', i.e. the mime type does not immediately follow the dot.

nobody9999 • 4 years ago

>But the in the screen capture of article, the user name is actually 'issac.asimov', i.e. the mime type does not immediately follow the dot.

A variation on the Scunthorpe Problem[0] then, eh?

[0] https://en.wikipedia.org/wiki/Scunthorpe_problem

whizzter • 4 years ago

Somebody probably put in a regexp with .mov$ , however for regexps the dot (.) matches everything (and $ matches end) so the i in asimov is eaten regardless and then the rest of the match succeeds.

chippiewill • 4 years ago

You can see the fix they made in the linked MR.

It wasn't a regex, they just did a generic "ends with" check.

iechoz6H • 4 years ago

Perhaps the sub-clause is redundant there?

'The problem was named after an incident in 1996 in which AOL's profanity filter prevented residents of the town of Scunthorpe, Lincolnshire, England, from creating accounts with AOL, because the town's name contains the substring "cunt".'

nobody9999 • 4 years ago

>'The problem was named after an incident in 1996 in which AOL's profanity filter prevented residents of the town of Scunthorpe, Lincolnshire, England, from creating accounts with AOL, because the town's name contains the substring "cunt".'

Right. Regardless of the specific pattern matching function, in both cases, the results were both incorrect and unwanted. Which is why I consider this instance to be a variation on the same issue.

ajkjk • 4 years ago

That was before the fix.

boomskats • 4 years ago

This doesn't look like a security issue, unless I'm missing something.

paxys • 4 years ago

Definitely a security issue.

- The merge request which originally added this check is inaccessible (https://gitlab.com/gitlab-org/security/gitlab/-/merge_reques...)

- In the issue comments the Gitlab employee says "Sorry, I cannot go into details right now. I will link the issue here once it goes public, is it ok?"

nine_k • 4 years ago

It could maybe potentially be exploited in a very interestingly crafted email, where there's link to download something (e.g. the source tarball, or a build artifact) with an URL containing the username, or being otherwise close by, so that the downloaded file would be interpreted differently. But I'm not creative enough at this hour to suggest a working exploit.

dolmen • 4 years ago

I suspect a case of impersonating a user which doesn't have the suffix. Ex: create user "toto.mov" to takeover some resources of user "toto".

amjd • 4 years ago

Maybe it's something to do with a MIME sniffing attack. The user profile URL may be detected as a different MIME type by the browser based on the extension: https://gitlab.com/myname.js

I'm not sure how one could exploit it though...

citrin_ru • 4 years ago

AFAIK in MIME sniffing IE guessed file type based on its content [1], not URL. And given that IE is rarely used nowadays not sure if it still relevant.

[1]: https://docs.microsoft.com/en-us/previous-versions/windows/i...

Update: tested this link in FireFox 92 - it still performs sniffing in 2021: http://www.debugtheweb.com/test/mime/script.asp (based on the content, not extension)

foota • 4 years ago

I've played with this before. A correctly implemented mail library should handle e.g., subject lines that contain SMTP control characters. I developed a lengthy repro for an email parsing issue in an ancient version of some java email library that contained a truely horrendous parser, only to find out that the library had been updated internally recently :-)

a-dub • 4 years ago

MIME types are used all over the place:

1) web servers, browsers, proxies 2) graphical os shells 3) email

every file a webserver returns has a mime type in the header, and that is how the browser knows how to present it.

+1

mmis1000 • 4 years ago

dolmen • 4 years ago

The ticket here reveals some blacklisting of known file extensions (attached to a know list of MIME type), not MIME types directly.

LewisVerstappen • 4 years ago

Here's a link to the fix:

https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65954/...

tankenmate • 4 years ago

I get the sneaking suspicion that this is a case of Ruby's magic being slightly too magic; it's a problem I have tripped across in the past.

Slightly tangentially reminds me of the "More Magic" switch of GLS fame.

zorr • 4 years ago

There is not much ruby magic in that code. This is just a naive use of data from another module.

Somewhere in the Gitlab code base there is a MIME_TYPES map with common extensions as the map key. No idea what it is used for but that module is very likely the target of a recent security issue.

The first fix to combat the "publicly unknown" vulnerability was to prevent usernames ending with any of the keys in the MIME_TYPES map using a simple "ends_with" strings check. Of course the map keys did not have periods so the ends_with would also match "Asimov" with the "mov" suffix.

The second fix in this PR is to extend the ends_with check to add an extra dot.

The actual vulnerability is still unknown but I suspect it's something like an intermediate component that performs special handling based on interpreting URLs and that could bypass security/ACL checks.

+1

LewisVerstappen • 4 years ago

+2

jeltz • 4 years ago

pavon • 4 years ago

Indeed the MR template does not have the security box checked.

kinix • 4 years ago

My guess is that the username is used in a url somewhere? So browsers might try and interpret it as a file

+2

numlock86 • 4 years ago

paxys • 4 years ago

TL;DR for those wondering:

- There was a yet-undisclosed security vulnerability in Gitlab usernames

- Staff member made a change to disallow usernames ending with `Mime::EXTENSION_LOOKUP.keys`, which I assume is a set of recognized file extensions (hidden – https://gitlab.com/gitlab-org/security/gitlab/-/merge_reques...)

- This was overly broad since it caught a lot of common names (like "asimov") (https://gitlab.com/gitlab-org/gitlab/-/issues/335278)

- The check was updated to additionally look for a "." before the extension (https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65954)

OskarS • 4 years ago

> Staff member made a change to disallow usernames ending with `Mime::EXTENSION_LOOKUP.keys`

This is a hell of a thing to commit and pass code review in 2021 on a project like GitLab. I understand that the staff member was fixing a security issue and was probably not thinking deeply about the ramifications, but even so. How many "Falsehoods programmers believe about names" articles do we need?

mavhc • 4 years ago

That's usually the exact time you want to think deeply about the ramifications

OskarS • 4 years ago

I don't want to be too harsh on the programmer, everyone makes mistakes. It's easy to see how a person focused on a security issue with user names makes a quick fix without thinking through how this will affect account creation. As programmers, we have to make like 7,000 decisions every day, you're bound to fuck up some of them. This is a pretty big one, though.

The bigger question is how this passed code review and testing.

ryandrake • 4 years ago

That's a good question, but I think it's still more important to understand why a root cause analysis wasn't finished. They got the cause, but not the root cause. It's like seeing your car battery dead and just replacing the battery, without understanding whether the alternator works or whether the alternator belt is broken.

hinkley • 4 years ago

What's the security issue though? Not disambiguating filenames and user names in URLs for the site? That's a hard problem to fix after the fact, which is why we call it a Rookie Mistake.

Use unique prefixes for users, groups, projects, assets in your webapp design, kids.

Maxion • 4 years ago

Surely it would be easy to run the new rules against random-list-of-usernames-found-through-google before pushing to prod? Or perhaps the security issue was deemed so great that they needed a fix out yesterday?

Aeolun • 4 years ago

I mean, they’re still not telling us what it was and it’s been fixed for a month. Must’ve been pretty big.

akie • 4 years ago

If anyone ever says that software engineering is completely unlike plumbing, I will point them towards your comment. All systems, however nicely architected, are full of duct-tape solutions like this.

andix • 4 years ago

Professional plumbers don't use duct tape for fixing leaks ;)

capableweb • 4 years ago

As someone who has traveled to some of the lesser traveled places on this planet, they do too! Sometimes even worse techniques are being used to fix things.

Grollicus • 4 years ago

Are they though? What I can think of (broken upstream reverse proxies that do mime type inference by filename) would warrant a WE_USE_BROKEN_LEGACY_SHIT_UPSTREAM config flag so that it doesn't get in the way of normal users.

So I'm probably missing something and I'm really curious for the underlying vulnerability.

akie • 4 years ago

I have no clue about the underlying issue, but I'm guessing it's occurring on a boundary or an interplay between two systems.

Something like "the username can be part of the URL, and if the URL contains .mov, some browsers will misinterpret this and assume it's a movie file, leading to bad things™".

Or: "the username is sometimes used as a folder name, and our syncing software contains rules to exclude certain file extensions, so these folders were never synced, which lead to issues on production servers"

I'm guessing it's something along these lines. Something that you control, but not really, leading to these kind of haphazard workarounds.

john_cogs • 4 years ago

GitLab team member here. This issue provides additional context on the changes: https://gitlab.com/gitlab-org/gitlab/-/issues/26295

cesarb • 4 years ago

That seems to be a consequence of what IMO is an unfortunately common bad design: having user-controlled data like usernames as the first path component (without a prefix like ~). There are many things which are expected to be found at the root of the path (the classic example being robots.txt, but there's also favicon.ico, .well-known, and probably others; I vaguely recall that IIRC Flash used a fixed filename in the root for cross-domain access control), and you never know when a new one will be invented by someone (though .well-known is supposed to contain the spread of these "magic" names).

captn3m0 • 4 years ago

crossdomain.xml: https://stackoverflow.com/questions/4174317/what-is-crossdom...

janpot • 4 years ago

Don't know the details of the vulnerability, but from the comfort of my armchair, it sounds like it's being patched in the wrong location. e.g. It's better to fix an XSS issue by escaping the input, rather than restricting the values it can take.

boleary-gl • 4 years ago

It doesn't represent any XSS - the details are here: https://gitlab.com/gitlab-org/gitlab/-/issues/26295

janpot • 4 years ago

I was just using XSS as an analogy.

DonHopkins • 4 years ago

>- Staff member made a change to disallow usernames ending with `Mime::EXTENSION_LOOKUP.keys`, which I assume is a set of recognized file extensions (hidden – https://gitlab.com/gitlab-org/security/gitlab/-/merge_reques...)

If only github had an established system and procedure for doing code reviews before releasing security fixes...

SSLy • 4 years ago

s/hub/lab

sam0x17 • 4 years ago

Further proof that gitlab is just super jank. What on earth are they doing inspecting usernames to see if they end in a file extension? What horrendous vulnerability is this band-aiding?

That they don't seem to understand what a MIME type is just adds to this perception.

bogwog • 4 years ago

It's not that weird if you look into it. Imagine a username "index.html", so the URL to their profile is `gitlab.com/index.html`.

Github has the exact same issue, and they solved it by just restricting usernames to alphanumeric characters and hyphens (but IIRC, there are some existing profiles with unreachable URLs from before they made the change).

Gitlab decided to just restrict known file extensions, which IMO is not the best idea, but it's also not that unreasonable.

> That they don't seem to understand what a MIME type is just adds to this perception.

That's not fair, people misspeak/mistype things all the time.

mdoms • 4 years ago

What would be the problem with gitlab.com/index.html username? There's no rule that a web server must serve up index.html from some /public/ folder just because that's the URL. This screams poor engineering to me.

sam0x17 • 4 years ago

I am in complete agreement with this statement. Unless they're running on an ancient LAMP stack or something the web server isn't going to do anything with the .html extension (or any other extension), and browsers are supposed to look at mime type.

thehappypm • 4 years ago

Most sites would have the URL to their username be something like gitlab.com/profile/index.html. The URL structure here is the problem.

lol768 • 4 years ago

Yeah, this didn't leave me with a good impression of their security and development practices (not that I really had one anyway after they accidentally deleted their production PostgreSQL data directory and did a fairly poor job of responding to the situation).

This change absolutely seems like the wrong place to fix any real security vulnerability and the fact that it affected a bunch of legitimate usernames is the icing on the cake.

city41 • 4 years ago

I've never used gitlab but this issue has definitely caused me to take pause. It sort of feels like "not allowing 'OR' in strings to prevent SQL injection."

dnsmichi • 4 years ago

Hi, the error message is a bit confusing - certain file type extensions may cause the user profile page not to load. Disallowing specific extensions in the username helps prevent that problem - it is not to prevent an injection attack or similar.

This comment https://news.ycombinator.com/item?id=28540665 helps with more content and issue URLs including the problem discussion.

lol768 • 4 years ago

> certain file type extensions may cause the user profile page not to load

There is no reason for this. They should always hit the same controller and be served as text/html. Why would the username ever influence this?

If that breaks an existing page then you either shouldn't have allowed the username to be created with the same name as an existing page (exactly what GitHub does) or shouldn't have created a page with the same name as a username [depending on which came first!] - or better yet, had them in a separate namespace so they cannot conflict in the first place.

arianvanp • 4 years ago

That you change the Content-Type of a page based on the URL is quite a vulnerability on itself no?

I don't think browsers infer meaning about file types based on the URL. The Content-Type is always what is being used.

If you have backend side code that maps URLs Content-Type header mime types. Don't. Instead simply always return text/html for user profiles. Then the extension shouldn't matter.

city41 • 4 years ago

The analogy to SQL injection is a user's input (what they chose their username to be) is directly influencing how the system works.

boleary-gl • 4 years ago

GitHub doesn't allow a "." in usernames at all.

jrochkind1 • 4 years ago

> @brabant.benjamin Sorry, I cannot go into details right now. I will link the issue here once it goes public, is it ok?

— (July 14) https://gitlab.com/gitlab-org/gitlab/-/issues/335278#note_62...

Has anyone traced down an explanation of the (presumably security-vuln-related) thing that was going on, motivating the original restriction?

I would think it would be public now, with the restriction removed, but maybe not?

It does seem like an unfortunate UX to exclude the lastnames of millions of people as usernames.

john_cogs • 4 years ago

Please see https://news.ycombinator.com/item?id=28540665 for more detail and links to related issues and MRs.

fregante • 4 years ago

My guess is that they were using /.mov$/ to check the username, which is missing an escape.

ajkjk • 4 years ago

Nope: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65954/...

just a regular .ends_with()

banana_giraffe • 4 years ago

So, .html is bad for a username, but .Html is ok?

What is the reason for this?

marwis • 4 years ago

Rails, which GitLab is based on, uses suffixes to select content renderer (such as .html, .json - apparently in case sensitive manner), also I think it will serve underlying files in some cases (e.g. the example of dashboard.html) given elsewhere.

The proper fix is to disable this mechanism at least for the username segment of gitlab path but perhaps GitLab developers are too lazy or unaware or just in rush.

ajkjk • 4 years ago

Probably it's a spot fix for a security problem and not what anyone thinks is ideal.

Pxtl • 4 years ago

The fact that usernames can break anything speaks to something profoundly wrong in general. Why should usernames be leaking into URLs instead of a surrogate key?

I mean, barring some tricks that could break layouts during rendering if people got really creative with them, why isn't any valid non-blank Unicode string a valid username?

Reminds me of this xkcd:

https://xkcd.com/1700/

SSLy • 4 years ago

>why isn't any valid non-blank Unicode string a valid username?

it's because HTTP is stringly-typed

malinens • 4 years ago

gitlab naming restrictions are also ridiculous for repositories. For example our files repository needed to be named filez. There is another repo which we needed to rename while migrating to gitlab but I dont remember it's name now. This has made many issues for us because repo name is no more related to project name...

EE84M3i • 4 years ago

Something that hasn't been mentioned, this could also be related to defenses against web cache deception attacks.

em3rgent0rdr • 4 years ago

Reminds me of https://xkcd.com/327/

"Did you really name your son Robert'); DROP TABLE Students;-- ?"

"Oh, yes. Little Bobby Tables, we call him."

RobertWorkbench • 4 years ago

came here to downvote this

InsomniacL • 4 years ago

why would this get a single downvote? i suspect the underlying reason to the restriction is somewhat related to sanitising input.

Grollicus • 4 years ago

I suspect because Isaac Asimov was named before any of these problems with his name were a thing and his name is not something artificially constructed to cause problems with buggy computer systems.

FroshKiller • 4 years ago

I did not downvote, but I don't like reading the same xkcd references over and over, so I can imagine others feel the same way.

alfiedotwtf • 4 years ago

Sounds like a WAF is getting in the way. Wikileaks had the same issue years ago when you couldn't search for programming language binaries e.g "Ruby" or "Perl" etc

kzrdude • 4 years ago

What's waf?

Grollicus • 4 years ago

https://en.wikipedia.org/wiki/Web_application_firewall

Basically something to extract boatloads of money from enterprise customers by annoying THEIR customers so they can't write "<script>" in texts in their application.

Or, less tongue-in-cheek, a way to harden web applications against known attack patterns like sql injections or xss-attacks. As they work on pattern recognition and don't know anything about your application they sometimes get in the way. But they'll probably check some box for some security audit so they're used.

Cloudflare for example offers one at https://www.cloudflare.com/waf/

saint_abroad • 4 years ago

> But they'll probably check some box for some security audit so they're used.

A WAF is useful for when a zero-day is found for that legacy application you just can't get patches for anymore because the team has "moved on".

a-dub • 4 years ago

oh weird. i thought they also did stuff like parse and reconstruct requests to try and catch any funny business and centralize/add ease for things like ratelimiting and fail2ban for webapps. looking at this one, it appears not.

bigiain • 4 years ago

Web Application Firewall.

https://en.wikipedia.org/wiki/Web_application_firewall