diff --git a/content/posts/CVE-2019-19844/index.md b/content/posts/CVE-2019-19844/index.md new file mode 100644 index 0000000..08f60dd --- /dev/null +++ b/content/posts/CVE-2019-19844/index.md @@ -0,0 +1,226 @@ +--- +title: CVE-2019-19844 +date: 2019-12-18 +subtitle: Potential account hijack via password reset form +tags: [security] +--- + +Yesterday, an email was sent to `django-announce`, informing of an upcoming security update, labelled "high" severity. Previous notifications like this have been 1 week before the actual disclosure; This email, just 12 hours. The updates were scheduled to be released 12:00 UTC the next day (today). Already, not the best thing to be reading just 1 week before Christmas, and 1 day before the company production freeze. + +{{< resource src="initial-announcement-email.png" >}} +Email announcing the upcoming security release. +{{< /resource >}} + +This morning, at 09:23 UTC, said updates were [released](https://www.djangoproject.com/weblog/2019/dec/18/security-releases/), and an email hit my inbox, almost 3 hours early. I can only imagine what seeing that notification did to my heart rate. + +{{< resource src="release-email.png" >}} +Email announcing the release +{{< /resource >}} + +These updates, versions 3.0.1, 2.2.9, and 1.11.27, contain a fix for CVE-2019-19844, a vulnerability around the password reset mechanism, potentially enabling accounts to be hijacked, simply by knowing the user's email address. It was possible to receive the password reset email for an account you didn't control, reset their password, and hence gain access to the account. GitHub was hit by a very similar issue [only last month](https://eng.getwisdom.io/hacking-github-with-unicode-dotless-i/). Because of the high-profile nature of the vulnerability, and its high impact, the Django security team decided to release updates as quickly as possible, hence the small notification period. + +It's around this time I realised today would be _interesting_. + +The vulnerability itself is a side-effect of how case-insensitive SQL queries work in many locale-aware database engines, and how this relates to email sending. The patches were applied to `django.contrib.auth.forms.PasswordResetForm`. Libraries which use this form directly, with little to no modification, such as `django-rest-auth`, shouldn't require any additional patches, besides bumping the Django version. + +The exact fix for CVE-2019-19844 came in 2 parts: Fixing unicode comparison, and not trusting user input. + +If your project, or a package you maintain, handles password reset in a custom way, however small, as `django-allauth` [did](https://github.com/pennersr/django-allauth/commit/9ec5a5456a59781771e1c3a0df3d555a0089accd), or overrides specific parts of `PasswordResetForm`, keep reading! Alternatively, if you're like me and find security vulnerabilities or weird unicode issues interesting, you should keep reading too. + +## Unicode is hard + + +What I'm about to talk about may be completely incorrect, because I, chances are much like you, find unicode a gloriously complicated, but rather interesting concept to grasp. I'm not sure anyone truly knows all its caveats, but if you know more than I do, and found something in the below which is wrong, please [tell me](https://twitter.com/RealOrangeOne). + + +Contrary to what many people believe, computers can display a lot more than just letters and numbers. Or at least, what primarily english speakers consider letters and numbers - There are a lot more languages and character sets than just those used in the English language! + +Whilst I could go quite in depth about unicode, why it's great, why it's terrible, and why you really should be aware of it, [Tom Scott](https://www.tomscott.com/) has done a number of great videos on this, which I highly recommend checking out! + +- [Characters, Symbols and the Unicode Miracle - Computerphile](https://www.youtube.com/watch?v=MijmeoH9LT4) +- [᚛ᚈᚑᚋ ᚄᚉᚑᚈᚈ᚜ and ᚛ᚑᚌᚐᚋ᚜](https://www.youtube.com/watch?v=2yWWFLI5kFU) +- [Why Do Flag Emoji Count As Two Characters?](https://www.youtube.com/watch?v=sTzp76JXsoY) +- [⚫ How The Black Point Message Crashes Android Apps](https://www.youtube.com/watch?v=jC4NNUYIIdM) + +The issue with this relies on collisions, where 2 characters can have the same operation done to them, such as changing their case, and produce the same output. + +A good example of this is the "ß" character in German. The german alphabet has an extra character when compared to the standard english alphabet, "ß", which sounds _almost_ identical to a "ss". As a human, watching a computer interact with this can lead to some confusing results: + +```python +"ß" +>>> "ß" # Looks correct + +"ß" == "ss" +>>> False # Well, obviously + +"ß".lower() +>>> "ß" # Yup, with you so far... + +"ß".upper() +>>> "SS" # lolwhut?! +``` + +(The same happens in both NodeJS and Ruby) + +The final example doesn't really make sense, until you think about it. "ß" is _almost_ equivalent to "ss", therefore making it lower case would result in the same thing. However, there is no upper-case version of "ß", meaning to deal with locales properly, it's converted into "SS", the upper-case version of "ss". However, "ss" isn't actually equal to "ß", whether as part of another string or otherwise. + +### Databases + +Databases do a very similar thing. PostgreSQL, my database engine of choice, compares strings byte-for-byte when querying based on strings, locale-aware or not. However, when querying in a case-insensitive manner, it uses locale-aware matching, meaning "ß" *is* equal to "ss". + +SQLite doesn't do locales in quite the same way. Try the above with SQLite, and you'll find "ß" and "ss" are in fact different, even when querying in a case-insensitive manner. + +## Don't trust user input + +One of the greatest security lessons you'll ever be taught is "Assume everyone's out to get you". Nothing is safe, every request could be _that request_, and everyone has malicious intent. In this case, do as little with the raw user-provided details as you can. + +Django's password reset request flow work like: + +1. User sends their email address to Django +2. Django validates what they sent looks like an email address +3. Django fetches users who's email matches what's provided, _in a case-insensitive manner_ +4. Django filters out users who don't have usable passwords +5. For each of those users, Django sends them an email containing a tokenized URL which can be used to reset their password +6. The user is informed "_If a user with this email exists_, we've sent them a password reset link" + +Now, nothing in this flow is necessarily insecure, or necessarily secure. The proof is in the detail. In this case, the cause of the issue lies in step 5. + +Once Django pulls users out of the database, and validates they have usable passwords, an email is crafted in memory for that users email. Importantly, said email address isn't the one from the database row, it's the one from the users request. But as we jsut learnt, a case-insensitive query can yield results which aren't exactly identical to the search term, meaning in malicious cases, they'll be different. + +Email addresses, and domain names for that matter, are widely accepted as being case insensitive. ME@GOOGLE.COM and me@google.com will probably end up in the same place, just as browsing to `GOOGLE.COM` will probably lead you to that ~~data collector~~ search engine you know and love. + +The issue here lies in the fact that the 2 don't work in exactly the same way. PostgreSQL, and many other locale-aware storages consider the locale when comparing case-insensitive. DNS on the other hand, converts domains to [punycode](https://en.wikipedia.org/wiki/Punycode) before resolving, at which point the character becomes 'just another character'. + +For example, the GitHub attack used the Turkish dotless i "ı". "GıtHub" isn't the same as "GitHub" to us, nor is it to DNS, where it becomes the punycode `gthub-2ub`, but as far as case-insensitive locale-correctness is concerned, they're the same, or at least the same enough. + +Now this isn't a bash on PostgreSQL, what they're doing is definitely correct, and is required for the modern, multi-charset world. Nor am I bashing Python, or DNS, or anyting for that matter. Really, us humans are the issue, assuming that everything works in the nice super simple way we'd expect it to. We're wrong. + +## _"So how does all this relate to CVE-2019-19844?"_ + +Back on topic, CVE-2019-19844. As I said, the patch to Django was in 2 parts: Fixing unicode comparisons, and fixing user input. + +> 1. After retrieving a list of potentially-matching accounts from the database, Django's password reset functionality now also checks the email address for equivalence in Python, using the recommended identifier-comparison process from +> 2. Unicode Technical Report 36, section 2.11.2(B)(2). +When generating password-reset emails, Django now sends to the email address retrieved from the database, rather than the email address submitted in the password-reset request form. + +The exact patch can be seen [on GitHub](https://github.com/django/django/commit/5b1fbcef7a8bec991ebe7b2a18b5d5a95d72cb70), and the split can be seen quite nicely. + +### Fixing unicode comparison + +A modification was made to `PasswordResetForm.get_users`, to add an additional check. Once users were retrieved from the database, their email addresses were normalised, and compared against a normalised version of the user input, before being allowed through. This means even if the database returns a user which is like the provided email address, but different in a locale-aware manner, it will still be filtered out. + +### User input sanitization + +Once users have been retrieved from the database using `PasswordResetForm.get_users`, and the emails are being created, the `to_email` is set to be the one pulled from the database, rather than what was provided by the user. This is more correct, as the recipient address fully matches the email address for the user, but also removes the use of the user-provided email value for anything other than retrieving database users. + +#### Non-obvious patch + +The exact change to this isn't obvious. Take the below 2 code examples. These are 2 snippets of the same method on `PasswordResetForm`, taken from Django's `master` branch. 1 is vulnerable to CVE-2019-19844, the other is not. + +This method is vulnerable: + +```python +def save(self, domain_override=None, + subject_template_name='registration/password_reset_subject.txt', + email_template_name='registration/password_reset_email.html', + use_https=False, token_generator=default_token_generator, + from_email=None, request=None, html_email_template_name=None, + extra_email_context=None): + """ + Generate a one-use only link for resetting password and send it to the + user. + """ + email = self.cleaned_data["email"] + if not domain_override: + current_site = get_current_site(request) + site_name = current_site.name + domain = current_site.domain + else: + site_name = domain = domain_override + email_field_name = UserModel.get_email_field_name() + for user in self.get_users(email): + user_email = getattr(user, email_field_name) + context = { + 'email': user_email, + 'domain': domain, + 'site_name': site_name, + 'uid': urlsafe_base64_encode(force_bytes(user.pk)), + 'user': user, + 'token': token_generator.make_token(user), + 'protocol': 'https' if use_https else 'http', + **(extra_email_context or {}), + } + self.send_mail( + subject_template_name, email_template_name, context, from_email, + email, html_email_template_name=html_email_template_name, + ) +``` + +And this method _isn't_ vulnerable: + +```python +def save(self, domain_override=None, + subject_template_name='registration/password_reset_subject.txt', + email_template_name='registration/password_reset_email.html', + use_https=False, token_generator=default_token_generator, + from_email=None, request=None, html_email_template_name=None, + extra_email_context=None): + """ + Generate a one-use only link for resetting password and send it to the + user. + """ + email = self.cleaned_data["email"] + if not domain_override: + current_site = get_current_site(request) + site_name = current_site.name + domain = current_site.domain + else: + site_name = domain = domain_override + email_field_name = UserModel.get_email_field_name() + for user in self.get_users(email): + user_email = getattr(user, email_field_name) + context = { + 'email': user_email, + 'domain': domain, + 'site_name': site_name, + 'uid': urlsafe_base64_encode(force_bytes(user.pk)), + 'user': user, + 'token': token_generator.make_token(user), + 'protocol': 'https' if use_https else 'http', + **(extra_email_context or {}), + } + self.send_mail( + subject_template_name, email_template_name, context, from_email, + user_email, html_email_template_name=html_email_template_name, + ) +``` + +Spot the difference yet? It's just 5 characters. + +The issue is which email address is passed into `self.send_mail`. In the vulnerable example, `email` is passed, which is pulled from `self.cleaned_data["email"]`, which is the user-provided address. Whereas the fixed example passes `user_email`, which is pulled form `getattr(user, email_field_name)`, and therefore from the database address. + +Now this example is intentionally vague, as the actual patch wasn't identical to this, but it's a prime example of how easy it is to miss what is actually quite a large security hole. + +#### Custom reset flows + +If you've got a custom password reset flow, and can't simply update Django, manually patching isn't hard. If you're doing something custom, ensure you're sending the email to the actual users email rather than the provided email address. + +An easy way of achieving this using Django's `PasswordResetForm` is by overriding `send_mail` to pull the email address from the user, which can be retrieved from the email context, rather than using the provided one: + +```python +def send_mail(self, *args, **kwargs): + args[2] = getattr(args[2]['user'], get_user_model().get_email_field_name()) + return super().send_mail(*args, **kwargs) +``` + +If you are doing this, add a test case to make sure it works, and doesn't accidentally get reverted. `django-allauth` has a nice [example](https://github.com/pennersr/django-allauth/commit/9ec5a5456a59781771e1c3a0df3d555a0089accd) of this. + +## Takeaways + +The biggest take away from this is to keep things up-to-date. If you take nothing else away, let it be that! Packages are updated for far more important reasons than simply new features or a slight performance improvement. + +If you're reading this, and have projects on versions of Django older than 3.0.1, 2.2.9, and 1.11.27, please go and fix them. Today I audited, patched, reviewed and deployed over 20 projects, in 1 day! + +When accepting user input, use it directly for as little as possible, and where you do have to use it, make sure it's valid and sanitary. + +And remember, Unicode is *weird*! diff --git a/content/posts/CVE-2019-19844/initial-announcement-email.png b/content/posts/CVE-2019-19844/initial-announcement-email.png new file mode 100644 index 0000000..0253395 Binary files /dev/null and b/content/posts/CVE-2019-19844/initial-announcement-email.png differ diff --git a/content/posts/CVE-2019-19844/release-email.png b/content/posts/CVE-2019-19844/release-email.png new file mode 100644 index 0000000..f91d8a5 Binary files /dev/null and b/content/posts/CVE-2019-19844/release-email.png differ