Solutions for a better Login/Register extra

sean69 · December 7, 2021, 2:20pm

Having issues with the register portion of Login/Register - specifically with thousands of spurious spam registrations. I’m looking for solutions/plugins/extras to fix the issue.

RECAPTCHA v3 seems to mitigate it some, but not enough.
the system still generates the confirmation email, so there is a high probability as getting blacklisted as a spam server.
the spurious registrations need to be expunged manually (25,000 on one site!!)

Has anyone got any working solutions to this issue?

a plugin to purge registrations that have not been confirmed after “X” days?
pre-validation of an email address before the registration form can be filled out?
some kind of validation on the email address before the email is sent. i.e. the user sends the system an email, headers could be checked to ensure it comes from a mail client/MTA rather than a bot…

-thanks
-sean

bennyb · December 7, 2021, 4:46pm

Hey Sean, have you seen modmore’s Akismet extra?

Details here:

Ben

sean69 · December 7, 2021, 5:04pm

Interesting - no, I had not seen it (or heard of it) will have to check it out.

smashingred · December 7, 2021, 6:21pm

I’ve previously implemented the CSRFHelper and spam hook. Akismet may work better than the basic spam tool. I’ve also layered in a password validator snippet that checks for password patterns and also against HaveIBeenPwned so that folks don’t use compromised passwords. This helps eliminate the super weak passwords that many script kiddies might load into a registration form. I’ll see if it’s possible to share the code from my snippet. I can’t promise it’s easily reusable.

sean69 · December 7, 2021, 7:03pm

I was kind of thinking on getting them before any email was actually generated i.e something along the lines:

fill in email address only
click submit
log it to DB
app comes back with a mailto link with a code in the subject line
user clicks and sends email from their client
system retrieves the email …

then it gets a little fuzzy, we can inspect headers for SPF etc, validate the email came from the correct server, check the subject code etc. I’m sure there is a bunch of other stuff…

But that should validate most cases (unless someone is intentionally targeting a given site) then we can either create an account & send credentials or send them an invite to create an account (in a hidden and validated resource) or if they have to fill out some account info - just enforce that on the first login.

Of course if they never come back - we prune that as well

smashingred · December 7, 2021, 8:39pm

That’s a pretty extraordinary and potentially error-prone task flow for the unsuspecting but legitimate person trying to sign up. Usually when someone is trying to sign up for a site or service they want to do so as easily and quickly as possible because the reason they’re signing up is they want to do something right now. Stepping out of the typical signup flow of enter your stuff and wait for an email confirmation, confirm and log in is going to add significant friction and could end up with abandonment if “this is too much of a hassle to be bothered. I just wanted to signup”

What you’re describing could work but is dependent on the query string remaining intact and may still require them to further fill out a form and wait for a confirmation email. Say the query string fails? What would that mean for the legitimate user who encounters a problem in this scenario?

I think adding more validation to the form before sending an email and the redirecting failed/filtered submissions to a thank-you page so it looks like a successful submission but doesn’t do anything—especially doesn’t send an email. So, stacking spam and CSRF_Helper and Akismet possibly with rate limiter would enable you to block the vast majority of SPAM submissions. Most importantly when a spammer tries to send emails by registering they’ll not get any and stop poking at it.

smashingred · December 7, 2021, 8:43pm

I want to add that I hate CAPTCHA and ReCAPTCHA as a user and don’t want to inflict it’s UX onto forms. If I’m unwilling to make customers jump through those hoops, I’m personally going to lean on how to I make the standard signup task flow less prone to abuse vs how to I stop abuse by moving my ideal customer’s cognitive cheese. You can always do what the Superstore Pharmacy does and require me to go into the Pharmacy to set up online prescription refills, so, consequently, I can’t be bothered.

markh · December 7, 2021, 9:25pm

Akismet works a lot better. It’s blocking 88% of signups on modmore.com as spam with remarkable accuracy. I think we’ve had 2 instances since first releasing that where it flagged a legitimate signup, and maybe a handful of missed spam signups. On a total of about 450 signup attempts.

I know it’s kinda touting our own horn as we built the integration… but Akismet is absolutely fantastic and 100% recommend it.

This would certainly be interesting!

That seems really complex and bound to hurt signups a lot unless you’re some kind of essential service people must sign up for.

One thing that could be considered which I don’t think Register currently does, is doing a MX lookup for the domain used in the email address. That’ll tell you in advance if the email can even be delivered.

While the presence of a valid MX DNS record is not a positive signal (i.e. you can’t determine the email is ham from it), the lack of one is a really good negative signal (i.e. you can block the signup because it’s invalid).

In the end it’s a rat race between spammers and people protecting against it, so a layered approach makes sense. Hidden nospam fields, recaptcha 3, and Akismet will go a very long way.

sean69 · December 7, 2021, 9:29pm

It’s also a lot of work to setup a pop retriever and wait for emails that may or may not show up… I didn’t say it was a well formed idea

Captcha sucks and does not work that well.

Another thought…

Obfuscate the form fields on presentation using some sort of key/encoding then decode on submission

i.e. your “email” and “confirm_email” controls look like random crap “awef342” & “hkj32ger_ewf2”
That way a bot is not going to know what fields are email addresses & need to match … it would need to be smart enough to associate the labels (or closest text) with a field type. (and smart enough to be able to respond to error messages - which I have not seen)

hmmm…

sean69 · December 7, 2021, 9:45pm

I kind of want to avoid 3rd party services … neither client that is having the issues currently would pay for it anyway

Yes - that would be interesting! But in both cases here I am generating and forcing strong passwords (users can’t pick their own)

Yes - both sites will have no problems with people jumping through hoops to sign up - one of them in face should be done manually by the client as it’s only their customer base from CRM that is allowed in! (and they got tagged with 35,000 spurious registrations!!)

The MX thing should actually be quite easy to implement I think… pretty low on the totem, but I think it would clean out maybe 20% ???

and yes, layered approach, there is no silver bullet.

also thinking about checking for a crumb trail - making sure the user has been on the site for a couple pages before allowing a registration… bots also start losing interest in multi-page forms as well.

smashingred · December 7, 2021, 9:50pm

I’m sure as soon as you do that, the client will start sending out direct links to the page.

sean69 · December 7, 2021, 10:02pm

hah! I’ll bet. Though - again not fully formed… a random string could be attached to the session and a “register here” button so that it is passed to the register page … a bot would have to establish a session first before trying to get the register form.

even less thought out … randomize the FURL of the register page on each request? Combined with some htaccess regex … hmmm.

markh · December 7, 2021, 10:13pm

Does the CRM have an API you can use to check if the provided email is allowed to signup? If you already have an “allowlist”, that’s as close as you’re gonna get to perfection.

How much are they spending time-wise on filtering through spam signups? What’s the AWS bill for sending emails that just bounce, or the cost of getting blacklisted when too spam signups happen?

I understand the general sentiment, but there is actually a business case to make that 3rd party services are cheaper for this sort of thing.

Akismet can be installed and blocking spam in about 10 minutes. You can try it out on the personal plan (“pay what you want”) to see if it’s effective for a few days, and then present the case to the client.

This is what Recaptcha 3 also does; it tries to identify normal user behavior on the site.

sean69 · December 7, 2021, 10:30pm

Hmmm looking into the code behind an MX lookup … looks pretty dead simple can do a reverse lookup as well … which should increase effectiveness…

sean69 · December 7, 2021, 10:41pm

It does actually and I do use it to match an email with a customer number - but I misspoke a little, they do need some people to sign up without being in the CRM system (that actually waffles back and forth )

Well they paid me over $400 this morning to clear out 35,000ish users … like 4 years of an akisnet subscription… it’s always an uphill battle

Yes, I am aware … it does not work very well at all … and that sort of thing won’t fool a bot that crawls a site looking for forms.

The particular bot they have is a little smarter than most, throttling won’t catch it - it’s pretty relaxed with the requests one every 2-3 minutes randomly and it will “go away” for a few hours here and there then wander back after a bit. Somehow ~ and I have no frikken clue how it managed this - it actually confirmed about 20 registrations!! (out of 35,000)

jako · December 8, 2021, 7:35am

A similar processor runs on a customer installation, removing registered but not confirmed users.

<?php
/**
 * Remove expired processor
 *
 * @package login
 * @subpackage processor
 */

class RemoveExpiredProcessor extends modProcessor
{
    public function process()
    {
        $c = $this->modx->newQuery('modUser');
        $c->where(array(
            'cachepwd:!=' => '',
            'active' => false,
            'createdon:<=' => strtotime('-12 hour')
        ));

        /** @var modUser[] $users */
        $users = $this->modx->getIterator('modUser', $c);
        foreach ($users as $user) {
            $user->remove();
        }

        if (php_sapi_name() == 'cli') {
            exit (0);
        } else {
            return $this->success();
        }
    }
}

return 'RemoveExpiredProcessor';

sean69 · December 8, 2021, 2:56pm

Ha! thanks - I was literally just about to sit down and write that

smashingred · December 8, 2021, 7:08pm

FYI,

I collaborated with @elizabeth on a standalone proof of concept yesterday for a simple Snippet to use if you did want to validate passwords against haveibeenpwned.com. An advantage of this in you validation chain is that if you use it, lazy password field entries will be blocked from registering. Here’s the untested POC:

<?php
/*
/ This hook is designed to make sure
/ a password is not terrible and doesn't
/ use other fields or any of the most
/ common passwords.
/ @@Author Jay Gilmore
/ @@Date Friday, March 8, 2019
/
*/

// acquire paths for requires.

// Get form fields and set them up for use.

$passField = $modx->getOption('passValField', $scriptProperties, 'password');
$pass = $hook->getValue($passField);

public function is_compromised_password(string $password): bool {
    $hash = strtoupper(sha1($password));
    $range = substr($hash, 0, 5);

    $handle = curl_init();
    curl_setopt($handle, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($handle, CURLOPT_TIMEOUT_MS, 2000);
    curl_setopt($handle, CURLOPT_URL, "https://api.pwnedpasswords.com/range/" . $range);
    $result = curl_exec($handle);

    // we could explode into an array, but just checking for the hash in string is cheaper and easier

    if (strpos($result, $hash))
        return true;
    return false;
}

if (is_compromised_password($pass)) {
    $err = 'Please choose a different password. This one appears in a compromised password database.';
    $hook->addError($passField, $err);
    return false;
} else {
    return true;
}

Also, you could optionally check the password against any other fields if the submitter was using the values of other fields, it’s not only a bad password but it’s also a potential spammer. Finally, you can also check character repetition levels and validate against that, you probably don’t want customers using 3334445555 as their password.

Here is the unique character check tidbit:

{
    function unique_check($input) {
        $l = mb_strlen($input, 'UTF-8');
        $thold = round($l*2/5);
        $unique = array();
        for($i = 0; $i < $l; $i++) {
            $char = mb_substr($input, $i, 1, 'UTF-8');
            if(!array_key_exists($char, $unique))
                $unique[$char] = 0;
                $unique[$char]++;
        }
        if(max($unique) > $thold){
            return false;
        }
        else{
            return true;
        }
    }
}

elseif($unique_checkResult !== true){
        $err = "This password has too many repeated characters.";
        $hook->addError($passField,$err);
}

bobray · December 8, 2021, 10:37pm

I’ve had good luck with extensive rewrite rules in .htaccess and a double opt-in requirement.

That said, some miscreants are reportedly hiring kids to register manually. It’s pretty difficult to block that.