Diagnosing SimpleSearch memory limit error

I have a large magazine site with 15K+ pages each with 1000 word+ content. SimpleSearch is producing PHP memory limit errors on terms that produce a large collection of results and is quite slow in returning results in general. I’ve never had any issues with SimpleSearch before, but I don’t recall using it on a site this size. Anyone else have a similar experience and/or advice? Is this simply the limitations of the script?

There’s nothing particularly unique about the setup. I’ve confirmed it’s not some element within the site template that might be causing or compounding the problem, it occurs when I run it on a blank page devoid of any other MODX tags for markup. I’ve also confirmed there’s nothing within the results template that might be causing the problem. I’m not using any parameters that would cause excessive memory use.

Any advice would be welcome. I’m guessing I’ll need to implement a custom engine, but maybe one of you geniuses can tell me where to look for a fix.

MODX 2.8.2 • PHP 7.4 • MODX Cloud (NGINX) • SimpleSearch 2.1.2

It sounds like you’ve already checked this, but that error is often caused by a circular reference in the material being searched, like a chunk that includes another chunk that includes the first chunk.

Another possibility is a circular reference in the resource tree, such as a resource that’s it’s own parent, two resources that are each other’s parents, or a descendant that has a descendant with a parent that’s above it in the tree. SiteCheck will test for that (if it doesn’t run out of memory itself). A new version of SiteCheck is in the works that does a much better job of conserving memory, and is also much faster.

I’m not familiar with the SimpleSearch architecture, but if it’s gets all the resources with getCollection() and walks through them, that could explain the problem. The modResource object is pretty heavy.

You might try CustomSearch. It’s mainly for searching for things using a form in the Manager, but if you can do PHP, you can call it directly in code.

I should also mention that hosts will sometimes bump up the memory limit for you if you ask.

SimpleSearch loads all the matching resources with getCollection() (line 270) and then selects the portion for the current page with array_slice() (line 285).

If there are a lot of search results, many resources are loaded and then immediately discarded. This probably uses up a lot of memory and slows things down.

Maybe try the extra AdvSearch instead.
In this extra there is an initial query to count the search results, and then a second one that applies a LIMIT and only loads the data for the current page. Maybe this is faster.

Huh, that’s surprising. This is almost certainly the problem. I had looked at AdvSearch but it’s been 7 years since that package has had any activity so I’m leery about putting into use.

Will try hacking this a bit for a temporary fix, I expect simply switching to getIterator might help, but maybe that will break something else.

Gotta say, search doesn’t seem to be a strong point for MODX, wish I had time to work on a custom package.

Thanks for the reply, at least I know what’s happening now.

Yeah, these were the first things I checked. Based on what halftrainedharry points out in the source code, I’m pretty confident I’m simply hitting the limitations of the script. Seems like getCollection is a bad idea for this function.

Maybe have a look into this one:
sepiariver/fulltextsearch: Full Text Search for MODX (github.com)