How can I speed up a faceted search using advsearch on collections (300k records)?

I am hoping someone here with more experience using collections, advsearch and databases can help out. I do not have a lot of backend or php experience.

I created a simple site for a public library to use and it is mostly complete, however after I uploaded the largest dataset, the faceted search slowed down so much it is virtually unusable now.

There are about 300,000 entries with the largest dataset, but it is simple records (Last Name, First Name, Date, Page, & Type of Notice). The idea was that library staff could easily add this info using customized Collections forms, and patrons could use the search form on the website. Here is the search page: https://historysearch.cclibraries.com/?ref=5

It is much slower than the original site here:
http://lhdatabases.cclibraries.com/ObituaryIndex/home.php

The filtered search works, but it is painfully slow. Does anyone here have any suggests on how to speed up this search? I know there is a way to create custom tables with ModX, but I don’t know if that would be faster than searching standard collections documents with TVs. (Plus I don’t know how to tie in a custom table to collections.)

I have already put a lot of work into figuring out the query hooks, but I also recognize that using the TVs might be slowing this down.

Thank you all in advance.

So how exactly do you store the data?
Do you have 300’000 resources and each resource has 5 TVs for Last Name, First Name, Date, Page and Type of Notice respectively?

What does your query hook look like?

1 Like

Yes, that’s what I did. I learned later that TVs can slow things down. Here is my QH:

<?php
$andConditions = array(
    'tv.date:REGEXP' => 'tags:request:all',
    'tv.typeOfNotice:REGEXP' => 'category:request:all',
    'tv.lastName:REGEXP' => 'mark:request:all',
    'tv.firstName:REGEXP' => 'fname:request:all'
);

$qhDeclaration = array(
    'qhVersion' => '1.2',	
    'andConditions' => $andConditions
);

$hook->setQueryHook($qhDeclaration);
return true;

Since I wasn’t using any fields, I also deleted this line of text in AdvSearch.php:

$scriptProperties['fields'] = $modx->getOption('fields', $scriptProperties, 'pagetitle,longtitle,alias,description,introtext,content');

I’d suggest looking into mSearch2 (available from modstore) - that builds its own index that will perform much better than advsearch, and with its companian mFilter you can also drill-down into the results.

1 Like

This query hook creates an SQL query with 8 left joins to connect all the TVs and their default values. This can’t be efficient.

SELECT ...
FROM `modx_site_content`
LEFT JOIN `modx_site_tmplvars` `date` ON ...
LEFT JOIN `modx_site_tmplvar_contentvalues` `date_cv` ON ...
LEFT JOIN `modx_site_tmplvars` `typeOfNotice` ON ...
LEFT JOIN `modx_site_tmplvar_contentvalues` `typeOfNotice_cv` ON ...
LEFT JOIN `modx_site_tmplvars` `lastName` ON ...
LEFT JOIN `modx_site_tmplvar_contentvalues` `lastName_cv` ON ...
LEFT JOIN `modx_site_tmplvars` `firstName` ON ...
LEFT JOIN `modx_site_tmplvar_contentvalues` `firstName_cv` ON ...
WHERE ...

Maybe you could try querying just the database table modx_site_tmplvar_contentvalues with a custom query.


Personally I’d probably move all the data into a custom database table.

But you can’t then use Collections anymore to manage the data. You would have to create a Custom Manager Page (which can be challenging). Maybe you could use MIGXdb for that or a JS framework like Vue.js (if you’re familiar with that.)

Maybe you could also just use the custom database table as a kind of “search index” and copy the data from the resources when they are saved.

2 Likes

Thank you. One of the few things I knew about databases was that JOIN slow things down so makes perfect sense. Do you think that something like this plugin would work to create a Custom Manager Page? customExtra / Utilities / MODX extras / modstore.pro

Thank you. I’m looking into this now. Do you think this could make the search less than 5 seconds? (Right now its taking over ten seconds.)

Thank you. One of the few things I knew about databases was that JOIN slow things down so makes perfect sense. Do you think that something like this plugin would work to create a Custom Manager Page? customExtra / Utilities / MODX extras / modstore.pro

I never used the extra customExtra and as it isn’t an open-source extra, I can’t look at the code to see how it actually works.

The easiest way is probably to use MIGXdb (that is part of the MIGX extra).
Are you still using MODX 2.x? (I don’t know if the package manager in MIGX already works in MODX 3).
Maybe take a look at this video:


By the way, for your “date” TV: Wouldn’t it be more practical to search for a timespan than to have an input field and search for matching strings with the LIKE operator?

1 Like

I’ve only used it on one site and that wasn’t quite as big as the data set you’ve got here, so I can’t make any promises. But I would say that searching a dedicated index is basically guaranteed to be more performant than this filtering through TVs.

Going with a completely custom data model would definitely be worthwhile, though. I think mSearch would be a good half-way point between making what you already have more easily searchable, and having to start from scratch.

1 Like

Thank you! I think I will try this first, since it does seem easier for my experience level. I’ll just purchase this without asking the libraries to pay for it though. If it’s not fast enough, it’s not too much of an investment, and I always can ask for reimbursement.

Thank you! I think I will try this first, since it does seem easier for my experience level. I’ll just purchase this without asking the libraries to pay for it though. If it’s not fast enough, it’s not too much of an investment, and I always can ask for reimbursement.

Thanks for all the info! Yes, I am still using 2x. I’m excited to try 3x, but since I already had lots of work in this version, I’m planning on upgrading next time around.

Since creating a custom database does seem more advanced, this will be my second option to try after mSearch2. I really want the search to be reasonable fast, so patrons are not frustrated, so it may be worth it in the long run. Hopefully I’ll know in the next few days.

Also, to answer your question, it would be great if I could use a timestamp, but the data from the library is not in standard formats and many of the dates are from decades ago, or even the late 1800s.

Hi there, just my two cents, reading your comments, I think it could be the case where investing the time on learning how to create a custom table, would be way less expensive on time and effort than trying to find alternate ways to happily satisfy your performance and data volume needs.

You can check this great tool GitHub - jaredfhealy/extrabuilder: An Extra for MODX to allow creating custom tables and packages directly in the manager that makes the table creation process a walk in the part, and once you get your table(s), creating a full snippet or rest service with pagination its a matter of a few lines of code, since MODX, as a framework already provides this kind of stuff, and then,
simple calls from the UI to do any data operation (Create, Retrieve, Update, Delete), and also opens the door for future developers to easily improve on your work, this might sound daunting, but after you get it running the first time, it’s like riding a bike.

The thing is that managing big datasets is a complex task, and what could seem like simple records, ends up being a complex text search/database optimization problem, that needs to be addressed from the database level.

here is some final information on improving the search on big datasets Advanced text searching using full-text indexes – Hacking with PHP - Practical PHP

1 Like

I’ll just say that using TVs for this is extremely cumbersome and slow, and can’t really be speeded up beyond a certain point. In part, that’s because the data is spread between the TV object table (which holds the TV default values) and the Table with the TV values for each resource.

Your method would also involve a whole lot of permission checks unless the user is a sudo user, and even then, I think that status would be checked many times.

I would use ClassExtender to create a custom DB table (it will put the extra fields onto the Create/Edit Resource panel for easy entry and editing), then I’d query it with PDO for the searches, though markh’s solution is definitely worth trying.

3 Likes

UPDATE: I got it working and it’s still fast even with 340k records! I had to learn some PHP in the process, which wasn’t as bad as I thought it would be. (I know a bit of python and a bit of JavaScript, but PHP has always been intimidating to me. I just hear back-end and run.) I ended up creating a custom table using MIGX and ExtraBuilder and did the query with XPDO. I wanted to thank you all for your responses here and the resources you have provided online, which really helped me to figure this out! Also wanted to list online sources, in case it might help another newb like me. :slightly_smiling_face:

Here is my much faster demo:
https://historysearch.cclibraries.com/?ref=689916

SOURCES:

What helped me to set up MIGX custom data tables: (Thank you @halftrainedharry and @robcarey )

What helped me to understand the fundamentals, like snippets, placeholders and XPDO: (Thank you @bobray )
https://www.amazon.com/MODX-Official-Guide-Bob-Ray/dp/0983619409 &
https://bobsguides.com/

What helped me with the pagination of my snippet: (Thank you @markh )

Using PDOTools for pagination helped a lot, and this documentation really helped:
https://docs.modx.pro/en/components/pdotools/snippets/pdopage

Other sources that helped:

https://docs.modx.com/3.x/en/extending-modx/modx-class/reference/modx.getchunk

https://docs.modx.com/3.x/en/extending-modx/xpdo/class-reference/xpdo/xpdo.getcount

https://docs.modx.com/3.x/en/extending-modx/xpdo/class-reference/xpdoquery

MIGX/migxconfigs.config.inc.php at b14c5abdb5503095b18975b391ef2615811c63aa · Bruno17/MIGX · GitHub (Refrerenced the query example starting at line 48.)

AdvSearch/advsearch.snippet.php at e0850067604da1c70052d175e6ca89647e578ede · Coroico/AdvSearch · GitHub (Looked at how to sanitize input with PHP.)

Thanks again!

4 Likes

This topic was automatically closed 2 days after discussion ended and a solution was marked. New replies are no longer allowed. You can open a new topic by clicking the link icon below the original post or solution and selecting “+ New Topic”.