Streamlining a Website

chrisandy · June 16, 2021, 10:22am

I’ve got a website that’s been around for over 12 years. First built in Evo then ported to revo and kept up to date over the years - currently running 2.8.

We’ve trimmed down unused / irrelevant content significantly. We’ve got to the point though where it’s getting unwieldy. We’ve got redundant TVs, Chunks and worst of all hundreds of images in various folders, that are no longer used on the site.

With images, I thought about moving them from one folder at a time into a kind of ‘holding’ folder then check to see what was broken.

I’d be grateful if anyone has tips about how I could trim the site assets (and check for broken image links?) and elements down efficiently?

I’m also concerned that although images may show up on the site, they will be in the image cache and so I could get false positives.

bobray · June 16, 2021, 8:12pm

Orphans might help identify orphan elements.

Images are trickier because it’s common for them to have fake filenames in the code that are translated by some snippet.

Rather than moving everything to a “holding” folder and trying to remember where it came from if it’s actually in use, you might just rename them. I just put xxxx at the beginning of the name. If they’re in use, I just rename them back.

chrisandy · June 17, 2021, 10:23am

Thank you Bob - Orphans will be useful.

With the images, since 2014 they’ve been split into folders for every six months so I think I’ll stick with the idea of creating a holding folder inside each of those folders and moving what I think are unused files.

Image cache for pthumb and MoreGallery does concern me though.

How feasible do you think the following is…

I was thinking of creating an uncached call in pdoresources or getresources to show the content of groups of pages - on one page. This would give me an easy reference to see any missing images all on one page. I’m just concerned it would blow the site or database.

bobray · June 17, 2021, 7:02pm

It would be pretty big load on the server, to produce a single page containing all pages, but I doubt if it would cause any permanent damage, though it might cause a PHP timeout .

I think I would write a snippet that used $modx->getIterator(), that looked all pages using a regex expression and just displayed the img tags and the file name in the quotes. That would be somewhat lighter than the mega-page you’re suggesting.

Here’s another way to go: https://serpstat.com/blog/how-to-detect-broken-images-on-the-website/

Actually, before doing either of those, I think I’d write a snippet that used DirWalker to grab the names of all image files. Then use getIterator() to look at all the pages on the site to identify images that aren’t used anywhere. This assumes that the hrefs contain the actual filenames.

chrisandy · June 18, 2021, 9:59am

Thank you Bob. Just been reading about that functionality and yes it does look like the way to go. A I’m not a coder I’m wondering if Rowboat might be a good way to go with this.

bobray · June 18, 2021, 5:18pm

I haven’t used Rowboat, but it might work.

It just occurred to me (duh) that many, if not most, of the images would be in templates and chunks rather than resources. Some could also be in snippets, plugins, system settings, properties, or property sets.

Another way to go might be to grab the image tags in a plugin attached to OnWebPagePrerender to append them to a file. Once that plugin was in place, running the RefreshCache extra would cause all web pages to be requested.

xgarb · June 27, 2021, 6:30pm

Could you make a clone of the site, remove all the images, delete the cache and then run a site indexer tools and look for 404 errors for missing images and add them back in?

chrisandy · June 28, 2021, 9:48am

That is basically what we’re doing but because the site is so large, cloning it isn’t an option. Once we start to get the site slimmed down we should have more flexibility for cloning and redeveloping