Google indexing - page with rediect - affected page - index.html

Hi, it’s been one issue after another trying to get google to index my site properly… Hopefully I’m at the last hurdle.

I’m using friendly urls and have installed the canonical extra but google is not happy with www.mydomain.com/index.html saying it’s being redirected but the entry in .htaccess is for index.php and I cannot find any setting in system settings to potentially resolve this.

Can someone please assist?

Many thanks : -)

I think I may have fixed it by deleting the resource alias setting which is (index) by default in the manager homepage resource which then changes it to (home) - hopefully this fixes the issue?

Can’t you just remove the index.html file? I can’t think of any reason why you would need one, unless your server is misconfigured.

Hey Bobray, just jumping in here, removing the index.html file does make sense, especially if it’s conflicting with the default index.php or causing confusion with redirects. I’ve seen similar issues where leftover files like index.html end up getting indexed or prioritized by Google, even when the actual site runs off index.php or MODX’s routing. Worth double-checking the server’s DirectoryIndex settings too, in case it’s still trying to serve index.html by default.

Thank you for your replies…

The thing is index.html is not a physical file it’s being generated from the homepage in the manager with friendly urls enabled.

I did try renaming the homepage resource alias to “home” but then google said the problem was “index.html - 404 not found” - maybe I should have left it this way and given it more time?

Currently google reports an issue on search console that index.php as “Alternative page with proper canonical tag”

Also currently I returned the homepage a few weeks ago to “index.html” and placed the following into htaccess - “RewriteRule ^index.html$ / [R=301,L]” after reading some forum posts, this was just a somewhat desperate attempt to fix the issue, I’m clutching at straws!

Google also currently reports a problem in search console for index.html - Reason “404 - Not Found” (Started) after I clicked the fix button a few weeks ago, I think I should have heard from them by now?

Google currently has only indexed my domain “mydomain.com”.

Cheers

I think you’re on the right track with renaming the resource to “home.”

What do the base href lines look like in your templates?
Are all links to the home page on the site in the form of link tags ([[~##]])? They should be. You may have some links with index hard-coded, which would confuse Google.

Be sure to clear the site cache after making changes.

Hey Bob, thanks for your help, appreciated and sorry for the delay in replying I am watching Wimbledon…

Ok the base href looks set correctly in my template…

<base href="[[!++site_url]]">

All of my links are set as [[~##]] and that’s thanks to you as I read a forum post that you mentioned this method so thanks for that!

I have just heard from google (email):

Google has validated your fix for Page indexing issues on https://www.codenameboo.com/. The specific issue validated was: Not found (404).

1 pages on your site were validated as fixed.

Which is for /index.html

But then the next email:

Search Console has identified that some pages on your site are not being indexed due to the following new reason:

Page with redirect

Which again is for /index.html

Also /index.php
Has the following error on search console: Alternative page with proper canonical tag

So let’s proceed with your wisdom, I have renamed index.html back to home.html and have removed the following redirect code from htaccess:

Added by me - rewrite domain/index.html to /

RewriteRule ^index.html$ / [R=301,L]

And have clicked on fix in search console for both issues, so all is set Bob, see how we go…

Speak soon

Well good news I now have indexed 10 pages on google search console by using “inspect any url” and after pasting each url pressing “request indexing” on each one.

I did have a major error with using the “breadcrumbs” extra as it was using an obsolete schema, (‘data-vocabulary.org schema deprecated’) this was fixed by switching to the “breadcrumb” extra.

Although I don’t know how to configure the “breadcrumb” extra to use:- BreadCrumb 1.4.4-pl
Added placeholder to support BreadcrumbList - Schema.org Type

Their is now two outstanding issues on google console:

Alternative page with proper canonical tag - /index.php (Probably expected and unavoidable ?)

Not found (404) - /index.html (Maybe just give google time to remove it or maybe just keep clicking fix issue ?

Anyway I’m very happy today : - )

Thank you both bobray and poldx for your input with this!

Not sure if this will make things better or worse, but you might try this in your .htaccess file:

DirectoryIndex index.php

Hey Bob,
I’ve been looking into DirectoryIndex I’m just not sure it’s necessary atm as things are going well - search engines seem happy now.

poldx mentioned DirectoryIndex as well but as both of you wrote: get rid of index.html which is now sorted.

Tell you what though I had a long break from web design (years) and now things are so different I used evolution back in the day - wow that was a long time ago and now revolution with containers which can contain content. And no need to have a index.html / index.php after a container. (folder).

And the only index document is in the site root now and is (index.php as you know).

I think I’ll see how things go for a few weeks and then reconsider using DirectoryIndex.

An issue I do have is the w3c html validator page returning a 500 internal server error message when I submit any url from my site - that’s not good.

Speak Soon

I could be wrong, but I think that index.html is still the default directory index for many servers. So the point of the command in .htaccess is to change that. If you only list index.php, it should stop looking for index.html.

Ok I’ve done it Bob…

DirectoryIndex index.php - is now set.

Thanks : - )