Regex? Or is there another way to Find & Replace?

peterb3 · September 11, 2024, 10:50pm

Out of necessity I’ve converted 2 PDFs into 2 separate html files.

The converter creates a serial number and a class. very few of them are alike to do a global find and replace. Here’s an example:

I thought Regex would help me any and all characters between “<p” and “>”
so that I could wind up with just

and then globally find and replace that with

I’m using Sublime Text but none of the regex solutions work. I’m having a hard time grasping why various solutions don’t work even though I have the regex button on.

Any help would be most appreciated as I have 13,000 lines I’d like to edit in just a few minutes.

bobray · September 13, 2024, 10:17pm

Regular expressions are notoriously unreliable when used on HTML.

Can you give a few examples of something you would want to find?

Another approach is to use an HTML parser to find particular tags, then search within them for what you want. I’ve done this, but it was a while back and I can’t remember which specific parser I used.

More info: DOMDocument. DOMDocument is used in PhpUnit and in parts of the MODX core code.

Back to Regex. I’ve found this site helpful

peterb3 · September 14, 2024, 12:22am

The Rubular website was most helpful. I was able to get rid of the various <p whateverhere"> by subtraction using the F&R panel in sublime that showed me what the results would be before committed to the Replace. Lots of inconsistencies were found that way.

All in all it took me 15 minutes to get rid of 5k

while fixing the inconsistencies.

BIG THANK YOU!

bobray · September 14, 2024, 6:18am

I’m glad you got it sorted.