Out of necessity I’ve converted 2 PDFs into 2 separate html files.
The converter creates a serial number and a class. very few of them are alike to do a global find and replace. Here’s an example:
I thought Regex would help me any and all characters between “<p” and “>”
so that I could wind up with just
and then globally find and replace that with
I’m using Sublime Text but none of the regex solutions work. I’m having a hard time grasping why various solutions don’t work even though I have the regex button on.
Any help would be most appreciated as I have 13,000 lines I’d like to edit in just a few minutes.
Regular expressions are notoriously unreliable when used on HTML.
Can you give a few examples of something you would want to find?
Another approach is to use an HTML parser to find particular tags, then search within them for what you want. I’ve done this, but it was a while back and I can’t remember which specific parser I used.
More info: DOMDocument. DOMDocument is used in PhpUnit and in parts of the MODX core code.
The Rubular website was most helpful. I was able to get rid of the various <p whateverhere"> by subtraction using the F&R panel in sublime that showed me what the results would be before committed to the Replace. Lots of inconsistencies were found that way.