Python Web Scraping and ModX

Hello all, I am starting to learn webscraping with Python. I have a website and I’m just using [[*stuff]] everywhere and am worried if these modx features will translate over when web scraping and then putting it in Python.

Any one done this yet? Any advice or warnings for me? Thank you in advance

If you’re going to scrape the MODX output, all those tags will have been replaced by something already. Just need to be mindful of anything that may be user/session-specific in your stateless scrape results I guess?

1 Like

ok, I was hoping it would work like that. Thank you Mark, wish me luck. Gonna start this in a few days

After watching the courses on Udemy, I believe Mark is 100% right as the scraping will come from what is rendered in your browser, and not in your code. I’ll update this on my journey to transfer my ModX site into Python and any discoveries along the way

I completed my Python training on Udemy and at this moment it seems the ModX functions will not be able to transfer directly to Python. Users would need to create Python functions that mirror what ModX does [[*stuff]] and [[~stuff]] for example

If you don’t feel like doing all that work, fear not, I too am lazy. You can just grab the page code from the front end of your site, press ctrl+u, grab all that code, and put it in Python, or use similar code below in JupityLab

import requests
from bs4 import BeautifulSoup


vgm_url = 'https://www.vgmusic.com/music/console/nintendo/nes/'
html_text = requests.get(vgm_url).text
soup = BeautifulSoup(html_text, 'html.parser')

soup

You’ll mainly want to look for HIP (HTML Inside Python) and here is a quick snippet of what to expect
http://karrigell.sourceforge.net/en/htmlinsidepython.html

My new ultimate goal with this project is to have Python running all my code from ModX, and the reason for using Python is basically to use a more flexible code, and also to hide my sloppy code from all my imaginary threats I created that make me feel important. I’ve seen another website use Python to hide their HTML (and yes, I know it was HTML first THEN Python later) and I’m just trying to replicate that

Thank you to MarkH for helping bounce an idea with me, and I know we didn’t see Bob Ray this time, but seriously, that guy is just amazing as well as the other people at ModX.

How can you have a website that hides its HTML? I’m utterly confused.

Why use MODX (a PHP framework) when you want to code in Python? Why not use a Python CMS?

1 Like