What is the recommended way to search pdfs

Hey all,

I was attempting to write a snippet to search for a phase within any number of resources providing that resource have a pdf attachment. My method is highly inefficient because it times for when it searched each resource for the phase.

My snippet

require_once $modx->getOption('core_path') . 'components/bjs/model/bjs/bjs.class.php';
$bjs = new bjs($modx);

$ids = $modx->getOption('ids', $scriptProperties);//1,2,3,4,5,etc
$ids = explode(",", $ids);

foreach ($ids as $key){
    $pdftext = $bjs->pdftotext($key);
    
    //echo "$key<br> ";
}
//$text = $bjs->pdftotext ($id);

return '';

My class

<?php
require_once MODX_ASSETS_PATH.'components/bjs/libraries/pdftotext/PdfToText.php';

class bjs {
    /**
     * @access public
     * @var modX A reference to the modX object.
     */
    public $modx = null;
    /**
     * @access public
     * @var array A collection of properties to adjust SFL Countries behaviour.
     */
    public $config = array();
    
    function __construct(modX & $modx, array $config = array()) {
        
        $this->modx = &$modx;
    }

    public function pdftotext ($id){
        
        $resource = $this->modx->getObject('modResource', $id);
        
        if ($resource){
            $pdf_file = "/paas/c0240/www/".$resource->getTVValue('pdf_attachment');
            
            if (file_exists($pdf_file)){
                $pdf = new PdfToText($pdf_file);
                $data = $pdf->Text;
            }
            
            
        }
        
        return $data;
    }
    
}

Obviously, using the foreach is not optimal when doing saerches. Does anyone have any suggestions?

I don’t know if there is a “recommended way”, but you probably have to convert the PDF to text when you attach it to the resource and save the text to the database where it can be queried.

Maybe write a plugin that runs on the event OnDocFormSave or OnBeforeDocFormSave, detect if the value of your TV pdf_attachment has changed and then do the conversion and save the text to another (hidden) TV or to a custom database table. (With a custom database table you could maybe even create a FULLTEXT index.)