RIAB and CDFA Web Searching PHP Script

RIAB and CDFA Web Searching PHP Script

I’ve been documenting my tribulations with the RIAB (Rendering Industry Advisory Board) for a few months now, and if you can help out our cause in the advocacy of personal-use biofuel collection in any way, I would greatly appreciate it.  If you can contact your legislator, attend some of the RIAB meetings, or contribute to our lobbying cause, that would be amazing.

Click to enlarge image.

Users cannot sign up for RIAB Press Releases, so they are uninformed on when and where new RIAB meetings and vacancies can occur.

The major problem with the board is that they never formally announce their meetings.  They are required by law to announce their meetings 72 hours in advance, and all they have to do is post a “press release” on the CDFA website.  However, they do this a shady and non-transparent way, to disincentivize citizens from attending the meetings.  If personal-use advocates cannot attend, then the corporate interests pass whatever they want and the legislators accept it as gospel.

The CDFA website is supposed to have an email service to notify people if a new press release has been posted.  When you try to sign up, you get the following issues (see image to the right:)

Basically, the RIAB posts their meeting announcement without much warning, they bury it in an inaccessible part of the website, and their email notification service malfunctions.  The last meeting in November 2018, there was less than 72 hours notice provided.  Unless you are literally logging on to their webpage every morning and reading their website every 24 hours, you’ll miss the announcement and the meeting.  Since they don’t allow for letters or phone conferencing, if you are not attending the meeting in person, you cannot object to their regulations on personal-use of biofuel.

So, we came up with a custom PHP script to search the press release archives on their website and post notifications to a HTML5 page.  Also, we embedded it in the legislative page so that whenever someone reads our content, it simultaneously checks the RIAB page for any new press releases.  Lastly, the page sends an email to anyone signed up for the notification if any new press releases are submitted to their archive.

Now on to the tech.  The code isn’t anything surprising, but I thought I would post my findings regardless:  It starts off…

$prepath = "https://www.cdfa.ca.gov/egov/Press_Releases/";

$yr = date("Y");
$yr2 = substr($yr, -2);
$f = file_get_contents($prepath."Press_Releases.asp?y=".$yr2);

Here’s the first part of the PHP script is resolving an issue with the CDFA website.   Rather than just using the CDFA main Press Release page (which contains the five latest Press Releases), I’m using the archives page.  The archive page contains all the Press Releases for that respective year.

The sketchy part is, if you go to the Archives Main Page, there is no link to 2019.  That means if you want to see the Press Releases for this year, you have to go to the Press Release Main Page, which only shows the latest five.  If you want to see the releases from January 2019, you are out of luck.

My script (above) allows me to bypass that link by concatenating the path without their website.

Moving on.  Now I set up my saved press releases list, so that I don’t send out emails to people on old releases.  We only want the most up-to-date releases.  This section of the code reads the text file and loads up the previous links in a simple string array.

$histfilepath = "name.txt";
$myfile = fopen($histfilepath, "a") or die("Unable to open file!");
$histarr = file($histfilepath, FILE_IGNORE_NEW_LINES);
$emailbool = 0;
$emailbodystr = "";

I’m going to use the HTML DOM library to read the site’s HTML.  If you aren’t familiar, it’s a great utility that allows programmers to read objects on any webpage using PHP.  The utility has some easy to use string-based searching, and also you can search by HTML object type.  Here’s how you set it up.   And “str_get_html()” is the function that retrieves the objects.

include 'simple_html_dom.php';
$html = str_get_html($f);
$ctr = 0;

The rest is basic PHP loops.  First, we will use the “find” method in Simple HTML DOM for any “a” hyperlinks on the website.  Then we use string comparison to find all the hyperlinks with “View this” in the title.  Once we retrieve the “value” of the link, we use the HTML DOM protocol to open that page as well, and filter on any press releases containing the word “rendering” as in Rendering Industry Advisory Board.

foreach($html->find('a') as $element) {

if (strripos($element,"View this") == "") {
 //nothing found with "View this..."
 } else {
 $value = $element->href;
 $g = file_get_contents($prepath.$value); //live link
 
 if (strripos($g,"rendering") == "") {
 //nothing found... with "rendering" in the text.

Then the script prints the found Press Release to our HTML page, and then checks to see if the press release is already inside our txt file.  If yes, then it moves on.  If no, then it saves the link to our txt file so we don’t send it out in the future.  Also, it saves a global boolean for later in the code.

 } else {
 
 //echo "Found RIAB or Rendering Press Release: ";
 echo "<b><a href=\"".$prepath.$value."\">Found RIAB Press Release</a>
    </b>"; //$prepath.$value;
 echo "<br>";
 
 if (in_array($prepath.$value, $histarr)) {
 // item found
 } else {
 // item not found, saving to txt file, sending email.
 $emailbool = 1;
 $emailbodystr = $emailbodystr.$prepath.$value."\r\n";
 fwrite($myfile, $prepath.$value."\r\n");
 }

Next it searches the “td” elements in the HTML to print more preview text for the page.  I figure it would be nice for our subscribers to know a little more than just “a new press release was found.”

 $gtml = str_get_html($g);
 $firstr = 0;
 foreach($gtml->find('td') as $gele) {
 if (strripos($gele,"rendering") == "") {
 // no rendering found
 } else {
 if ($firstr == 0) {
 $firstr++;
 $gtmlplain = $gtml->plaintext;
 echo "<p class=\"tab\">Preview Text: ...<i>".substr($gtmlplain, 70, 500)."...
     (more)</i></p><br><br>";
 $ctr++;
 }
 }
 //echo "<br>".$firstr."<br>";
 }

Lastly, I used a simple conditional to report if nothing was found.    After than, if the $emailbool was changed, then it triggers a simple text email with the list of press release links saved in the body of the email.

if ($ctr == 0) {
 echo "Our automated search found no current \"RIAB\" or \"Rendering\"
      Press Releases on the CDFA website for ".date("Y").".
      <a href=\"https://www.cdfa.ca.gov/egov/Press_Releases/Press_Releases.asp\"
      target=\"_blank\">Click here</a> to personally investigate.
      Good luck to you. If you want to be on our automated email
      notification list, <a href=\"http://www.nickpisca.com/wvo/contact-us/\"
      target=\"_blank\">contact us</a>." ;
}

if ($emailbool == 1) {
 $to = 'email@gmail.com';
 $subject = 'Possible RIAB Press Releases on CDFA Website.';
 $message = 'The following RIAB or Rendering Press Releases were
        recently posted to the CDFA website: '."\r\n\r\n".$emailbodystr;
 $headers = 'From: email@gmail.com' . "\r\n" .
 'Reply-To: email@gmail.com' . "\r\n" .
 'X-Mailer: PHP/' . phpversion();

if(mail($to, $subject, $message, $headers)) {
 echo 'New RIAB Press Releases found and emails sent.';
 }
}
fclose($myfile);

After it’s all done, you close the text file.

On the HTML end, I used some unicode character to make a “refresh” icon for anyone that feels the need to reload.

echo "<span class=reload><b><a href=\"http://nickpisca.com/wvo/riab.php\"
     style=\"text-decoration:none\" title=\"Refresh Search...\">&#x21bb;</a>
     </b></span><br><br>";

That’s about it.  Simple PHP.  If you have any suggestions on improving it, feel free to leave a note in the comments.  Also, if you can help out in any way (contacting legislators, going to RIAB meetings, and/or contributing some money to the advocacy effort), we would really appreciate it.  Thanks a million!

About the Author

Nicholas Pisca: Founder 0001d LLC; Former Technical Manager Gehry Technologies; Former Lecturer/Adviser/Faculty UCSB MAT/USC/SCIarc; Author YSYT; Editor 0001d BLAST;