All posts by mberding

Web Scraping Tools

I’ve been shying away from web-scraping projects because of the amount of anti-scraping tech out there lately.  But today I found a new package that makes web scraping a whole lot easier.  I’m not sure how much anti-scraping software it can get around, but it worked for a recent project, and I’m quite pleased with it.

The software is called “Goutte” (pronounced ‘goot’, i.e. it rhymes with boot and not out).

Some example code, let’s say you’re trying to get the prices for all elements on a page that include a css class of ‘dollarAmountHere’:

use Goutte\Client;
$client = new Client();
$url = '';
$crawler = $client->request('GET', $url);

$nodeValues = $crawler->filter('span.dollarAmountHere')->each(function ($node) {
return $node->text();
$nodeValues now contains an array of the contents of the spans that matched the dollarAmountHere class.  Very handy!

Let’s Encrypt on Amazon Linux

My favorite SSL provider ( recently increased their pricing from $7.99/year to $49.99/year.  That felt like blatant abuse to me, so I’m canceling my services with them.

I’m now using the latest and greatest in SSL providers, Let’s Encrypt.  It’s a well-support non-profit to provide CA (Certificates of Authority) for free.  But, there’s a different kind of setup required to use it, so a little code is in order.  I’ll document here how to use the system (including automatic renewals) when using Amazon Linux.

First, log in as the ec2-user, the standard way of logging into an Amazon Linux EC2 instance, then run the following commands:

sudo pip install -U certbot

# note, if pip is not available, you can download the certbot manually:
# wget

chmod a+x certbot-auto

If you’re using Amazon Linux 2, you’ll need to run somme additional commands, as documented by AWS:

cd /tmp
wget -O epel.rpm –nv
sudo yum install -y ./epel.rpm
sudo yum install python2-certbot-apache.noarch

In places where the text YOUR_WEBSITE_HERE appears, replace that text with your website domain.  If additional websites are needed (such as a www version), append an additional “-d [extra_domain_here]” to the command below.

sudo ./certbot-auto --debug -v --server certonly -d YOUR_WEBSITE_HERE

Note: If you need to protect multiple websites, append additional “-d YOUR_WEBSITE_HERE” arguments to the command above.  Let’s Encrypt will generate a single set of keys that can be used to protect multiple websites.
Note: If using the Amazon Linux 2 instructions above, use an install command like this:

sudo certbot certonly --debug -v --webroot -d YOUR_WEBSITE_HERE

In the setup wizard you’ll come to a menu that looks like this:

How would you like to authenticate with the ACME CA?
1: Apache Web Server plugin - Beta (apache)
2: Spin up a temporary webserver (standalone)
3: Place files in webroot directory (webroot)
Select the appropriate number [1-3] then [enter] (press 'c' to cancel):

Chose option 3.  I tried option 1, it didn’t work, and I didn’t want a temporary setup either.  It’ll ask for an email address, legal agreement, opt-in to a newsletter, then another prompt:

Select the webroot for YOUR_WEBSITE_HERE:
1: Enter a new webroot
Press 1 [enter] to confirm the selection (press 'c' to cancel):

Since your only option is ‘1’, enter 1 and press enter.

Input the webroot for YOUR_WEBSITE_HERE: (Enter 'c' to cancel):

input /var/www/html and press enter

At this point the system will finish generating the certificates.  You’ll now need to edit the apache config to tell apache to use these new certificates.  Note: for this example, I’m assuming you’re running a web server with a single working directory and no virtual hosts.

sudo nano /etc/httpd/conf.d/ssl.conf

You’ll want to find the SSLCertificateFile, SSLCertificateKeyFile, and SSLCertificateChainFile entries and change their values to be as specified here:

SSLCertificateFile /etc/letsencrypt/live/YOUR_WEBSITE_HERE/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/YOUR_WEBSITE_HERE/privkey.pem
SSLCertificateChainFile /etc/letsencrypt/live/YOUR_WEBSITE_HERE/fullchain.pem

Note: If encrypting multiple websites all at once, the “YOUR_WEBSITE_HERE” will be the first website name you used in the certbot-auto command above and that will be used for all websites you are encrypting.

Save your changes, then reboot the web server.

sudo service httpd restart

If the web server restarts successfully, you can try the https version of your site.  It should be up and running at this point.

To set up a crontab that will automatically renew the certs as needed (since Let’s Encrypt only provides 90-day CAs), I like to use this code to get into a crontab manager:

sudo env EDITOR=nano crontab -e

Then add this line to have it automatically attempt to renew the certificate every day at 4:17AM UTC (a randomly selected time in the middle of the night):

17 4 * * * /home/ec2-user/certbot-auto renew --debug > /dev/null 2>&1

Save your crontab, and you’re good to go.

Much of this blog entry came from

2017-12-14 Update: if you run into this kind of error, “Error: couldn’t get currently installed version for /opt/”, run these commands (solution came from here):

rm -rf ~/.local/share/letsencrypt
sudo rm -rf ~/.local/share/letsencrypt
sudo rm -rf /opt/
sudo /home/ec2-user/certbot-auto renew --debug

WordPress Login Brute Force Protection

Lately I’ve had a bunch of wordpress sites that seem to randomly come under brute-force attacks on their wp-login pages.  One relatively simple solution is to password-protect the login file which will block the attacker from even trying to log in.  Seems like a reasonable security measure that can be put in place.  Use a htpasswd generator to create the .htpasswd file.

Add the code below to your root level .htaccess file, and make sure to change out the path to your .htpasswd file as necessary.

# START brute force protection
 ErrorDocument 401 "Unauthorized Access"
 ErrorDocument 403 "Forbidden"
 <FilesMatch "wp-login.php">
 AuthName "Authorized Only"
 AuthType Basic
 AuthUserFile /path/to/.htpasswd
 require valid-user
 # END brute force protection

PayPal 2016 merchant security upgrades

PayPal is upgrading SSL certificates, connection types, and so forth, which is generally a good thing.  However, CentOS has a tendency to be a bit behind on just about everything and that includes the version of cURL that is bundled into the OS.

To test your CentOS machine (or just about any linux machine) to see if it’ll be compatible with PayPal’s new updates, run this command:


If you get an SSL Error (35), you’ll need to run some updates.  If you get a long string back from PayPal complaining about the method not being supported, then you’re in good shape.

To get CentOS to actually use a modern version of cURL, there’s some great instructions here that worked perfectly for me.

Why You Should Never Use MongoDB

Had a project today that I figured might be a good use case for MongoDB.  Since I’ve never used that tech before, I started googling around to see if my use case would be a good fit.

As it turns out, I can probably just use a combination of MySQL and Amazon S3 and make it easier and cheaper than using MongoDB, but along the way I found a great read: Why You Should Never Use MongoDB by Sarah Mei.  She talks about using MongoDB as a replacement for a relational database (MySQL, PostgreSQL, etc.) and how it doesn’t fit that use case in real-world situations.  If you’ve got some random JSON data and that data can be of formatting any-which-way, then MongoDB works great.  It also seems to work well as a cache layer between a RDBMS and code, but as a straight up replacement to an RDBMS? Nope.

SNI is Awesome!

I learned something awesome the other day — you no longer need a dedicated IP address for every site that needs to have an SSL certificate!  Not only does this greatly help with IPv4 exhaustion, but on a personal level, it means that I can throw more SSL-based sites on a single server without needing to add additional IP addresses.  So cool!

To make it work, your server needs to be SNI compatible.  Looks like most modern operating systems are, including CentOS 6, which I’m using to host my “big” server at AWS with cPanel.

I’m just so excited about being able to host as many secure sites as I want without any concern for IP address allocation.  Cool!

BTW, I found a place to get SSL certificates for $8 per year, and they’re using the same backend technology as GoDaddy.  If you haven’t already, check out Starfield Technologies.

Video Creation

I just got done putting together the “why” video for DuxCal.  1 minute and 15 seconds of finished video took I-don’t-know how many hours to put together.  Writing the script, recording the raw footage, finding the right supporting media, doing voiceovers, editing the script, additional voiceovers, editing, and finally publishing, takes quite a while.

When you see a really smooth professional video with a great script, good camera work, and a clear voiceover, that should be appreciated just for the art and talent it took to create it.

I don’t think I’ll ever become a professional at video creation, but it is really nice to have a finished product!