Automate your Datafeed with Cron Jobs

Datafeeds are extremely common in affiliate marketing these days and offer affiliates a quick and easy way to import hundreds (if not thousands) of products into 3rd party applications and websites.

For sites that offer a price comparison functionality this import has to be done regularly – and can therefore be a real pain if done manually.

What is needed is a way of automating the process – and (if your host allows it) the use of cron jobs makes this a breeze.

WHAT IS A CRON JOB?

A cron job is simply a script or task that can be scheduled to run at a specified time.

Crons are commonly used to schedule system maintenance or to administer content but, as we are about to discover, they can also be used to perform common, repetitive tasks (such as downloading remote datafeeds from networks and merchants).

SO HOW IS IT DONE?

It all depends on your individual hosting setup but the basics remain the same; we need to create a simple script that will download (and unzip if necessary) the remote datafeed to our host and then we need to tell our host where that script resides and how often we would like it to be run. Simple.

So let’s get started and write a simple script to download a single datafeed.

Open up notepad (or any text editor) and write (or copy and paste) the following (remembering to change the relevant paths to reflect your own server setup):


/usr/bin/wget -O "/path/to/your/feeds/feed.csv.gz" "http://datafeedurl/gzip/";
/usr/bin/gzip -c -d -S "" /path/to/your/feeds/feed/feed.csv.gz > /path/to/your/feeds/feed/feed.csv
rm /path/to/your/feeds/feed/feed.csv.gz

The code above may look daunting but it is quite simple when we break it down.

In order to retrieve remote files our script will have to use wget (wget is a software package for retrieving files using HTTP, HTTPS and FTP – the most widely-used Internet protocols).


/usr/bin/wget -O

That’s where the first part of our line comes in above; it tells our script where it can find wget in order to use it (if you are not sure of the path or whether wget is even installed on your server speak to your administrator or host).

The second part of the line looks like this:


"/path/to/your/feeds/feed.csv.gz" "http://datafeedurl/gzip/";

This tells our script to “fetch” the gzipped remote file using the url (on the right) and save it to the location on the left (note: for this example we are calling the downloaded file feed.csv.gz but you could call it anything).

We now (in theory) have a gzipped datafeed downloaded which now needs ‘unzipping’. That is where the second line of the script comes in:


/usr/bin/gzip -c -d -S ""

As we did earlier with wget we are telling our script where it can find the gzip application that will be used to decompress or unzip our datafeed.

We then tell gzip which file is to be unzipped (/path/to/your/feeds/feed/feed.csv.gz) and what to call it once unzipping has been completed (feed.csv)


/path/to/your/feeds/feed/feed.csv.gz > /path/to/your/feeds/feed/feed.csv

Finally we have the last line of our script:


rm /path/to/your/feeds/feed/feed.csv.gz

“rm” simply stands for remove – and this is exactly what we are doing here. Removing the now redundant zipped file we originally downloaded. (as we have now unzipped and copied it)

Easy!

So – now that we know what the script does and how it does it all we need is a valid datafeed to download.

For this example I am using affiliatewindow but the principle is the same with any network (or website) that offers the ability to download datafeeds remotely (such as linkshare, webgains etc.)

If you are familar with affiliatewindow you know that they have a cool wizard called ‘create a feed’ in order to let you build and customise your own datafeeds.

At the end of this process we are presented with a couple of options; the url to our datafeed or a download button.

Datafeed url from Affiliate Window

Ensure you select gzip as the compression mode and then copy and paste the datafeed url replacing http://datafeedurl/gzip/ from the first line of our script.

ALMOST DONE

Now save the script as import.sh (or whatever you like) and upload it to your hosting package using an FTP Client such as filezilla (or whatever you are comfortable with)

You should now be able to login to your Control Panel and setup a Cron Job (sometimes called Cron Task, or Scheduled Task)

Most cron managament areas have a similar look and feel and often look something like this:

Cron managment tool from Heart Internets control panel

Enter the path to the script you have just uploaded and set the frequency you would like it to run – and you are done!!

Simple as that

COUPLE OF GOTCHAS

There are a couple of things to watch out for when setting up cron jobs like this.

  • Firstly you MUST ensure that the script you have created has the correct permmsions to be executed by the server. You can do this by chmodding the script to 711 or 755. To do this in Filezilla browse to and right click on your script before selecting “File Permissions”. Now type 711 or 755 into the “Numeric Value” text box before hitting “Ok”.
  • It is also worth checking that the folder you intend to upload your datafeeds too also has the correct permissions – again 711 or 755 should suffice. But if in doubt 777 is fine.
  • Another common mistake is getting the paths wrong within the script itself.If you run the script and nothing appears to be happening double and triple check all the paths to ensure they are correct.
  • You can also check the datafeed url itself is working correctly by simply pasting it directly into your browser.If everything is ok it should prompt you to download a zipped folder. If not double check you have copied the url corrrectly from your relevant network.

GREAT – SO WHAT NOW?

Now you are free to import or update your site as usual – if you are running a script like price tapestry this can also be automated.

Simply create another cron job and point it at the import file that comes with PT (within the scripts directory).

Like this:


/usr/bin/php /home/sites/yourdomain.com/public_html/scripts/import.php @MODIFIED

Notice the @MODIFIED tag at the end of the line – this means the script will only run on any records that have changed since the last import.

Good luck – and have fun!

If you have any thoughts, ideas or comments – feel free to leave them below.