In this post, I will detail the very simple process of pulling a copy of your Google Drive data to your VPS using the rclone utility. This is useful if you ever want to convert from Google Drive to owncloud, for example, or if you simply wish to have a secondary copy of your Google Drive data on a headless VPS somewhere. I will also show you how to script this, so that the script will run every night and pull only the changed and new files down from Google to your VPS.
Real World Example
I use Google Drive and Google Photos, with Google Drive configured to show the “Google Photos” folder in it. I’m paranoid about my data, and keep multiple copies of them. I keep the Google Drive data, the local copy on my desktop machine at home, and a copy on a VPS that I have.
To sync the VPS with Google Drive, I use rclone. rclone stands above all as the easiest solution to sync your Google Drive data on a headless (eg: no GUI) server.
This was done on CentOS 7.
Install and Configure rclone to access Google Drive
This step is highly dependent on the OS of your VPS. For my example, I’ll be using CentOS 7.
yum install rclone
– Choose option “n” for new remote
– name > Type in a name at the prompt (eg: google)
– Choose Option 7 from the list for Google Drive
– Hit enter at the client_id prompt
– Hit enter at the client_secret prompt
– Choose option “N” for headless machine setup
Now, copy and paste the presented URL into a web browser, and copy and paste the response into the terminal session of the headless box. This authorizes the rclone application to access your Google Drive account.
Dedupe your Google Drive file names
This step is important, primarily for Google Photos. Google Photos allows for duplicate filenames, which really trips up rclone. For example, the Google Photos\year directories can contain any number of photos:
This typically happens when multiple cameras use the same naming convention for the files. The problem here, is each time rclone attempts to sync with Google, it will notice a checksum difference between the VPS and Google Drive copy for the files with a same filename and proceed to pull as it assumes changes have been made. This results in an inconsistent data set on your VPS, and a ton of wasted bandwidth, as the source will never be the same as the destination.
For me, without deduping, I was missing about 3,000 pictures and was consuming gigs of data per sync transaction.
So, we use rclone to dedupe the data on Google Drive:
rclone dedupe google:/ --dedupe-mode rename
What this does is as follows:
-Detects any exact duplicates (filename + file size) and deletes all but one copy.
-Detects any duplicate filenames (e.g.: 002.jpg, 002.jpg) with different sizes, and then appends a -num to it. So our above example then becomes:
So now rclone has a clean dataset from which to pull from.
Create a script
I created a very simple script to perform the data ingest.
#!/bin/sh set -x #download all google drive contents rclone --config /path/to/.rclone.conf sync "google:/" /pathtodestination/
Set the script to +x
chmod +x scriptname
Test the script
Time to completion will depend the data ingest speed of your VPS and amount of data in Google Drive.
Create a cronjob for your sync
Now that the script has done its initial pull, you will want to set up a cronjob that will run nightly and pull all new and changed files down from Google to your VPS.
Here is the contents of my
. Adjust to your environment.
SHELL=/bin/sh PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO= HOME=/ # For details see man 4 crontabs # Example of job definition: # .---------------- minute (0 - 59) # | .------------- hour (0 - 23) # | | .---------- day of month (1 - 31) # | | | .------- month (1 - 12) OR jan,feb,mar,apr ... # | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat # | | | | | # * * * * * user-name command to be executed 01 02 * * * /bin/sh /path/to/backup.sh >/var/log/backup-log.txt 2>&1
Hope this helps.