Moving Data With Rclone

What is Rclone?

From https://hub.docker.com/r/rclone/rclone:

Rclone (“rsync for cloud storage”) is a command line program
to sync files and directories to and from different cloud storage providers.

Overview

You will install rclone on your local computer. Through the command rclone config, you will create a credential file for rclone to connect to your WUSTL Box on your local computer. By copying the file to your home directory on RIS compute1 client, you will be able to access your Box storage through your rclone container on a compute1 exec node.

Prerequisites

  1. A WUSTL Box account

  2. A user account for RIS storage1 and compute1 services

Building an Endpoint

I. Installation

  • For macOS users, run the following command to install rclone with Homebrew.

    > brew install rclone
    
  • For Windows users, download the relevant archive file from https://rclone.org/downloads/ for your environment. Then, extract the rclone.exe binary from the archive.

  • For Linux/BSD users, run the following command to install rclone.

    > curl https://rclone.org/install.sh | sudo bash
    

II. Configuration

  1. Creating the configuration file for the connection to WUSTL Box

  1. Open a terminal where the rclone has been installed.

  2. Run rclone config to start the interactive process.

> rclone config
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q>
  1. Type n to setup a new remote connection. It will ask for the name for your new remote connection.

n/s/q> n
name>
  1. Type Box for example, as the name of your new remote connection. It will ask for the storage type.

name> Box
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
  1. Type box for the storage type.

Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / 1Fichier
   \ "fichier"
 2 / Alias for an existing remote
   \ "alias"
 3 / Amazon Drive
   \ "amazon cloud drive"
 4 / Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)
   \ "s3"
 5 / Backblaze B2
   \ "b2"
 6 / Box
   \ "box"
 7 / Cache a remote
   \ "cache"
 8 / Citrix Sharefile
   \ "sharefile"
 9 / Dropbox
   \ "dropbox"
10 / Encrypt/Decrypt a remote
   \ "crypt"
11 / FTP Connection
   \ "ftp"
12 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
13 / Google Drive
   \ "drive"
14 / Google Photos
   \ "google photos"
15 / Hubic
   \ "hubic"
16 / In memory object storage system.
   \ "memory"
17 / Jottacloud
   \ "jottacloud"
18 / Koofr
   \ "koofr"
19 / Local Disk
   \ "local"
20 / Mail.ru Cloud
   \ "mailru"
21 / Mega
   \ "mega"
22 / Microsoft Azure Blob Storage
   \ "azureblob"
23 / Microsoft OneDrive
   \ "onedrive"
24 / OpenDrive
   \ "opendrive"
25 / OpenStack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
26 / Pcloud
   \ "pcloud"
27 / Put.io
   \ "putio"
28 / QingCloud Object Storage
   \ "qingstor"
29 / SSH/SFTP Connection
   \ "sftp"
30 / Sugarsync
   \ "sugarsync"
31 / Tardigrade Decentralized Cloud Storage
   \ "tardigrade"
32 / Transparently chunk/split large files
   \ "chunker"
33 / Union merges the contents of several upstream fs
   \ "union"
34 / Webdav
   \ "webdav"
35 / Yandex Disk
   \ "yandex"
36 / http Connection
   \ "http"
37 / premiumize.me
   \ "premiumizeme"
38 / seafile
   \ "seafile"
Storage> box
** See help for box backend at: https://rclone.org/box/ **

6. Leave blank for the following questions about: client_id, client_secret, box_config_file, access_token.

OAuth Client Id
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_id>
OAuth Client Secret
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret>
Box App config.json location
Leave blank normally.

Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.

Enter a string value. Press Enter for the default ("").
box_config_file>
Box App Primary Access Token
Leave blank normally.
Enter a string value. Press Enter for the default ("").
access_token>
  1. Type user for the option to delegate the connection role to rclone.

Enter a string value. Press Enter for the default ("user").
Choose a number from below, or type in your own value
 1 / Rclone should act on behalf of a user
   \ "user"
 2 / Rclone should act on behalf of a service account
   \ "enterprise"
box_sub_type> user

8. Use the default values for the rest of the questions for: Edit advanced config? Use auto config? Then, It will provide you a link and wait for code.

Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n>
Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n>
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth?state=#####################
Log in and authorize rclone for access
Waiting for code...
  1. Open your browser to the link on your machine where rclone config has been running on.

  2. Login to WUSTL Box with your credential. Approve the access on your Duo App.

../../_images/Box_login.png

11. Grant the access for rclone to connect to Box. Then, your will see the confirmation of the process. An email notification from box will be sent to you with the subject: Box login from “rclone”.

../../_images/Grant_rclone_the_Box_access.png ../../_images/Rclone_config_success.png

12. Close the browser. The configuration for rclone connection to Box will be displayed on your terminal. For example:

Got code
--------------------
[Box]
type = box
box_sub_type = user
token = {"access_token":"###########################","token_type":"bearer","refresh_token":"##############################################","expiry":"2020-12-11T12:45:22.744758-06:00"}
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d>
  1. Type y if the configuration content looks OK. Then, you will see the new remote connection in the remotes list.

y/e/d> y
Current remotes:

Name                 Type
====                 ====
Box                  box
  1. Type q to finish the interactive process.

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q
  1. Copying the credential file to the home directory on compute1

  1. Confirm the rclone configuration file from the terminal where rclone config has been run.

  1. On Mac and Linux:

> ls -la $HOME/.config/rclone/rclone.conf
  1. On Windows (using CMD or PowerShell):

> dir %APPDATA%/rclone/rclone.conf

Windows Command Assumptions

The above command assumes the rclone configuration file is its default folder. Please see the rclone documentation for more information.

It is also assumed that the %APPDATA% environment variable is set to the correct location. Replace %APPDATA% with the correct path if needed.

  1. (Optional) Verify the content of the file to see the remote storage you’ve just created.

  1. On Mac and Linux:

> view $HOME/.config/rclone/rclone.conf
  1. On Windows (using CMD or PowerShell):

> type %APPDATA%/rclone/rclone.conf
  1. Copy the file to your compute1 home directory. For example (replacing <wustlkey> with your WUSTL key):

  1. On Mac and Linux:

> scp $HOME/.config/rclone/rclone.conf <wustlkey>@compute1-client-1.ris.wustl.edu:~/.rclone.conf
  1. On Windows (using CMD or PowerShell):

> scp %APPDATA%/rclone/rclone.conf <wustlkey>@compute1-client-1.ris.wustl.edu:~/.rclone.conf

III. Test

  1. Run ssh to a compute1 client from a terminal. You will get a shell at your compute1 home.

  2. Verify the rclone configuration file at your home directory.

> ls -la .rclone.conf
  1. Run bsub to start a rclone container on a compute1 exec node.

> LSF_DOCKER_ENTRYPOINT=/bin/sh bsub -Is -G group-name -q general-interactive -a 'docker(rclone/rclone)' /bin/sh

d. Run rclone lsd to check the connection from compute1 exec node to your Box storage by listing the directories. For example:

> rclone lsd Box:/

Use Case

From Box to Storage1

Example: A user has a file File_A in the WUSTL Box. The file needs to be moved to the storage1 space /storage1/fs1/${STORAGE_ALLOCATION}/Active.

  1. Run ssh to a compute1 client from a terminal. For example:

> ssh compute1-client-1.ris.wustl.edu
  1. Verify the rclone configuration file is in the home directory.

> ls -la $HOME/.rclone.conf
  1. Prepare to mount the storage1 space to the job.

> export LSF_DOCKER_VOLUMES=/storage1/fs1/${STORAGE_ALLOCATION}/Active:/storage1/fs1/${STORAGE_ALLOCATION}/Active
  1. Rub bsub to start a rclone container.

> LSF_DOCKER_ENTRYPOINT=/bin/sh bsub -Is -G group-name -q general-interactive -a 'docker(rclone/rclone)' /bin/sh
  1. Copy File_A from the WUSTL Box to the storage1 space.

> rclone ls Box:/File_A
314572800 File_A

> ls /my_storage1/File_A
ls: /my_storage1/File_A: No such file or directory

> rclone copy Box:/File_A /my_storage1/
  1. Verify the file in the storage1 space.

> ls /my_storage1/File_A
/my_storage1/File_A
  1. Exit the rclone container.

> exit