=========================
Compute: Space Management
=========================

.. highlight:: none

.. contents::
   :local:

.. _`ris-compute-transfer-policy`:

Compute Data Transfer Policy
============================

Supported methods to transfer data into and out of the Scientific Compute
Platform are:

#. :ref:`Globus <ris-globus>`
#. :ref:`Rclone <ris-rclone>`
#. Submitting a job using a data transfer tool (e.g. ``rsync``, ``wget``,
   ``curl``, ``scp``)

   - Please do not use these data transfer tools directly on the compute1
     client nodes. This type of activity can slow down the client node and
     negatively affect all users connected to the client nodes.
   - If you require assistance submitting jobs using these tools, please
     open a ticket at our `Service Desk <https://servicedesk.ris.wustl.edu>`_.


Storage Service Integration Points
==================================

In addition to the Storage Service SMB Interface to a given Allocation,
the Compute Service reveals another three interfaces to the Storage
Service.  This brings the total number of different types of locations
where RIS Storage Space can be consumed to four:

1. Storage Service Allocations
#. Storage Service Allocations' Caches
#. Home directories in the Compute Platform
#. Scratch space directories in the Compute Platform

Each of these types of locations have different methods and policies
for managing and inspecting usage.  This helps balance the availability
of space and performance with the capabilities of the resources that
provide them.

Checking Storage Usage
======================

Storage Service Allocations
---------------------------

An accurate report of a Storage Service Allocation's space consumption
can only be obtained through the Storage Service SMB Interface.  This is
a :ref:`known limitation <storage-limitations-calculating-free-space>`
of the Storage Service.

SMB Inteface
~~~~~~~~~~~~

..
   maybe a windows or mac user could compose/reference more detailed instructions

The easiest way to do this is to mount the allocation to your desktop,
right-click the mounted folder, and select the appropriate menu option
for more information.

Alternatively, maximum and available space can be obtained with ``smbclient``::

   $ smbclient -A .smb_creds -k //storage1.ris.wustl.edu/ris -c du

   		137438953472 blocks of size 1024. 135136394752 blocks available
   Total number of bytes: 14619

The "blocks of size 1024" figure is the limit of how much total space
can be consumed, and the "blocks available" figure is how much of that
limit is not consumed. [#f2]_

These values can be converted to TiB [#f1]_  ::

   $ bc -q
   scale=2
   137438953472/1024^3
   128.00
   135136394752/1024^3
   125.85

Or subtracted from each other to equate space used::

   (137438953472-135136394752)/1024^3
   2.14

Telling us that this Allocation is using 2.14 TiB out of a 128 TiB limit.

Cache Interface
~~~~~~~~~~~~~~~
.. _`storage-compute-cache-interface`:

The cache interface can be measured more simply using a shell on a
Compute Platform execution node or condo::

   $ df -Ph /storage1/fs1/ris
   Filesystem      Size  Used Avail Use% Mounted on
   cache2-fs1      128T  3.3T  125T   3% /storage1/fs1

Notice that this usage is over a TiB more than what SMB reported!
That's because it is *a different location*\.  Rewriting the same 1 TiB
file with different data three times would consume 3 TiB on the cache,
but ultimately only use 1 TiB on the Storage Service Allocation, where
the data gets flushed for "permanent" storage.

.. _`storage-compute-cache-conflicts`:

Similarly, data written over SMB, the client (login) nodes, or the interactive
nodes but not used by a Compute Platform execution node or condo, will not be
pulled into the cache, possibly resulting in a cache interface usage that is
lower than the underlying Storage Service Allocation's actual usage.

Again, this discrepency is a :ref:`known limitation
<storage-limitations-calculating-free-space>` of the Storage Service.

Because data can be modified from the execution node and condo cache interface, the
SMB interface, the client (login) nodes, and the interactive nodes, the possibility
of conflicts arises. Conflicts happen when a file is deleted via SMB before the
same file has finished writing back to the home fileset from the cache, or if a
file is modified at the same path from another source and cache before the cache has
time to write the file back to the home fileset. The cache fileset will detect these
conflicts and move the data into hidden directories for manual review, in a way to
prevent accidental loss of data in flight. The data in these hidden directories
counts towards the cache fileset usage.

If you encounter a conflict, contact the `RIS Service Desk`_
for assistance.

Compute Service Home Directories
--------------------------------

Every Compute Service User is assigned a limit of 9 GiB of home directory
space on the Compute Platform.  This space is restricted at the user
level, and can only be checked with the appropriate ``mmlsquota``
command::

   mmlsquota -u foouser rdcw-fs2:home1

For example, the current logged in user's home directory usage in
automatically scaled units can be obtained like so::

   $ mmlsquota -u $(id -nu) --block-size auto rdcw-fs2:home1
                            Block Limits                                               |     File Limits
   Filesystem Fileset    type         blocks      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace  Remarks
   cache1-fs1 home1      USR           1.37G         9G        10G          0     none |     5961       0        0        0     none cache1-gpfs-cluster.ris.wustl.edu

There is no SMB interface to this space, and ``df`` reports space for
the entire device, which is shared among all home directories.

Compute Service Scratch Space
-----------------------------

High-performance Scratch Space is typically allocated for each lab as
it is onboarded to the Compute Service.  This space is restricted at
the group level, which should represent an eponymous lab.  Because it
is a shared device like that for home directories, this usage must also
be inspected with the appropriate ``mmlsquota`` command, referencing a
group name and group quota on the scratch device::

   mmlsquota -g compute-foo scratch1-fs1

To see the usage of every compute group the current logged in user
belongs to, in automatically scaled units, try something like::

   $ groups | grep -Po 'compute-\S+' | while read COMPUTE_GROUP
   > do ls -ld "/scratch1/fs1/${COMPUTE_ALLOCATION}"
   > mmlsquota -g "$COMPUTE_GROUP" --block-size auto scratch1-fs1
   > done
   drwxr-sr-x. 6 root compute-ris 4096 Aug 23 02:43 /scratch1/fs1/ris

   Disk quotas for group compute-ris (gid 1208827):
                            Block Limits                                    |     File Limits
   Filesystem type         blocks      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace  Remarks
   scratch1-fs1 GRP          2.168T         3T         4T          0     none | 33226772       0        0        0     none
   drwx--S---. 2 root compute-corcoran.william.p 4096 Aug 23 02:24 /scratch1/fs1/corcoran.william.p

   Disk quotas for group compute-corcoran.william.p (gid 1262586):
                            Block Limits                                    |     File Limits
   Filesystem type         blocks      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace  Remarks
   scratch1-fs1 GRP               0        50T        50T          0     none |        3       0        0        0     none

Staging Data
============

The Compute Service home directories and Scratch Space are not accessible
from outside of the Compute Platform.  Data should be staged to these
locations from a Storage Service Allocation, and computational result
or job output data should then be staged back to a Storage Service
Allocation.

Addtional Notes
===============

.. warning::

   Note that your *compute lab group* and your *storage lab group*
   **are not the same**\.  That is, the membership of ``storage-foo``
   and ``compute-foo`` are likely intentionally different, for specific
   and meaningful reasons.

.. [#f1]

   Binary terabytes, or "tebibytes", are base 1024 (that is, there
   are 1024 gibibytes in every tebibyte, and so on).  This comes from
   the interval between SI suffixes on computers historically being
   represented by ten binary digits, which is 1024 units in decimal.
   They are commonly labeled as "TB", although this can lead to a
   problematic loss of precision when comparing with values calculated
   using base 1000.

.. [#f2]

   The figures representing limits and available space are not necessarily
   a *guarantee* that space is available.  It is possible for space to
   be overprovisioned.  This happens when the total space available to
   *all users* is less than the sum of their quotas.  Thus, as every
   user approaches their quota, there is a potential lower effective
   limit if the *total space for all users* is exhausted first.

.. _`RIS Service Desk`: https://servicedesk.ris.wustl.edu/