Move2silo

From CLONWiki
Revision as of 12:20, 14 June 2019 by Boiarino (talk | contribs)
Jump to navigation Jump to search

Data moving process initiated by cronjob, running from 'clascron' account, for example for HPS experiment on clondaq5:

0,10,20,30,40,50 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/stage5 /data/stage5 /mss/hallb/hps/physrun2019/data -jvm:-Dfile.transfer.client.displayrates=true

and for CLAS12 run group A on clondaq6:

0,10,20,30,40,50 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/stage6 /data/stage6 /mss/clas12/rg-a/data -jvm:-Dfile.transfer.client.displayrates=true

Cronjob files for different run periods can be found in ~clascron/backup/ directory.


OLD

In auto.master make sure the auto.direct is mounted without timeout:

/-      /etc/auto.direct  --timeout 0

If changed, restart autofs (on RHEL7 do 'service autofs restart'). To forcibly unmount do 'umount -lf /xxx/yyy'.

Following have to be in auto.direct:

/lustre/scicomp/jasmine/fairy2 -fstype=nfs,rw,async,vers=3 scidaqgw10b:/lustre/scicomp/jasmine/fairy2

Following cronjob have to be running as user 'clascron' on machine moving data to tape:

# Scan for to-tape files every 15 minutes
10,25,40,55 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/totape /data/totape /mss/clas12/er-a/data -jvm:-Dfile.transfer.client.displayrates=true
# access occasionally to keep it visible
* * * * * /bin/csh -c "(ls -al /lustre/scicomp/jasmine/fairy2/) >>&! /usr/logs/disks/clondaq6_lustre"
* * * * * /bin/csh -c "(sleep 48; rm -f /usr/logs/disks/clondaq6_lustre) >>&! /dev/null"

Log files are in

/usr/local/scicomp/jasmine/log/jmigrate/data-totape/.

If job stuck, remove the lock file:

rm /tmp/jmigrate-data-totape.lock

Useful command to check process status:

ps auxf | grep java

If see something like

clascron 128847  0.0  1.0 7307240 662088 ?      D    Dec14   0:00  \_ java -DJMirror.minFileModif.........

job marked as 'D' is in uninterruptable state and cannot not be kiiled by 'kill -9'. Other stuck jobs can be killed, they became <defunct>.

To see files on tape, ssh to ifarm65 and type:

ls -ltrh /mss/clas12/er-a/data/ | tail

To see files still in cache:

ls -ltrh /cache/mss/clas12/er-a/data/ | tail

To retrieve from the tape :

jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y #copy to the current directory
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /path/to/dir #copy to the directory with path
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /mss/clas12/er-a/data/clas_00XXXX.evio.Z /path/to/dir # copy two files

Mover's status can be checked at https://scicomp.jlab.org/scicomp/index.html#/moverStatus.

Job's schedule can be checked at https://scicomp.jlab.org/scicomp/index.html#/tapeJobsSchedule.

NOTE: problems were observed in Dec 2017 moving data with NFS-mounted scidaqgw10b or scidaqgw10f. To fix stuched job, jcancel have to be issued on movers side. When NFS mount was removed and process started to use network socket, everything works.