Move2silo: Difference between revisions
No edit summary |
No edit summary |
||
Line 62: | Line 62: | ||
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /mss/clas12/er-a/data/clas_00XXXX.evio.Z /path/to/dir # copy two files | jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /mss/clas12/er-a/data/clas_00XXXX.evio.Z /path/to/dir # copy two files | ||
Mover's status can be checked at https://scicomp.jlab.org/scicomp/ | Mover's status can be checked at https://scicomp.jlab.org/scicomp/tapeJob/scheduled | ||
Job's schedule can be checked at https://scicomp.jlab.org/scicomp/ | Job's schedule can be checked at https://scicomp.jlab.org/scicomp/tapeJob/scheduled | ||
'''NOTE''': problems were observed in Dec 2017 moving data with NFS-mounted scidaqgw10b or scidaqgw10f. To fix stuched job, jcancel have to be issued on movers side. When NFS mount was removed and process started to use network socket, everything works. | '''NOTE''': problems were observed in Dec 2017 moving data with NFS-mounted scidaqgw10b or scidaqgw10f. To fix stuched job, jcancel have to be issued on movers side. When NFS mount was removed and process started to use network socket, everything works. |
Revision as of 13:05, 5 December 2021
Data moving process initiated by cronjob, running from 'clascron' account, for example for HPS experiment on clondaq5:
0,10,20,30,40,50 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/stage5 /data/stage5 /mss/hallb/hps/physrun2019/data -jvm:-Dfile.transfer.client.displayrates=true
and for CLAS12 run group A on clondaq6:
0,10,20,30,40,50 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/stage6 /data/stage6 /mss/clas12/rg-a/data -jvm:-Dfile.transfer.client.displayrates=true
Cronjob files for different run periods can be found in ~clascron/backup/ directory.
OLD
In auto.master make sure the auto.direct is mounted without timeout:
/- /etc/auto.direct --timeout 0
If changed, restart autofs (on RHEL7 do 'service autofs restart'). To forcibly unmount do 'umount -lf /xxx/yyy'.
Following have to be in auto.direct:
/lustre/scicomp/jasmine/fairy2 -fstype=nfs,rw,async,vers=3 scidaqgw10b:/lustre/scicomp/jasmine/fairy2
Following cronjob have to be running as user 'clascron' on machine moving data to tape:
# Scan for to-tape files every 15 minutes 10,25,40,55 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/totape /data/totape /mss/clas12/er-a/data -jvm:-Dfile.transfer.client.displayrates=true # access occasionally to keep it visible * * * * * /bin/csh -c "(ls -al /lustre/scicomp/jasmine/fairy2/) >>&! /usr/logs/disks/clondaq6_lustre" * * * * * /bin/csh -c "(sleep 48; rm -f /usr/logs/disks/clondaq6_lustre) >>&! /dev/null"
Log files are in
/usr/local/scicomp/jasmine/log/jmigrate/data-totape/.
If job stuck, remove the lock file:
rm /tmp/jmigrate-data-totape.lock
Useful command to check process status:
ps auxf | grep java
If see something like
clascron 128847 0.0 1.0 7307240 662088 ? D Dec14 0:00 \_ java -DJMirror.minFileModif.........
job marked as 'D' is in uninterruptable state and cannot not be kiiled by 'kill -9'. Other stuck jobs can be killed, they became <defunct>.
To see files on tape, ssh to ifarm65 and type:
ls -ltrh /mss/clas12/er-a/data/ | tail
To see files still in cache:
ls -ltrh /cache/mss/clas12/er-a/data/ | tail
To retrieve from the tape :
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y #copy to the current directory jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /path/to/dir #copy to the directory with path jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /mss/clas12/er-a/data/clas_00XXXX.evio.Z /path/to/dir # copy two files
Mover's status can be checked at https://scicomp.jlab.org/scicomp/tapeJob/scheduled
Job's schedule can be checked at https://scicomp.jlab.org/scicomp/tapeJob/scheduled
NOTE: problems were observed in Dec 2017 moving data with NFS-mounted scidaqgw10b or scidaqgw10f. To fix stuched job, jcancel have to be issued on movers side. When NFS mount was removed and process started to use network socket, everything works.