Move2silo: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
Boiarino (talk | contribs)
No edit summary
Boiarino (talk | contribs)
No edit summary
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
Data moving process initiated by cronjob, running from 'clascron' account, for example for HPS experiment on clondaq5:
0,10,20,30,40,50 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/stage5 /data/stage5 /mss/hallb/hps/physrun2019/data -jvm:-Dfile.transfer.client.displayrates=true
and for CLAS12 run group A on clondaq6:
0,10,20,30,40,50 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/stage6 /data/stage6 /mss/clas12/rg-a/data -jvm:-Dfile.transfer.client.displayrates=true
Cronjob files for different run periods can be found in ''~clascron/backup/'' directory.
'''OLD'''
In auto.master make sure the auto.direct is mounted without timeout:
In auto.master make sure the auto.direct is mounted without timeout:


Line 17: Line 32:
  * * * * * /bin/csh -c "(sleep 48; rm -f /usr/logs/disks/clondaq6_lustre) >>&! /dev/null"
  * * * * * /bin/csh -c "(sleep 48; rm -f /usr/logs/disks/clondaq6_lustre) >>&! /dev/null"


Log files are in /usr/local/scicomp/jasmine/log/jmigrate/data-totape/.
Log files are in
/usr/local/scicomp/jasmine/log/jmigrate/data-totape/.


If job stuck, remove the lock file:
If job stuck, remove the lock file:
Line 31: Line 47:
  clascron 128847  0.0  1.0 7307240 662088 ?      D    Dec14  0:00  \_ java -DJMirror.minFileModif.........
  clascron 128847  0.0  1.0 7307240 662088 ?      D    Dec14  0:00  \_ java -DJMirror.minFileModif.........


then job is in uninterruptable state and cannot not be kiiled by 'kill -9'. Other stuck jobs can be killed, they became <defunct>.
job marked as 'D' is in uninterruptable state and cannot not be kiiled by 'kill -9'. Other stuck jobs can be killed, they became <defunct>.
 
To see files on tape, ssh to ifarm65 and type:
 
ls -ltrh /mss/clas12/er-a/data/ | tail
 
To see files still in cache:
 
ls -ltrh /cache/mss/clas12/er-a/data/ | tail
 
To retrieve from the tape :
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y #copy to the current directory
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /path/to/dir #copy to the directory with path
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /mss/clas12/er-a/data/clas_00XXXX.evio.Z /path/to/dir # copy two files
 
Mover's status can be checked at https://scicomp.jlab.org/scicomp/moverStatus
 
Job's schedule can be checked at https://scicomp.jlab.org/scicomp/tapeJob/scheduled
 
'''NOTE''': problems were observed in Dec 2017 moving data with NFS-mounted scidaqgw10b or scidaqgw10f. To fix stuched job, jcancel have to be issued on movers side. When NFS mount was removed and process started to use network socket, everything works.

Latest revision as of 13:07, 5 December 2021

Data moving process initiated by cronjob, running from 'clascron' account, for example for HPS experiment on clondaq5:

0,10,20,30,40,50 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/stage5 /data/stage5 /mss/hallb/hps/physrun2019/data -jvm:-Dfile.transfer.client.displayrates=true

and for CLAS12 run group A on clondaq6:

0,10,20,30,40,50 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/stage6 /data/stage6 /mss/clas12/rg-a/data -jvm:-Dfile.transfer.client.displayrates=true

Cronjob files for different run periods can be found in ~clascron/backup/ directory.


OLD

In auto.master make sure the auto.direct is mounted without timeout:

/-      /etc/auto.direct  --timeout 0

If changed, restart autofs (on RHEL7 do 'service autofs restart'). To forcibly unmount do 'umount -lf /xxx/yyy'.

Following have to be in auto.direct:

/lustre/scicomp/jasmine/fairy2 -fstype=nfs,rw,async,vers=3 scidaqgw10b:/lustre/scicomp/jasmine/fairy2

Following cronjob have to be running as user 'clascron' on machine moving data to tape:

# Scan for to-tape files every 15 minutes
10,25,40,55 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/totape /data/totape /mss/clas12/er-a/data -jvm:-Dfile.transfer.client.displayrates=true
# access occasionally to keep it visible
* * * * * /bin/csh -c "(ls -al /lustre/scicomp/jasmine/fairy2/) >>&! /usr/logs/disks/clondaq6_lustre"
* * * * * /bin/csh -c "(sleep 48; rm -f /usr/logs/disks/clondaq6_lustre) >>&! /dev/null"

Log files are in

/usr/local/scicomp/jasmine/log/jmigrate/data-totape/.

If job stuck, remove the lock file:

rm /tmp/jmigrate-data-totape.lock

Useful command to check process status:

ps auxf | grep java

If see something like

clascron 128847  0.0  1.0 7307240 662088 ?      D    Dec14   0:00  \_ java -DJMirror.minFileModif.........

job marked as 'D' is in uninterruptable state and cannot not be kiiled by 'kill -9'. Other stuck jobs can be killed, they became <defunct>.

To see files on tape, ssh to ifarm65 and type:

ls -ltrh /mss/clas12/er-a/data/ | tail

To see files still in cache:

ls -ltrh /cache/mss/clas12/er-a/data/ | tail

To retrieve from the tape :

jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y #copy to the current directory
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /path/to/dir #copy to the directory with path
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /mss/clas12/er-a/data/clas_00XXXX.evio.Z /path/to/dir # copy two files

Mover's status can be checked at https://scicomp.jlab.org/scicomp/moverStatus

Job's schedule can be checked at https://scicomp.jlab.org/scicomp/tapeJob/scheduled

NOTE: problems were observed in Dec 2017 moving data with NFS-mounted scidaqgw10b or scidaqgw10f. To fix stuched job, jcancel have to be issued on movers side. When NFS mount was removed and process started to use network socket, everything works.