Move2silo: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
Boiarino (talk | contribs)
No edit summary
Boiarino (talk | contribs)
No edit summary
 
(One intermediate revision by the same user not shown)
Line 62: Line 62:
  jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /mss/clas12/er-a/data/clas_00XXXX.evio.Z /path/to/dir # copy two files
  jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /mss/clas12/er-a/data/clas_00XXXX.evio.Z /path/to/dir # copy two files


Mover's status can be checked at https://scicomp.jlab.org/scicomp/index.html#/moverStatus.
Mover's status can be checked at https://scicomp.jlab.org/scicomp/moverStatus


Job's schedule can be checked at https://scicomp.jlab.org/scicomp/index.html#/tapeJobsSchedule.
Job's schedule can be checked at https://scicomp.jlab.org/scicomp/tapeJob/scheduled


'''NOTE''': problems were observed in Dec 2017 moving data with NFS-mounted scidaqgw10b or scidaqgw10f. To fix stuched job, jcancel have to be issued on movers side. When NFS mount was removed and process started to use network socket, everything works.
'''NOTE''': problems were observed in Dec 2017 moving data with NFS-mounted scidaqgw10b or scidaqgw10f. To fix stuched job, jcancel have to be issued on movers side. When NFS mount was removed and process started to use network socket, everything works.

Latest revision as of 13:07, 5 December 2021

Data moving process initiated by cronjob, running from 'clascron' account, for example for HPS experiment on clondaq5:

0,10,20,30,40,50 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/stage5 /data/stage5 /mss/hallb/hps/physrun2019/data -jvm:-Dfile.transfer.client.displayrates=true

and for CLAS12 run group A on clondaq6:

0,10,20,30,40,50 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/stage6 /data/stage6 /mss/clas12/rg-a/data -jvm:-Dfile.transfer.client.displayrates=true

Cronjob files for different run periods can be found in ~clascron/backup/ directory.


OLD

In auto.master make sure the auto.direct is mounted without timeout:

/-      /etc/auto.direct  --timeout 0

If changed, restart autofs (on RHEL7 do 'service autofs restart'). To forcibly unmount do 'umount -lf /xxx/yyy'.

Following have to be in auto.direct:

/lustre/scicomp/jasmine/fairy2 -fstype=nfs,rw,async,vers=3 scidaqgw10b:/lustre/scicomp/jasmine/fairy2

Following cronjob have to be running as user 'clascron' on machine moving data to tape:

# Scan for to-tape files every 15 minutes
10,25,40,55 * * * * /usr/local/scicomp/jasmine/bin/jmigrate /data/totape /data/totape /mss/clas12/er-a/data -jvm:-Dfile.transfer.client.displayrates=true
# access occasionally to keep it visible
* * * * * /bin/csh -c "(ls -al /lustre/scicomp/jasmine/fairy2/) >>&! /usr/logs/disks/clondaq6_lustre"
* * * * * /bin/csh -c "(sleep 48; rm -f /usr/logs/disks/clondaq6_lustre) >>&! /dev/null"

Log files are in

/usr/local/scicomp/jasmine/log/jmigrate/data-totape/.

If job stuck, remove the lock file:

rm /tmp/jmigrate-data-totape.lock

Useful command to check process status:

ps auxf | grep java

If see something like

clascron 128847  0.0  1.0 7307240 662088 ?      D    Dec14   0:00  \_ java -DJMirror.minFileModif.........

job marked as 'D' is in uninterruptable state and cannot not be kiiled by 'kill -9'. Other stuck jobs can be killed, they became <defunct>.

To see files on tape, ssh to ifarm65 and type:

ls -ltrh /mss/clas12/er-a/data/ | tail

To see files still in cache:

ls -ltrh /cache/mss/clas12/er-a/data/ | tail

To retrieve from the tape :

jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y #copy to the current directory
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /path/to/dir #copy to the directory with path
jget /mss/clas12/er-a/data/clas_00XXXX.evio.Y /mss/clas12/er-a/data/clas_00XXXX.evio.Z /path/to/dir # copy two files

Mover's status can be checked at https://scicomp.jlab.org/scicomp/moverStatus

Job's schedule can be checked at https://scicomp.jlab.org/scicomp/tapeJob/scheduled

NOTE: problems were observed in Dec 2017 moving data with NFS-mounted scidaqgw10b or scidaqgw10f. To fix stuched job, jcancel have to be issued on movers side. When NFS mount was removed and process started to use network socket, everything works.