ERROR Book: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
'''CLAS12''' | '''CLAS12''' | ||
== Sergey B. Dec 2017 run: RESOLVED by always using even number ??? == | |||
TI VME (Local) address = 0x00a80000 (0xabf81000) | |||
tiReload: | |||
FPGA Re-Load ! | |||
CODAtcpServer: wait: coda request >download PROD< in progress | |||
ID: 0x7101 Firmware (type - revision): 0x3 - 0x07A (prodID = 1) | |||
... Configure as TI Slave... | |||
tiSetClockSource: Setting clock source to EXTERNAL-HFBR1 (1) | |||
tiSetTriggerSource: ERROR: Invalid Trigger Source (129). Must be between 0 and 11. | |||
FiberMeas: syncDelay_write=0x82828200 | |||
FiberMeas: syncDelay=0x8200820b | |||
FiberMeas: ERROR: TI Fiber Measurement failed! | |||
First Measurement != Second Measurement (60 != 61) | |||
tiInit: Fiber Measurement failure. Check fiber and/or fiber port, | |||
cannot find TI, ret=-1 - exit | |||
coda_roc is exiting, clear dma memory | |||
== Sergey B. Dec 2017 run: == | |||
== Sergey B. Dec 2017 run: == | |||
== Sergey B. Dec 2017 run: == | |||
Revision as of 13:43, 2 January 2018
CLAS12
Sergey B. Dec 2017 run: RESOLVED by always using even number ???
TI VME (Local) address = 0x00a80000 (0xabf81000) tiReload:
FPGA Re-Load !
CODAtcpServer: wait: coda request >download PROD< in progress
ID: 0x7101 Firmware (type - revision): 0x3 - 0x07A (prodID = 1)
... Configure as TI Slave... tiSetClockSource: Setting clock source to EXTERNAL-HFBR1 (1) tiSetTriggerSource: ERROR: Invalid Trigger Source (129). Must be between 0 and 11. FiberMeas: syncDelay_write=0x82828200 FiberMeas: syncDelay=0x8200820b
FiberMeas: ERROR: TI Fiber Measurement failed!
First Measurement != Second Measurement (60 != 61)
tiInit: Fiber Measurement failure. Check fiber and/or fiber port, cannot find TI, ret=-1 - exit coda_roc is exiting, clear dma memory
Sergey B. Dec 2017 run:
Sergey B. Dec 2017 run:
Sergey B. Dec 2017 run:
CLAS12 era - older records
Sergey B. 11-jan-2016: pcal0 test setup coda_roc error messages
.............. net_thread: waiting= 0 sending= 1 microsec per event (nev=120) proc_thread: waiting= 0 processing= 1568 microsec per event (nev=30) proc_thread: waiting= 0 processing= 1539 microsec per event (nev=30) ERROR in TI event header word (0x1d) ERROR in TI event header word (0x1d) ERROR in TI event header word (0x1d) ERROR in TI event header word (0x1d) ERROR in TI event header word (0x1d) ERROR in TI event header word (0x1d) ERROR in TI event header word (0x1e) ERROR in TI event header word (0x1e) ERROR in TI event header word (0x1e) ERROR in TI event header word (0x1e) ERROR in TI event header word (0x1e) ERROR in TI event header word (0x1e) ERROR in TI event header word (0x1e) ERROR in TI event header word (0x1e) ERROR in TI event header word (0x1f) ERROR in TI event header word (0x1f) ERROR in TI event header word (0x1f) ERROR in TI event header word (0x1f) ERROR in TI event header word (0x1f) ERROR in TI event header word (0x1f) ERROR in TI event header word (0x1f) ERROR in TI event header word (0x1f) ERROR in TI event header word (0x20) ERROR in TI event header word (0x20) ERROR in TI event header word (0x20) TI UNKNOWN data: [162] 0x00000000
Previous stuff [142][162] 0xa41fc94d [143][162] 0x00000000 [144][162] 0x01010003 [145][162] 0x000052cd [146][162] 0xa41fe9db [147][162] 0x00000000 [148][162] 0x01010003 [149][162] 0x000052ce [150][162] 0xa4200a68 [151][162] 0x00000000 [152][162] 0x01010003 [153][162] 0x000052cf [154][162] 0xa4202af5 [155][162] 0x00000000 [156][162] 0x01010003 [157][162] 0x000052d0 [158][162] 0xa4204b82 [159][162] 0x00000000 [160][162] 0x0000fafe [161][162] 0x0101d97e [162][162] 0x00000000 [163][162] 0xb9820003 [164][162] 0x0000faff [165][162] 0x0101fa0b [166][162] 0x00000000 [167][162] 0x01010003 [168][162] 0x0000fb00 [169][162] 0xb9830003 [170][162] 0x0000fb01 [171][162] 0x01013b25 End Of Previous stuff net_thread: waiting= 104 sending= 1 microsec per event (nev=1080) coda_roc is exiting, clear dma memory net_thread: waiting= 0 sending= 1 microsec per event (nev=120) bb_delete1: gefVmeFreeDmaBuf[ 0] has been called bb_delete1: gefVmeFreeDmaBuf[ 1] has been called Segmentation fault clasrun@pcal0:clasrun>
AFTER roc_reboot:
....... TI UNKNOWN data: [ 0] 0x00000000
Previous stuff [-20][ 0] 0x00000000 [-19][ 0] 0x900052f7 [-18][ 0] 0xa0000005 [-17][ 0] 0x00000000 [-16][ 0] 0x00001466 [-15][ 0] 0x000052f7 [-14][ 0] 0x00000000 [-13][ 0] 0x00000000 [-12][ 0] 0x900052f8 [-11][ 0] 0xa0000005 [-10][ 0] 0x00000000 [ -9][ 0] 0x00001466 [ -8][ 0] 0x000052f8 [ -7][ 0] 0x5693d19e [ -6][ 0] 0x00000001 [ -5][ 0] 0x8800011a [ -4][ 0] 0x000069de [ -3][ 0] 0x00010e14 [ -2][ 0] 0x000000e2 [ -1][ 0] 0xe10a0106 [ 0][ 0] 0x00000000 [ 1][ 0] 0x01013cf3 [ 2][ 0] 0x00006cbe [ 3][ 0] 0x6cbe0003 [ 4][ 0] 0x00006cbf [ 5][ 0] 0x01010003 [ 6][ 0] 0xfd400000 [ 7][ 0] 0x8d407e0e [ 8][ 0] 0xff116cc0 [ 9][ 0] 0x85420003 End Of Previous stuff coda_roc is exiting, clear dma memory bb_delete1: gefVmeFreeDmaBuf[ 0] has been called bb_delete1: gefVmeFreeDmaBuf[ 1] has been called bb_delete1: gefVmeFreeDmaBuf[ 2] has been called bb_delete1: gefVmeFreeDmaBuf[ 3] has been called bb_delete1: gefVmeFreeDmaBuf[ 4] has been called bb_delete1: gefVmeFreeDmaBuf[ 5] has been called bb_delete1: gefVmeFreeDmaBuf[ 6] has been called bb_delete1: gefVmeFreeDmaBuf[ 7] has been called bb_delete1: gefVmeFreeDmaBuf[ 8] has been called bb_delete1: gefVmeFreeDmaBuf[ 9] has been called bb_delete1: gefVmeFreeDmaBuf[10] has been called bb_delete1: gefVmeFreeDmaBuf[11] has been called bb_delete1: gefVmeFreeDmaBuf[12] has been called bb_delete1: gefVmeFreeDmaBuf[13] has been called bb_delete1: gefVmeFreeDmaBuf[14] has been called bb_delete1: gefVmeFreeDmaBuf[15] has been called clasrun@pcal0:clasrun>
roc_reboot did not helped. Flashed firmwares in all boards (TI, SD, FADC250s), roc_reboot again - now it works. Will watch it ...
Happened again. Replaced TI. Took 300M events, happened again:
.......................... net_thread: waiting= 0 sending= 1 microsec per event (nev=120) TI UNKNOWN data: [162] 0x1519028c
Previous stuff [142][162] 0x01010003 [143][162] 0x1518991c [144][162] 0xed570324 [145][162] 0x0000028b [146][162] 0x01010003 [147][162] 0x1518991d [148][162] 0xed5723b1 [149][162] 0x0000028b [150][162] 0x01010003 [151][162] 0x1518991e [152][162] 0xed57443e [153][162] 0x0000028b [154][162] 0x01010003 [155][162] 0x1518991f [156][162] 0xed5764da [157][162] 0x0000028b [158][162] 0x01010003 [159][162] 0x15189920 [160][162] 0x00009bab [161][162] 0x01019815 [162][162] 0x1519028c [163][162] 0x01010003 [164][162] 0x15199bac [165][162] 0x0e0db894 [166][162] 0x0000028c [167][162] 0x01010003 [168][162] 0x15199bad [169][162] 0x0e0ed921 [170][162] 0x0000028c [171][162] 0x01010003 End Of Previous stuff
Notice that shortly before that error following condition occur:
............ proc_thread: waiting= 0 processing= 975 microsec per event (nev=30) proc_thread: waiting= 0 processing= 983 microsec per event (nev=30) proc_thread: waiting= 0 processing= 995 microsec per event (nev=30) bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... timer: 763 microsec (min= 722 max= 14613 rms**2=6321261) bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... bb_write[0x08a24418]: waiting for buffer (read=5 write=4) ... bb_write[0x08a25280]: waiting for buffer (read=9 write=8) ... net_thread: waiting= 0 sending= 2001 microsec per event (nev=1080) net_thread: waiting= 0 sending= 2 microsec per event (nev=120) net_thread: waiting= 0 sending= 2 microsec per event (nev=1080) ................
CLAS era
Sergey B. 10-dec-2011: nothing bad happened, security scan got through, need to block it better
..... FFFFFFFFFFFFFFFFFF WARN: ignore request from 129.57.71.28 CODAtcpServer: start work thread befor: socket=29 address>129.57.71.28< port=59401 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 accept: Too many open files .....
Sergey B. 23-nov-2011: EB1 hung, probably because if security scan
..... WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 tcl7.4(init.tcl): invalid command name "GET" clasprod::EB1> WARN: ignore request from 129.57.71.28 CODAtcpServer: start work thread befor: socket=6 address>129.57.71.28< port=47514 WARN: ignore request from 129.57.71.28 WARN: ignore request from 129.57.71.28 <HUNG>
Sergey B. 7-nov-2011: dc12, error message from from rol.h (10kHz pulser)
codaUpdateStatus: updating request done active, events so far 0 wait: coda request >go< in progress 0x3e55cd0 (twork0047): INIT_NAME: INFO: rolp->daproc = 5 0x3e55cd0 (twork0047): INFO: User Go ... codaExecute done SEND_BUFFER_ROC 4 (0x017f4050 0) (conditions 1 13) FILLING HEADER: dataInBuf>>2=11, g_events_in_buffer=1 SSS1 4 (0x017f4050 11) SSS2 4 (0x017f4050 11) SEND_BUFFER_ROC 4 (0x016bd8e0 0) (conditions 1 13) FILLING HEADER: dataInBuf>>2=11, g_events_in_buffer=1 SSS1 4 (0x016bd8e0 11) SSS2 4 (0x016bd8e0 11) SEND_BUFFER_ROC 4 (0x01587170 0) (conditions 1 13) FILLING HEADER: dataInBuf>>2=11, g_events_in_buffer=1 SSS1 4 (0x01587170 11) SSS2 4 (0x01587170 11) SEND_BUFFER_ROC 4 (0x01450a00 0) (conditions 1 13) FILLING HEADER: dataInBuf>>2=11, g_events_in_buffer=1 SSS1 4 (0x01450a00 11) SSS2 4 (0x01450a00 11) SEND_BUFFER_ROC 4 (0x0131a290 0) (conditions 1 13) FILLING HEADER: dataInBuf>>2=11, g_events_in_buffer=1 SSS1 4 (0x0131a290 11) SSS2 4 (0x0131a290 11) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) interrupt: INFO from bb_write: NO BUFS1 bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) setHeartError: 0 >sys 0, mask 4< WARN: HeartBeat[0]: heartbeat=21263(21263) heartmask=4 bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) UDP_cancel: cancel >inf:dc12 sys 0, mask 4< UDP_cancel: cancel >inf:dc12 sys 0, mask 4< bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) bb_write: no bufs, back end too slow ..) ...........
Click end - ended fine (probably network was stuck ?? or proc ???).
Sergey B. 8-jun-2011: test setup croctest2: error on croctest2 after 99M events, DMAing from single v1720 (NOTE: first test of ROL1 writing directly to the big buffer)
proc_thread: waiting= 846 processing= 3 microsec per event (nev=188) rols_thread: waiting= 199 processing= 451 microsec per event (nev=188) rols_thread: waiting= 199 processing= 450 microsec per event (nev=188) proc_thread: waiting= 635 processing= 3 microsec per event (nev=188) rols_thread: waiting= 199 processing= 451 microsec per event (nev=188) proc_thread: waiting= 635 processing= 3 microsec per event (nev=188) rols_thread: waiting= 192 processing= 460 microsec per event (nev=189) rols_thread: waiting= 194 processing= 457 microsec per event (nev=188) rols_thread: waiting= 199 processing= 450 microsec per event (nev=188) proc_thread: waiting= 846 processing= 3 microsec per event (nev=188) rols_thread: waiting= 185 processing= 469 microsec per event (nev=188) rols_thread: waiting= 199 processing= 451 microsec per event (nev=188) proc_thread: waiting= 635 processing= 3 microsec per event (nev=188) rols_thread: waiting= 199 processing= 451 microsec per event (nev=188) rols_thread: waiting= 200 processing= 449 microsec per event (nev=188) proc_thread: waiting= 847 processing= 3 microsec per event (nev=188) rols_thread: waiting= 199 processing= 451 microsec per event (nev=188) rols_thread: waiting= 199 processing= 451 microsec per event (nev=188) net_thread: waiting= 720 sending= 0 microsec per event (nev=2218) proc_thread: waiting= 635 processing= 3 microsec per event (nev=188) rols_thread: waiting= 199 processing= 451 microsec per event (nev=188) rols_thread: waiting= 199 processing= 451 microsec per event (nev=188) proc_thread: waiting= 635 processing= 3 microsec per event (nev=188) rols_thread: waiting= 199 processing= 450 microsec per event (nev=188) rols_thread: waiting= 199 processing= 450 microsec per event (nev=188) proc_thread: waiting= 847 processing= 3 microsec per event (nev=188) rols_thread: waiting= 205 processing= 450 microsec per event (nev=188) machine check Exception next instruction address: 0x0946e298 Machine Status Register: 0x0010b030 Condition Register: 0x80000088 Task: 0x946c670 "ROLS_LOOP" setHeartError: 0 >sys 0, mask 4< WARN: HeartBeat[0]: heartbeat=54077(54077) heartmask=4 UDP_cancel: cancel >inf:croctest2 sys 0, mask 4< UDP_cancel: cancel >inf:croctest2 sys 0, mask 4< -> tt 1668c8 vxTaskEntry +68 : rols_loop () d2e5290 rols_loop +b4 : output_proc_network () d2e5574 output_proc_network+298: vmenrol1__init () 946ccb8 vmenrol1__init +19c: 946cafc () 946cb0c end +91cb68c: 946ca1c () 946cabc end +91cb63c: 946c890 () 946c920 end +91cb4a0: 946e298 () value = 0 = 0x0 ->
Sergey B. 10-feb-2011: test setup croctest2: coda_net on pmc dies
0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=4) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=7) proc_thread: waiting= 127 processing= 3 microsec per event (nev=6559) net_thread: waiting= 131 sending= 2 microsec per event (nev=12708) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=11) proc_thread: waiting= 139 processing= 3 microsec per event (nev=6474) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=15) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=19) proc_thread: waiting= 129 processing= 3 microsec per event (nev=6452) net_thread: waiting= 133 sending= 2 microsec per event (nev=12598) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=19) proc_thread: waiting= 127 processing= 3 microsec per event (nev=6553) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=16) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=12) net_thread: waiting= 131 sending= 2 microsec per event (nev=12714) proc_thread: waiting= 139 processing= 3 microsec per event (nev=6489) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=8) proc_thread: waiting= 128 processing= 3 microsec per event (nev=6478) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=4) net_thread: waiting= 134 sending= 2 microsec per event (nev=12377) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=0) proc_thread: waiting= 127 processing= 3 microsec per event (nev=6554) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=3) proc_thread: waiting= 139 processing= 3 microsec per event (nev=6470) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=7) proc_thread: waiting= 128 processing= 3 microsec per event (nev=6516) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=11) 0x1aacca00 (coda_proc): timer: 1 microsec (min=1 max=173 rms**2=15) ERROR2: LINK_sized_write() returns errno=32 Broken pipe ERROR: net_thread failed (in LINK_sized_write). -> -> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1effe6c0 0 PEND 14e508 1effe5a0 3d0001 0 tLogTask logTask 1effbb20 0 PEND 14e508 1effba10 0 0 tShell shell 1eb8cb70 1 READY 145290 1eb8c750 d 0 tRlogind rlogind 1ebb3850 2 PEND 13f934 1ebb33f0 0 0 tWdbTask wdbTask 1ebaa530 3 PEND 13f934 1ebaa400 0 0 tNetTask netTask 1ed0f280 50 PEND 13f934 1ed0f190 0 0 tPortmapd portmapd 1ebb1690 54 PEND 13f934 1ebb1430 3d0002 0 tFtpdTask b6064 1ebaebc0 55 PEND 13f934 1ebaea10 0 0 coda_proc proc_thread 1aacca00 115 DELAY 144bb4 1aacc900 0 10 coda_pmc coda_pmc 1e3f9720 200 DELAY 144bb4 1e3f9480 0 5 value = 0 = 0x0 -> printErrno 32 S_errno_EPIPE value = 14 = 0xe -> tt Sorry, traces of my own stack begin at tt (). value = -1 = 0xffffffff -> memShow invalid block at 0x1ec60000 deleted value = -1 = 0xffffffff -> memShow status bytes blocks avg block max block ------ ---------- --------- ---------- ---------- current free 421348656 56 7524083 411333840 alloc 79807344 1986 40184 - cumulative alloc 87490320 7392 11835 - value = 0 = 0x0 -> usrNetStackSysPoolStatus value = 0 = 0x0 -> usrNetStackSysPoolStatus("CODAtcpServer",0) value = 0 = 0x0 -> usrNetStackDataPoolStatus("CODAtcpServer",0) value = 454823200 = 0x1b1c0d20 -> usrNetStackDataPoolStatus("bla",0) value = 454823168 = 0x1b1c0d00 -> usrNetStackDataPoolStatus("bla",0) value = 454823040 = 0x1b1c0c80 -> memShow status bytes blocks avg block max block ------ ---------- --------- ---------- ---------- current free 421348528 55 7660882 411333840 alloc 79807472 1990 40104 - cumulative alloc 87496528 7401 11822 - value = 0 = 0x0 ->
EB looks fine, message was:
-- Got all fragments of go BOS_encode_spec reached BOS_encode_spec done: len=116 FFFFFFFFFFFFFFFFFF PUT: wait while icb=8 == cbp->read=8 ------- clastest::EB5>
Tried to reset-download-prestart and got error on pmc at Download:
-> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1effe6c0 0 PEND 14e508 1effe5a0 3d0001 0 tLogTask logTask 1effbb20 0 PEND 14e508 1effba10 0 0 tShell shell 1eb8cb70 1 READY 145290 1eb8c750 d 0 tRlogind rlogind 1ebb3850 2 PEND 13f934 1ebb33f0 0 0 tWdbTask wdbTask 1ebaa530 3 PEND 13f934 1ebaa400 0 0 tNetTask netTask 1ed0f280 50 PEND 13f934 1ed0f190 0 0 tPortmapd portmapd 1ebb1690 54 PEND 13f934 1ebb1430 3d0002 0 tFtpdTask b6064 1ebaebc0 55 PEND 13f934 1ebaea10 0 0 coda_pmc coda_pmc 1e3f9720 200 DELAY 144bb4 1e3f9480 30065 1 value = 0 = 0x0 -> Download command received -> bignet at 0x0d3a6e68 <- bignet at 0xcd3a6e68 bignet at 0xcd3a6e68, bignet.gbigBuffer at 0xcd3a6e78 -> 0xcd3a6e78 proc_on_pmc=1, net_on_pmc=1 rocID=0 ObjInitName >_rol2_tt_testsetup__init< data access Exception current instruction address: 0x000d2090 Machine Status Register: 0x00003030 Data Access Register: 0x3d40001c Condition Register: 0x40002083 Data storage interrupt Register: 0x40000000 Task: 0x1ed0f280 "tNetTask" -> tt 14cad8 vxTaskEntry +68 : netTask () c3e88 netTask +70 : 209c0 () 20a14 mgiEndLoad +cc8: 20a88 () 20bd8 mgiEndLoad +e8c: muxReceive () bc638 muxReceive +13c: b002c () b02c4 random +358: do_protocol_with_type () 9c920 do_protocol_with_type+2c : ipintr (1ec60e04) 9fb28 ipintr +c8c: dbbb4 () dbbb4 tcp_input +2968: tcp_output () dca24 tcp_output +5c8: mHdrClGet () aa940 mHdrClGet +8c : d1eb4 () value = 0 = 0x0 -> -> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1effe6c0 0 PEND 14e508 1effe5a0 3d0001 0 tLogTask logTask 1effbb20 0 PEND 14e508 1effba10 0 0 tShell shell 1eb8cb70 1 READY 145290 1eb8c750 d 0 tRlogind rlogind 1ebb3850 2 PEND 13f934 1ebb33f0 0 0 tWdbTask wdbTask 1ebaa530 3 PEND 13f934 1ebaa400 0 0 tNetTask netTask 1ed0f280 50 SUSPEND d2090 1ed0efd0 0 0 tPortmapd portmapd 1ebb1690 54 PEND 13f934 1ebb1430 3d0002 0 tFtpdTask b6064 1ebaebc0 55 PEND 13f934 1ebaea10 0 0 coda_pmc coda_pmc 1e3f9720 200 PEND 1415a0 1e3f8ab0 3d0004 0 value = 0 = 0x0 ->
Sergey B. 16-apr-2010: end run failed, both ET systems are dead !!!???
clondaq1:et1_start> et_start -n 4000 -s 200000 -f /tmp/et_sys_clasprod -m eb -v et_nodename >eb< et_start: asking for 200000 byte events. et_start: asking for 4000 events. removing file >/tmp/et_sys_clasprod< file >/tmp/et_sys_clasprod< removed et_start: starting ET system /tmp/et_sys_clasprod ET user library >/usr/local/clas/release/current/coda/SunOS_i86pc/lib/ libet_user.so< will be used et INFO: et_add_stations, nstations = 2, stats_max = 10 et INFO: et_conductor 1, found station cec1d440 et INFO: et_station_create_at, created station LEVEL3 et INFO: et_add_stations, nstations = 3, stats_max = 10 et INFO: et_conductor 2, found station cec1d860 et INFO: et_station_create_at, created station ET2ER unique name >et_start_eb_clasprod< 11:45:20: TAL-SS-00088-I Connecting to project <clasprod> on <local> RTserver 11:45:20: TAL-SS-00089-I Using local protocol 11:45:20: TAL-SS-00090-I Could not connect to <local> RTserver 11:45:20: TAL-SS-00093-I Skipping starting <start_never:local> RTserver 11:45:20: TAL-SS-00088-I Connecting to project <clasprod> on <clondb1> RTserver 11:45:20: TAL-SS-00089-I Using local protocol 11:45:20: TAL-SS-00090-I Could not connect to <clondb1> RTserver 11:45:20: TAL-SS-00088-I Connecting to project <clasprod> on <clondb1> RTserver 11:45:20: TAL-SS-00089-I Using tcp protocol 11:45:20: TAL-SS-00091-I Message from RTserver: Connection established. 11:45:20: TAL-SS-00096-I Start subscribing to subject </et_start_eb_clasprod> 11:45:20: TAL-SS-00111-I Start subscribing to subject </_clondaq1.jlab.org> 11:45:20: TAL-SS-00111-I Start subscribing to subject </_all> ipc_init called at Fri Apr 16 11:45:20 2010 et_start: pthread_create(0x0000000e,...) done et ERROR: et_netserver: ET server being probed by non-ET client or read failure et ERROR: et_netserver: ET server being probed by non-ET client or read failure et INFO: et_sys_heartmonitor, kill bad process (2,22078) et INFO: et_sys_heartmonitor, cleanup process 2 et INFO: set_fix_nprocs, change # of ET processes from 2 to 2 et INFO: set_fix_natts, station GRAND_CENTRAL has 0 attachments et INFO: set_fix_natts, station LEVEL3 has 1 attachments et INFO: set_fix_natts, station ET2ER has 1 attachments et INFO: set_fix_natts, # total attachments 2 -> 2 et INFO: set_fix_natts, proc 0 has 1 attachments et INFO: set_fix_natts, proc 1 has 1 attachments
clondaq2:clasrun> cd et2_start clondaq2:et2_start> clondaq2:et2_start> et_start -n 4000 -s 200000 -f /tmp/et_sys_clasprod
-m er -v
et_nodename >er< et_start: asking for 200000 byte events. et_start: asking for 4000 events. removing file >/tmp/et_sys_clasprod< file >/tmp/et_sys_clasprod< removed et_start: starting ET system /tmp/et_sys_clasprod ET user library >/usr/local/clas/release/current/coda/SunOS_i86pc/lib/ libet_user.so< will be used et INFO: et_add_stations, nstations = 2, stats_max = 10 et INFO: et_conductor 1, found station cec1d440 et INFO: et_station_create_at, created station TAPE et INFO: et_add_stations, nstations = 3, stats_max = 10 et INFO: et_conductor 2, found station cec1d860 et INFO: et_conductor 2, loaded shared lib /usr/local/clas/release/cur rent/coda/SunOS_i86pc/lib/libet_user.so et INFO: et_conductor 2, loaded function et2et_function et INFO: et_station_create_at, created station ET2ET10 unique name >et_start_er_clasprod< 11:45:20: TAL-SS-00088-I Connecting to project <clasprod> on <local> R Tserver 11:45:20: TAL-SS-00089-I Using local protocol 11:45:20: TAL-SS-00090-I Could not connect to <local> RTserver 11:45:20: TAL-SS-00093-I Skipping starting <start_never:local> RTserve r 11:45:20: TAL-SS-00088-I Connecting to project <clasprod> on <clondb1>
RTserver
11:45:20: TAL-SS-00089-I Using local protocol 11:45:20: TAL-SS-00090-I Could not connect to <clondb1> RTserver 11:45:20: TAL-SS-00088-I Connecting to project <clasprod> on <clondb1>
RTserver
11:45:20: TAL-SS-00089-I Using tcp protocol 11:45:20: TAL-SS-00091-I Message from RTserver: Connection established . 11:45:20: TAL-SS-00096-I Start subscribing to subject </et_start_er_cl asprod> 11:45:20: TAL-SS-00111-I Start subscribing to subject </_clondaq2.jlab .org> 11:45:20: TAL-SS-00111-I Start subscribing to subject </_all>
ipc_init called at Fri Apr 16 11:45:20 2010
et_start: pthread_create(0x0000000e,...) done et INFO: et_netserver: magic numbers do match, accept ET client et INFO: et_station_attach, done et INFO: et_netserver: magic numbers do match, accept ET client et INFO: et_station_attach, done et INFO: et_netserver: magic numbers do match, accept ET client et INFO: et_station_attach, done et INFO: et_restore_events, no events found belonging to attachment 4 et INFO: et_station_detach, done et WARN: et_command_loop: remote client connection broken et ERROR: et_netserver: ET server being probed by non-ET client or rea d failure et ERROR: et_netserver: ET server being probed by non-ET client or rea d failure et INFO: et_sys_heartmonitor, kill bad process (1,8898) et INFO: et_sys_heartmonitor, detach attachment 3 from stat 1 et INFO: et_flush_events, read 4000 events et INFO: et_flush_events, wrote 4000 events et INFO: et_station_detach, make station "TAPE" (cec1d440) idle et INFO: et_restore_events, no events found belonging to attachment 3 et INFO: et_station_detach, done et INFO: et_sys_heartmonitor, cleanup process 1 et INFO: set_fix_nprocs, change # of ET processes from 1 to 1 et INFO: set_fix_natts, station GRAND_CENTRAL has 2 attachments et INFO: set_fix_natts, station ET2ET10 has 1 attachments et INFO: set_fix_natts, # total attachments 3 -> 3 et INFO: set_fix_natts, proc 0 has 1 attachments
clondaq1:dac>
clondaq1:dac>
clondaq1:dac>
clondaq1:dac> et_monitor -f /tmp/et_sys_clasprod
LOCALITY = 1
ET SYSTEM - (/tmp/et_sys_clasprod) (host clondaq1.jlab.org) (bits 32) (tcp port 0) (udp port 11111) (multicast port 11111) (pid 22276) (lang C) (local) (period = 5 sec)
STATIC INFO - maximum of: events(4000), event size(200000), temps(4000) stations(10), attaches(20), procs(10), concurrency added(16) network interfaces(3): 129.57.167.107, 129.57.69.200, 129.57.68.22,
DYNAMIC INFO - currently there are: processes(4), attachments(4), temps(0) stations(3), hearbeat(1)
STATIONS: "GRAND_CENTRAL" (id = 0) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(MULTI), select(ALL) restore(GC), prescale(1), cue(10), select words(-1,-1,-1,-1,) dynamic info attachments: total#(2), ids(2,3,) input list: cnt = 4000, events in = 0 output list: cnt = 0, events out = 0
"LEVEL3" (id = 1) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(1), select(ALL) restore(OUT), prescale(1), cue(100), select words(-1,-1,-1,-1,) dynamic info attachments: total#(1), ids(1,) input list: cnt = 0, events in = 0, events try = 0 output list: cnt = 0, events out = 0
"ET2ER" (id = 2) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(1), select(ALL) restore(OUT), prescale(1), cue(100), select words(-1,-1,-1,-1,) dynamic info attachments: total#(1), ids(0,) input list: cnt = 0, events in = 0, events try = 0 output list: cnt = 0, events out = 0
LOCAL USERS: process #0, # attachments(1), attach ids(0,), pid(22408), hbeat(1) process #1, # attachments(1), attach ids(1,), pid(22491), hbeat(1) process #2, # attachments(1), attach ids(2,), pid(23984), hbeat(70) process #3, # attachments(1), attach ids(3,), pid(24082), hbeat(68)
ATTACHMENTS: att #0, is at station(ET2ER) on host(clondaq1.jlab.org) at pid(22408) proc(0), blocked(NO) events: make(0), get(0), put(0), dump(0) att #1, is at station(LEVEL3) on host(clondaq1.jlab.org) at pid(22491) proc(1), blocked(YES) events: make(0), get(0), put(0), dump(0) att #2, is at station(GRAND_CENTRAL) on host(clondaq1.jlab.org) at pid(23984) proc(2), blocked(NO) events: make(0), get(0), put(0), dump(0) att #3, is at station(GRAND_CENTRAL) on host(clondaq1.jlab.org) at pid(24082) proc(3), blocked(NO) events: make(0), get(0), put(0), dump(0)
EVENTS OWNED BY: system (4000), att0 (0), att1 (0), att2 (0), att3 (0),
EVENT RATE of GC = 0 events/sec
CREATING STATIONS: IDLE STATIONS: STATION CHAIN: GRAND_CENTRAL, LEVEL3, ET2ER, LOCKED MUTEXES:
ET SYSTEM - (/tmp/et_sys_clasprod) (host clondaq1.jlab.org) (bits 32) (tcp port 0) (udp port 11111) (multicast port 11111) (pid 22276) (lang C) (local) (period = 5 sec)
STATIC INFO - maximum of: events(4000), event size(200000), temps(4000) stations(10), attaches(20), procs(10), concurrency added(16) network interfaces(3): 129.57.167.107, 129.57.69.200, 129.57.68.22,
DYNAMIC INFO - currently there are: processes(4), attachments(4), temps(0) stations(3), hearbeat(1)
STATIONS: "GRAND_CENTRAL" (id = 0) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(MULTI), select(ALL) restore(GC), prescale(1), cue(10), select words(-1,-1,-1,-1,) dynamic info attachments: total#(2), ids(2,3,) input list: cnt = 4000, events in = 0 output list: cnt = 0, events out = 0
"LEVEL3" (id = 1) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(1), select(ALL) restore(OUT), prescale(1), cue(100), select words(-1,-1,-1,-1,) dynamic info attachments: total#(1), ids(1,) input list: cnt = 0, events in = 0, events try = 0 output list: cnt = 0, events out = 0
"ET2ER" (id = 2) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(1), select(ALL) restore(OUT), prescale(1), cue(100), select words(-1,-1,-1,-1,) dynamic info attachments: total#(1), ids(0,) input list: cnt = 0, events in = 0, events try = 0 output list: cnt = 0, events out = 0
LOCAL USERS: process #0, # attachments(1), attach ids(0,), pid(22408), hbeat(1) process #1, # attachments(1), attach ids(1,), pid(22491), hbeat(1) process #2, # attachments(1), attach ids(2,), pid(23984), hbeat(70) process #3, # attachments(1), attach ids(3,), pid(24082), hbeat(68)
ATTACHMENTS: att #0, is at station(ET2ER) on host(clondaq1.jlab.org) at pid(22408) proc(0), blocked(NO) events: make(0), get(0), put(0), dump(0) att #1, is at station(LEVEL3) on host(clondaq1.jlab.org) at pid(22491) proc(1), blocked(YES) events: make(0), get(0), put(0), dump(0) att #2, is at station(GRAND_CENTRAL) on host(clondaq1.jlab.org) at pid(23984) proc(2), blocked(NO) events: make(0), get(0), put(0), dump(0) att #3, is at station(GRAND_CENTRAL) on host(clondaq1.jlab.org) at pid(24082) proc(3), blocked(NO) events: make(0), get(0), put(0), dump(0)
EVENTS OWNED BY: system (4000), att0 (0), att1 (0), att2 (0), att3 (0),
EVENT RATE of GC = 0 events/sec
CREATING STATIONS: IDLE STATIONS: STATION CHAIN: GRAND_CENTRAL, LEVEL3, ET2ER, LOCKED MUTEXES:
ET SYSTEM is DEAD!
^C clondaq1:dac>
clondaq2:stage_in>
clondaq2:stage_in> et_monitor -f /tmp/et_sys_clasprod
LOCALITY = 1
ET SYSTEM - (/tmp/et_sys_clasprod) (host clondaq2) (bits 32) (tcp port 0) (udp port 11111) (multicast port 11111) (pid 16127) (lang C) (local) (period = 5 sec)
STATIC INFO - maximum of: events(4000), event size(200000), temps(4000) stations(10), attaches(20), procs(10), concurrency added(16) network interfaces(3): 129.57.167.109, 129.57.69.202, 129.57.68.24,
DYNAMIC INFO - currently there are: processes(3), attachments(3), temps(0) stations(3), hearbeat(1)
STATIONS: "GRAND_CENTRAL" (id = 0) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(MULTI), select(ALL) restore(GC), prescale(1), cue(10), select words(-1,-1,-1,-1,) dynamic info attachments: total#(0), ids() input list: cnt = 4000, events in = 0 output list: cnt = 0, events out = 0
"TAPE" (id = 1) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(MULTI), select(ALL) restore(OUT), prescale(1), cue(10), select words(-1,-1,-1,-1,) dynamic info attachments: total#(2), ids(1,2,) input list: cnt = 0, events in = 0, events try = 0 output list: cnt = 0, events out = 0
"ET2ET10" (id = 2) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(1), select(USER) restore(OUT), prescale(1), cue(100), select words(-1,-1,-1,-1,) lib = /usr/local/clas/release/current/coda/SunOS_i86pc/lib/libet_user.so, function = et2et_function, class = dynamic info attachments: total#(1), ids(0,) input list: cnt = 0, events in = 0, events try = 0 output list: cnt = 0, events out = 0
LOCAL USERS: process #0, # attachments(1), attach ids(0,), pid(16189), hbeat(1) process #1, # attachments(1), attach ids(1,), pid(8432), hbeat(625) process #2, # attachments(1), attach ids(2,), pid(10990), hbeat(1662)
ATTACHMENTS: att #0, is at station(ET2ET10) on host(clondaq2) at pid(16189) proc(0), blocked(NO) events: make(0), get(0), put(0), dump(0) att #1, is at station(TAPE) on host(clondaq2) at pid(8432) proc(1), blocked(NO) events: make(0), get(0), put(0), dump(0) att #2, is at station(TAPE) on host(clondaq2) at pid(10990) proc(2), blocked(NO) events: make(0), get(0), put(0), dump(0)
EVENTS OWNED BY: system (4000), att0 (0), att1 (0), att2 (0),
EVENT RATE of GC = 0 events/sec
CREATING STATIONS: IDLE STATIONS: STATION CHAIN: GRAND_CENTRAL, TAPE, ET2ET10, LOCKED MUTEXES:
ET SYSTEM - (/tmp/et_sys_clasprod) (host clondaq2) (bits 32) (tcp port 0) (udp port 11111) (multicast port 11111) (pid 16127) (lang C) (local) (period = 5 sec)
STATIC INFO - maximum of: events(4000), event size(200000), temps(4000) stations(10), attaches(20), procs(10), concurrency added(16) network interfaces(3): 129.57.167.109, 129.57.69.202, 129.57.68.24,
DYNAMIC INFO - currently there are: processes(3), attachments(3), temps(0) stations(3), hearbeat(1)
STATIONS: "GRAND_CENTRAL" (id = 0) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(MULTI), select(ALL) restore(GC), prescale(1), cue(10), select words(-1,-1,-1,-1,) dynamic info attachments: total#(0), ids() input list: cnt = 4000, events in = 0 output list: cnt = 0, events out = 0
"TAPE" (id = 1) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(MULTI), select(ALL) restore(OUT), prescale(1), cue(10), select words(-1,-1,-1,-1,) dynamic info attachments: total#(2), ids(1,2,) input list: cnt = 0, events in = 0, events try = 0 output list: cnt = 0, events out = 0
"ET2ET10" (id = 2) static info status(ACTIVE), flow(SERIAL), blocking(YES), user(1), select(USER) restore(OUT), prescale(1), cue(100), select words(-1,-1,-1,-1,) lib = /usr/local/clas/release/current/coda/SunOS_i86pc/lib/libet_user.so, function = et2et_function, class = dynamic info attachments: total#(1), ids(0,) input list: cnt = 0, events in = 0, events try = 0 output list: cnt = 0, events out = 0
LOCAL USERS: process #0, # attachments(1), attach ids(0,), pid(16189), hbeat(1) process #1, # attachments(1), attach ids(1,), pid(8432), hbeat(625) process #2, # attachments(1), attach ids(2,), pid(10990), hbeat(1672)
ATTACHMENTS: att #0, is at station(ET2ET10) on host(clondaq2) at pid(16189) proc(0), blocked(NO) events: make(0), get(0), put(0), dump(0) att #1, is at station(TAPE) on host(clondaq2) at pid(8432) proc(1), blocked(NO) events: make(0), get(0), put(0), dump(0) att #2, is at station(TAPE) on host(clondaq2) at pid(10990) proc(2), blocked(NO) events: make(0), get(0), put(0), dump(0)
EVENTS OWNED BY: system (4000), att0 (0), att1 (0), att2 (0),
EVENT RATE of GC = 0 events/sec
CREATING STATIONS: IDLE STATIONS: STATION CHAIN: GRAND_CENTRAL, TAPE, ET2ET10, LOCKED MUTEXES:
ET SYSTEM is DEAD!
^C clondaq2:stage_in>
Sergey B. 14-apr-2010: debugging clastrig2 endofrun problem
wait: coda request >end< in progress 0x916e80 (twork0028): TS GO bit cleared 0x916e80 (twork0028): TS csr: 0xfc070000 0x916e80 (twork0028): TS csr2: 0xffff1c1f interrupt: TRIG ERROR: no pool buffer available 0x916e80 (twork0028): setForceSyncInterval: forceSyncInterval set to 0 0x916e80 (twork0028): INFO: User End Executed codaUpdateStatus: dbConnecting .. codaUpdateStatus: dbConnect done codaUpdateStatus: >UPDATE process SET state='ending' WHERE name='croctest10'< codaUpdateStatus: dbDisconnecting .. codaUpdateStatus: dbDisconnect done
Sergey B. 14-apr-2010: message from runcontrol during cancel-reset
Get all components failed: Lost connection to MySQL server during query
Sergey B. 6-apr-2010: EB waiting for following ROCs
[0] 0x76fefffe != 0x008cf17e, waiting for the following ROC IDs: 7 9 10 11 17 20 21 22 25 26 28 29 30
Files produced:
clon10:ccscans> ll | grep Apr drwxrwxrwx 2 clasrun onliners 4096 Apr 2 16:23 ./ drwxr-xr-x 163 clasrun 100 73728 Apr 6 2010 ../ -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_cc1* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:09 good_clastrig2* -rwxrwxrwx 1 boiarino clas 34 Apr 2 16:25 good_croctest10* -rwxrwxrwx 1 boiarino clas 34 Apr 2 16:25 good_croctest2* -rwxrwxrwx 1 boiarino clas 34 Apr 2 16:21 good_croctest3* -rwxrwxrwx 1 baturin clas 34 Apr 2 13:26 good_ctoftest1* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_dc1* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_dc10* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_dc11* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_dc2* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_dc3* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_dc4* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_dc5* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_dc6* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_dc7* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_dc8* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_dc9* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_ec1* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_ec2* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_ec3* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_ec4* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_polar* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_sc1* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_sc2* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_scaler1* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_scaler2* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_scaler3* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:26 good_scaler4* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_tage* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_tage2* -rwxrwxrwx 1 clasrun onliners 34 Apr 6 20:08 good_tage3* clon10:ccscans> more good_dc7 accepted link from >129.57.68.80< clon10:ccscans> more good_dc8 accepted link from >129.57.68.74< clon10:ccscans>
On 'Reset' EB crashed, outputs:
[0] 0x76fefffe != 0x008cf17e, waiting for the following ROC IDs: 7 9 10 11 17 20 21 22 25 26 28 29 30
[0] 0x76fefffe != 0x008cf17e, waiting for the following ROC IDs: 7 9 10 11 17 20 21 22 25 26 28 29 30
CODAtcpServer: start work thread befor: socket=5 address>129.57.167.14< port=34468 wait: coda request >prestart< in progress codaExecute reached, message >exit<, len=4 codaExecute: 'exit' transition debShutdownBuild 1 cb_delete roc=0 cb_delete roc=1 cb_delete roc=2 cb_delete roc=3 cb_delete roc=4 cb_delete roc=5 cb_delete roc=6 cb_delete roc=7 cb_delete roc=8 cb_delete roc=9 cb_delete roc=10 cb_delete roc=11 cb_delete roc=12 cb_delete roc=13 cb_delete roc=14 cb_delete roc=15 cb_delete roc=16 cb_delete roc=17 cb_delete roc=18 cb_delete roc=19 cb_delete roc=20 cb_delete roc=21 cb_delete roc=22 cb_delete roc=23 cb_delete roc=24 cb_delete roc=25 cb_delete roc=26 cb_delete roc=27 cb_delete roc=28 cb_delete roc=29 cb_delete roc=30 cb_delete roc=31 debShutdownBuild 2 Stop waiting for the rocs because of force_end condition [0] ROC mask 0x76fefffe force_end 1 end_event_done 0 [0] NEED TO TEST AND PROBABLY REDESIGN THAT PART !!! [0] et_event_new() 1 [0] EVENT_ENCODE_SPEC 2 .. BOS_encode_spec reached BOS_encode_spec done: len=116 [0] .. done [0] handle_build_cleanup [0] flush ET system [0] detached from ET [0] remove mutex locks and shutdown fifos [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[00] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[00] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[00] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[01] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[01] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[01] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[02] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[02] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[02] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[03] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[03] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[03] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[04] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[04] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[04] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[05] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[05] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[05] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[06] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[06] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[06] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[07] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[07] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[07] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[08] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[08] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[08] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[09] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[09] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[09] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 3 [0] count for roc[10] = 3 CCCCCCCCCC: 3 CCCCCCCCCC: 1 [0] count for roc[10] = 1 CCCCCCCCCC: 1 CCCCCCCCCC: 0 [0] count for roc[10] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 2 [0] count for roc[11] = 2 CCCCCCCCCC: 2 CCCCCCCCCC: 0 [0] count for roc[11] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 2 [0] count for roc[12] = 2 CCCCCCCCCC: 2 CCCCCCCCCC: 0 [0] count for roc[12] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 2 [0] count for roc[13] = 2 CCCCCCCCCC: 2 CCCCCCCCCC: 0 [0] count for roc[13] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[14] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[15] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[16] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[17] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[18] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[19] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[20] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[21] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[22] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[23] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[24] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[25] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[26] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[27] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[28] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[29] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[30] = 0 CCCCCCCCCC: 0 [0] flush input streams CCCCCCCCCC: 0 [0] count for roc[31] = 0 CCCCCCCCCC: 0 [0] ============= Build threads cleaned [0] ============= Build threads cleaned [0] ============= Build threads cleaned [0] ============= Build threads cleaned [0] ============= Build threads cleaned [0] build thread exiting: 1 1 [0] The implementation has detected that the value specified by threa d does not refer to a joinable thread. wait: coda request >exit< in progress polling_routine ================!!!!!!!!!!!!!!!!!!===== cancel building threads =c============================================= =c============================================= =c============================================= =c==== >>>>> close link from >dc1< link=0x081c2af0 debCloseLink: theLink=0x081c2af0 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >dc2< link=0x081c4a90 debCloseLink: theLink=0x081c4a90 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >dc3< link=0x081c4b28 debCloseLink: theLink=0x081c4b28 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >dc4< link=0x081c14f8 debCloseLink: theLink=0x081c14f8 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >dc5< link=0x081c1590 debCloseLink: theLink=0x081c1590 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >dc6< link=0x081c1628 debCloseLink: theLink=0x081c1628 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >dc8< link=0x081c16c0 debCloseLink: theLink=0x081c16c0 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >cc1< link=0x081c1a08 debCloseLink: theLink=0x081c1a08 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >sc1< link=0x081c1aa0 debCloseLink: theLink=0x081c1aa0 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >ec1< link=0x081c1b38 debCloseLink: theLink=0x081c1b38 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >ec2< link=0x081c1bd0 debCloseLink: theLink=0x081c1bd0 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >scaler2< link=0x681d95c8 debCloseLink: theLink=0x681d95c8 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >scaler3< link=0x681d9660 debCloseLink: theLink=0x681d9660 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >scaler4< link=0x681d96f8 debCloseLink: theLink=0x681d96f8 -> closing debCloseLink: link is down debCloseLink: free memory debCloseLink: done. =c==== >>>>> close link from >ec4< link=0x00000001 debCloseLink: theLink=0x00000001 -> closing Segmentation fault clondaq1:coda_eb>
DC7:
codaExecute done wait: coda request >download prod< in progress codaExecute reached, message >prestart<, len=8 codaExecute: 'prestart' transition roc_prestart reached partReInitAll() reached ? partReInitAll() reached ! partReInitAll() done rocCleanup reached partReInitAll() reached ? wait: coda request >prestart< in progress partReInitAll() reached ! partReInitAll() done 315: dbsock=410567520 >>> prestarting, run 62496, type 47 rocMask=0x76fefffe !!!!!!!!!!!!! tmpp >SELECT outputs FROM prod WHERE name='dc7'< !!!!!!!!!!!!! tmpp >SELECT outputs FROM prod WHERE name='dc7'< !!!!!!!!!!!!! tmpp >SELECT outputs FROM prod WHERE name='dc7'< ++++> output to network !!!!!!!!!!!!! for rocOpenLinks: >dc7< >EB1< 3 !!!!!!!!!!!!! for rocOpenLinks: >dc7< >EB1< !!!!!!!!!!!!! for rocOpenLinks: >dc7< >EB1< !!!!!!!!!!!!! for rocOpenLinks: >dc7< >EB1< rocCloselink reached rocOpenLinks reached LINK_constructor_C: set host to >dc7< LINK_constructor_C: set name to >dc7->EB1< 3121: dbsock=410567520 name selected nrow=1 fields [0] >>>TCP<<< fields [1] >>>clondaq1-daq1wait: coda req<<< uest >fields [pr2estart] >>>< in prodowngress <<< fields [3] >>>37683<<< parsing results: type=>TCP< host=>clondaq1-daq1< state=>down< port=>37683< -> 37683 LINK_establish: socket # 17 LINK_establish: socket buffer size is 48000(0x0000bb80) bytes LINK_establish: keepAlive is 8 LINK_establish: socket 17 is ready: host 129.57.68.22 port 37683 DB command >UPDATE links SET state='up' WHERE name='dc7->EB1'< UPDATE process success Spawn proc/net threads Spawn proc/net threads Spawn proc/net threads 0x18772230 (twork0028INFO: Entering Us): er Prestart 2 INIT_NAME: rolp->daproc = trans_rol2:2 adr(nddl)=1db60750 NWBOS=16384 uthbook1: WARN id=1047 already exist - will be replaced uthbook1: free histogram buffer ... done. RAW=0 PROFILE=1 NOTRANS=0 MAX_EVENT_LENGTH = 65536 NWBOS = 16384 0x18772230INFO: User Prest (art 2 executed twork0028): CTRIGRSS: set handler and done, code=1 0x18772230 (twork0028): CTRIGRSS: 0x1877da9c 0x1877d9c0 0x1877d2e0 0x00000001 0x1877d2d8 wait: coda request >prestart< in progress bignetptr=0x1d366fe0 offset=0x00000000 taskSpawn("coda_net",0x187d4130) returns 490106848 wait: coda request >prestart< in progress setHeartError: 0 >sys 0, mask 1< WARN: HeartBeat[0]: heartbeat=19424(19424) h eartmask=1 bignetptr=0x1d366fe0 offset=0x00000000 bigprocptr=0x1d366db8 offset=0x00000000 taskSpawn("coda_proc",0x1874a580) returns 490106296 0x18772230 (twork0028): INIT_NAME: rolp->dap roc = 2 0x18772230 (twork0028): INFO: Entering User Prestart disconnecting vector 236 0x18772230 (twork0028): CTRIGRSA: set handler and done, code=1 0x18772230 (twork0028): CTRIGRSA: 0x1882e7bc 0x1882d864 0x00000001 0x1882b0c8 Sequencer Status Register = FFFFA000 Last Sequencer KeyAddress = FFFF0003 Fastbus Status Register1 = FFFFF000 Fastbus Status Register2 = FFFF0000 Fastbus Last Primary Addr = 00000019 spds_mask1: 0xfdfbf0 0x18772230_tdc1877 ===> (4twork0028wait: co) : Use 32-bit Ram List da request >firsprestartt< in progre csr0=ss 100 (1100) csr18=1364 csr1=6400001b _tdc1877 ===>5 middle csr0=100 (1900) csr18= 1364 csr1=6400001b _tdc1877 ===>6 middle csr0=100 (1900) csr18= 1364 csr1=6400001b _tdc1877 ===>7 middle csr0=100 (1900) csr18= 1364 csr1=6400001b _tdc1877 ===>8 middle csr0=100 (1900) csr18= 1364 csr1=6400001b _tdc1877 ===>9 last csr0=100 (900) csr18=1364 csr1=6400001b _tdc1877 ===>11 first csr0=100 (1100) csr18= 1364 csr1=6400001b _tdc1877 ===>12 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>13 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>14 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>15 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>16 last csr0=100 (900) csr18=13 bignetptr=0x641d366fe0 csr1= offset=0x00bigp rocptr=0x000001d366db80 offset=0x6400000000 00001b bigproc at 0x 1d366db8, bigproc.gbigBuffer at 0x_tdc1877 = ==>1d366dc818 -> 0x1d366dc8 proc_thread reacheUDP_cancel: cancel >inf:d c7 sys 0, mask 1d < firstUDP_cancel: cancel >proc_threainf:dc7 sys 0, mask 1d reached < wait: cproc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached csr0=oda request >1prestart00< in progr (ess1100) csr18=1364 csr1=6400001b _tdc1877 ===>19 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>20 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>21 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>22 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>23 last csr0=100 (900) csr18=13 64 csr1=6400001b DEBUG in SFIRamLoad => 0 1 0 0 0 slot# 1872 1875 1881(2) 1877(2) 1877S(2) [ 0] 0 0 0 0bignetptr=0x 1d366fe00 offset=0x0000000 0 0 0 0 [ 1] 0 0 0 0 0 0 0 0 [ 2] 0 0 0 0 0 0 0 0 [ 3] 0 0 0 0 0 0 0 0 [ 4] 0 0 0 0 1 0 0 0 [ 5] 0 0 0 0 1 0 0 0 [ 6] 0 0 0 0 1 0 0 0 [ 7] 0 0 0 0 1 0 0 0 [ 8] 0 0 0 0 1 0 0 0 [ 9] 0 0 0 0 1 2 0 0 [10] wait: coda reques0t > prestart 0< in progress 0 0 0 0 0 0 [11] 0 0 0 0 1 0 0 0 [12] 0 0 0 0 1 0 0 0 [13] 0 0 0 0 1 0 0 0 [14] 0 0 0 0 1 0 0 0 [15] 0 0 0 0 1 0 0 0 [16] 0 0 0 0 1 2 0 0 [17] 0 0 0 0 0 0 0 0 [18] 0 0 0 0 1 0 0 0 [19] 0 0 0 0 1 0 0 0 [20] bignet at 0x 1d366fe00, bignet.gbig Buffer at 0x 1d366ff00 -> 0x1d366ff0 0 0 1 0 0 0 [21] 0 0 0 0 1 0 0 0 [22] 0 0 0 0 1 0 0 0 [23] 0 0 0 0 1 2 0 0 [24] 0 0 0 0 0 0 0 0 [25] 0 0 0 0 0 0 0 0 DEBUG 0 1 0 0 INFO: datascan = 0x00000000 INFO: datascan&spds_mask1 = 0x00000000 rcname >RC07< 0x18772230 (twork0028111): INFO: User Prestart Executed 333: rocp->primwait: coda request >efd=prest art17< in p rogress informEB reached informEB: 11 -1 7 1 17 0 - 0x00000004 0x001101cc 0x000004b0 0x0000f420 0x0000002f 0x00000000 codaUpdateStatus: dbConnecting .. codaUpdateStatus: dbConnect done codaUpdateStatu>>>>>>>>>>>>>>>> use pid=-1 < <<<<<<<<<<<<<<<< s: >UPDATE process SET state='paused' WHERE name='dc7'< codaUpdateStatus: dbDisconnecting .. codaUpdateStatus: dbDisconnect done codaUpdateStatus: updating request .. UDP_standard_request >sta:dc7 paused< UDP_standard_request >sta:dc7 paused< UDP_standard_request >sta:dc7 paused< UDP_standard_request >sta:dc7 paused< UDP_standard_request >sta:dc7 paused< UDP_standard_request >sta:dc7 paused< UDP_cancel: cancel >sta:dc7 downloaded< codaUpdateStatus: updating request done prestarted POLLS: 0 0 codaExecute done -> ->
DC8:
codaExecute: 'prestart' transition roc_prestart reached partReInitAll() reached ? partReInitAll() reached ! partReInitAll() done rocCleanup reached partReInitAll() reached ? wait: coda request >prestart< in progress partReInitAll() reached ! partReInitAll() done 315: dbsock=410505056 >>> prestarting, run 62496, type 47 rocMask=0x76fefffe !!!!!!!!!!!!! tmpp >SELECT outputs FROM prod WHERE name='dc8'< !!!!!!!!!!!!! tmpp >SELECT outputs FROM prod WHERE name='dc8'< !!!!!!!!!!!!! tmpp >SELECT outputs FROM prod WHERE name='dc8'< ++++> output to network !!!!!!!!!!!!! for rocOpenLinks: >dc8< >EB1< 3 !!!!!!!!!!!!! for rocOpenLinks: >dc8< >EB1< !!!!!!!!!!!!! for rocOpenLinks: >dc8< >EB1< !!!!!!!!!!!!! for rocOpenLinks: >dc8< >EB1< rocCloselink reached rocOpenLinks reached LINK_constructor_C: set host to >dc8< LINK_constructor_C: set name to >dc8->EB1< 3121: dbsock=410505056 name selected nrow=1 fields [0] >>>TCP<<< fields [1] >>>clondaq1-daq1wait: coda re<<< quest >fields [p2restart] >>>< in prupogress <<< fields [3] >>>37772<<< parsing results: type=>TCP< host=>clondaq1-d aq1< state=>up< port=>37772< -> 37772 LINK_establish: socket # 17 LINK_establish: socket buffer size is 48000( 0x0000bb80) bytes LINK_establish: keepAlive is 8 LINK_establish: socket 17 is ready: host 129 .57.68.22 port 37772 DB command >UPDATE links SET state='up' WHER E name='dc8->EB1'< UPDATE process success Spawn proc/net threads Spawn proc/net threads Spawn proc/net threads 0x187ef280INFO: Entering U (ser Prestart 2 twork0028 trans_rol2:): adr(nddl)=INIT_NAME: rolp->da proc = 1d2b60750 NWBOS=16384 uthbook1: WARN id=1048 already exist - will be replaced uthbook1: free histogram buffer ... done. RAW=0 PROFILE=1 NOTRANS=0 MAX_EVENT_LENGTH = 65536 NWBOS = 16384 0x187ef280 (twork0028INFO: User Pr): estart 2 executedCTRIGRSS: set handler and done, co de= 1 0x187ef280 (twork0028): CTRIGRSS: 0x187705ac 0x187704d0 0x1876fdf0 0x00000001 0x1876fde8 wait: coda request >prestart< in progress bignetptr=0x1d366fe0 offset=0x00000000 taskSpawn("coda_net",0x187d4110) returns 490 106848 wait: coda request >prestart< in progress setHeartError: 0 >sys 0, mask 1< WARN: HeartBeat[0]: heartbeat=19424(19424) h eartmask=1 bignetptr=0x1d366fe0 offset=0x00000000 bigprocptr=0x1d366db8 offset=0x00000000 taskSpawn("coda_proc",0x1874a560) returns 49 0106296 0x187ef280 (twork0028): INIT_NAME: rolp->dap roc = 2 0x187ef280 (twork0028): INFO: Entering User Prestart disconnecting vector 236 0x187ef280 (twork0028): CTRIGRSA: set handle r and done, code=1 0x187ef280 (twork0028): CTRIGRSA: 0x1882e13c 0x1882d1e4 0x00000001 0x1882aa48 Sequencer Status Register = FFFFA000 Last Sequencer KeyAddress = FFFF0003 Fastbus Status Register1 = FFFFF000 Fastbus Status Register2 = FFFF0000 Fastbus Last Primary Addr = wait: coda req00 0000uest >19presta rtspds_mask1: 0x< ifdfbf0n progress 0x187ef280_tdc1877 === (>twork0028): Use 32- bit Ram List 4 first csr0=100 (1100) csr18=1364 csr1=6400 001b _tdc1877 ===>5 middle csr0=100 (1900) csr18= 1364 csr1=6400001b _tdc1877 ===>6 middle csr0=100 (1900) csr18= 1364 csr1=6400001b _tdc1877 ===>7 middle csr0=100 (1900) csr18= 1364 csr1=6400001b _tdc1877 ===>8 middle csr0=100 (1900) csr18= 1364 csr1=6400001b _tdc1877 ===>9 last csr0=100 (900) csr18=136 4 csr1=6400001b _tdc1877 ===>11 first csr0=100 (1100) csr18= 1364 csr1=6400001b _tdc1877 ===>12 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>13 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>14 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>15 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>16 last csr0bignetptr=0x=1d366f e0100 offset=0x0000000 (0 900) csr18=1364bigprocptr=0x csr1=1d366db864 00001b offset=0x 00000000 bigproc at 0x_tdc1877 ===1d366db8>, bigproc. gbigBuffer at 0x181d3UDP_cancel: cancel >inf :dc8 sys 0, mask 166dc8< wait: coda rUDP_cancel: cancel >inf:dc8 sys 0, mask 1 -> 0x< 1d366dc8eq firstproc_thread reached uest >proc_thread reached csr0=proc_thread reached prproc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached proc_thread reached 100estart (< in prog1100ress ) csr18=1364 csr1=6400001b _tdc1877 ===>19 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>20 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>21 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>22 middle csr0=100 (1900) csr18 =1364 csr1=6400001b _tdc1877 ===>23 last csr0=100 (900) csr18=13 64 csr1=6400001b DEBUG in SFIRamLoad => 0 1 0 0 0 slot# 1872 bignetptr=0x1875 181d366fe081 (2) 1877( offset=0x2) 1877S(00000002) 0 [ 0] 0 0 0 0 0 0 0 0 [ 1] 0 0 0 0 0 0 0 0 [ 2] 0 0 0 0 0 0 0 0 [ 3] 0 0 0 0 0 0 0 0 [ 4] 0 0 0 0 1 0 0 0 [ 5] 0 0 0 0 1 0 0 0 [ 6] 0 0 0 0 1 0 0 0 [ 7] 0 0 0 0 1 0 0 0 [ 8] 0 0 0 0 1 0 0 0 [ 9] 0 0 0 0 1 2 0 0 [10] wait: coda request0 > prestart 0< i n progress 0 0 0 0 0 0 [11] 0 0 0 0 1 0 0 0 [12] 0 0 0 0 1 0 0 0 [13] 0 0 0 0 1 0 0 0 [14] 0 0 0 0 1 0 0 0 [15] 0 0 0 0 1 0 0 0 [16] 0 0 0 0 1 2 0 0 [17] 0 0 0 0 0 0 0 0 [18] 0 0 0 0 1 0 0 0 [19] 0 bignet at 0x 1d366fe00, bigne t.gbigBuffer at 0x 1d366ff00 -> 0x1d366 ff0 0 1 0 0 0 [20] 0 0 0 0 1 0 0 0 [21] 0 0 0 0 1 0 0 0 [22] 0 0 0 0 1 0 0 0 [23] 0 0 0 0 1 2 0 0 [24] 0 0 0 0 0 0 0 0 [25] 0 0 0 0 0 0 0 0 DEBUG 0 1 0 0 INFO: datascan = 0x00000000 INFO: datascan&spds_mask1 = 0x00000000 rcname >RC08< 0x187ef280 (twork0028111 ): INFO: User Prestart Executed 333: rocp->primwait: coda request >efd=prest art17< in p rogress informEB reached informEB: 11 -1 8 1 17 0 - 0x00000004 0x0011 01cc 0x000004b0 0x0000f420 0x0000002f 0x0000 0000 codaUpdateStatus: dbConnecting .. codaUpdateStatus: dbConnect done codaUpdateStatus: >UPDATE process SET state= 'paused' WH>>>>>>>>>>>>>>>> use pid=-1 <<<<< <<<<<<<<<<<< ERE name='dc8'< codaUpdateStatus: dbDisconnecting .. codaUpdateStatus: dbDisconnect done codaUpdateStatus: updating request .. UDP_standard_request >sta:dc8 paused< UDP_standard_request >sta:dc8 paused< UDP_standard_request >sta:dc8 paused< UDP_standard_request >sta:dc8 paused< UDP_standard_request >sta:dc8 paused< UDP_standard_request >sta:dc8 paused< UDP_cancel: cancel >sta:dc8 downloaded< codaUpdateStatus: updating request done prestarted POLLS: 0 0 codaExecute done .....
Sergey B. 6-apr-2010: after TS problem described below, polar self rebooted on next download
........
wait: coda request >download prod< in progress wait: coda request >download prod< in progress wait: coda request >download prod< in progress 3123: dbsock=59553504 3123: tmpp>SELECT id FROM runTypes WHERE name='prod'< ====1==== !!! object->runType=47 token_interval=64 object->name >polar< big0.proc_on_pmc=0, big0.net_on_pmc=0 big1.proc_on_pmc=0, big1.net_on_pmc=0 name selected nrow=1 download: name >polar< code selected nrow=1 download: code >{$CODA/VXWORKS_ppc/rol/moller.o usr} {$CODA/VXWORKS_ppc/rol/rol2_tt.o usr} < nrols [0] >$CODA/VXWORKS_ppc/rol/moller.o usr< nrols [1] >$CODA/VXWORKS_ppc/rol/rol2_tt.o usr< listArgc=2 listArgv >$CODA/VXWORKS_ppc/rol/moller.o usr< >$CODA/VXWORKS_ppc/rol/rol2_tt.o usr< set this_roc_id = 25, rocId() can be called from now on ObjInitName >_moller__init< wait: coda request >download prod< in progress wait: coda request >download prod< in progress wait: coda request >download prod< in progress wait: coda request >download prod< in progress wait: coda request >download prod< in progress INFO: >_moller__init()< routine found readout list >$CODA/VXWORKS_ppc/rol/moller.o< loaded allocate 200 buffers with length 65536 bytes partCreate1() reached partCreate1(): pPart=0x008e7570 listInit: cleanup 24 bytes starting from address 0x008e75a0 partCreate1() done 0x81c7e0 (twork0004): INIT_NAME: rolp->daproc = 0 0x81c7e0 (twork0004): INIT_NAME: Initializing new rol structures for SCALROL1 0x81c7e0 (twork0004): INIT_NAME: MAX_EVENT_LENGTH = 65536 bytes, MAX_EVENT_POOL = 200 0x81c7e0 (twork0004): INIT_NAME: name >SCALROL1:pool< 0x81c7e0 (twork0004): Init - Done 0x81c7e0 (twork0004): INIT_NAME: rolp->daproc = rol11: downloading DDL table ... wait: coda request >download prod< in progress wait: coda request >download prod< in progress Exception at interrupt level: Exception next instruction address: 0xfff901f0 Fixed Point Register: 0x00000012 Condition Register: 0x40482044 Regs at 0x64e498 Press any key to stop auto-boot... 2 [VxWorks Boot]:
It suppose to be as following:
..... 0x91ae30 (twork0001): INIT_NAME: rolp->daproc = 0 0x91ae30 (twork0001): INIT_NAME: Initializing new rol structures for SCALROL1 0x91ae30 (twork0001): INIT_NAME: MAX_EVENT_LENGTH = 65536 bytes, MAX_EVENT_POOL = 200 rol1: downloading DDL table ... 0x91ae30 (twork0001): INIT_NAME: name >SCALROL1:pool< 0x91ae30 (twork0001): Init - Done 0x91ae30 (twork0001): INIT_NAME: rolp->daproc = 1 wait: coda request >download prod< in progress adr1(nddl_)=0x037b69c8 N name (nname) fmt (nfmt) ncol [ 1] >PTRN< (4) >B32< (3) 1 [ 2] >PSYN< (4) >B32< (3) 1 [ 3] >RC00< (4) >B32< (3) 1 [ 4] >RC01< (4) >B32< (3) 1 .....
Sergey B. 6-apr-2010: TS busy condition at the beginning of the run, all rocs have 205 events, clastrig2 has 200 events, ts2status output
-> ts2status CSR 1 (0x0): Go : 1 Pause on Next scheduled Sync : 0 Sync and Pause : 0 Initiate Sync Event : 0 Initiate Program 1 Event : 0 Initiate Program 2 Event : 1 Enable Level 1 (drives outputs) : 1 Override Inhibit : 0 Test Mode : 0 Reserved : 0 Reset : 0 Initialize : 0 Sync Event occurred : 0 Program 1 Event occurred : 1 Program 2 Event occurred : 0 Late Fail occurred : 0 Inhibit occurred : 1 Write FIFO error occurred : 0 Read FIFO error occurreds : 1 CSR 2 (0x4): Enable Scheduled Sync : 1 Use Clear Permit Timer : 1 Use Front Busy Timer : 1 Use Clear Hold Timer : 1 Use External Front Busy : 1 Lock ROC Branch 1 : 0 Lock ROC Branch 2 : 0 Lock ROC Branch 3 : 0 Lock ROC Branch 4 : 0 Lock ROC Branch 5 : 0 Enable Program 1 front panel input : 1 Enable Interrupt : 1 Enable local ROC (branch 5) : 1 Trigger Control Register (0x8): 0x00001fff ROC Enable Register (0xc) val=0xf07fff7f: Branch 1: 0x7f bits: 01111111 Branch 2: 0xff bits: 11111111 Branch 3: 0x7f bits: 01111111 Branch 4: 0xf0 bits: 11110000 Synchronization Interval Register (0x10): 1000 Trigger Word Count Register (0x14): 0 Trigger Data Register (0x18): 256 Local ROC (Branch 5) Data Register (0x1c): 60 Synchronization Flag : 0 Late Fail Flag : 0 ROC code : 15 Input Trigger Prescale Registers: Input 1 Prescale Factor : 0 Input 2 Prescale Factor : 0 Input 3 Prescale Factor : 0 Input 4 Prescale Factor : 0 Input 5 Prescale Factor : 0 Input 6 Prescale Factor : 0 Input 7 Prescale Factor : 0 Input 8 Prescale Factor : 0 Clear Permit Timer Register = 0 Level2 Accept Timer Register = 83 Level3 Accept Timer Register = 0 Front Busy Timer Register = 325 Clear Hold Timer Register = 50 Branch (1-4) ROC Buffer Status Register (0x58): 0x40404040 Branch 1: Buffer Count = 0, Empty Flag = 1, Full Flag = 0 Branch 2: Buffer Count = 0, Empty Flag = 1, Full Flag = 0 Branch 3: Buffer Count = 0, Empty Flag = 1, Full Flag = 0 Branch 4: Buffer Count = 0, Empty Flag = 1, Full Flag = 0 Local ROC (Branch 5) Buffer Status Register (0x5c): 0xffff8086 Branch 5: Buffer Count = 6, Empty Flag = 0, Full Flag = 1,Local Acknowledge = 0 ROC Acknowledge Status Register (0x60): val=0x0 Branch 1: 0x 0 bits: 00000000 (enabled: 01111111) Branch 2: 0x 0 bits: 00000000 (enabled: 11111111) Branch 3: 0x 0 bits: 00000000 (enabled: 01111111) Branch 4: 0x 0 bits: 00000000 (enabled: 11110000) State Register (0x6C): Level 1 Accept : 0 Start Level 2 Trigger : 0 Level 2 Pass Latched : 0 Level 2 Fail Latched : 0 Level 2 Accept : 0 Start Level 3 Trigger : 0 Level 3 Pass Latched : 0 Level 3 Fail Latched : 0 Level 3 Accept : 0 Clear : 0 Front End Busy (external) : 0 External Inhibit : 0 Latched Trigger : 0 TS Busy : 1 TS Active : 1 TS Ready : 0 Main Sequencer Active : 0 Synchronization Sequencer Active : 0 Program 1 Event Sequencer Active : 1 Program 2 Event Sequencer Active : 0 Event Count (0xc8): 135 Live 1 Count (0xcc): 54866 Live 2 Count (0xd0): 9290375 value = 1 = 0x1 -> exit
clasprod::clastrig2> clastrig2 statistics 200 0 0 0
Sergey B. 6-apr-2010: end failed, only one ROC was in use (clastrig2), messages
interrupt: Program Events 1-2: 247 0 interrupt: live time = 93 percent (gated=1956740 ungated=2094143) interrupt: live_corr = 93 percent (gated=1956740 ungated=2094143) proc_thread: waiting= 3743 processing= 30 microsec per event (nev=404) rols_thread: waiting= 754 processing= 21 microsec per event (nev=454) wait: coda request >go< in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition roc_end reached ?? roc_end reached !! 0x18c6f5c0 (twork0003): INIT_NAME: rolp->daproc = 3 UDP_cancel: cancel >ts2: 93 0 0 0 0 0 0 0 0 507 0 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 0 0 339 0 0 0 interrupt: Program Events 1-2: 172 0 interrupt: live time = 97 percent (gated=1358086 ungated=1400034) interrupt: live_corr = 97 percent (gated=1358086 ungated=1400034) wait: coda request >end< in progress 0x18c6f5c0 (twork0003): TS GO bit cleared 0x18c6f5c0 (twork0003): TS csr: 0xfc000000 interrupt: TRIG ERROR: no pool buffer available 0x18c6f5c0 (twork0003): setForceSyncInterval: forceSyncInterval set to 0 0x18c6f5c0 (twork0003): INFO: User End Executed codaUpdateStatus: dbConnecting .. codaUpdateStatus: dbConnect done codaUpdateStatus: >UPDATE process SET state='ending' WHERE name='clastrig2'< codaUpdateStatus: dbDisconnecting .. codaUpdateStatus: dbDisconnect done codaUpdateStatus: updating request .. UDP_standard_request >sta:clastrig2 ending< UDP_standard_request >sta:clastrig2 ending< UDP_standard_request >sta:clastrig2 ending< UDP_standard_request >sta:clastrig2 ending< UDP_standard_request >sta:clastrig2 ending< UDP_standard_request >sta:clastrig2 ending< UDP_cancel: cancel >sta:clastrig2 active< codaUpdateStatus: updating request done NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! roc_end done codaExecute done coda_roc: processing case 'DA_ENDING': last event=12113 nevents=12314 coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: in DA_ENDING dataInBuf=20212 BBHEAD_BYTES=24 coda_roc: rocp->state == ENDING and input list is empty coda_roc: process rol1 buffer by rol2 completed coda_roc: break loop coda_roc: processing case 'DA_ENDING': last event=12313 nevents=12314 coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: in DA_ENDING dataInBuf=7224 BBHEAD_BYTES=24 coda_roc: rocp->state == ENDING and input list is empty coda_roc: process rol1 buffer by rol2 completed coda_roc: break loop coda_roc: processing case 'DA_ENDING': last event=12313 nevents=12314 coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: in DA_ENDING last event=12313 nevents=12314 coda_roc: rocp->state == ENDING and input list is empty coda_roc: process rol1 buffer by rol2 completed coda_roc: break loop coda_roc: processing case 'DA_ENDING': last event=12313 nevents=12314 coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: in DA_ENDING last event=12313 nevents=12314 coda_roc: rocp->state == ENDING and input list is empty coda_roc: process rol1 buffer by rol2 completed coda_roc: break loop coda_roc: processing case 'DA_ENDING': last event=12313 nevents=12314 coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: in DA_ENDING last event=12313 nevents=12314 coda_roc: rocp->state == ENDING and input list is empty
Sergey B. 5-apr-2010: scrolling messages from all ROCs (start from Download - works fine)
..... event=2405 nevents=2405 coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING': last event=2405 nevents=2405 coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING': last event=2405 nevents=2405 coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING': last event=2405 nevents=2405 coda_roc: processing case 'DA_ENDING' .....
Sergey B. 2-apr-2010: event recorder gives error during Prestart
file[ 0]->/raid/stage_in/clas_062414.A00<- bosopen.c: splitted files handling .. bosopen.c: >/raid/stage_in/clas_062414.A00< bosopen.c: len1=30 bosopen.c: len2=30 bosopen.c: len3=14 bosopen.c: >/raid/stage_in/clas_062414.A00< len=14 bosopen.c: checking how much disk space left in directory /raid/stage_in bosopen.c: total 1403778969 blocks, free 1100988411 blocks, available 1100988411 blocks (1 block = 512 bytes) bosopen.c: so we have 550493696 kbytes available space bosopen.c: we can continue to write file. bosopen.c: check for partition swaping: file->/raid/stage_in/clas_062414.A00<- bosopen.c: system call: >checkdisk 4094< [1] 9253 bosopen.c: system call completed codaUpdateStatus: dbConnecting .. ER: Not attached to ET system er_write_thread loop ended (0 145452664) codaUpdateStatus: dbConnect done codaUpdateStatus: >UPDATE process SET state='paused' WHERE name='ER3'< codaUpdateStatus: dbDisconnecting .. codaUpdateStatus: dbDisconnect done codaUpdateStatus: updating request .. UDP_standard_request >sta:ER3 paused< UDP_standard_request >sta:ER3 paused<
Station TAPE was idle, but et system seems operational; end run (successful), restart coda_er, start from Download - everything works fine.
Sergey B. 18-mar-2010: previous run ended fine, click 'prestart' and got from EB
[0] 0x76fefffe != 0x40acf17e, waiting for the following ROC IDs: 7 9 10 11 17 20 22 25 26 28 29
Similar problem was observed before. Kill rcServer, next run started fine, ROCs were NOT rebooted. CC scans again ???
Sergey B. 14-mar-2010: old problem, can be posted already
If EB was not restarted afre the ROC crash, it sometimes waiting for several ROcs in the end of prestart, however runcontrol allows 'Go' button become active; something wrong with the logic, must check; will be good of course to avoid that situation by making EB reset itself properly on 'reset' transition after the ROC crash
Sergey B. 14-mar-2010: during reboot dc9 gives following
................. ppc/bootscripts/boot_dc9 taskSpawn("TCPSERVER") returns 475458320 bind on port 5001 myname >dc9< -> Query >UPDATE Ports SET Host='dc9',tcpClient_tcp=5001 WHERE Name='dc9'< succeeded INFO(mysql_real_connect9): errno=0 INFO(mysql_real_connect9): OK mysql_real_connect9: error message: 2013/HY000 (Lost connection to MySQL server during query) dbConnect ERROR: mysql == NULL 333333333333333 program Exception current instruction address: 0x1c5 e8734 Machine Status Register: 0x00081000 Condition Register: 0x40000085 Task: 0x1c5e8e50 "ROC" interrupt: Unconnected main interrupt 0 logTask: 3222 log messages lost. interrupt: Unconnected main interrupt 1 interrupt: Unconnected main interrupt 0 interrupt: Unconnected main interrupt 1 interrupt: Unconnected main interrupt 0 interrupt: Unconnected main interrupt 1 interrupt: Unconnected main interrupt 0 interrupt: Unconnected main interrupt 1 interrupt: Unconnected main interrupt 0 ..............
probably mysql connection problem
RESOLVING: will make 5 attempts to to connect waiting 3 seconds in between, then will give up.
Sergey B. 14-mar-2010
If in prestart stage hit abort and reset, all ROCs printing
wait: coda request >exit< in progress
indefinitely, must check
Sergey B. 14-mar-2010: coda_eb x-term hung completely, kill coda_eb but yellow window still stuck; last messages were
codaEnd 10 codaEnd 11 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! codaExecute done CODAtcpServer: start work thread befor: socket=5 address>129.57.71.28< port=59319 wait: coda request >< in progress wait: coda request >< in progress wait: coda request >< in progress wait: coda request >< in progress wait: coda request >< in progress wait: coda request >< in progress Executing >< (len=0) codaExecute: ERROR: len=0 - do nothing CODAtcpServer: start work thread befor: socket=5 address>129.57.71.28< port=59342 wait: coda request >< in progress Executing >TP/1.0
Probably last string contains something, will remove printing
Sergey B. 13-mar-2010: sometimes after bad crash all rocs (even scalers) must be rebooted, otherwise EB is crashing with messages
..... case 0: no swap case 0: no swap case 0: no swap case 0: no swap case 0: no swap case 0: no swap case 0: no swap case 0: no swap case 0: no swap case 0: no swap .. done. [0] FATAL: Event (Num 1 type 1) NUMBER mismatch -- roc[11] (rocid 30) sent -1 (type 18) [0] ERROR: Discard data until next control event clondaq1:coda_eb>
After that problem stays, restarted every daq component (EB, ER, ETs, runcontrol, rcServer) - does not helped. Did daq_exit, daq_start, reboot all rocs again - not it works.
In the same time ROCs were printing messages 'no data', may be it was the source of EB troubles ...
Sergey B.12-mar-2010 around 10:50am: something is breaking coda_eb/coda_er
..... UDP_standard_request >sta:EB1 booted< UDP_standard_request >sta:EB1 booted< UDP_standard_request >sta:EB1 booted< UDP_standard_request >sta:EB1 booted< UDP_cancel: cancel >sta:EB1 booted< codaUpdateStatus: updating request done CODA_Init 14 2 clasprod::EB1> bind on port 5001 DB update: >UPDATE process SET inuse='5001' WHERE name='EB1'< CODAtcpServer: start work thread befor: socket=5 address>129.57.71.28< port=37258 wait: coda request >< in progress wait: coda request >< in progress wait: coda request >< in progress wait: coda request >< in progress wait: coda request >< in progress wait: coda request >< in progress Executing >< (len=0) Segmentation fault (core dumped) clondaq1:coda_eb>
In the same time et_start complains (we have now ET system protected against CC scans):
et ERROR: et_netserver: ET server being probed by non-ET client or read failure et ERROR: et_netserver: ET server being probed by non-ET client or read failure
FIXED by checking for the len=0 in the message string, all other possible checks were implemented already
Sergey B. 12-mar-2010: global reboot gives almost all PMCs (except first two) failing on MySQL access
Done executing startup script $CODA/VXWORKS_ ppc/bootscripts/boot_pmc1 -> INFO(mysql_real_connect9): errno=0 INFO(mysql_real_connect9): OK mysql_real_connect9: error message: 2013/HY000 (Lost connection to MySQL server during query) dbConnect ERROR: mysql == NULL program Exception current instruction address: 0x00000000 Machine Status Register: 0x0008b030 Condition Register: 0x40000085 Task: 0x1e404190 "coda_pmc"
It should be mentioned that we are doing something right now; day before it was no activity and all rebooted fine.
Sergey B. 1-sep-2009: during croctest1 reboot
INFO(mysql_real_connect9): errno=0 INFO(mysql_real_connect9): OK mysql_real_connect9: error message: 2013/HY000 (Lost connection to MySQL server during query) dbConnect ERROR: mysql == NULL
second reboot went fine
Sergey B. 15-june-2009
Just after run was ended noticed that all runs starting from 60073 has 'No configuration!' instead of 'PROD' etc. Correcting manually. Must check if other information is correct, and understand the reason if possible .. Is it related to the database update procedure inplemented around that time ? It appeares that begin_clasprod_xxxxxx.txt log files has that wrong starting from run 60073.
Sergey B. 11-June-2009: swaped ec4 and sc2 and during sc2 boot see following from host
.......................... taskDelay(sysClkRateGet()*5) Args = -session clasprod -objects sc2 ROC -i CODA_Init reached CODA_Init 1 CODA_Init 2 sc2 CODA_Init 11 CODA_Init: objectTy >(null)< CODA_Init 12 CODA_Init: use 'SESSION' as >clasprod< Tcl_AppInit CALLS !!!!! 11-11 11-12 11-13 11111111111111111 22222222222222222 0xdd0e880 (tNetTask): arptnew failed on 8139a743 value = 0 = 0x0 taskSpawn "TCP_SERVER",250,0,100000,tcpServer value = 231091088 = 0xdc62b90 proconhost value = 0 = 0x0 ......................
Second reboot shows the same:
................... 22222222222222222 0xdd0e880 (tNetTask): arptnew failed on 8139a743 >>>>>>>clon10 clasrun 2508 9998<<<<<<<<< 24 list >clon10 clasrun 2508 9998< machine name = clon10 user ID = 2508 group ID = 9998 .................
Sergey B. around 6-June-2009: ec2pmc1 error message
write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print: 255115 20 write_thread: about to print: 255115 20 write_thread: about to print: 255115 20 write_thread: wait= 199 send= 8 microsec per event (nev=12755) CALLING customized : machine check SRAM ERROR : SRAM Error Address : 0x00000000.f2000504 SRAM Error Data Low : 0x006805e0 SRAM Error Data High : 0x00420000 SRAM Error Parity : 0x00000004 SRAM Error Cause : 0x00000001 PCI_0 DEVICE ERROR : PCI_0 Error: Status=0x00000100 Master abort PCI Cmd=7 (MemWr) ByteEnable=0 Par=0 Error Address High: 0x00000000 Error Address Low : 0xf0d10128 CALLING generic : machine check Exception next instruction address: 0x0001f058 Machine Status Register: 0x0012b030 Condition Register: 0x40222042 Task: 0x1e3b0020 "coda_net"
reboot did not helped, have to turn pmc off and run on host only
Sergey B. 4-June-2009: when rebooting all rocs, some cannot connect to DB, as result have following
0x1e42a360 (coda_pmc): bb_new: 'big' buffer created (addr=0x1eba3590, 16 bufs, 3145728 size) coda_pmc: big buffer1 allocated myname >ec1pmc1< Done executing startup script $CODA/VXWORKS_ ppc/bootscripts/boot_pmc1 -> INFO(mysql_real_connect9): errno=0 INFO(mysql_real_connect9): OK program Exception current instruction address: 0x00000000 Machine Status Register: 0x0008b030 Condition Register: 0x40000085 Task: 0x1e42a360 "coda_pmc" -> tt 14cad8 vxTaskEntry +68 : coda_pmc () 1e2275c0 coda_pmc +144: mysql_query () 1e258520 mysql_query +54 : mysql_real_query () 1e249850 mysql_real_query+110: mysql_send_query () 1e249718 mysql_send_query+1d0: 0 () value = 0 = 0x0 ->
for some reason it efefcts mostly PMCs. First 10 or so PMCs booted fine, and then few shows that error, and few following booted fine again, and so on. It looks like DB does not respond right the way, and PMC exits on timeout and do not try again. Should fix that place.
Sergey B. 15-may-2009: last night following error messages from sc2 were reportedly associated with 2 crashes (I was not on shift)
write_thread: about to print: 394603 20 write_thread: about to print: 394603 20 write_thread: wait= 176 send= 6 microsec per event (nev=19730) 0x1b1c7e00 (coda_proc): timer: 9 microsec (min=3 max=909 rms**2=11) proc_thread: wait= 176 send= 12 microsec per event (nev=14534) 0x1b1c7e00 (coda_proc): timer: 9 microsec (min=3 max=909 rms**2=19) 0x1b1c7e00 (coda_proc): timer: 9 microsec (min=3 max=909 rms**2=17) bosMgetid: ERROR: lenn=0 ???: 0x52432d32 0x38363333 0x31313534 0x00696d3d 0x25640a00 bosMgetid: ERROR: name >RC-2< does not described, update clonbanks.ddl file !!! bosMgetid: lenn=4, len=4, nddl1=116 bosMgetid: name=>RC-286331154< try again .. bosMgetid: ERROR: lenn=0 !!!: 0x52432d32 0x38363333 0x31313534 0x00696d3d 0x25640a00 bosMgetid: lenn=4, len=4 bosMgetid: ERROR: name >RC-2< does not descr ibed, update clonbanks.ddl file !!! bosMgetid: lenn=4, len=4, nddl1=116 bosMgetid: name=>RC-286331154< no way !! bosMlink: ERROR: bosMgetid returns -99
interrupt: timer: 48 microsec (min=9 max=195 rms**2=9) interrupt: timer: 45 microsec (min=9 max=195 rms**2=18) interrupt: timer: 45 microsec (min=9 max=195 rms**2=10) interrupt: timer: 48 microsec (min=9 max=195 rms**2=14) setHeartError: 0 >sys 0, mask 14< WARN: HeartBeat[0]: heartbeat=6658(6658) heartmask=14 UDP_cancel: cancel >inf:sc2 sys 0, mask 14< UDP_cancel: cancel >inf:sc2 sys 0, mask 14<
Sergey B. 14-May-2009 5:54am
Got page from clascron@clon10: 'Process monitor: missing process: alarm_server - Online problem: alarm_server did not respond to status pool request'. Found later that stadis was not updating etc. It appeares that smartsockets server was hund somehow on clondb1, for example command 'ipc_info -a clasprod' from clon10 or clon00 did not worked, messages were:
clon00:clasrun> ipc_info -a clasprod 08:08:31: TAL-SS-00088-I Connecting to project <clasprod> on <local> RTserver 08:08:31: TAL-SS-00089-I Using local protocol 08:08:31: TAL-SS-00090-I Could not connect to <local> RTserver 08:08:31: TAL-SS-00093-I Skipping starting <start_never:local> RTserver 08:08:31: TAL-SS-00088-I Connecting to project <clasprod> on <clondb1> RTserver 08:08:31: TAL-SS-00089-I Using local protocol 08:08:31: TAL-SS-00090-I Could not connect to <clondb1> RTserver 08:08:31: TAL-SS-00088-I Connecting to project <clasprod> on <clondb1> RTserver 08:08:31: TAL-SS-00089-I Using tcp protocol ^C
and it waits here. After restarting server on clondb1 by commands
/etc/init.d/smartsockets stop /etc/init.d/smartsockets start
everything came back to normal. We should think about detecting that problem and restarting server automatically, also it happens very rearly, few time a year. Probably it can be done by monitoring cpu usage: when server was hung it was using 100% of cpu:
top - 08:06:16 up 130 days, 8:59, 1 user, load average: 2.13, 2.03, 1.64 Tasks: 92 total, 2 running, 90 sleeping, 0 stopped, 0 zombie Cpu(s): 25.0% us, 0.1% sy, 0.0% ni, 73.0% id, 1.9% wa, 0.0% hi, 0.1% si Mem: 3995356k total, 3968880k used, 26476k free, 88168k buffers Swap: 8385920k total, 208k used, 8385712k free, 3432056k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4471 root 25 0 5908 4540 1956 R 100 0.1 300:24.06 rtserver.x 1 root 16 0 4756 556 460 S 0 0.0 0:11.25 init 2 root RT 0 0 0 0 S 0 0.0 0:03.85 migration/0 3 root 34 19 0 0 0 S 0 0.0 0:10.23 ksoftirqd/0 4 root RT 0 0 0 0 S 0 0.0 0:01.79 migration/1 5 root 34 19 0 0 0 S 0 0.0 1:03.72 ksoftirqd/1 6 root RT 0 0 0 0 S 0 0.0 0:01.62 migration/2
while normally it uses fractions of percent. It is also will be useful to monitor cpu usage in general, few days ago clasrun's sshd on clon10 was taking almost 50% of cpu.
Sergey B. 12-may-2009
Looking for the reason of missing gate messages in all DC crates we found that front end busy cable from SF to FC was disconnected on the floor. For the future we should remember that disconnecting that cable may create missing gate messages. It is also important before every run to use 'dc_all' configuration and look at scope in the counting house to make sure front end busy signal is there.
Sergey B. 16-feb-2009: runcontrol pop-up message while pressing 'Configure'
rcConfigure::loadRcDbaseCbk: Loading database failed !!! On second 'Configure' click it worked.
Sergey B. June-2008
It looks like when one SILO drive goes bad, something is happening in script's logic and it never tries to run 2 streams, only one ! One .temp link can be observed, aling with one presilo1 link which is NOT becoming another .temp. Need check !!!
Sergey B. 7-may-2008: EB1 crashed, last messages
EB: WARNING - resyncronization in crate controller number 5 ... fixed . EB: WARNING - resyncronization in crate controller number 6 ... fixed . EB: WARNING - resyncronization in crate controller number 6 ... fixed . ERROR: lfmt=0 bankid=0
Found that tage2 and tage3 must be rebooted
Sergey B. 5-may-2008
polar crate turned off; turn it back on, but notice that one of str7201 scalers has all lights on; scaler was replaced in 2 days, everything works fine
Sergey B. 30-apr-2008: gam_server on clon04 takes 100% CPU
Found on the web that creating file '/etc/gamin/gaminrc' with following contents will help:
#configuration for gamin # Can be used to override the default behaviour. # notify filepath(s) : indicate to use kernel notification # poll filepath(s) : indicate to use polling instead # fsset fsname method poll_limit : indicate what method of notification for the filesystem # kernel - use the kernel for notification # poll - use polling for notification # none - don't use any notification # # the poll_limit is the number of seconds # that must pass before a resource is polled again. # It is optional, and if it is not present the previous # value will be used or the default. fsset nfs poll 10 # use polling on nfs mounts and poll once every 10 seconds # This will limit polling to every 10 seconds and seams to prevent it from running away
Created file, will see if it helped ..
Sergey B. 20-apr-2008: sc2pmc1 error message (during "end run' ???)
................................................. write_thread: wait= 135 send= 28 microsec per event (nev=12351) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=4) proc_thread: wait= 130 send= 31 microsec per event (nev=14709) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=10) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=15) write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print: 252148 20 write_thread: about to print: 252148 20 write_thread: about to print: 252148 20 write_thread: wait= 135 send= 31 microsec per event (nev=12607) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=19) proc_thread: wait= 135 send= 31 microsec per event (nev=14739) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=19) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=14) ERROR: bufout overflow - skip the rest ... bufout=485322144 hit=485387636 endofbufout=485387680 write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print: 255564 20 write_thread: about to print: 255564 20 write_thread: about to print: 255564 20 write_thread: wait= 128 send= 42 microsec per event (nev=12778) proc_thread: wait= 371 send= 31 microsec per event (nev=6291) ROC # 22 Event # 0 : Bad Block Read signature 0x00000140 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x0150107F -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x015810DC -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x02B80FD6 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x02A010BE -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ..................................................
At the same time at sc2:
...................................... interrupt: timer: 33 microsec (min=6 max=401 rms**2=14) interrupt: timer: 33 microsec (min=6 max=401 rms**2=4) interrupt: timer: 33 microsec (min=6 max=401 rms**2=17) interrupt: timer: 33 microsec (min=6 max=401 rms**2=1) interrupt: timer: 33 microsec (min=6 max=401 rms**2=17) interrupt: timer: 33 microsec (min=6 max=401 rms**2=0) interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums ................................ interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums logTask: 220 log messages lost. interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums logTask: 692 log messages lost. interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums logTask: 544 log messages lost. .................................. interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums logTask: 562 log messages lost. interrupt: SYNC: ERROR: [ 0] slot=16 error_flag=1 - clear logTask: 707 log messages lost. interrupt: SYNC: ERROR: [ 3] slot=19 error_flag=1 - clear logTask: 517 log messages lost. interrupt: SYNC: ERROR: [ 4] slot=20 error_flag=1 - clear logTask: 620 log messages lost. interrupt: SYNC: ERROR: [ 5] slot=21 error_flag=1 - clear logTask: 559 log messages lost. interrupt: SYNC: scan_flag=0x00390000 logTask: 512 log messages lost. interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums logTask: 671 log messages lost. ...................................... logTask: 563 log messages lost. interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums logTask: 385 log messages lost. interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums wait: coda request in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress ....................................... wait: coda request in progress wait: coda request in progress interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: TRIGGER ERROR: no pool buffer available wait: coda request in progress wait: coda request in progress .....................................
Note: sc2 never warn about wrong slot number before that moment.
FIX: it seems there is a bug in 1190/1290-related rols and library: NBOARDS set to 21 and several arrays allocated with that length, but actual board maximum number is 21; NBOARDS was set to 22 everywhere, will test next time
Sergey B. 31-mar-2008: CC scans CODA !!! messages from EB
wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=59063 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=60187 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=60295 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=60418 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=60823 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=32961 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=33122 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=33795 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=34251 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=34529 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=35811 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=36134 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=41569 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=53913 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=57701 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=33357 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=36136 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=39509 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=39900 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=45174 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=45909 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=34521 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=49023 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=49526 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=53555 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=55694 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=56833 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=58251 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=59023 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=33687 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=35731 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=51581 wait: coda request in progress wait: coda request in progress Error(old rc): nRead=18, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=41371 wait: coda request in progress Error(old rc): nRead=6, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=43198 wait: coda request in progress Error(old rc): nRead=10, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=55260 wait: coda request in progress Error(old rc): nRead=0, must be 1032 clasprod::EB1> ^C
Similar messages were coming from ER.
ET reported following:
TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure
and another ET:
....... et_start: pthread_create(0x0000000d,...) done TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure et INFO: et_sys_heartmonitor, kill bad process (2,3063) et INFO: et_sys_heartmonitor, cleanup process 2 et INFO: set_fix_nprocs, change # of ET processes from 2 to 2 et INFO: set_fix_natts, station GRAND_CENTRAL has 0 attachments et INFO: set_fix_natts, station LEVEL3 has 1 attachments et INFO: set_fix_natts, station ET2ER has 1 attachments et INFO: set_fix_natts, # total attachments 2 -> 2 et INFO: set_fix_natts, proc 0 has 1 attachments et INFO: set_fix_natts, proc 1 has 1 attachments
Also noticed that CODAs (production and test setup) do not like each other: reported following during 'configure':
Query test_ts2 table failed: Error reading table'test_ts2' definition ec3 ec3
If one runcontrol is restarted, then it works, but another one complains !!!
Sergey B. 31-mar-2008
'run_log_comment.tcl' and 'rlComment' cannot recognize xml comments when extraction level1 trigger file name from <l1trig> tag, it seems using first line after <l1trig> tag without looking into "" comment sign.
Sergey B. 10-mar-2008
seems found error in run control: Xui/src.s/rcMenuWindow.cc parameters XmNpaneMinimum and XmNpaneMaximum were both set to 480, as result run control gui area above log messages window was not big enough; set to 100 and 900 respectively, will ask Jie
23-jan-2008
ER3 crashed several times during last 3 weeks, mostly (only ?) during 'End' transition; todays core file:
(dbx) where [1] 0xfe9e4c20(0x8068f9d), at 0xfe9e4c20 [2] codaExecute(0xce4fdbc0, 0xce4fdbc0, 0x1, 0x8068c39), at 0x8068f9d [3] CODAtcpServerWorkTask(0x811fa00, 0x0, 0x0, 0xce4fdff8, 0xfea60020, 0xfe591400), at 0x8068d3a [4] 0xfea5fd36(0xfe591400, 0x0, 0x0, ), at 0xfea5fd36 [5] 0xfea60020(), at 0xfea60020 (dbx)
14-nov-2007
first week of running G9A: crashes observed in ec1 (twice), tage3, scaler1, clastrig2, EB; no feather details were obtained so far
Sergey B. 3-nov-2007: after about 26Mevents during the run sc2pmc1 started to print following
ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000080 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x03C80ADD -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x8A90097F -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00680BD9 -> resyncronize !!!
end run failed. Reboot sc2. During end transition ec2 froze with message:
interrupt: timer: 32 microsec (min=19 max=86 rms**2=18) 0x1a05fdf0 (twork0005): sfiUserEnd: INFO: Last Event 26723663, status=0 (0x1ca648c8 0x1ca648c0) 0x1a05fdf0 (twork0005): data: 0x00000003 0x0007014f 0x00120000 0x00000000 0xc8009181 0xc0001181 0x1a05fdf0 (twork0005): jw1 : 0x00000000 0x0197c54f 0x00000003 0x0007014f 0x00120000 0x00000000 0x1a05fdf0 (twork0005): Last DMA status = 0x200000b count=11 blen=11 0x1a05fdf0 (twork0006): sfiUserEnd: ERROR: Last Transfer Event NUMBER 26723663, status = 0x1a000 (0x90001181 0x88001181 0x80009181 0x78001181) 0x1a05fdf0 (twork0006): SFI_SEQ_ERR: Sequencer not Enabled
Reboot ec2. Started new run 55463, everything looks normal.
Sergey B. 2-nov-2007: during the run ec2 started to print on tsconnect screen
Unknown error errno=65 Unknown error errno=65 Unknown error errno=65 Unknown error errno=65 Unknown error errno=65
data taking continues, but runcontrol printed message:
WARN : ec2 has not reported status for 1516 seconds ERROR : ec2 is in state disconnected should be active
ec2pmc1 looked fine; end run failed, need to reboot ec2