ERROR Book
- Sergey B. 20-apr-2008: sc2pmc1 error message (during "end run' ???)
................................................. write_thread: wait= 135 send= 28 microsec per event (nev=12351) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=4) proc_thread: wait= 130 send= 31 microsec per event (nev=14709) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=10) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=15) write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print: 252148 20 write_thread: about to print: 252148 20 write_thread: about to print: 252148 20 write_thread: wait= 135 send= 31 microsec per event (nev=12607) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=19) proc_thread: wait= 135 send= 31 microsec per event (nev=14739) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=19) 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=14) ERROR: bufout overflow - skip the rest ... bufout=485322144 hit=485387636 endofbufout=485387680 write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print: 255564 20 write_thread: about to print: 255564 20 write_thread: about to print: 255564 20 write_thread: wait= 128 send= 42 microsec per event (nev=12778) proc_thread: wait= 371 send= 31 microsec per event (nev=6291) ROC # 22 Event # 0 : Bad Block Read signature 0x00000140 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x0150107F -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x015810DC -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x02B80FD6 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x02A010BE -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ..................................................
At the same time at sc2:
...................................... interrupt: timer: 33 microsec (min=6 max=401 rms**2=14) interrupt: timer: 33 microsec (min=6 max=401 rms**2=4) interrupt: timer: 33 microsec (min=6 max=401 rms**2=17) interrupt: timer: 33 microsec (min=6 max=401 rms**2=1) interrupt: timer: 33 microsec (min=6 max=401 rms**2=17) interrupt: timer: 33 microsec (min=6 max=401 rms**2=0) interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums ................................ interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums logTask: 220 log messages lost. interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums logTask: 692 log messages lost. interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums logTask: 544 log messages lost. .................................. interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums logTask: 562 log messages lost. interrupt: SYNC: ERROR: [ 0] slot=16 error_flag=1 - clear logTask: 707 log messages lost. interrupt: SYNC: ERROR: [ 3] slot=19 error_flag=1 - clear logTask: 517 log messages lost. interrupt: SYNC: ERROR: [ 4] slot=20 error_flag=1 - clear logTask: 620 log messages lost. interrupt: SYNC: ERROR: [ 5] slot=21 error_flag=1 - clear logTask: 559 log messages lost. interrupt: SYNC: scan_flag=0x00390000 logTask: 512 log messages lost. interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums logTask: 671 log messages lost. ...................................... logTask: 563 log messages lost. interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums logTask: 385 log messages lost. interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums wait: coda request in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress ....................................... wait: coda request in progress wait: coda request in progress interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums interrupt: TRIGGER ERROR: no pool buffer available wait: coda request in progress wait: coda request in progress .....................................
- Sergey B. 31-mar-2008: CC scans CODA !!! messages from EB:
wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=59063 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=60187 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=60295 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=60418 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=60823 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=32961 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=33122 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=33795 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=34251 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=34529 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=35811 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=36134 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=41569 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=53913 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=57701 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=33357 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=36136 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=39509 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=39900 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=45174 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=45909 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=34521 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=49023 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=49526 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=53555 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=55694 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=56833 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=58251 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=59023 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=33687 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=35731 wait: coda request in progress Error(old rc): nRead=0, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=51581 wait: coda request in progress wait: coda request in progress Error(old rc): nRead=18, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=41371 wait: coda request in progress Error(old rc): nRead=6, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=43198 wait: coda request in progress Error(old rc): nRead=10, must be 1032 CODAtcpServer: start work thread befor: socket=5 address>129.57.71.38< port=55260 wait: coda request in progress Error(old rc): nRead=0, must be 1032 clasprod::EB1> ^C
Similar messages were coming from ER.
ET reported following:
TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure
and another ET:
....... et_start: pthread_create(0x0000000d,...) done TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure TCP server got a connection so spawn thread et ERROR: et_client_thread: read failure et INFO: et_sys_heartmonitor, kill bad process (2,3063) et INFO: et_sys_heartmonitor, cleanup process 2 et INFO: set_fix_nprocs, change # of ET processes from 2 to 2 et INFO: set_fix_natts, station GRAND_CENTRAL has 0 attachments et INFO: set_fix_natts, station LEVEL3 has 1 attachments et INFO: set_fix_natts, station ET2ER has 1 attachments et INFO: set_fix_natts, # total attachments 2 -> 2 et INFO: set_fix_natts, proc 0 has 1 attachments et INFO: set_fix_natts, proc 1 has 1 attachments
Also noticed that CODAs (production and test setup) do not like each other: reported following during 'configure':
Query test_ts2 table failed: Error reading table'test_ts2' definition ec3 ec3
If one runcontrol is restarted, then it works, but another one complains !!!
- Sergey B. 31-mar-2008: 'run_log_comment.tcl' and 'rlComment' cannot recognize xml comments when extraction level1 trigger file name from <l1trig> tag, it seems using first line after <l1trig> tag without looking into "" comment sign.
- Sergey B. 10-mar-2008: seems found error in run control: Xui/src.s/rcMenuWindow.cc parameters
XmNpaneMinimum and XmNpaneMaximum were both set to 480, as result run control gui area above log messages window was not big enough; set to 100 and 900 respectively, will ask Jie
- 23-jan-2008: ER3 crashed several times during last 3 weeks, mostly (only ?) during 'End' transition; todays core file:
(dbx) where [1] 0xfe9e4c20(0x8068f9d), at 0xfe9e4c20 [2] codaExecute(0xce4fdbc0, 0xce4fdbc0, 0x1, 0x8068c39), at 0x8068f9d [3] CODAtcpServerWorkTask(0x811fa00, 0x0, 0x0, 0xce4fdff8, 0xfea60020, 0xfe591400), at 0x8068d3a [4] 0xfea5fd36(0xfe591400, 0x0, 0x0, ), at 0xfea5fd36 [5] 0xfea60020(), at 0xfea60020 (dbx)
- 14-nov-2007: first week of running G9A: crashes observed in ec1 (twice), tage3, scaler1, clastrig2, EB; no feather details were obtained so far
- Sergey B. 3-nov-2007: after about 26Mevents during the run sc2pmc1 started to print following:
ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000040 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000080 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x03C80ADD -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x8A90097F -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00000060 -> resyncronize !!! ROC # 22 Event # 0 : Bad Block Read signature 0x00680BD9 -> resyncronize !!!
end run failed. Reboot sc2. During end transition ec2 froze with message:
interrupt: timer: 32 microsec (min=19 max=86 rms**2=18) 0x1a05fdf0 (twork0005): sfiUserEnd: INFO: Last Event 26723663, status=0 (0x1ca648c8 0x1ca648c0) 0x1a05fdf0 (twork0005): data: 0x00000003 0x0007014f 0x00120000 0x00000000 0xc8009181 0xc0001181 0x1a05fdf0 (twork0005): jw1 : 0x00000000 0x0197c54f 0x00000003 0x0007014f 0x00120000 0x00000000 0x1a05fdf0 (twork0005): Last DMA status = 0x200000b count=11 blen=11 0x1a05fdf0 (twork0006): sfiUserEnd: ERROR: Last Transfer Event NUMBER 26723663, status = 0x1a000 (0x90001181 0x88001181 0x80009181 0x78001181) 0x1a05fdf0 (twork0006): SFI_SEQ_ERR: Sequencer not Enabled
Reboot ec2. Started new run 55463, everything looks normal.
- Sergey B. 2-nov-2007: during the run ec2 started to print on tsconnect screen:
Unknown error errno=65 Unknown error errno=65 Unknown error errno=65 Unknown error errno=65 Unknown error errno=65
data taking continues, but runcontrol printed message:
WARN : ec2 has not reported status for 1516 seconds ERROR : ec2 is in state disconnected should be active
ec2pmc1 looked fine; end run failed, need to reboot ec2