ERROR Book: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
Boiarino (talk | contribs)
No edit summary
Boiarino (talk | contribs)
No edit summary
Line 1: Line 1:
* Sergey B. 14-May-2009 5:54am: got page from clascron@clon10: 'Process monitor: missing process: alarm_server - Online problem: alarm_server did not respond to status pool request'. Found later that stadis was not updating etc. It appeares that smartsockets server was hund somehow on clondb1, for example command 'ipc_info -a clasprod' from clon10 did not worked, messages were:
* Sergey B. 16-feb-2009: runcontrol pop-up message while pressing 'Configure':
* Sergey B. 16-feb-2009: runcontrol pop-up message while pressing 'Configure':
  rcConfigure::loadRcDbaseCbk: Loading database failed !!!
  rcConfigure::loadRcDbaseCbk: Loading database failed !!!

Revision as of 07:25, 14 May 2009

  • Sergey B. 14-May-2009 5:54am: got page from clascron@clon10: 'Process monitor: missing process: alarm_server - Online problem: alarm_server did not respond to status pool request'. Found later that stadis was not updating etc. It appeares that smartsockets server was hund somehow on clondb1, for example command 'ipc_info -a clasprod' from clon10 did not worked, messages were:


  • Sergey B. 16-feb-2009: runcontrol pop-up message while pressing 'Configure':
rcConfigure::loadRcDbaseCbk: Loading database failed !!!

On second 'Configure' click it worked.

  • Sergey B. June-2008: it looks like when one SILO drive goes bad, something is happening in script's logic and it never tries to run 2 streams, only one ! One .temp link can be observed, aling with one presilo1 link which is NOT becoming another .temp. Need check !!!
  • Sergey B. 7-may-2008: EB1 crashed, last messages:
EB: WARNING - resyncronization in crate controller number  5 ...  fixed
.
EB: WARNING - resyncronization in crate controller number  6 ...  fixed
.
EB: WARNING - resyncronization in crate controller number  6 ...  fixed
.
ERROR: lfmt=0 bankid=0

Found that tage2 and tage3 must be rebooted


  • Sergey B. 5-may-2008: polar crate turned off; turn it back on, but notice that

one of str7201 scalers has all lights on; scaler was replaced in 2 days, everything works fine


  • Sergey B. 30-apr-2008: gam_server on clon04 takes 100% CPU

Found on the web that creating file '/etc/gamin/gaminrc' with following contents will help:

#configuration for gamin
# Can be used to override the default behaviour.
# notify filepath(s) : indicate to use kernel notification
# poll filepath(s)   : indicate to use polling instead
# fsset fsname method poll_limit : indicate what method of notification for the filesystem
#                                  kernel - use the kernel for notification
#                                  poll - use polling for notification
#                                  none - don't use any notification
#
#                                  the poll_limit is the number of seconds
#                                  that must pass before a resource is polled again.
#                                  It is optional, and if it is not present the previous
#                                  value will be used or the default.
fsset nfs poll 10
# use polling on nfs mounts and poll once every 10 seconds
# This will limit polling to every 10 seconds and seams to prevent it from running away

Created file, will see if it helped ..


  • Sergey B. 20-apr-2008: sc2pmc1 error message (during "end run' ???)
.................................................
write_thread: wait=    135 send=     28 microsec per event (nev=12351)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=4)
proc_thread: wait=    130 send=     31 microsec per event (nev=14709)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=10)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=15)
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print: 252148 20
write_thread: about to print: 252148 20
write_thread: about to print: 252148 20
write_thread: wait=    135 send=     31 microsec per event (nev=12607)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=19)
proc_thread: wait=    135 send=     31 microsec per event (nev=14739)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=19)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=14)
 ERROR: bufout overflow - skip the rest ...
        bufout=485322144 hit=485387636 endofbufout=485387680
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print: 255564 20
write_thread: about to print: 255564 20
write_thread: about to print: 255564 20
write_thread: wait=    128 send=     42 microsec per event (nev=12778)
proc_thread: wait=    371 send=     31 microsec per event (nev=6291)
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000140 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x0150107F -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x015810DC -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x02B80FD6 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x02A010BE -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
.................................................. 


At the same time at sc2:

......................................
interrupt: timer: 33 microsec (min=6 max=401 rms**2=14)
interrupt: timer: 33 microsec (min=6 max=401 rms**2=4)
interrupt: timer: 33 microsec (min=6 max=401 rms**2=17)
interrupt: timer: 33 microsec (min=6 max=401 rms**2=1)
interrupt: timer: 33 microsec (min=6 max=401 rms**2=17)
interrupt: timer: 33 microsec (min=6 max=401 rms**2=0)
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
................................
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
logTask: 220 log messages lost.
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
logTask: 692 log messages lost.
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
logTask: 544 log messages lost.
..................................
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
logTask: 562 log messages lost.
interrupt: SYNC: ERROR: [ 0] slot=16 error_flag=1 - clear
logTask: 707 log messages lost.
interrupt: SYNC: ERROR: [ 3] slot=19 error_flag=1 - clear
logTask: 517 log messages lost.
interrupt: SYNC: ERROR: [ 4] slot=20 error_flag=1 - clear
logTask: 620 log messages lost.
interrupt: SYNC: ERROR: [ 5] slot=21 error_flag=1 - clear
logTask: 559 log messages lost.
interrupt: SYNC: scan_flag=0x00390000
logTask: 512 log messages lost.
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
logTask: 671 log messages lost.
......................................
logTask: 563 log messages lost.
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
logTask: 385 log messages lost.
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
wait: coda request in progress
codaExecute reached, message >end<, len=3
codaExecute: 'end' transition
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
.......................................
wait: coda request in progress
wait: coda request in progress
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: TRIGGER ERROR: no pool buffer available
wait: coda request in progress
wait: coda request in progress
.....................................

Note: sc2 never warn about wrong slot number before that moment.

FIX: it seems there is a bug in 1190/1290-related rols and library: NBOARDS set to 21 and several arrays allocated with that length, but actual board maximum number is 21; NBOARDS was set to 22 everywhere, will test next time


  • Sergey B. 31-mar-2008: CC scans CODA !!! messages from EB:
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=59063
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=60187
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=60295
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=60418
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=60823
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=32961
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=33122
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=33795
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=34251
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=34529
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=35811
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=36134
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=41569
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=53913
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=57701
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=33357
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=36136
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=39509
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=39900
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=45174
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=45909
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=34521
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=49023
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=49526
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=53555
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=55694
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=56833
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=58251
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=59023
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=33687
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=35731
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=51581
wait: coda request in progress
wait: coda request in progress
Error(old rc): nRead=18, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=41371
wait: coda request in progress
Error(old rc): nRead=6, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=43198
wait: coda request in progress
Error(old rc): nRead=10, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=55260
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
clasprod::EB1> ^C

Similar messages were coming from ER.

ET reported following:

TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure  

and another ET:

.......
et_start: pthread_create(0x0000000d,...) done
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
et INFO: et_sys_heartmonitor, kill bad process (2,3063)
et INFO: et_sys_heartmonitor, cleanup process 2
et INFO: set_fix_nprocs, change # of ET processes from 2 to 2
et INFO: set_fix_natts, station GRAND_CENTRAL has 0 attachments
et INFO: set_fix_natts, station LEVEL3 has 1 attachments
et INFO: set_fix_natts, station ET2ER has 1 attachments
et INFO: set_fix_natts, # total attachments 2 -> 2
et INFO: set_fix_natts, proc 0 has 1 attachments
et INFO: set_fix_natts, proc 1 has 1 attachments 

Also noticed that CODAs (production and test setup) do not like each other: reported following during 'configure':

 Query test_ts2 table failed: Error reading table'test_ts2' definition
        ec3                                        ec3

If one runcontrol is restarted, then it works, but another one complains !!!

  • Sergey B. 31-mar-2008: 'run_log_comment.tcl' and 'rlComment' cannot recognize xml comments when extraction level1 trigger file name from <l1trig> tag, it seems using first line after <l1trig> tag without looking into "" comment sign.
  • Sergey B. 10-mar-2008: seems found error in run control: Xui/src.s/rcMenuWindow.cc parameters

XmNpaneMinimum and XmNpaneMaximum were both set to 480, as result run control gui area above log messages window was not big enough; set to 100 and 900 respectively, will ask Jie

  • 23-jan-2008: ER3 crashed several times during last 3 weeks, mostly (only ?) during 'End' transition; todays core file:
(dbx) where
  [1] 0xfe9e4c20(0x8068f9d), at 0xfe9e4c20 
  [2] codaExecute(0xce4fdbc0, 0xce4fdbc0, 0x1, 0x8068c39), at 0x8068f9d 
  [3] CODAtcpServerWorkTask(0x811fa00, 0x0, 0x0, 0xce4fdff8, 0xfea60020, 0xfe591400), at 0x8068d3a 
  [4] 0xfea5fd36(0xfe591400, 0x0, 0x0, ), at 0xfea5fd36 
  [5] 0xfea60020(), at 0xfea60020 
(dbx)
  • 14-nov-2007: first week of running G9A: crashes observed in ec1 (twice), tage3, scaler1, clastrig2, EB; no feather details were obtained so far
  • Sergey B. 3-nov-2007: after about 26Mevents during the run sc2pmc1 started to print following:
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000080 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x03C80ADD -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x8A90097F -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00680BD9 -> resyncronize !!!

end run failed. Reboot sc2. During end transition ec2 froze with message:

interrupt: timer: 32 microsec (min=19 max=86 rms**2=18)
0x1a05fdf0 (twork0005): sfiUserEnd: INFO: Last Event 26723663, status=0 (0x1ca648c8 0x1ca648c0)
0x1a05fdf0 (twork0005): data: 0x00000003 0x0007014f 0x00120000 0x00000000 0xc8009181 0xc0001181
0x1a05fdf0 (twork0005): jw1 : 0x00000000 0x0197c54f 0x00000003 0x0007014f 0x00120000 0x00000000
0x1a05fdf0 (twork0005): Last DMA status = 0x200000b count=11 blen=11
0x1a05fdf0 (twork0006): sfiUserEnd: ERROR: Last Transfer Event NUMBER 26723663, status = 0x1a000 (0x90001181 0x88001181 0x80009181  0x78001181)
0x1a05fdf0 (twork0006): SFI_SEQ_ERR: Sequencer not Enabled

Reboot ec2. Started new run 55463, everything looks normal.

  • Sergey B. 2-nov-2007: during the run ec2 started to print on tsconnect screen:
Unknown error errno=65
Unknown error errno=65
Unknown error errno=65
Unknown error errno=65
Unknown error errno=65

data taking continues, but runcontrol printed message:

WARN   : ec2 has not reported status for 1516 seconds
ERROR  : ec2 is in state disconnected should be active

ec2pmc1 looked fine; end run failed, need to reboot ec2