ERROR Book: Difference between revisions

Revision as of 11:29, 14 April 2010

Sergey B. 14-apr-2010: message from runcontrol during cancel-reset:

Get all components failed: Lost connection to MySQL server during query

Sergey B. 6-apr-2010: EB waiting for following ROCs:

[0] 0x76fefffe != 0x008cf17e, waiting for the following ROC IDs:
 7  9 10 11 17 20 21 22 25 26 28 29 30

Files produced:

clon10:ccscans> ll | grep Apr
drwxrwxrwx   2 clasrun  onliners    4096 Apr  2 16:23 ./
drwxr-xr-x 163 clasrun  100        73728 Apr  6  2010 ../
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_cc1*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:09 good_clastrig2*
-rwxrwxrwx   1 boiarino clas          34 Apr  2 16:25 good_croctest10*
-rwxrwxrwx   1 boiarino clas          34 Apr  2 16:25 good_croctest2*
-rwxrwxrwx   1 boiarino clas          34 Apr  2 16:21 good_croctest3*
-rwxrwxrwx   1 baturin  clas          34 Apr  2 13:26 good_ctoftest1*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_dc1*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_dc10*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_dc11*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_dc2*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_dc3*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_dc4*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_dc5*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_dc6*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_dc7*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_dc8*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_dc9*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_ec1*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_ec2*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_ec3*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_ec4*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_polar*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_sc1*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_sc2*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_scaler1*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_scaler2*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_scaler3*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:26 good_scaler4*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_tage*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_tage2*
-rwxrwxrwx   1 clasrun  onliners      34 Apr  6 20:08 good_tage3* 
clon10:ccscans> more good_dc7
accepted link from >129.57.68.80<
clon10:ccscans> more good_dc8
accepted link from >129.57.68.74<
clon10:ccscans>

On 'Reset' EB crashed, outputs:

[0] 0x76fefffe != 0x008cf17e, waiting for the following ROC IDs:
 7  9 10 11 17 20 21 22 25 26 28 29 30

[0] 0x76fefffe != 0x008cf17e, waiting for the following ROC IDs:
 7  9 10 11 17 20 21 22 25 26 28 29 30

CODAtcpServer: start work thread
befor: socket=5 address>129.57.167.14< port=34468
wait: coda request >prestart< in progress
codaExecute reached, message >exit<, len=4
codaExecute: 'exit' transition
debShutdownBuild 1
cb_delete roc=0
cb_delete roc=1
cb_delete roc=2
cb_delete roc=3
cb_delete roc=4
cb_delete roc=5
cb_delete roc=6
cb_delete roc=7
cb_delete roc=8
cb_delete roc=9
cb_delete roc=10
cb_delete roc=11
cb_delete roc=12
cb_delete roc=13
cb_delete roc=14
cb_delete roc=15
cb_delete roc=16
cb_delete roc=17
cb_delete roc=18
cb_delete roc=19
cb_delete roc=20
cb_delete roc=21
cb_delete roc=22
cb_delete roc=23
cb_delete roc=24
cb_delete roc=25
cb_delete roc=26
cb_delete roc=27
cb_delete roc=28
cb_delete roc=29
cb_delete roc=30
cb_delete roc=31
debShutdownBuild 2
Stop waiting for the rocs because of force_end condition
[0] ROC mask 0x76fefffe force_end 1 end_event_done 0
[0] NEED TO TEST AND PROBABLY REDESIGN THAT PART !!!
[0] et_event_new() 1
[0] EVENT_ENCODE_SPEC 2 ..
BOS_encode_spec reached
BOS_encode_spec done: len=116
[0] .. done
[0] handle_build_cleanup
[0] flush ET system
[0] detached from ET
[0] remove mutex locks and shutdown fifos
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[00] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[00] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[00] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[01] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[01] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[01] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[02] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[02] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[02] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[03] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[03] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[03] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[04] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[04] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[04] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[05] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[05] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[05] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[06] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[06] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[06] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[07] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[07] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[07] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[08] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[08] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[08] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[09] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[09] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[09] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 3
[0] count for roc[10] = 3
CCCCCCCCCC: 3
CCCCCCCCCC: 1
[0] count for roc[10] = 1
CCCCCCCCCC: 1
CCCCCCCCCC: 0
[0] count for roc[10] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 2
[0] count for roc[11] = 2
CCCCCCCCCC: 2
CCCCCCCCCC: 0
[0] count for roc[11] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 2
[0] count for roc[12] = 2
CCCCCCCCCC: 2
CCCCCCCCCC: 0
[0] count for roc[12] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 2
[0] count for roc[13] = 2
CCCCCCCCCC: 2
CCCCCCCCCC: 0
[0] count for roc[13] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[14] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[15] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[16] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[17] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[18] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[19] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[20] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[21] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[22] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[23] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[24] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[25] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[26] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[27] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[28] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[29] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[30] = 0
CCCCCCCCCC: 0
[0] flush input streams
CCCCCCCCCC: 0
[0] count for roc[31] = 0
CCCCCCCCCC: 0
[0] ============= Build threads cleaned
[0] ============= Build threads cleaned
[0] ============= Build threads cleaned
[0] ============= Build threads cleaned
[0] ============= Build threads cleaned
[0] build thread exiting: 1 1
[0] The implementation has detected that the value  specified by threa
d does not refer to a joinable thread.
wait: coda request >exit< in progress
polling_routine ================!!!!!!!!!!!!!!!!!!=====
cancel building threads
=c=============================================
=c=============================================
=c=============================================
=c====
>>>>> close link from >dc1< link=0x081c2af0
debCloseLink: theLink=0x081c2af0 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >dc2< link=0x081c4a90
debCloseLink: theLink=0x081c4a90 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >dc3< link=0x081c4b28
debCloseLink: theLink=0x081c4b28 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >dc4< link=0x081c14f8
debCloseLink: theLink=0x081c14f8 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >dc5< link=0x081c1590
debCloseLink: theLink=0x081c1590 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >dc6< link=0x081c1628
debCloseLink: theLink=0x081c1628 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >dc8< link=0x081c16c0
debCloseLink: theLink=0x081c16c0 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >cc1< link=0x081c1a08
debCloseLink: theLink=0x081c1a08 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >sc1< link=0x081c1aa0
debCloseLink: theLink=0x081c1aa0 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >ec1< link=0x081c1b38
debCloseLink: theLink=0x081c1b38 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >ec2< link=0x081c1bd0
debCloseLink: theLink=0x081c1bd0 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >scaler2< link=0x681d95c8
debCloseLink: theLink=0x681d95c8 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >scaler3< link=0x681d9660
debCloseLink: theLink=0x681d9660 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >scaler4< link=0x681d96f8
debCloseLink: theLink=0x681d96f8 -> closing
debCloseLink: link is down
debCloseLink: free memory
debCloseLink: done.
=c====
>>>>> close link from >ec4< link=0x00000001
debCloseLink: theLink=0x00000001 -> closing
Segmentation fault
clondaq1:coda_eb>

DC7:

codaExecute done
wait: coda request >download prod< in progress
codaExecute reached, message >prestart<, len=8
codaExecute: 'prestart' transition
roc_prestart reached
partReInitAll() reached ? 
partReInitAll() reached !
partReInitAll() done
rocCleanup reached
partReInitAll() reached ?
wait: coda request >prestart< in progress
partReInitAll() reached !
partReInitAll() done
315: dbsock=410567520
>>> prestarting, run 62496, type 47
rocMask=0x76fefffe
!!!!!!!!!!!!! tmpp >SELECT outputs FROM prod WHERE name='dc7'<
!!!!!!!!!!!!! tmpp >SELECT outputs FROM prod WHERE name='dc7'<
!!!!!!!!!!!!! tmpp >SELECT outputs FROM prod WHERE name='dc7'<
++++> output to network
!!!!!!!!!!!!! for rocOpenLinks: >dc7< >EB1< 
3
!!!!!!!!!!!!! for rocOpenLinks: >dc7< >EB1<
!!!!!!!!!!!!! for rocOpenLinks: >dc7< >EB1<
!!!!!!!!!!!!! for rocOpenLinks: >dc7< >EB1<
rocCloselink reached
rocOpenLinks reached
LINK_constructor_C: set host to >dc7<
LINK_constructor_C: set name to >dc7->EB1< 
3121: dbsock=410567520
name selected
nrow=1
fields [0] >>>TCP<<<
fields [1] >>>clondaq1-daq1wait: coda req<<<
uest >fields [pr2estart] >>>< in prodowngress
<<<
fields [3] >>>37683<<<
parsing results: type=>TCP< host=>clondaq1-daq1< state=>down< port=>37683< -> 37683
LINK_establish: socket # 17
LINK_establish: socket buffer size is 48000(0x0000bb80) bytes
LINK_establish: keepAlive is 8
LINK_establish: socket 17 is ready: host 129.57.68.22 port 37683
DB command >UPDATE links SET state='up' WHERE name='dc7->EB1'<
UPDATE process success
Spawn proc/net threads
Spawn proc/net threads
Spawn proc/net threads
0x18772230 (twork0028INFO: Entering Us): er 
Prestart 2
INIT_NAME: rolp->daproc = 
trans_rol2:2
adr(nddl)=1db60750 NWBOS=16384
uthbook1: WARN id=1047 already exist - will be replaced
uthbook1: free histogram buffer ... done.
RAW=0  PROFILE=1  NOTRANS=0
MAX_EVENT_LENGTH = 65536 NWBOS = 16384
0x18772230INFO: User Prest (art 2 executed
twork0028): CTRIGRSS: set handler and done, 
code=1
0x18772230 (twork0028): CTRIGRSS: 0x1877da9c
0x1877d9c0 0x1877d2e0 0x00000001 0x1877d2d8
wait: coda request >prestart< in progress
bignetptr=0x1d366fe0 offset=0x00000000
taskSpawn("coda_net",0x187d4130) returns 490106848
wait: coda request >prestart< in progress
setHeartError: 0 >sys 0, mask 1<
WARN: HeartBeat[0]: heartbeat=19424(19424) h
eartmask=1
bignetptr=0x1d366fe0 offset=0x00000000
bigprocptr=0x1d366db8 offset=0x00000000
taskSpawn("coda_proc",0x1874a580) returns 490106296
0x18772230 (twork0028): INIT_NAME: rolp->dap
roc = 2
0x18772230 (twork0028): INFO: Entering User 
Prestart
disconnecting vector 236 
0x18772230 (twork0028): CTRIGRSA: set handler and done, code=1
0x18772230 (twork0028): CTRIGRSA: 0x1882e7bc
0x1882d864 0x00000001 0x1882b0c8
Sequencer Status Register = FFFFA000
Last Sequencer KeyAddress = FFFF0003
Fastbus Status Register1  = FFFFF000
Fastbus Status Register2  = FFFF0000
Fastbus Last Primary Addr = 00000019
spds_mask1: 0xfdfbf0
0x18772230_tdc1877 ===> (4twork0028wait: co)
:  Use 32-bit Ram List
da request >firsprestartt< in progre csr0=ss
100 (1100) csr18=1364 csr1=6400001b
_tdc1877 ===>5 middle csr0=100 (1900) csr18=
1364 csr1=6400001b
_tdc1877 ===>6 middle csr0=100 (1900) csr18=
1364 csr1=6400001b
_tdc1877 ===>7 middle csr0=100 (1900) csr18=
1364 csr1=6400001b
_tdc1877 ===>8 middle csr0=100 (1900) csr18=
1364 csr1=6400001b
_tdc1877 ===>9 last csr0=100 (900) csr18=1364 csr1=6400001b
_tdc1877 ===>11 first csr0=100 (1100) csr18=
1364 csr1=6400001b
_tdc1877 ===>12 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>13 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>14 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>15 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>16 last csr0=100 (900) csr18=13
bignetptr=0x641d366fe0 csr1= offset=0x00bigp
rocptr=0x000001d366db80 offset=0x6400000000
00001b
bigproc at 0x
1d366db8, bigproc.gbigBuffer at 0x_tdc1877 =
==>1d366dc818 -> 0x1d366dc8
proc_thread reacheUDP_cancel: cancel >inf:d
c7 sys 0, mask 1d
<
firstUDP_cancel: cancel >proc_threainf:dc7 sys 0, mask 1d reached
<
wait: cproc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
csr0=oda request >1prestart00< in progr (ess1100) csr18=1364 csr1=6400001b
_tdc1877 ===>19 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>20 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>21 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>22 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>23 last csr0=100 (900) csr18=13
64 csr1=6400001b
DEBUG in SFIRamLoad => 0 1 0 0 0
slot#   1872  1875  1881(2)  1877(2)  1877S(2)
[ 0]      0     0      0 0bignetptr=0x      
1d366fe00 offset=0x0000000 0
0       0 0
[ 1]      0     0      0 0      0 0       0 
0
[ 2]      0     0      0 0      0 0       0 
0
[ 3]      0     0      0 0      0 0       0 
0
[ 4]      0     0      0 0      1 0       0 
0
[ 5]      0     0      0 0      1 0       0 
0
[ 6]      0     0      0 0      1 0       0 
0
[ 7]      0     0      0 0      1 0       0 
0
[ 8]      0     0      0 0      1 0       0 
0
[ 9]      0     0      0 0      1 2       0 
0
[10]      wait: coda reques0t >     prestart
0<       in progress
0 0      0 0       0 0
[11]      0     0      0 0      1 0       0 
0
[12]      0     0      0 0      1 0       0 
0
[13]      0     0      0 0      1 0       0 
0
[14]      0     0      0 0      1 0       0 
0
[15]      0     0      0 0      1 0       0 
0
[16]      0     0      0 0      1 2       0 
0
[17]      0     0      0 0      0 0       0 
0
[18]      0     0      0 0      1 0       0 
0
[19]      0     0      0 0      1 0       0 
0
[20] bignet at 0x     1d366fe00, bignet.gbig
Buffer at 0x     1d366ff00 -> 0x1d366ff0
     0 0      1 0       0 0
[21]      0     0      0 0      1 0       0 
0
[22]      0     0      0 0      1 0       0 
0
[23]      0     0      0 0      1 2       0 
0
[24]      0     0      0 0      0 0       0 
0
[25]      0     0      0 0      0 0       0 
0
DEBUG 0 1 0 0 
INFO: datascan = 0x00000000
INFO: datascan&spds_mask1 = 0x00000000
rcname >RC07<
0x18772230 (twork0028111): INFO: User Prestart Executed
333: rocp->primwait: coda request >efd=prest
art17< in p
rogress
informEB reached
informEB: 11 -1 7 1 17 0 - 0x00000004 0x001101cc 0x000004b0 0x0000f420 0x0000002f 0x00000000
codaUpdateStatus: dbConnecting ..
codaUpdateStatus: dbConnect done
codaUpdateStatu>>>>>>>>>>>>>>>> use pid=-1 <
<<<<<<<<<<<<<<<<
s: >UPDATE process SET state='paused' WHERE 
name='dc7'<
codaUpdateStatus: dbDisconnecting ..
codaUpdateStatus: dbDisconnect done
codaUpdateStatus: updating request .. 
UDP_standard_request >sta:dc7 paused<
UDP_standard_request >sta:dc7 paused<
UDP_standard_request >sta:dc7 paused<
UDP_standard_request >sta:dc7 paused<
UDP_standard_request >sta:dc7 paused<
UDP_standard_request >sta:dc7 paused<
UDP_cancel: cancel >sta:dc7 downloaded<
codaUpdateStatus: updating request done
prestarted
POLLS: 0 0
codaExecute done
-> 
->

DC8:

codaExecute: 'prestart' transition
roc_prestart reached
partReInitAll() reached ?
partReInitAll() reached !
partReInitAll() done
rocCleanup reached
partReInitAll() reached ?
wait: coda request >prestart< in progress
partReInitAll() reached !
partReInitAll() done
315: dbsock=410505056
>>> prestarting, run 62496, type 47
rocMask=0x76fefffe
!!!!!!!!!!!!! tmpp >SELECT outputs FROM prod
 WHERE name='dc8'<
!!!!!!!!!!!!! tmpp >SELECT outputs FROM prod
 WHERE name='dc8'<
!!!!!!!!!!!!! tmpp >SELECT outputs FROM prod
 WHERE name='dc8'<
++++> output to network
!!!!!!!!!!!!! for rocOpenLinks: >dc8< >EB1< 
3
!!!!!!!!!!!!! for rocOpenLinks: >dc8< >EB1<
!!!!!!!!!!!!! for rocOpenLinks: >dc8< >EB1<
!!!!!!!!!!!!! for rocOpenLinks: >dc8< >EB1<
rocCloselink reached
rocOpenLinks reached
LINK_constructor_C: set host to >dc8<
LINK_constructor_C: set name to >dc8->EB1<
3121: dbsock=410505056
name selected
nrow=1
fields [0] >>>TCP<<<
fields [1] >>>clondaq1-daq1wait: coda re<<<
quest >fields [p2restart] >>>< in prupogress
<<<
fields [3] >>>37772<<<
parsing results: type=>TCP< host=>clondaq1-d
aq1< state=>up< port=>37772< -> 37772
LINK_establish: socket # 17
LINK_establish: socket buffer size is 48000(
0x0000bb80) bytes
LINK_establish: keepAlive is 8
LINK_establish: socket 17 is ready: host 129
.57.68.22 port 37772
DB command >UPDATE links SET state='up' WHER
E name='dc8->EB1'<
UPDATE process success
Spawn proc/net threads
Spawn proc/net threads
Spawn proc/net threads
0x187ef280INFO: Entering U (ser Prestart 2
twork0028
trans_rol2:):  adr(nddl)=INIT_NAME: rolp->da
proc = 1d2b60750
 NWBOS=16384
uthbook1: WARN id=1048 already exist - will 
be replaced
uthbook1: free histogram buffer ... done.
RAW=0  PROFILE=1  NOTRANS=0
MAX_EVENT_LENGTH = 65536 NWBOS = 16384
0x187ef280 (twork0028INFO: User Pr): estart 
2 executedCTRIGRSS: set handler and done, co
de=
1
0x187ef280 (twork0028): CTRIGRSS: 0x187705ac
 0x187704d0 0x1876fdf0 0x00000001 0x1876fde8
wait: coda request >prestart< in progress
bignetptr=0x1d366fe0 offset=0x00000000
taskSpawn("coda_net",0x187d4110) returns 490
106848
wait: coda request >prestart< in progress
setHeartError: 0 >sys 0, mask 1<
WARN: HeartBeat[0]: heartbeat=19424(19424) h
eartmask=1
bignetptr=0x1d366fe0 offset=0x00000000
bigprocptr=0x1d366db8 offset=0x00000000
taskSpawn("coda_proc",0x1874a560) returns 49
0106296
0x187ef280 (twork0028): INIT_NAME: rolp->dap
roc = 2
0x187ef280 (twork0028): INFO: Entering User 
Prestart
disconnecting vector 236 
0x187ef280 (twork0028): CTRIGRSA: set handle
r and done, code=1
0x187ef280 (twork0028): CTRIGRSA: 0x1882e13c
 0x1882d1e4 0x00000001 0x1882aa48
Sequencer Status Register = FFFFA000
Last Sequencer KeyAddress = FFFF0003
Fastbus Status Register1  = FFFFF000
Fastbus Status Register2  = FFFF0000
Fastbus Last Primary Addr = wait: coda req00
0000uest >19presta
rtspds_mask1: 0x< ifdfbf0n progress 
0x187ef280_tdc1877 === (>twork0028): Use 32-
bit Ram List
4 first csr0=100 (1100) csr18=1364 csr1=6400
001b
_tdc1877 ===>5 middle csr0=100 (1900) csr18=
1364 csr1=6400001b
_tdc1877 ===>6 middle csr0=100 (1900) csr18=
1364 csr1=6400001b
_tdc1877 ===>7 middle csr0=100 (1900) csr18=
1364 csr1=6400001b
_tdc1877 ===>8 middle csr0=100 (1900) csr18=
1364 csr1=6400001b
_tdc1877 ===>9 last csr0=100 (900) csr18=136
4 csr1=6400001b
_tdc1877 ===>11 first csr0=100 (1100) csr18=
1364 csr1=6400001b
_tdc1877 ===>12 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>13 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>14 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>15 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>16 last csr0bignetptr=0x=1d366f
e0100 offset=0x0000000 (0
900) csr18=1364bigprocptr=0x csr1=1d366db864
00001b offset=0x
00000000
bigproc at 0x_tdc1877 ===1d366db8>, bigproc.
gbigBuffer at 0x181d3UDP_cancel: cancel >inf
:dc8 sys 0, mask 166dc8<
wait: coda rUDP_cancel: cancel >inf:dc8 sys 
0, mask 1 -> 0x<
1d366dc8eq
firstproc_thread reached
uest >proc_thread reached
 csr0=proc_thread reached
prproc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
proc_thread reached
100estart (< in prog1100ress
) csr18=1364 csr1=6400001b
_tdc1877 ===>19 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>20 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>21 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>22 middle csr0=100 (1900) csr18
=1364 csr1=6400001b
_tdc1877 ===>23 last csr0=100 (900) csr18=13
64 csr1=6400001b
DEBUG in SFIRamLoad => 0 1 0 0 0
slot#   1872  bignetptr=0x1875  181d366fe081
(2)  1877( offset=0x2)  1877S(00000002)
0
[ 0]      0     0      0 0      0 0       0 
0
[ 1]      0     0      0 0      0 0       0 
0
[ 2]      0     0      0 0      0 0       0 
0
[ 3]      0     0      0 0      0 0       0 
0
[ 4]      0     0      0 0      1 0       0 
0
[ 5]      0     0      0 0      1 0       0 
0
[ 6]      0     0      0 0      1 0       0 
0
[ 7]      0     0      0 0      1 0       0 
0
[ 8]      0     0      0 0      1 0       0 
0
[ 9]      0     0      0 0      1 2       0 
0
[10]      wait: coda request0 >     prestart
0< i      n progress
0 0      0 0       0 0
[11]      0     0      0 0      1 0       0 
0
[12]      0     0      0 0      1 0       0 
0
[13]      0     0      0 0      1 0       0 
0
[14]      0     0      0 0      1 0       0 
0
[15]      0     0      0 0      1 0       0 
0
[16]      0     0      0 0      1 2       0 
0
[17]      0     0      0 0      0 0       0 
0
[18]      0     0      0 0      1 0       0 
0
[19]      0 bignet at 0x    1d366fe00, bigne
t.gbigBuffer at 0x      1d366ff00 -> 0x1d366
ff0
 0      1 0       0 0
[20]      0     0      0 0      1 0       0 
0
[21]      0     0      0 0      1 0       0 
0
[22]      0     0      0 0      1 0       0 
0
[23]      0     0      0 0      1 2       0 
0
[24]      0     0      0 0      0 0       0 
0
[25]      0     0      0 0      0 0       0 
0
DEBUG 0 1 0 0 
INFO: datascan = 0x00000000
INFO: datascan&spds_mask1 = 0x00000000
rcname >RC08<
0x187ef280 (twork0028111
): INFO: User Prestart Executed
333: rocp->primwait: coda request >efd=prest
art17< in p
rogress
informEB reached
informEB: 11 -1 8 1 17 0 - 0x00000004 0x0011
01cc 0x000004b0 0x0000f420 0x0000002f 0x0000
0000
codaUpdateStatus: dbConnecting ..
codaUpdateStatus: dbConnect done
codaUpdateStatus: >UPDATE process SET state=
'paused' WH>>>>>>>>>>>>>>>> use pid=-1 <<<<<
<<<<<<<<<<<<
ERE name='dc8'<
codaUpdateStatus: dbDisconnecting ..
codaUpdateStatus: dbDisconnect done
codaUpdateStatus: updating request ..
UDP_standard_request >sta:dc8 paused<
UDP_standard_request >sta:dc8 paused<
UDP_standard_request >sta:dc8 paused<
UDP_standard_request >sta:dc8 paused<
UDP_standard_request >sta:dc8 paused<
UDP_standard_request >sta:dc8 paused<
UDP_cancel: cancel >sta:dc8 downloaded<
codaUpdateStatus: updating request done
prestarted
POLLS: 0 0
codaExecute done
.....

Sergey B. 6-apr-2010: after TS problem described below, polar self rebooted on next download:

........

wait: coda request >download prod< in progress
wait: coda request >download prod< in progress
wait: coda request >download prod< in progress
3123: dbsock=59553504
3123: tmpp>SELECT id FROM runTypes WHERE name='prod'<
====1==== !!! object->runType=47
token_interval=64
object->name >polar<
big0.proc_on_pmc=0, big0.net_on_pmc=0
big1.proc_on_pmc=0, big1.net_on_pmc=0
name selected
nrow=1
download: name >polar<
code selected
nrow=1
download: code >{$CODA/VXWORKS_ppc/rol/moller.o usr} {$CODA/VXWORKS_ppc/rol/rol2_tt.o usr} <
nrols [0] >$CODA/VXWORKS_ppc/rol/moller.o usr<
nrols [1] >$CODA/VXWORKS_ppc/rol/rol2_tt.o usr<
listArgc=2 listArgv >$CODA/VXWORKS_ppc/rol/moller.o usr< >$CODA/VXWORKS_ppc/rol/rol2_tt.o usr<
set this_roc_id = 25, rocId() can be called from now on
ObjInitName >_moller__init< 
wait: coda request >download prod< in progress
wait: coda request >download prod< in progress
wait: coda request >download prod< in progress
wait: coda request >download prod< in progress
wait: coda request >download prod< in progress
INFO: >_moller__init()< routine found
readout list >$CODA/VXWORKS_ppc/rol/moller.o< loaded
allocate 200 buffers with length 65536 bytes
partCreate1() reached
partCreate1(): pPart=0x008e7570
listInit: cleanup 24 bytes starting from address 0x008e75a0
partCreate1() done
0x81c7e0 (twork0004): INIT_NAME: rolp->daproc = 0
0x81c7e0 (twork0004): INIT_NAME: Initializing new rol structures for SCALROL1
0x81c7e0 (twork0004): INIT_NAME: MAX_EVENT_LENGTH = 65536 bytes, MAX_EVENT_POOL = 200
0x81c7e0 (twork0004): INIT_NAME: name >SCALROL1:pool<
0x81c7e0 (twork0004): Init - Done
0x81c7e0 (twork0004): INIT_NAME: rolp->daproc = rol11: downloading DDL table ...
wait: coda request >download prod< in progress
wait: coda request >download prod< in progress
Exception at interrupt level:
Exception next instruction address: 0xfff901f0
Fixed Point Register: 0x00000012
Condition Register: 0x40482044
Regs at 0x64e498
Press any key to stop auto-boot...
2
[VxWorks Boot]:

It suppose to be as following:

.....
0x91ae30 (twork0001): INIT_NAME: rolp->daproc = 0
0x91ae30 (twork0001): INIT_NAME: Initializing new rol structures for SCALROL1
0x91ae30 (twork0001): INIT_NAME: MAX_EVENT_LENGTH = 65536 bytes, MAX_EVENT_POOL = 200
rol1: downloading DDL table ...
0x91ae30 (twork0001): INIT_NAME: name >SCALROL1:pool<
0x91ae30 (twork0001): Init - Done
0x91ae30 (twork0001): INIT_NAME: rolp->daproc = 1
wait: coda request >download prod< in progress
adr1(nddl_)=0x037b69c8
N   name (nname)     fmt (nfmt)     ncol
[ 1]   >PTRN< (4)         >B32< (3)     1
[ 2]   >PSYN< (4)         >B32< (3)     1
[ 3]   >RC00< (4)         >B32< (3)     1
[ 4]   >RC01< (4)         >B32< (3)     1
.....

Sergey B. 6-apr-2010: TS busy condition at the beginning of the run, all rocs have 205 events, clastrig2 has 200 events, ts2status output:

-> ts2status
CSR 1 (0x0):
                                     Go : 1
           Pause on Next scheduled Sync : 0
                         Sync and Pause : 0
                    Initiate Sync Event : 0
               Initiate Program 1 Event : 0
               Initiate Program 2 Event : 1
        Enable Level 1 (drives outputs) : 1
                       Override Inhibit : 0
                              Test Mode : 0
                               Reserved : 0
                                  Reset : 0
                             Initialize : 0
                    Sync Event occurred : 0
               Program 1 Event occurred : 1
               Program 2 Event occurred : 0
                     Late Fail occurred : 0
                       Inhibit occurred : 1
              Write FIFO error occurred : 0
              Read FIFO error occurreds : 1
CSR 2 (0x4):
                     Enable Scheduled Sync : 1
                    Use Clear Permit Timer : 1
                      Use Front Busy Timer : 1
                      Use Clear Hold Timer : 1
                   Use External Front Busy : 1
                         Lock ROC Branch 1 : 0
                         Lock ROC Branch 2 : 0
                         Lock ROC Branch 3 : 0
                         Lock ROC Branch 4 : 0
                         Lock ROC Branch 5 : 0
        Enable Program 1 front panel input : 1
                          Enable Interrupt : 1
               Enable local ROC (branch 5) : 1
Trigger Control Register (0x8): 0x00001fff
ROC Enable Register (0xc) val=0xf07fff7f:
Branch 1: 0x7f  bits: 01111111
Branch 2: 0xff  bits: 11111111
Branch 3: 0x7f  bits: 01111111
Branch 4: 0xf0  bits: 11110000
Synchronization Interval Register (0x10): 1000
Trigger Word Count Register (0x14): 0
Trigger Data Register (0x18): 256
Local ROC (Branch 5) Data Register (0x1c): 60
                      Synchronization Flag : 0
                            Late Fail Flag : 0
                                  ROC code : 15
Input Trigger Prescale Registers:
                Input 1 Prescale Factor : 0
                Input 2 Prescale Factor : 0
                Input 3 Prescale Factor : 0
                Input 4 Prescale Factor : 0
                Input 5 Prescale Factor : 0
                Input 6 Prescale Factor : 0
                Input 7 Prescale Factor : 0
                Input 8 Prescale Factor : 0
Clear Permit Timer Register  = 0
Level2 Accept Timer Register = 83
Level3 Accept Timer Register = 0
Front Busy Timer Register    = 325
Clear Hold Timer Register    = 50
Branch (1-4) ROC Buffer Status Register (0x58): 0x40404040
Branch 1: Buffer Count = 0, Empty Flag = 1, Full Flag = 0
Branch 2: Buffer Count = 0, Empty Flag = 1, Full Flag = 0
Branch 3: Buffer Count = 0, Empty Flag = 1, Full Flag = 0
Branch 4: Buffer Count = 0, Empty Flag = 1, Full Flag = 0
Local ROC (Branch 5) Buffer Status Register (0x5c): 0xffff8086
Branch 5: Buffer Count = 6, Empty Flag = 0, Full Flag = 1,Local Acknowledge = 0
ROC Acknowledge Status Register (0x60): val=0x0
Branch 1: 0x 0  bits: 00000000 (enabled: 01111111)
Branch 2: 0x 0  bits: 00000000 (enabled: 11111111)
Branch 3: 0x 0  bits: 00000000 (enabled: 01111111)
Branch 4: 0x 0  bits: 00000000 (enabled: 11110000)
State Register (0x6C):
                      Level 1 Accept     : 0
                   Start Level 2 Trigger : 0
                    Level 2 Pass Latched : 0
                    Level 2 Fail Latched : 0
                          Level 2 Accept : 0
                   Start Level 3 Trigger : 0
                    Level 3 Pass Latched : 0
                    Level 3 Fail Latched : 0
                          Level 3 Accept : 0
                                   Clear : 0
               Front End Busy (external) : 0
                        External Inhibit : 0
                         Latched Trigger : 0
                                 TS Busy : 1
                               TS Active : 1
                                TS Ready : 0
                   Main Sequencer Active : 0
        Synchronization Sequencer Active : 0
        Program 1 Event Sequencer Active : 1
        Program 2 Event Sequencer Active : 0
Event Count (0xc8): 135
Live 1 Count (0xcc): 54866
Live 2 Count (0xd0): 9290375
value = 1 = 0x1
-> exit

clasprod::clastrig2> clastrig2 statistics
200 0 0 0

Sergey B. 6-apr-2010: end failed, only one ROC was in use (clastrig2), messages:

interrupt: Program Events 1-2:      247        0
interrupt: live time = 93 percent (gated=1956740 ungated=2094143)
interrupt: live_corr = 93 percent (gated=1956740 ungated=2094143)
proc_thread: waiting=   3743 processing=     30 microsec per event (nev=404)
rols_thread: waiting=    754 processing=     21 microsec per event (nev=454)
wait: coda request >go< in progress
codaExecute reached, message >end<, len=3
codaExecute: 'end' transition
roc_end reached ??
roc_end reached !!
0x18c6f5c0 (twork0003): INIT_NAME: rolp->daproc = 3
UDP_cancel: cancel >ts2: 93 0 0 0 0 0 0 0 0 507 0 0 0 <
interrupt: inputs  1-6:        0        0        0        0        0        0
interrupt: inputs 7-12:        0        0      339        0        0        0
interrupt: Program Events 1-2:      172        0
interrupt: live time = 97 percent (gated=1358086 ungated=1400034)
interrupt: live_corr = 97 percent (gated=1358086 ungated=1400034)
wait: coda request >end< in progress
0x18c6f5c0 (twork0003): TS GO bit cleared
0x18c6f5c0 (twork0003): TS csr:  0xfc000000
interrupt: TRIG ERROR: no pool buffer available
0x18c6f5c0 (twork0003): setForceSyncInterval: forceSyncInterval set to 0
0x18c6f5c0 (twork0003): INFO: User End Executed
codaUpdateStatus: dbConnecting ..
codaUpdateStatus: dbConnect done
codaUpdateStatus: >UPDATE process SET state='ending' WHERE name='clastrig2'<
codaUpdateStatus: dbDisconnecting ..
codaUpdateStatus: dbDisconnect done
codaUpdateStatus: updating request ..
UDP_standard_request >sta:clastrig2 ending<
UDP_standard_request >sta:clastrig2 ending<
UDP_standard_request >sta:clastrig2 ending<
UDP_standard_request >sta:clastrig2 ending<
UDP_standard_request >sta:clastrig2 ending<
UDP_standard_request >sta:clastrig2 ending<
UDP_cancel: cancel >sta:clastrig2 active<
codaUpdateStatus: updating request done
NOW !!!!!!!!!!!!!!!!!!!
NOW !!!!!!!!!!!!!!!!!!!
NOW !!!!!!!!!!!!!!!!!!!
NOW !!!!!!!!!!!!!!!!!!!
NOW !!!!!!!!!!!!!!!!!!!
roc_end done
codaExecute done
coda_roc: processing case 'DA_ENDING': last event=12113 nevents=12314
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: in DA_ENDING dataInBuf=20212 BBHEAD_BYTES=24
coda_roc: rocp->state == ENDING and input list is empty
coda_roc: process rol1 buffer by rol2 completed
coda_roc: break loop
coda_roc: processing case 'DA_ENDING': last event=12313 nevents=12314
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: in DA_ENDING dataInBuf=7224 BBHEAD_BYTES=24
coda_roc: rocp->state == ENDING and input list is empty
coda_roc: process rol1 buffer by rol2 completed
coda_roc: break loop
coda_roc: processing case 'DA_ENDING': last event=12313 nevents=12314
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: in DA_ENDING last event=12313 nevents=12314
coda_roc: rocp->state == ENDING and input list is empty
coda_roc: process rol1 buffer by rol2 completed
coda_roc: break loop
coda_roc: processing case 'DA_ENDING': last event=12313 nevents=12314
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: in DA_ENDING last event=12313 nevents=12314
coda_roc: rocp->state == ENDING and input list is empty
coda_roc: process rol1 buffer by rol2 completed
coda_roc: break loop
coda_roc: processing case 'DA_ENDING': last event=12313 nevents=12314
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: in DA_ENDING last event=12313 nevents=12314
coda_roc: rocp->state == ENDING and input list is empty

Sergey B. 5-apr-2010: scrolling messages from all ROCs (start from Download - works fine):

.....
event=2405 nevents=2405
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING': last 
event=2405 nevents=2405
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING': last 
event=2405 nevents=2405
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING'
coda_roc: processing case 'DA_ENDING': last 
event=2405 nevents=2405
coda_roc: processing case 'DA_ENDING'
.....

Sergey B. 2-apr-2010: event recorder gives error during Prestart:

file[ 0]->/raid/stage_in/clas_062414.A00<-
bosopen.c: splitted files handling ..
bosopen.c: >/raid/stage_in/clas_062414.A00<
bosopen.c: len1=30
bosopen.c: len2=30
bosopen.c: len3=14
bosopen.c: >/raid/stage_in/clas_062414.A00< len=14
bosopen.c: checking how much disk space left in directory /raid/stage_in
bosopen.c: total 1403778969 blocks, free 1100988411 blocks, available 
1100988411 blocks (1 block = 512 bytes)
bosopen.c: so we have 550493696 kbytes available space
bosopen.c: we can continue to write file.
bosopen.c: check for partition swaping: file->/raid/stage_in/clas_062414.A00<-
bosopen.c: system call: >checkdisk  4094<
[1] 9253
bosopen.c: system call completed
codaUpdateStatus: dbConnecting ..
ER: Not attached to ET system
er_write_thread loop ended (0 145452664)
codaUpdateStatus: dbConnect done
codaUpdateStatus: >UPDATE process SET state='paused' WHERE name='ER3'<
codaUpdateStatus: dbDisconnecting ..
codaUpdateStatus: dbDisconnect done
codaUpdateStatus: updating request ..
UDP_standard_request >sta:ER3 paused<
UDP_standard_request >sta:ER3 paused<

Station TAPE was idle, but et system seems operational; end run (successful), restart coda_er, start from Download - everything works fine.

Sergey B. 18-mar-2010: previous run ended fine, click 'prestart' and got from EB:

[0] 0x76fefffe != 0x40acf17e, waiting for the following ROC IDs:
7  9 10 11 17 20 22 25 26 28 29

Similar problem was observed before. Kill rcServer, next run started fine, ROCs were NOT rebooted. CC scans again ???

Sergey B. 14-mar-2010: old problem, can be posted already: if EB was not restarted afre the ROC crash, it sometimes waiting for several ROcs in the end of prestart, however runcontrol allows 'Go' button become active; something wrong with the logic, must check; will be good of course to avoid that situation by making EB reset itself properly on 'reset' transition after the ROC crash

Sergey B. 14-mar-2010: during reboot dc9 gives following:

.................
ppc/bootscripts/boot_dc9
taskSpawn("TCPSERVER") returns 475458320
bind on port 5001
myname >dc9<
-> Query >UPDATE Ports SET Host='dc9',tcpClient_tcp=5001 WHERE Name='dc9'< succeeded
INFO(mysql_real_connect9): errno=0
INFO(mysql_real_connect9): OK
mysql_real_connect9: error message: 2013/HY000 (Lost connection to MySQL server during query)
dbConnect ERROR: mysql == NULL
333333333333333
program
Exception current instruction address: 0x1c5
e8734
Machine Status Register: 0x00081000
Condition Register: 0x40000085
Task: 0x1c5e8e50 "ROC"
interrupt: Unconnected main interrupt 0
logTask: 3222 log messages lost.
interrupt: Unconnected main interrupt 1
interrupt: Unconnected main interrupt 0
interrupt: Unconnected main interrupt 1
interrupt: Unconnected main interrupt 0
interrupt: Unconnected main interrupt 1
interrupt: Unconnected main interrupt 0
interrupt: Unconnected main interrupt 1
interrupt: Unconnected main interrupt 0
..............

probably mysql connection problem

RESOLVING: will make 5 attempts to to connect waiting 3 seconds in between, then will give up.

Sergey B. 14-mar-2010: if in prestart stage hit abort and reset, all ROCs printing

wait: coda request >exit< in progress

indefinitely, must check

Sergey B. 14-mar-2010: coda_eb x-term hung completely, kill coda_eb but yellow window still stuck; last messages were:

codaEnd 10
codaEnd 11
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
codaExecute done
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.28< port=59319
wait: coda request >< in progress
wait: coda request >< in progress
wait: coda request >< in progress
wait: coda request >< in progress
wait: coda request >< in progress
wait: coda request >< in progress
Executing >< (len=0)
codaExecute: ERROR: len=0 - do nothing
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.28< port=59342
wait: coda request >< in progress
Executing >TP/1.0

Probably last string contains something, will remove printing

Sergey B. 13-mar-2010: sometimes after bad crash all rocs (even scalers) must be rebooted, otherwise EB is crashing with messages:

.....
case 0: no swap
case 0: no swap
case 0: no swap
case 0: no swap
case 0: no swap
case 0: no swap
case 0: no swap
case 0: no swap
case 0: no swap
case 0: no swap
 .. done.
[0] FATAL: Event (Num 1 type 1) NUMBER mismatch -- roc[11] (rocid 30) 
sent -1 (type 18)
[0] ERROR: Discard data until next control event
clondaq1:coda_eb>

After that problem stays, restarted every daq component (EB, ER, ETs, runcontrol, rcServer) - does not helped. Did daq_exit, daq_start, reboot all rocs again - not it works.

In the same time ROCs were printing messages 'no data', may be it was the source of EB troubles ...

Sergey B.12-mar-2010 around 10:50am: something is breaking coda_eb/coda_er:

.....
UDP_standard_request >sta:EB1 booted<
UDP_standard_request >sta:EB1 booted<
UDP_standard_request >sta:EB1 booted<
UDP_standard_request >sta:EB1 booted<
UDP_cancel: cancel >sta:EB1 booted<
codaUpdateStatus: updating request done
CODA_Init 14
2
clasprod::EB1> bind on port 5001
DB update: >UPDATE process SET inuse='5001' WHERE name='EB1'<
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.28< port=37258
wait: coda request >< in progress
wait: coda request >< in progress
wait: coda request >< in progress
wait: coda request >< in progress
wait: coda request >< in progress
wait: coda request >< in progress
Executing >< (len=0)
Segmentation fault (core dumped)
clondaq1:coda_eb>

In the same time et_start complains (we have now ET system protected against CC scans):

et ERROR: et_netserver: ET server being probed by non-ET client or read failure
et ERROR: et_netserver: ET server being probed by non-ET client or read failure

FIXED by checking for the len=0 in the message string, all other possible checks were implemented already

Sergey B. 12-mar-2010: global reboot gives almost all PMCs (except first two) failing on MySQL access:

Done executing startup script $CODA/VXWORKS_
ppc/bootscripts/boot_pmc1
-> INFO(mysql_real_connect9): errno=0
INFO(mysql_real_connect9): OK
mysql_real_connect9: error message: 2013/HY000 (Lost connection to MySQL server during query)
dbConnect ERROR: mysql == NULL
program
Exception current instruction address: 0x00000000
Machine Status Register: 0x0008b030
Condition Register: 0x40000085
Task: 0x1e404190 "coda_pmc"

It should be mentioned that we are doing something right now; day before it was no activity and all rebooted fine.

Sergey B. 1-sep-2009: during croctest1 reboot:

INFO(mysql_real_connect9): errno=0
INFO(mysql_real_connect9): OK
mysql_real_connect9: error message: 2013/HY000 (Lost connection to MySQL server during query)
dbConnect ERROR: mysql == NULL

second reboot went fine

Sergey B. 15-june-2009: just after run was ended noticed that all runs starting from 60073 has 'No configuration!' instead of 'PROD' etc. Correcting manually. Must check if other information is correct, and understand the reason if possible .. Is it related to the database update procedure inplemented around that time ?

It appeares that begin_clasprod_xxxxxx.txt log files has that wrong starting from run 60073.

Sergey B. 11-June-2009: swaped ec4 and sc2 and during sc2 boot see following from host:

..........................
taskDelay(sysClkRateGet()*5)
 Args = -session clasprod -objects sc2 ROC -i
CODA_Init reached
CODA_Init 1
CODA_Init 2
sc2
CODA_Init 11
CODA_Init: objectTy >(null)<
CODA_Init 12
CODA_Init: use 'SESSION' as >clasprod<
Tcl_AppInit CALLS !!!!!
11-11
11-12
11-13
11111111111111111
22222222222222222
0xdd0e880 (tNetTask): arptnew failed on 8139a743
value = 0 = 0x0
taskSpawn "TCP_SERVER",250,0,100000,tcpServer
value = 231091088 = 0xdc62b90
proconhost
value = 0 = 0x0
......................

Second reboot shows the same:

...................
22222222222222222
0xdd0e880 (tNetTask): arptnew failed on 8139a743
>>>>>>>clon10 clasrun 2508 9998<<<<<<<<< 24
list >clon10 clasrun 2508 9998<
machine name = clon10
user ID      = 2508
group ID     = 9998
.................

Sergey B. around 6-June-2009: ec2pmc1 error message:

write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print: 255115 20
write_thread: about to print: 255115 20
write_thread: about to print: 255115 20
write_thread: wait=    199 send=      8 microsec per event (nev=12755)
CALLING customized :
machine check 
SRAM ERROR :
SRAM Error Address    : 0x00000000.f2000504
SRAM Error Data Low   : 0x006805e0
SRAM Error Data High  : 0x00420000
SRAM Error Parity     : 0x00000004
SRAM Error Cause      : 0x00000001
PCI_0 DEVICE ERROR :
PCI_0 Error: Status=0x00000100
Master abort
PCI Cmd=7 (MemWr)  ByteEnable=0  Par=0
Error Address High: 0x00000000
Error Address Low : 0xf0d10128
CALLING generic    :
machine check
Exception next instruction address: 0x0001f058
Machine Status Register: 0x0012b030
Condition Register: 0x40222042
Task: 0x1e3b0020 "coda_net"

reboot did not helped, have to turn pmc off and run on host only

Sergey B. 4-June-2009: when rebooting all rocs, some cannot connect to DB, as result have following:

0x1e42a360 (coda_pmc): bb_new: 'big' buffer 
created (addr=0x1eba3590, 16 bufs, 3145728 size)
coda_pmc: big buffer1 allocated
myname >ec1pmc1<
Done executing startup script $CODA/VXWORKS_
ppc/bootscripts/boot_pmc1
-> INFO(mysql_real_connect9): errno=0
INFO(mysql_real_connect9): OK
program
Exception current instruction address: 0x00000000
Machine Status Register: 0x0008b030
Condition Register: 0x40000085
Task: 0x1e42a360 "coda_pmc"
-> tt
14cad8 vxTaskEntry    +68 : coda_pmc ()
1e2275c0 coda_pmc       +144: mysql_query ()
1e258520 mysql_query    +54 : mysql_real_query ()
1e249850 mysql_real_query+110: mysql_send_query ()
1e249718 mysql_send_query+1d0: 0 ()
value = 0 = 0x0
->

for some reason it efefcts mostly PMCs. First 10 or so PMCs booted fine, and then few shows that error, and few following booted fine again, and so on. It looks like DB does not respond right the way, and PMC exits on timeout and do not try again. Should fix that place.

Sergey B. 15-may-2009: last night following error messages from sc2 were reportedly associated with 2 crashes (I was not on shift):

write_thread: about to print: 394603 20
write_thread: about to print: 394603 20
write_thread: wait=    176 send=      6 microsec per event (nev=19730)
0x1b1c7e00 (coda_proc): timer: 9 microsec (min=3 max=909 rms**2=11)
proc_thread: wait=    176 send=     12 microsec per event (nev=14534)
0x1b1c7e00 (coda_proc): timer: 9 microsec (min=3 max=909 rms**2=19)
0x1b1c7e00 (coda_proc): timer: 9 microsec (min=3 max=909 rms**2=17)
bosMgetid: ERROR: lenn=0
???: 0x52432d32 0x38363333 0x31313534 0x00696d3d 0x25640a00
bosMgetid: ERROR: name >RC-2< does not described, update clonbanks.ddl file !!!
bosMgetid: lenn=4, len=4, nddl1=116
bosMgetid: name=>RC-286331154<
try again ..
bosMgetid: ERROR: lenn=0
!!!: 0x52432d32 0x38363333 0x31313534 0x00696d3d 0x25640a00
bosMgetid: lenn=4, len=4
bosMgetid: ERROR: name >RC-2< does not descr
ibed, update clonbanks.ddl file !!!
bosMgetid: lenn=4, len=4, nddl1=116
bosMgetid: name=>RC-286331154<
no way !!
bosMlink: ERROR: bosMgetid returns -99

interrupt: timer: 48 microsec (min=9 max=195 rms**2=9)
interrupt: timer: 45 microsec (min=9 max=195 rms**2=18)
interrupt: timer: 45 microsec (min=9 max=195 rms**2=10)
interrupt: timer: 48 microsec (min=9 max=195 rms**2=14)
setHeartError: 0 >sys 0, mask 14<
WARN: HeartBeat[0]: heartbeat=6658(6658) heartmask=14
UDP_cancel: cancel >inf:sc2 sys 0, mask 14<
UDP_cancel: cancel >inf:sc2 sys 0, mask 14<

Sergey B. 14-May-2009 5:54am: got page from clascron@clon10: 'Process monitor: missing process: alarm_server - Online problem: alarm_server did not respond to status pool request'. Found later that stadis was not updating etc. It appeares that smartsockets server was hund somehow on clondb1, for example command 'ipc_info -a clasprod' from clon10 or clon00 did not worked, messages were:

clon00:clasrun> ipc_info -a clasprod
08:08:31: TAL-SS-00088-I Connecting to project <clasprod> on <local> RTserver
08:08:31: TAL-SS-00089-I Using local protocol
08:08:31: TAL-SS-00090-I Could not connect to <local> RTserver
08:08:31: TAL-SS-00093-I Skipping starting <start_never:local> RTserver
08:08:31: TAL-SS-00088-I Connecting to project <clasprod> on <clondb1> RTserver
08:08:31: TAL-SS-00089-I Using local protocol
08:08:31: TAL-SS-00090-I Could not connect to <clondb1> RTserver
08:08:31: TAL-SS-00088-I Connecting to project <clasprod> on <clondb1> RTserver
08:08:31: TAL-SS-00089-I Using tcp protocol
^C

and it waits here. After restarting server on clondb1 by commands

/etc/init.d/smartsockets stop
/etc/init.d/smartsockets start

everything came back to normal. We should think about detecting that problem and restarting server automatically, also it happens very rearly, few time a year. Probably it can be done by monitoring cpu usage: when server was hung it was using 100% of cpu:

top - 08:06:16 up 130 days,  8:59,  1 user,  load average: 2.13, 2.03, 1.64
Tasks:  92 total,   2 running,  90 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.0% us,  0.1% sy,  0.0% ni, 73.0% id,  1.9% wa,  0.0% hi,  0.1% si
Mem:   3995356k total,  3968880k used,    26476k free,    88168k buffers
Swap:  8385920k total,      208k used,  8385712k free,  3432056k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                       
 4471 root      25   0  5908 4540 1956 R  100  0.1 300:24.06 rtserver.x                                                                                                     
    1 root      16   0  4756  556  460 S    0  0.0   0:11.25 init                                                                                                           
    2 root      RT   0     0    0    0 S    0  0.0   0:03.85 migration/0                                                                                                    
    3 root      34  19     0    0    0 S    0  0.0   0:10.23 ksoftirqd/0                                                                                                    
    4 root      RT   0     0    0    0 S    0  0.0   0:01.79 migration/1                                                                                                    
    5 root      34  19     0    0    0 S    0  0.0   1:03.72 ksoftirqd/1                                                                                                    
    6 root      RT   0     0    0    0 S    0  0.0   0:01.62 migration/2

while normally it uses fractions of percent. It is also will be useful to monitor cpu usage in general, few days ago clasrun's sshd on clon10 was taking almost 50% of cpu.

Sergey B. 12-may-2009: looking for the reason of missing gate messages in all DC crates we found that front end busy cable from SF to FC was disconnected on the floor. For the future we should remember that disconnecting that cable may create missing gate messages. It is also important before every run to use 'dc_all' configuration and look at scope in the counting house to make sure front end busy signal is there.

Sergey B. 16-feb-2009: runcontrol pop-up message while pressing 'Configure':

rcConfigure::loadRcDbaseCbk: Loading database failed !!!

On second 'Configure' click it worked.

Sergey B. June-2008: it looks like when one SILO drive goes bad, something is happening in script's logic and it never tries to run 2 streams, only one ! One .temp link can be observed, aling with one presilo1 link which is NOT becoming another .temp. Need check !!!

Sergey B. 7-may-2008: EB1 crashed, last messages:

EB: WARNING - resyncronization in crate controller number  5 ...  fixed
.
EB: WARNING - resyncronization in crate controller number  6 ...  fixed
.
EB: WARNING - resyncronization in crate controller number  6 ...  fixed
.
ERROR: lfmt=0 bankid=0

Found that tage2 and tage3 must be rebooted

Sergey B. 5-may-2008: polar crate turned off; turn it back on, but notice that

one of str7201 scalers has all lights on; scaler was replaced in 2 days, everything works fine

Sergey B. 30-apr-2008: gam_server on clon04 takes 100% CPU

Found on the web that creating file '/etc/gamin/gaminrc' with following contents will help:

#configuration for gamin
# Can be used to override the default behaviour.
# notify filepath(s) : indicate to use kernel notification
# poll filepath(s)   : indicate to use polling instead
# fsset fsname method poll_limit : indicate what method of notification for the filesystem
#                                  kernel - use the kernel for notification
#                                  poll - use polling for notification
#                                  none - don't use any notification
#
#                                  the poll_limit is the number of seconds
#                                  that must pass before a resource is polled again.
#                                  It is optional, and if it is not present the previous
#                                  value will be used or the default.
fsset nfs poll 10
# use polling on nfs mounts and poll once every 10 seconds
# This will limit polling to every 10 seconds and seams to prevent it from running away

Created file, will see if it helped ..

Sergey B. 20-apr-2008: sc2pmc1 error message (during "end run' ???)

.................................................
write_thread: wait=    135 send=     28 microsec per event (nev=12351)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=4)
proc_thread: wait=    130 send=     31 microsec per event (nev=14709)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=10)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=15)
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print: 252148 20
write_thread: about to print: 252148 20
write_thread: about to print: 252148 20
write_thread: wait=    135 send=     31 microsec per event (nev=12607)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=19)
proc_thread: wait=    135 send=     31 microsec per event (nev=14739)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=19)
0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=14)
 ERROR: bufout overflow - skip the rest ...
        bufout=485322144 hit=485387636 endofbufout=485387680
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print
write_thread: about to print: 255564 20
write_thread: about to print: 255564 20
write_thread: about to print: 255564 20
write_thread: wait=    128 send=     42 microsec per event (nev=12778)
proc_thread: wait=    371 send=     31 microsec per event (nev=6291)
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000140 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x0150107F -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x015810DC -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x02B80FD6 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x02A010BE -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
..................................................

At the same time at sc2:

......................................
interrupt: timer: 33 microsec (min=6 max=401 rms**2=14)
interrupt: timer: 33 microsec (min=6 max=401 rms**2=4)
interrupt: timer: 33 microsec (min=6 max=401 rms**2=17)
interrupt: timer: 33 microsec (min=6 max=401 rms**2=1)
interrupt: timer: 33 microsec (min=6 max=401 rms**2=17)
interrupt: timer: 33 microsec (min=6 max=401 rms**2=0)
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
................................
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
logTask: 220 log messages lost.
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
logTask: 692 log messages lost.
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
logTask: 544 log messages lost.
..................................
interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -> use slotnums
logTask: 562 log messages lost.
interrupt: SYNC: ERROR: [ 0] slot=16 error_flag=1 - clear
logTask: 707 log messages lost.
interrupt: SYNC: ERROR: [ 3] slot=19 error_flag=1 - clear
logTask: 517 log messages lost.
interrupt: SYNC: ERROR: [ 4] slot=20 error_flag=1 - clear
logTask: 620 log messages lost.
interrupt: SYNC: ERROR: [ 5] slot=21 error_flag=1 - clear
logTask: 559 log messages lost.
interrupt: SYNC: scan_flag=0x00390000
logTask: 512 log messages lost.
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
logTask: 671 log messages lost.
......................................
logTask: 563 log messages lost.
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
logTask: 385 log messages lost.
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
wait: coda request in progress
codaExecute reached, message >end<, len=3
codaExecute: 'end' transition
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
.......................................
wait: coda request in progress
wait: coda request in progress
interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -> use slotnums
interrupt: TRIGGER ERROR: no pool buffer available
wait: coda request in progress
wait: coda request in progress
.....................................

Note: sc2 never warn about wrong slot number before that moment.

FIX: it seems there is a bug in 1190/1290-related rols and library: NBOARDS set to 21 and several arrays allocated with that length, but actual board maximum number is 21; NBOARDS was set to 22 everywhere, will test next time

Sergey B. 31-mar-2008: CC scans CODA !!! messages from EB:

wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=59063
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=60187
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=60295
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=60418
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=60823
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=32961
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=33122
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=33795
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=34251
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=34529
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=35811
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=36134
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=41569
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=53913
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=57701
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=33357
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=36136
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=39509
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=39900
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=45174
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=45909
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=34521
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=49023
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=49526
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=53555
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=55694
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=56833
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=58251
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=59023
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=33687
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=35731
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=51581
wait: coda request in progress
wait: coda request in progress
Error(old rc): nRead=18, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=41371
wait: coda request in progress
Error(old rc): nRead=6, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=43198
wait: coda request in progress
Error(old rc): nRead=10, must be 1032
CODAtcpServer: start work thread
befor: socket=5 address>129.57.71.38< port=55260
wait: coda request in progress
Error(old rc): nRead=0, must be 1032
clasprod::EB1> ^C

Similar messages were coming from ER.

ET reported following:

TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure

and another ET:

.......
et_start: pthread_create(0x0000000d,...) done
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
TCP server got a connection so spawn thread
et ERROR: et_client_thread: read failure
et INFO: et_sys_heartmonitor, kill bad process (2,3063)
et INFO: et_sys_heartmonitor, cleanup process 2
et INFO: set_fix_nprocs, change # of ET processes from 2 to 2
et INFO: set_fix_natts, station GRAND_CENTRAL has 0 attachments
et INFO: set_fix_natts, station LEVEL3 has 1 attachments
et INFO: set_fix_natts, station ET2ER has 1 attachments
et INFO: set_fix_natts, # total attachments 2 -> 2
et INFO: set_fix_natts, proc 0 has 1 attachments
et INFO: set_fix_natts, proc 1 has 1 attachments

Also noticed that CODAs (production and test setup) do not like each other: reported following during 'configure':

 Query test_ts2 table failed: Error reading table'test_ts2' definition
        ec3                                        ec3

If one runcontrol is restarted, then it works, but another one complains !!!

Sergey B. 31-mar-2008: 'run_log_comment.tcl' and 'rlComment' cannot recognize xml comments when extraction level1 trigger file name from <l1trig> tag, it seems using first line after <l1trig> tag without looking into "" comment sign.

Sergey B. 10-mar-2008: seems found error in run control: Xui/src.s/rcMenuWindow.cc parameters

XmNpaneMinimum and XmNpaneMaximum were both set to 480, as result run control gui area above log messages window was not big enough; set to 100 and 900 respectively, will ask Jie

23-jan-2008: ER3 crashed several times during last 3 weeks, mostly (only ?) during 'End' transition; todays core file:

(dbx) where
  [1] 0xfe9e4c20(0x8068f9d), at 0xfe9e4c20 
  [2] codaExecute(0xce4fdbc0, 0xce4fdbc0, 0x1, 0x8068c39), at 0x8068f9d 
  [3] CODAtcpServerWorkTask(0x811fa00, 0x0, 0x0, 0xce4fdff8, 0xfea60020, 0xfe591400), at 0x8068d3a 
  [4] 0xfea5fd36(0xfe591400, 0x0, 0x0, ), at 0xfea5fd36 
  [5] 0xfea60020(), at 0xfea60020 
(dbx)

14-nov-2007: first week of running G9A: crashes observed in ec1 (twice), tage3, scaler1, clastrig2, EB; no feather details were obtained so far

Sergey B. 3-nov-2007: after about 26Mevents during the run sc2pmc1 started to print following:

ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000080 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x03C80ADD -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x8A90097F -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -> resyncronize !!!
ROC # 22 Event # 0 :  Bad Block Read signature 0x00680BD9 -> resyncronize !!!

end run failed. Reboot sc2. During end transition ec2 froze with message:

interrupt: timer: 32 microsec (min=19 max=86 rms**2=18)
0x1a05fdf0 (twork0005): sfiUserEnd: INFO: Last Event 26723663, status=0 (0x1ca648c8 0x1ca648c0)
0x1a05fdf0 (twork0005): data: 0x00000003 0x0007014f 0x00120000 0x00000000 0xc8009181 0xc0001181
0x1a05fdf0 (twork0005): jw1 : 0x00000000 0x0197c54f 0x00000003 0x0007014f 0x00120000 0x00000000
0x1a05fdf0 (twork0005): Last DMA status = 0x200000b count=11 blen=11
0x1a05fdf0 (twork0006): sfiUserEnd: ERROR: Last Transfer Event NUMBER 26723663, status = 0x1a000 (0x90001181 0x88001181 0x80009181  0x78001181)
0x1a05fdf0 (twork0006): SFI_SEQ_ERR: Sequencer not Enabled

Reboot ec2. Started new run 55463, everything looks normal.

Sergey B. 2-nov-2007: during the run ec2 started to print on tsconnect screen:

Unknown error errno=65
Unknown error errno=65
Unknown error errno=65
Unknown error errno=65
Unknown error errno=65

data taking continues, but runcontrol printed message:

WARN   : ec2 has not reported status for 1516 seconds
ERROR  : ec2 is in state disconnected should be active

ec2pmc1 looked fine; end run failed, need to reboot ec2

@@ Line 1: / Line 1: @@
+* Sergey B. 14-apr-2010: message from runcontrol during cancel-reset:
+ Get all components failed: Lost connection to MySQL server during query
 * Sergey B. 6-apr-2010: EB waiting for following ROCs:

ERROR Book: Difference between revisions

Revision as of 11:29, 14 April 2010

Navigation menu

Search