<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://clonwiki0.jlab.org/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=129.57.167.109</id>
	<title>CLONWiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://clonwiki0.jlab.org/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=129.57.167.109"/>
	<link rel="alternate" type="text/html" href="https://clonwiki0.jlab.org/wiki/index.php?title=Special:Contributions/129.57.167.109"/>
	<updated>2026-05-07T14:12:41Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.42.1</generator>
	<entry>
		<id>https://clonwiki0.jlab.org/wiki/index.php?title=ERROR_Book&amp;diff=4724</id>
		<title>ERROR Book</title>
		<link rel="alternate" type="text/html" href="https://clonwiki0.jlab.org/wiki/index.php?title=ERROR_Book&amp;diff=4724"/>
		<updated>2010-04-02T20:00:18Z</updated>

		<summary type="html">&lt;p&gt;129.57.167.109: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* Sergey B. 2-apr-2010: event recorder gives error during Prestart:&lt;br /&gt;
 &lt;br /&gt;
 file[ 0]-&amp;gt;/raid/stage_in/clas_062414.A00&amp;lt;-&lt;br /&gt;
 bosopen.c: splitted files handling ..&lt;br /&gt;
 bosopen.c: &amp;gt;/raid/stage_in/clas_062414.A00&amp;lt;&lt;br /&gt;
 bosopen.c: len1=30&lt;br /&gt;
 bosopen.c: len2=30&lt;br /&gt;
 bosopen.c: len3=14&lt;br /&gt;
 bosopen.c: &amp;gt;/raid/stage_in/clas_062414.A00&amp;lt; len=14&lt;br /&gt;
 bosopen.c: checking how much disk space left in directory /raid/stage_in&lt;br /&gt;
 bosopen.c: total 1403778969 blocks, free 1100988411 blocks, available &lt;br /&gt;
 1100988411 blocks (1 block = 512 bytes)&lt;br /&gt;
 bosopen.c: so we have 550493696 kbytes available space&lt;br /&gt;
 bosopen.c: we can continue to write file.&lt;br /&gt;
 bosopen.c: check for partition swaping: file-&amp;gt;/raid/stage_in/clas_062414.A00&amp;lt;-&lt;br /&gt;
 bosopen.c: system call: &amp;gt;checkdisk  4094&amp;lt;&lt;br /&gt;
 [1] 9253&lt;br /&gt;
 bosopen.c: system call completed&lt;br /&gt;
 codaUpdateStatus: dbConnecting ..&lt;br /&gt;
 ER: Not attached to ET system&lt;br /&gt;
 er_write_thread loop ended (0 145452664)&lt;br /&gt;
 codaUpdateStatus: dbConnect done&lt;br /&gt;
 codaUpdateStatus: &amp;gt;UPDATE process SET state=&#039;paused&#039; WHERE name=&#039;ER3&#039;&amp;lt;&lt;br /&gt;
 codaUpdateStatus: dbDisconnecting ..&lt;br /&gt;
 codaUpdateStatus: dbDisconnect done&lt;br /&gt;
 codaUpdateStatus: updating request ..&lt;br /&gt;
 UDP_standard_request &amp;gt;sta:ER3 paused&amp;lt;&lt;br /&gt;
 UDP_standard_request &amp;gt;sta:ER3 paused&amp;lt;&lt;br /&gt;
&lt;br /&gt;
Station TAPE was idle, but et system seems operational; end run (successful), restart coda_er, start from Download - everything works fine.&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 18-mar-2010: previous run ended fine, click &#039;prestart&#039; and got from EB:&lt;br /&gt;
 [0] 0x76fefffe != 0x40acf17e, waiting for the following ROC IDs:&lt;br /&gt;
 7  9 10 11 17 20 22 25 26 28 29&lt;br /&gt;
Similar problem was observed before. Kill rcServer, next run started fine, ROCs were NOT rebooted. CC scans again ???&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 14-mar-2010: old problem, can be posted already: if EB was not restarted afre the ROC crash, it sometimes waiting for  several ROcs in the end of prestart, however runcontrol allows &#039;Go&#039; button become active; something wrong with the logic, must check; will be good of course to avoid that situation by making EB reset itself properly on &#039;reset&#039; transition after the ROC crash&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 14-mar-2010: during reboot dc9 gives following:&lt;br /&gt;
 .................&lt;br /&gt;
 ppc/bootscripts/boot_dc9&lt;br /&gt;
 taskSpawn(&amp;quot;TCPSERVER&amp;quot;) returns 475458320&lt;br /&gt;
 bind on port 5001&lt;br /&gt;
 myname &amp;gt;dc9&amp;lt;&lt;br /&gt;
 -&amp;gt; Query &amp;gt;UPDATE Ports SET Host=&#039;dc9&#039;,tcpClient_tcp=5001 WHERE Name=&#039;dc9&#039;&amp;lt; succeeded&lt;br /&gt;
 INFO(mysql_real_connect9): errno=0&lt;br /&gt;
 INFO(mysql_real_connect9): OK&lt;br /&gt;
 mysql_real_connect9: error message: 2013/HY000 (Lost connection to MySQL server during query)&lt;br /&gt;
 dbConnect ERROR: mysql == NULL&lt;br /&gt;
 333333333333333&lt;br /&gt;
 program&lt;br /&gt;
 Exception current instruction address: 0x1c5&lt;br /&gt;
 e8734&lt;br /&gt;
 Machine Status Register: 0x00081000&lt;br /&gt;
 Condition Register: 0x40000085&lt;br /&gt;
 Task: 0x1c5e8e50 &amp;quot;ROC&amp;quot;&lt;br /&gt;
 interrupt: Unconnected main interrupt 0&lt;br /&gt;
 logTask: 3222 log messages lost.&lt;br /&gt;
 interrupt: Unconnected main interrupt 1&lt;br /&gt;
 interrupt: Unconnected main interrupt 0&lt;br /&gt;
 interrupt: Unconnected main interrupt 1&lt;br /&gt;
 interrupt: Unconnected main interrupt 0&lt;br /&gt;
 interrupt: Unconnected main interrupt 1&lt;br /&gt;
 interrupt: Unconnected main interrupt 0&lt;br /&gt;
 interrupt: Unconnected main interrupt 1&lt;br /&gt;
 interrupt: Unconnected main interrupt 0&lt;br /&gt;
 ..............&lt;br /&gt;
&lt;br /&gt;
probably mysql connection problem&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;RESOLVING&#039;&#039;&#039;: will make 5 attempts to to connect waiting 3 seconds in between, then will give up. &lt;br /&gt;
&lt;br /&gt;
* Sergey B. 14-mar-2010: if in prestart stage hit abort and reset, all ROCs printing &lt;br /&gt;
 wait: coda request &amp;gt;exit&amp;lt; in progress&lt;br /&gt;
indefinitely, must check&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 14-mar-2010: coda_eb x-term hung completely, kill coda_eb but yellow window still stuck; last messages were:&lt;br /&gt;
&lt;br /&gt;
 codaEnd 10&lt;br /&gt;
 codaEnd 11&lt;br /&gt;
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;br /&gt;
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;br /&gt;
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;br /&gt;
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;br /&gt;
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;br /&gt;
 codaExecute done&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.28&amp;lt; port=59319&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 Executing &amp;gt;&amp;lt; (len=0)&lt;br /&gt;
 codaExecute: ERROR: len=0 - do nothing&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.28&amp;lt; port=59342&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 Executing &amp;gt;TP/1.0&lt;br /&gt;
&lt;br /&gt;
Probably last string contains something, will remove printing&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 13-mar-2010: sometimes after bad crash all rocs (even scalers) must be rebooted, otherwise EB is crashing with messages:&lt;br /&gt;
 .....&lt;br /&gt;
 case 0: no swap&lt;br /&gt;
 case 0: no swap&lt;br /&gt;
 case 0: no swap&lt;br /&gt;
 case 0: no swap&lt;br /&gt;
 case 0: no swap&lt;br /&gt;
 case 0: no swap&lt;br /&gt;
 case 0: no swap&lt;br /&gt;
 case 0: no swap&lt;br /&gt;
 case 0: no swap&lt;br /&gt;
 case 0: no swap&lt;br /&gt;
  .. done.&lt;br /&gt;
 [0] FATAL: Event (Num 1 type 1) NUMBER mismatch -- roc[11] (rocid 30) &lt;br /&gt;
 sent -1 (type 18)&lt;br /&gt;
 [0] ERROR: Discard data until next control event&lt;br /&gt;
 clondaq1:coda_eb&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After that problem stays, restarted every daq component (EB, ER, ETs, runcontrol, rcServer)  - does not helped. Did daq_exit, daq_start,&lt;br /&gt;
reboot all rocs again - not it works.&lt;br /&gt;
&lt;br /&gt;
In the same time ROCs were printing messages &#039;no data&#039;, may be it was the source of EB troubles ...&lt;br /&gt;
&lt;br /&gt;
* Sergey B.12-mar-2010 around 10:50am: something is breaking coda_eb/coda_er:&lt;br /&gt;
&lt;br /&gt;
 .....&lt;br /&gt;
 UDP_standard_request &amp;gt;sta:EB1 booted&amp;lt;&lt;br /&gt;
 UDP_standard_request &amp;gt;sta:EB1 booted&amp;lt;&lt;br /&gt;
 UDP_standard_request &amp;gt;sta:EB1 booted&amp;lt;&lt;br /&gt;
 UDP_standard_request &amp;gt;sta:EB1 booted&amp;lt;&lt;br /&gt;
 UDP_cancel: cancel &amp;gt;sta:EB1 booted&amp;lt;&lt;br /&gt;
 codaUpdateStatus: updating request done&lt;br /&gt;
 CODA_Init 14&lt;br /&gt;
 2&lt;br /&gt;
 clasprod::EB1&amp;gt; bind on port 5001&lt;br /&gt;
 DB update: &amp;gt;UPDATE process SET inuse=&#039;5001&#039; WHERE name=&#039;EB1&#039;&amp;lt;&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.28&amp;lt; port=37258&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 wait: coda request &amp;gt;&amp;lt; in progress&lt;br /&gt;
 Executing &amp;gt;&amp;lt; (len=0)&lt;br /&gt;
 Segmentation fault (core dumped)&lt;br /&gt;
 clondaq1:coda_eb&amp;gt; &lt;br /&gt;
&lt;br /&gt;
In the same time &#039;&#039;et_start&#039;&#039; complains (we have now ET system protected against CC scans):&lt;br /&gt;
&lt;br /&gt;
 et ERROR: et_netserver: ET server being probed by non-ET client or read failure&lt;br /&gt;
 et ERROR: et_netserver: ET server being probed by non-ET client or read failure&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;FIXED&#039;&#039;&#039; by checking for the len=0 in the message string, all other possible checks were implemented already&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 12-mar-2010: global reboot gives almost all PMCs (except first two) failing on MySQL access:&lt;br /&gt;
&lt;br /&gt;
 Done executing startup script $CODA/VXWORKS_&lt;br /&gt;
 ppc/bootscripts/boot_pmc1&lt;br /&gt;
 -&amp;gt; INFO(mysql_real_connect9): errno=0&lt;br /&gt;
 INFO(mysql_real_connect9): OK&lt;br /&gt;
 mysql_real_connect9: error message: 2013/HY000 (Lost connection to MySQL server during query)&lt;br /&gt;
 dbConnect ERROR: mysql == NULL&lt;br /&gt;
 program&lt;br /&gt;
 Exception current instruction address: 0x00000000&lt;br /&gt;
 Machine Status Register: 0x0008b030&lt;br /&gt;
 Condition Register: 0x40000085&lt;br /&gt;
 Task: 0x1e404190 &amp;quot;coda_pmc&amp;quot;&lt;br /&gt;
&lt;br /&gt;
It should be mentioned that we are doing something right now; day before it was no activity and all rebooted fine.&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 1-sep-2009: during croctest1 reboot:&lt;br /&gt;
&lt;br /&gt;
 INFO(mysql_real_connect9): errno=0&lt;br /&gt;
 INFO(mysql_real_connect9): OK&lt;br /&gt;
 mysql_real_connect9: error message: 2013/HY000 (Lost connection to MySQL server during query)&lt;br /&gt;
 dbConnect ERROR: mysql == NULL&lt;br /&gt;
&lt;br /&gt;
second reboot went fine&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 15-june-2009: just after run was ended noticed that all runs starting from 60073 has &#039;No configuration!&#039; instead of &#039;PROD&#039; etc. Correcting manually. Must check if other information is correct, and understand the reason if possible .. Is it related to the database update procedure inplemented around that time ?&lt;br /&gt;
It appeares that begin_clasprod_xxxxxx.txt log files has that wrong starting from run 60073.&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 11-June-2009: swaped ec4 and sc2 and during sc2 boot see following from host:&lt;br /&gt;
&lt;br /&gt;
 ..........................&lt;br /&gt;
 taskDelay(sysClkRateGet()*5)&lt;br /&gt;
  Args = -session clasprod -objects sc2 ROC -i&lt;br /&gt;
 CODA_Init reached&lt;br /&gt;
 CODA_Init 1&lt;br /&gt;
 CODA_Init 2&lt;br /&gt;
 sc2&lt;br /&gt;
 CODA_Init 11&lt;br /&gt;
 CODA_Init: objectTy &amp;gt;(null)&amp;lt;&lt;br /&gt;
 CODA_Init 12&lt;br /&gt;
 CODA_Init: use &#039;SESSION&#039; as &amp;gt;clasprod&amp;lt;&lt;br /&gt;
 Tcl_AppInit CALLS !!!!!&lt;br /&gt;
 11-11&lt;br /&gt;
 11-12&lt;br /&gt;
 11-13&lt;br /&gt;
 11111111111111111&lt;br /&gt;
 22222222222222222&lt;br /&gt;
 0xdd0e880 (tNetTask): arptnew failed on 8139a743&lt;br /&gt;
 value = 0 = 0x0&lt;br /&gt;
 taskSpawn &amp;quot;TCP_SERVER&amp;quot;,250,0,100000,tcpServer&lt;br /&gt;
 value = 231091088 = 0xdc62b90&lt;br /&gt;
 proconhost&lt;br /&gt;
 value = 0 = 0x0&lt;br /&gt;
 ......................&lt;br /&gt;
&lt;br /&gt;
Second reboot shows the same:&lt;br /&gt;
&lt;br /&gt;
 ...................&lt;br /&gt;
 22222222222222222&lt;br /&gt;
 0xdd0e880 (tNetTask): arptnew failed on 8139a743&lt;br /&gt;
 &amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;clon10 clasrun 2508 9998&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt; 24&lt;br /&gt;
 list &amp;gt;clon10 clasrun 2508 9998&amp;lt;&lt;br /&gt;
 machine name = clon10&lt;br /&gt;
 user ID      = 2508&lt;br /&gt;
 group ID     = 9998&lt;br /&gt;
 .................&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Sergey B. around 6-June-2009: ec2pmc1 error message:&lt;br /&gt;
&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print: 255115 20&lt;br /&gt;
 write_thread: about to print: 255115 20&lt;br /&gt;
 write_thread: about to print: 255115 20&lt;br /&gt;
 write_thread: wait=    199 send=      8 microsec per event (nev=12755)&lt;br /&gt;
 CALLING customized :&lt;br /&gt;
 machine check &lt;br /&gt;
 SRAM ERROR :&lt;br /&gt;
 SRAM Error Address    : 0x00000000.f2000504&lt;br /&gt;
 SRAM Error Data Low   : 0x006805e0&lt;br /&gt;
 SRAM Error Data High  : 0x00420000&lt;br /&gt;
 SRAM Error Parity     : 0x00000004&lt;br /&gt;
 SRAM Error Cause      : 0x00000001&lt;br /&gt;
 PCI_0 DEVICE ERROR :&lt;br /&gt;
 PCI_0 Error: Status=0x00000100&lt;br /&gt;
 Master abort&lt;br /&gt;
 PCI Cmd=7 (MemWr)  ByteEnable=0  Par=0&lt;br /&gt;
 Error Address High: 0x00000000&lt;br /&gt;
 Error Address Low : 0xf0d10128&lt;br /&gt;
 CALLING generic    :&lt;br /&gt;
 machine check&lt;br /&gt;
 Exception next instruction address: 0x0001f058&lt;br /&gt;
 Machine Status Register: 0x0012b030&lt;br /&gt;
 Condition Register: 0x40222042&lt;br /&gt;
 Task: 0x1e3b0020 &amp;quot;coda_net&amp;quot;&lt;br /&gt;
&lt;br /&gt;
reboot did not helped, have to turn pmc off and run on host only&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 4-June-2009: when rebooting all rocs, some cannot connect to DB, as result have following:&lt;br /&gt;
&lt;br /&gt;
 0x1e42a360 (coda_pmc): bb_new: &#039;big&#039; buffer &lt;br /&gt;
 created (addr=0x1eba3590, 16 bufs, 3145728 size)&lt;br /&gt;
 coda_pmc: big buffer1 allocated&lt;br /&gt;
 myname &amp;gt;ec1pmc1&amp;lt;&lt;br /&gt;
 Done executing startup script $CODA/VXWORKS_&lt;br /&gt;
 ppc/bootscripts/boot_pmc1&lt;br /&gt;
 -&amp;gt; INFO(mysql_real_connect9): errno=0&lt;br /&gt;
 INFO(mysql_real_connect9): OK&lt;br /&gt;
 program&lt;br /&gt;
 Exception current instruction address: 0x00000000&lt;br /&gt;
 Machine Status Register: 0x0008b030&lt;br /&gt;
 Condition Register: 0x40000085&lt;br /&gt;
 Task: 0x1e42a360 &amp;quot;coda_pmc&amp;quot;&lt;br /&gt;
 -&amp;gt; tt&lt;br /&gt;
 14cad8 vxTaskEntry    +68 : coda_pmc ()&lt;br /&gt;
 1e2275c0 coda_pmc       +144: mysql_query ()&lt;br /&gt;
 1e258520 mysql_query    +54 : mysql_real_query ()&lt;br /&gt;
 1e249850 mysql_real_query+110: mysql_send_query ()&lt;br /&gt;
 1e249718 mysql_send_query+1d0: 0 ()&lt;br /&gt;
 value = 0 = 0x0&lt;br /&gt;
 -&amp;gt; &lt;br /&gt;
&lt;br /&gt;
for some reason it efefcts mostly PMCs. First 10 or so PMCs booted fine, and then few shows that error, and few following booted fine again, and so on. It looks like DB does not respond right the way, and PMC exits on timeout and do not try again. Should fix that place.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 15-may-2009: last night following error messages from sc2 were reportedly associated with 2 crashes (I was not on shift):&lt;br /&gt;
&lt;br /&gt;
 write_thread: about to print: 394603 20&lt;br /&gt;
 write_thread: about to print: 394603 20&lt;br /&gt;
 write_thread: wait=    176 send=      6 microsec per event (nev=19730)&lt;br /&gt;
 0x1b1c7e00 (coda_proc): timer: 9 microsec (min=3 max=909 rms**2=11)&lt;br /&gt;
 proc_thread: wait=    176 send=     12 microsec per event (nev=14534)&lt;br /&gt;
 0x1b1c7e00 (coda_proc): timer: 9 microsec (min=3 max=909 rms**2=19)&lt;br /&gt;
 0x1b1c7e00 (coda_proc): timer: 9 microsec (min=3 max=909 rms**2=17)&lt;br /&gt;
 bosMgetid: ERROR: lenn=0&lt;br /&gt;
 ???: 0x52432d32 0x38363333 0x31313534 0x00696d3d 0x25640a00&lt;br /&gt;
 bosMgetid: ERROR: name &amp;gt;RC-2&amp;lt; does not described, update clonbanks.ddl file !!!&lt;br /&gt;
 bosMgetid: lenn=4, len=4, nddl1=116&lt;br /&gt;
 bosMgetid: name=&amp;gt;RC-286331154&amp;lt;&lt;br /&gt;
 try again ..&lt;br /&gt;
 bosMgetid: ERROR: lenn=0&lt;br /&gt;
 !!!: 0x52432d32 0x38363333 0x31313534 0x00696d3d 0x25640a00&lt;br /&gt;
 bosMgetid: lenn=4, len=4&lt;br /&gt;
 bosMgetid: ERROR: name &amp;gt;RC-2&amp;lt; does not descr&lt;br /&gt;
 ibed, update clonbanks.ddl file !!!&lt;br /&gt;
 bosMgetid: lenn=4, len=4, nddl1=116&lt;br /&gt;
 bosMgetid: name=&amp;gt;RC-286331154&amp;lt;&lt;br /&gt;
 no way !!&lt;br /&gt;
 bosMlink: ERROR: bosMgetid returns -99&lt;br /&gt;
&lt;br /&gt;
 interrupt: timer: 48 microsec (min=9 max=195 rms**2=9)&lt;br /&gt;
 interrupt: timer: 45 microsec (min=9 max=195 rms**2=18)&lt;br /&gt;
 interrupt: timer: 45 microsec (min=9 max=195 rms**2=10)&lt;br /&gt;
 interrupt: timer: 48 microsec (min=9 max=195 rms**2=14)&lt;br /&gt;
 setHeartError: 0 &amp;gt;sys 0, mask 14&amp;lt;&lt;br /&gt;
 WARN: HeartBeat[0]: heartbeat=6658(6658) heartmask=14&lt;br /&gt;
 UDP_cancel: cancel &amp;gt;inf:sc2 sys 0, mask 14&amp;lt;&lt;br /&gt;
 UDP_cancel: cancel &amp;gt;inf:sc2 sys 0, mask 14&amp;lt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 14-May-2009 5:54am: got page from clascron@clon10: &#039;Process monitor: missing process: alarm_server - Online problem: alarm_server did not respond to status pool request&#039;. Found later that stadis was not updating etc. It appeares that smartsockets server was hund somehow on clondb1, for example command &#039;ipc_info -a clasprod&#039; from clon10 or clon00 did not worked, messages were:&lt;br /&gt;
&lt;br /&gt;
 clon00:clasrun&amp;gt; ipc_info -a clasprod&lt;br /&gt;
 08:08:31: TAL-SS-00088-I Connecting to project &amp;lt;clasprod&amp;gt; on &amp;lt;local&amp;gt; RTserver&lt;br /&gt;
 08:08:31: TAL-SS-00089-I Using local protocol&lt;br /&gt;
 08:08:31: TAL-SS-00090-I Could not connect to &amp;lt;local&amp;gt; RTserver&lt;br /&gt;
 08:08:31: TAL-SS-00093-I Skipping starting &amp;lt;start_never:local&amp;gt; RTserver&lt;br /&gt;
 08:08:31: TAL-SS-00088-I Connecting to project &amp;lt;clasprod&amp;gt; on &amp;lt;clondb1&amp;gt; RTserver&lt;br /&gt;
 08:08:31: TAL-SS-00089-I Using local protocol&lt;br /&gt;
 08:08:31: TAL-SS-00090-I Could not connect to &amp;lt;clondb1&amp;gt; RTserver&lt;br /&gt;
 08:08:31: TAL-SS-00088-I Connecting to project &amp;lt;clasprod&amp;gt; on &amp;lt;clondb1&amp;gt; RTserver&lt;br /&gt;
 08:08:31: TAL-SS-00089-I Using tcp protocol&lt;br /&gt;
 ^C&lt;br /&gt;
&lt;br /&gt;
and it waits here. After restarting server on clondb1 by commands&lt;br /&gt;
&lt;br /&gt;
 /etc/init.d/smartsockets stop&lt;br /&gt;
 /etc/init.d/smartsockets start&lt;br /&gt;
&lt;br /&gt;
everything came back to normal. We should think about detecting that problem and restarting server automatically, also it happens very rearly, few time a year. Probably it can be done by monitoring cpu usage: when server was hung it was using 100% of cpu:&lt;br /&gt;
&lt;br /&gt;
 top - 08:06:16 up 130 days,  8:59,  1 user,  load average: 2.13, 2.03, 1.64&lt;br /&gt;
 Tasks:  92 total,   2 running,  90 sleeping,   0 stopped,   0 zombie&lt;br /&gt;
 Cpu(s): 25.0% us,  0.1% sy,  0.0% ni, 73.0% id,  1.9% wa,  0.0% hi,  0.1% si&lt;br /&gt;
 Mem:   3995356k total,  3968880k used,    26476k free,    88168k buffers&lt;br /&gt;
 Swap:  8385920k total,      208k used,  8385712k free,  3432056k cached&lt;br /&gt;
   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                       &lt;br /&gt;
  4471 root      25   0  5908 4540 1956 R  100  0.1 300:24.06 rtserver.x                                                                                                     &lt;br /&gt;
     1 root      16   0  4756  556  460 S    0  0.0   0:11.25 init                                                                                                           &lt;br /&gt;
     2 root      RT   0     0    0    0 S    0  0.0   0:03.85 migration/0                                                                                                    &lt;br /&gt;
     3 root      34  19     0    0    0 S    0  0.0   0:10.23 ksoftirqd/0                                                                                                    &lt;br /&gt;
     4 root      RT   0     0    0    0 S    0  0.0   0:01.79 migration/1                                                                                                    &lt;br /&gt;
     5 root      34  19     0    0    0 S    0  0.0   1:03.72 ksoftirqd/1                                                                                                    &lt;br /&gt;
     6 root      RT   0     0    0    0 S    0  0.0   0:01.62 migration/2&lt;br /&gt;
&lt;br /&gt;
while normally it uses fractions of percent. It is also will be useful to monitor cpu usage in general, few days ago clasrun&#039;s sshd on clon10 was taking almost 50% of cpu.&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
* Sergey B. 12-may-2009: looking for the reason of missing gate messages in all DC crates we found that front end busy cable from SF to FC was disconnected on the floor. For the future we should remember that disconnecting that cable may create missing gate messages. It is also important before every run to use &#039;dc_all&#039; configuration and look at scope in the counting house to make sure front end busy signal is there.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 16-feb-2009: runcontrol pop-up message while pressing &#039;Configure&#039;:&lt;br /&gt;
 rcConfigure::loadRcDbaseCbk: Loading database failed !!!&lt;br /&gt;
On second &#039;Configure&#039; click it worked.&lt;br /&gt;
&lt;br /&gt;
* Sergey B. June-2008: it looks like when one SILO drive goes bad, something is happening in script&#039;s logic and it never tries to run 2 streams, only one ! One .temp link can be observed, aling with one presilo1 link which is NOT becoming another .temp. Need check !!!&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 7-may-2008: EB1 crashed, last messages:&lt;br /&gt;
&lt;br /&gt;
 EB: WARNING - resyncronization in crate controller number  5 ...  fixed&lt;br /&gt;
 .&lt;br /&gt;
 EB: WARNING - resyncronization in crate controller number  6 ...  fixed&lt;br /&gt;
 .&lt;br /&gt;
 EB: WARNING - resyncronization in crate controller number  6 ...  fixed&lt;br /&gt;
 .&lt;br /&gt;
 ERROR: lfmt=0 bankid=0&lt;br /&gt;
&lt;br /&gt;
Found that tage2 and tage3 must be rebooted&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 5-may-2008: polar crate turned off; turn it back on, but notice that&lt;br /&gt;
one of str7201 scalers has all lights on; scaler was replaced in 2 days, everything&lt;br /&gt;
works fine&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 30-apr-2008: gam_server on clon04 takes 100% CPU&lt;br /&gt;
&lt;br /&gt;
Found on the web that creating file &#039;/etc/gamin/gaminrc&#039; with following contents will help:&lt;br /&gt;
&lt;br /&gt;
 #configuration for gamin&lt;br /&gt;
 # Can be used to override the default behaviour.&lt;br /&gt;
 # notify filepath(s) : indicate to use kernel notification&lt;br /&gt;
 # poll filepath(s)   : indicate to use polling instead&lt;br /&gt;
 # fsset fsname method poll_limit : indicate what method of notification for the filesystem&lt;br /&gt;
 #                                  kernel - use the kernel for notification&lt;br /&gt;
 #                                  poll - use polling for notification&lt;br /&gt;
 #                                  none - don&#039;t use any notification&lt;br /&gt;
 #&lt;br /&gt;
 #                                  the poll_limit is the number of seconds&lt;br /&gt;
 #                                  that must pass before a resource is polled again.&lt;br /&gt;
 #                                  It is optional, and if it is not present the previous&lt;br /&gt;
 #                                  value will be used or the default.&lt;br /&gt;
 fsset nfs poll 10&lt;br /&gt;
 # use polling on nfs mounts and poll once every 10 seconds&lt;br /&gt;
 # This will limit polling to every 10 seconds and seams to prevent it from running away&lt;br /&gt;
&lt;br /&gt;
Created file, will see if it helped ..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 20-apr-2008: sc2pmc1 error message (during &amp;quot;end run&#039; ???)&lt;br /&gt;
&lt;br /&gt;
 .................................................&lt;br /&gt;
 write_thread: wait=    135 send=     28 microsec per event (nev=12351)&lt;br /&gt;
 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=4)&lt;br /&gt;
 proc_thread: wait=    130 send=     31 microsec per event (nev=14709)&lt;br /&gt;
 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=10)&lt;br /&gt;
 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=15)&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print: 252148 20&lt;br /&gt;
 write_thread: about to print: 252148 20&lt;br /&gt;
 write_thread: about to print: 252148 20&lt;br /&gt;
 write_thread: wait=    135 send=     31 microsec per event (nev=12607)&lt;br /&gt;
 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=19)&lt;br /&gt;
 proc_thread: wait=    135 send=     31 microsec per event (nev=14739)&lt;br /&gt;
 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=19)&lt;br /&gt;
 0x19c18050 (coda_proc): timer: 23 microsec (min=8 max=2161 rms**2=14)&lt;br /&gt;
  ERROR: bufout overflow - skip the rest ...&lt;br /&gt;
         bufout=485322144 hit=485387636 endofbufout=485387680&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print&lt;br /&gt;
 write_thread: about to print: 255564 20&lt;br /&gt;
 write_thread: about to print: 255564 20&lt;br /&gt;
 write_thread: about to print: 255564 20&lt;br /&gt;
 write_thread: wait=    128 send=     42 microsec per event (nev=12778)&lt;br /&gt;
 proc_thread: wait=    371 send=     31 microsec per event (nev=6291)&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000140 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x0150107F -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x015810DC -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x02B80FD6 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x02A010BE -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -&amp;gt; resyncronize !!!&lt;br /&gt;
 .................................................. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
At the same time at sc2:&lt;br /&gt;
&lt;br /&gt;
 ......................................&lt;br /&gt;
 interrupt: timer: 33 microsec (min=6 max=401 rms**2=14)&lt;br /&gt;
 interrupt: timer: 33 microsec (min=6 max=401 rms**2=4)&lt;br /&gt;
 interrupt: timer: 33 microsec (min=6 max=401 rms**2=17)&lt;br /&gt;
 interrupt: timer: 33 microsec (min=6 max=401 rms**2=1)&lt;br /&gt;
 interrupt: timer: 33 microsec (min=6 max=401 rms**2=17)&lt;br /&gt;
 interrupt: timer: 33 microsec (min=6 max=401 rms**2=0)&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 ................................&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 logTask: 220 log messages lost.&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 logTask: 692 log messages lost.&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 logTask: 544 log messages lost.&lt;br /&gt;
 ..................................&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=1, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 logTask: 562 log messages lost.&lt;br /&gt;
 interrupt: SYNC: ERROR: [ 0] slot=16 error_flag=1 - clear&lt;br /&gt;
 logTask: 707 log messages lost.&lt;br /&gt;
 interrupt: SYNC: ERROR: [ 3] slot=19 error_flag=1 - clear&lt;br /&gt;
 logTask: 517 log messages lost.&lt;br /&gt;
 interrupt: SYNC: ERROR: [ 4] slot=20 error_flag=1 - clear&lt;br /&gt;
 logTask: 620 log messages lost.&lt;br /&gt;
 interrupt: SYNC: ERROR: [ 5] slot=21 error_flag=1 - clear&lt;br /&gt;
 logTask: 559 log messages lost.&lt;br /&gt;
 interrupt: SYNC: scan_flag=0x00390000&lt;br /&gt;
 logTask: 512 log messages lost.&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 logTask: 671 log messages lost.&lt;br /&gt;
 ......................................&lt;br /&gt;
 logTask: 563 log messages lost.&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 logTask: 385 log messages lost.&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 codaExecute reached, message &amp;gt;end&amp;lt;, len=3&lt;br /&gt;
 codaExecute: &#039;end&#039; transition&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 .......................................&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 interrupt: WARN: [ 0] slotnums=0, tdcslot=16 -&amp;gt; use slotnums&lt;br /&gt;
 interrupt: TRIGGER ERROR: no pool buffer available&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 .....................................&lt;br /&gt;
&lt;br /&gt;
Note: sc2 never warn about wrong slot number before that moment.&lt;br /&gt;
&lt;br /&gt;
FIX: it seems there is a bug in 1190/1290-related rols and library: NBOARDS set to 21 and several arrays allocated with that length, but actual board maximum number is 21; NBOARDS was set to 22 everywhere, will test next time&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 31-mar-2008: CC scans CODA !!! messages from EB:&lt;br /&gt;
&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=59063&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=60187&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=60295&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=60418&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=60823&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=32961&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=33122&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=33795&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=34251&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=34529&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=35811&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=36134&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=41569&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=53913&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=57701&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=33357&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=36136&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=39509&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=39900&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=45174&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=45909&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=34521&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=49023&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=49526&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=53555&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=55694&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=56833&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=58251&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=59023&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=33687&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=35731&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=51581&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=18, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=41371&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=6, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=43198&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=10, must be 1032&lt;br /&gt;
 CODAtcpServer: start work thread&lt;br /&gt;
 befor: socket=5 address&amp;gt;129.57.71.38&amp;lt; port=55260&lt;br /&gt;
 wait: coda request in progress&lt;br /&gt;
 Error(old rc): nRead=0, must be 1032&lt;br /&gt;
 clasprod::EB1&amp;gt; ^C&lt;br /&gt;
&lt;br /&gt;
Similar messages were coming from ER.&lt;br /&gt;
&lt;br /&gt;
ET reported following:&lt;br /&gt;
&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure  &lt;br /&gt;
&lt;br /&gt;
and another ET:&lt;br /&gt;
 .......&lt;br /&gt;
 et_start: pthread_create(0x0000000d,...) done&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 TCP server got a connection so spawn thread&lt;br /&gt;
 et ERROR: et_client_thread: read failure&lt;br /&gt;
 et INFO: et_sys_heartmonitor, kill bad process (2,3063)&lt;br /&gt;
 et INFO: et_sys_heartmonitor, cleanup process 2&lt;br /&gt;
 et INFO: set_fix_nprocs, change # of ET processes from 2 to 2&lt;br /&gt;
 et INFO: set_fix_natts, station GRAND_CENTRAL has 0 attachments&lt;br /&gt;
 et INFO: set_fix_natts, station LEVEL3 has 1 attachments&lt;br /&gt;
 et INFO: set_fix_natts, station ET2ER has 1 attachments&lt;br /&gt;
 et INFO: set_fix_natts, # total attachments 2 -&amp;gt; 2&lt;br /&gt;
 et INFO: set_fix_natts, proc 0 has 1 attachments&lt;br /&gt;
 et INFO: set_fix_natts, proc 1 has 1 attachments &lt;br /&gt;
&lt;br /&gt;
Also noticed that CODAs (production and test setup) do not like each other: reported following during &#039;configure&#039;:&lt;br /&gt;
&lt;br /&gt;
  Query test_ts2 table failed: Error reading table&#039;test_ts2&#039; definition&lt;br /&gt;
         ec3                                        ec3&lt;br /&gt;
&lt;br /&gt;
If one runcontrol is restarted, then it works, but another one complains !!!&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 31-mar-2008: &#039;run_log_comment.tcl&#039; and &#039;rlComment&#039; cannot recognize xml comments when extraction level1 trigger file name from &amp;lt;l1trig&amp;gt; tag, it seems using first line after &amp;lt;l1trig&amp;gt; tag without looking into &amp;quot;&amp;lt;!-- xxx --&amp;gt;&amp;quot; comment sign.&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 10-mar-2008: seems found error in run control: Xui/src.s/rcMenuWindow.cc parameters&lt;br /&gt;
XmNpaneMinimum and XmNpaneMaximum were both set to 480, as result run control gui area above log messages window&lt;br /&gt;
was not big enough; set to 100 and 900 respectively, will ask Jie&lt;br /&gt;
&lt;br /&gt;
* 23-jan-2008: ER3 crashed several times during last 3 weeks, mostly (only ?) during &#039;End&#039; transition; todays core file:&lt;br /&gt;
 (dbx) where&lt;br /&gt;
   [1] 0xfe9e4c20(0x8068f9d), at 0xfe9e4c20 &lt;br /&gt;
   [2] codaExecute(0xce4fdbc0, 0xce4fdbc0, 0x1, 0x8068c39), at 0x8068f9d &lt;br /&gt;
   [3] CODAtcpServerWorkTask(0x811fa00, 0x0, 0x0, 0xce4fdff8, 0xfea60020, 0xfe591400), at 0x8068d3a &lt;br /&gt;
   [4] 0xfea5fd36(0xfe591400, 0x0, 0x0, ), at 0xfea5fd36 &lt;br /&gt;
   [5] 0xfea60020(), at 0xfea60020 &lt;br /&gt;
 (dbx)&lt;br /&gt;
&lt;br /&gt;
* 14-nov-2007: first week of running G9A: crashes observed in ec1 (twice), tage3, scaler1, clastrig2, EB; no feather details were obtained so far&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 3-nov-2007: after about 26Mevents during the run sc2pmc1 started to print following:&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000040 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000080 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x03C80ADD -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x8A90097F -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00000060 -&amp;gt; resyncronize !!!&lt;br /&gt;
 ROC # 22 Event # 0 :  Bad Block Read signature 0x00680BD9 -&amp;gt; resyncronize !!!&lt;br /&gt;
end run failed. Reboot sc2.&lt;br /&gt;
During end transition ec2 froze with message:&lt;br /&gt;
 interrupt: timer: 32 microsec (min=19 max=86 rms**2=18)&lt;br /&gt;
 0x1a05fdf0 (twork0005): sfiUserEnd: INFO: Last Event 26723663, status=0 (0x1ca648c8 0x1ca648c0)&lt;br /&gt;
 0x1a05fdf0 (twork0005): data: 0x00000003 0x0007014f 0x00120000 0x00000000 0xc8009181 0xc0001181&lt;br /&gt;
 0x1a05fdf0 (twork0005): jw1 : 0x00000000 0x0197c54f 0x00000003 0x0007014f 0x00120000 0x00000000&lt;br /&gt;
 0x1a05fdf0 (twork0005): Last DMA status = 0x200000b count=11 blen=11&lt;br /&gt;
 0x1a05fdf0 (twork0006): sfiUserEnd: ERROR: Last Transfer Event NUMBER 26723663, status = 0x1a000 (0x90001181 0x88001181 0x80009181  0x78001181)&lt;br /&gt;
 0x1a05fdf0 (twork0006): SFI_SEQ_ERR: Sequencer not Enabled&lt;br /&gt;
Reboot ec2. Started new run 55463, everything looks normal.&lt;br /&gt;
&lt;br /&gt;
* Sergey B. 2-nov-2007: during the run ec2 started to print on tsconnect screen:&lt;br /&gt;
 Unknown error errno=65&lt;br /&gt;
 Unknown error errno=65&lt;br /&gt;
 Unknown error errno=65&lt;br /&gt;
 Unknown error errno=65&lt;br /&gt;
 Unknown error errno=65&lt;br /&gt;
data taking continues, but runcontrol printed message:&lt;br /&gt;
 WARN   : ec2 has not reported status for 1516 seconds&lt;br /&gt;
 ERROR  : ec2 is in state disconnected should be active&lt;br /&gt;
ec2pmc1 looked fine; end run failed, need to reboot ec2&lt;/div&gt;</summary>
		<author><name>129.57.167.109</name></author>
	</entry>
</feed>