# if ER holds, 'end' does not work properly # ER on clonxt2 holds, no messages, 99% CPU usage in 'top' - remove usleep(1000) # ec4 error messages interrupt: ERR: event# 355808, slot# 19, tdc# 02, error_flags=0x00004000, err=0x9030, lock=0x643580 0x8b1d80 (rols_loop): timer: 221 microsec (min=21 max=12428 rms**2=8) # rnc: 'differentiation failed' when switched to "PEDS_ALL" and clicked download opens 2 file chusers looking for ER1 - restart rcn it looks like everytime we change coda configuration, we have that problem; usually reset-reconfigure works # rcn: click end: "120 sec. waiting sc2 to postend" - it took 2 minutes to get it, although sc2 finished much faster # rcn: click end: "120 sec. waiting ec1 to postend" - it took 2 minutes to get it, although ec1 finished much faster # roc_status: .............................................. it waits forever for rebooting rocs * Well, I'm no Solaris maven, but I always thought "truss" was a * syscall viewer... If so, then the equivalent under Linux is "strace"... * (There's also "ltrace" for tracing library calls, rather than syscalls...) * But, it sounds like you want a packet-sniffer? If so, there are * several to choose from... One you probably already have * installed on your distro of choice is tcpdump (http://www.tcpdump.org/)... Depending on * the distro, you may also already have Ethereal (http://www.ethereal.com/) installed, which * is simply the finest sniffer on the planet, IMHO... ;-) trying to use 'strace' for coda_er - shows nothing 'ltrace' - coda_er crashed # DC11 missing gate interrupt: missing gate !!! interrupt: MISG: spds_mask1 = 0x000fffe0, datascan = 0x000fefe0, ii=30 interrupt: =2=> cleanup buffers: interrupt: slot 5 interrupt: slot 6 interrupt: slot 7 interrupt: slot 8 interrupt: slot 9 interrupt: slot 10 interrupt: slot 11 interrupt: slot 13 interrupt: slot 14 interrupt: slot 15 interrupt: slot 16 interrupt: slot 17 interrupt: slot 18 interrupt: slot 19 interrupt: SYNC:MisgMask = 0x00001000 interrupt: timer: 23 microsec (min=7 max=977 rms**2=8) * top(clonxt1) vs top(clonxt2) clonxt1================================================================== Tasks: 110 total, 1 running, 109 sleeping, 0 stopped, 0 zombie Cpu(s): 1.8% us, 1.3% sy, 0.0% ni, 96.8% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 7857744k total, 2085484k used, 5772260k free, 84704k buffers Swap: 8289532k total, 0k used, 8289532k free, 1658596k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5904 clasrun 16 0 2609m 326m 162m S 6.0 4.3 5:54.93 coda_eb 1491 clasrun 16 0 777m 174m 174m S 3.7 2.3 11:58.66 et_2_et 8043 root 16 0 214m 17m 6648 S 1.0 0.2 12:23.22 java 15432 clasrun 16 0 790m 181m 181m S 1.0 2.4 20:20.29 coda_l3 22205 clasrun 22 0 927m 765m 764m S 0.3 10.0 16:34.45 et_start 1 root 16 0 2192 556 476 S 0.0 0.0 0:00.94 init 2 root RT 0 0 0 0 S 0.0 0.0 0:00.98 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/0 4 root RT 0 0 0 0 S 0.0 0.0 0:00.98 migration/1 5 root 34 19 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/1 6 root RT 0 0 0 0 S 0.0 0.0 0:01.07 migration/2 7 root 34 19 0 0 0 S 0.0 0.0 0:00.09 ksoftirqd/2 8 root RT 0 0 0 0 S 0.0 0.0 0:01.29 migration/3 9 root 34 19 0 0 0 S 0.0 0.0 0:00.08 ksoftirqd/3 10 root 5 -10 0 0 0 S 0.0 0.0 0:00.01 events/0 clonxt1:clasrun> clonxt2================================================================== Tasks: 148 total, 1 running, 147 sleeping, 0 stopped, 0 zombie Cpu(s): 3.3% us, 17.8% sy, 0.0% ni, 74.2% id, 4.6% wa, 0.1% hi, 0.0% si Mem: 8055976k total, 8038876k used, 17100k free, 4236k buffers Swap: 8193140k total, 160k used, 8192980k free, 7835024k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16814 clasrun 16 0 927m 765m 764m S 57.8 9.7 178:50.22 et_start 23710 clasrun 16 0 828m 163m 162m S 55.5 2.1 43:51.42 coda_er 22673 root 15 0 1260m 54m 13m S 44.2 0.7 53:18.44 java 76 root 15 0 0 0 0 S 4.0 0.0 36:07.73 kswapd0 17343 clasrun 16 0 777m 158m 158m S 3.3 2.0 11:18.98 et_2_et 2165 root 15 0 0 0 0 S 2.3 0.0 8:01.48 kjournald 6 root RT 0 0 0 0 S 0.3 0.0 0:09.83 migration/2 12 root RT 0 0 0 0 S 0.3 0.0 0:26.61 migration/5 3407 root 15 0 0 0 0 S 0.3 0.0 2:48.45 pdflush 1 root 16 0 1824 556 476 S 0.0 0.0 0:02.97 init 2 root RT 0 0 0 0 S 0.0 0.0 0:00.08 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0 4 root RT 0 0 0 0 S 0.0 0.0 0:29.63 migration/1 5 root 34 19 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/1 7 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2 8 root RT 0 0 0 0 S 0.0 0.0 0:09.05 migration/3 9 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/3 clonxt2:clasrun> # all 6 PMCs reported problem, must be rebooted: .............. bb_cleanup_pci 1: 0xcbc71400 bb_cleanup_pci 2 roc_network +++++++++++++++++++++++++++++++++++++ 2 LINK_close: socket #10 connection closed roc_network +++++++++++++++++++++++++++++++++++++ 3 WRITE THREAD EXIT Prestart command received port=38960 (0xcb7dc874) host >clonxt1-daq1< LINK_establish: socket # 10 LINK_establish: socket buffer size is 48000(0x0000bb80) bytes LINK_establish: keepAlive is 8 LINK_establish: socket 10 is ready: host 129.57.68.22 port 38960 socket=10 bignetptr=0xcb7dc868 offset=0xc0000000 bignetptr=0xcb7dc868 offset=0xc0000000 bignetptr=0xcb7dc868 offset=0xc0000000 bignetptr=0xcb7dc868 offset=0xc0000000 bignet at 0xcb7dc868, bignet.gbigBuffer at 0xcb7dc888 -> 0xcb7dc888 bb_read_pci: cleanup2: 1 write_thread: ERROR: bigbuf==NULL roc_network +++++++++++++++++++++++++++++++++++++ 1 bb_cleanup_pci 0: 0xcb7dc888 bb_cleanup_pci 1: 0xcbc71400 bb_cleanup_pci 2 roc_network +++++++++++++++++++++++++++++++++++++ 2 LINK_close: socket #10 connection closed roc_network +++++++++++++++++++++++++++++++++++++ 3 WRITE THREAD EXIT Prestart command received port=39236 (0xcb7dc874) host >clonxt1-daq1< LINK_establish: socket # 10 LINK_establish: socket buffer size is 48000(0x0000bb80) bytes LINK_establish: keepAlive is 8 ERRORRRRRRRRRRRRRRRRRRRRRRRRRRRRRR !!!!!!!!!!!!!! ERRORRRRRRRRRRRRRRRRRRRRRRRRRRRRRR !!!!!!!!!!!!!! ERRORRRRRRRRRRRRRRRRRRRRRRRRRRRRRR !!!!!!!!!!!!!! LINK_establish: connect failed: host 129.57.68.22 port 39236, ret=-1 ERRORRRRRRRRRRRRRRRRRRRRRRRRRRRRRR !!!!!!!!!!!!!! ERRORRRRRRRRRRRRRRRRRRRRRRRRRRRRRR !!!!!!!!!!!!!! ERRORRRRRRRRRRRRRRRRRRRRRRRRRRRRRR !!!!!!!!!!!!!! socket=0 bignetptr=0xcb7dc868 offset=0xc0000000 bignetptr=0xcb7dc868 offset=0xc0000000 bignetptr=0xcb7dc868 offset=0xc0000000 bignetptr=0xcb7dc868 offset=0xc0000000 bignet at 0xcb7dc868, bignet.gbigBuffer at 0xcb7dc888 -> 0xcb7dc888 Copyright Motorola Inc. 1999-2003, All Rights Reserved MOTLoad RTOS Version 2.0, PAL Version 1.1 RM01 Built on Tue Sep 30 19:15:43 CST 2003 by awang MPU-Type =MPC7447 MPU-Int Clock Speed =1000MHz MPU-Ext Clock Speed =133MHz MPU-Int Cache(L2) Enabled, 512KB, L2CR =C0000000 System Controller =Negated ............ # after about 33000 events rate drops to 0, all 6 PMCs reports: ............. socket=7 bignetptr=0xcb7dc868 offset=0xc0000000 bignetptr=0xcb7dc868 offset=0xc0000000 bignetptr=0xcb7dc868 offset=0xc0000000 bignetptr=0xcb7dc868 offset=0xc0000000 bignet at 0xcb7dc868, bignet.gbigBuffer at 0xcb7dc888 -> 0xcb7dc888 ERROR1: LINK_sized_write() returns errno=32 ERROR: write_thread failed (in LINK_sized_write). Go command received ............ ts2status: ========== -> ts2status CSR 1 (0x0): Go : 1 Pause on Next scheduled Sync : 0 Sync and Pause : 0 Initiate Sync Event : 0 Initiate Program 1 Event : 1 Initiate Program 2 Event : 0 Enable Level 1 (drives outputs) : 1 Override Inhibit : 0 Test Mode : 0 Reserved : 0 Reset : 0 Initialize : 0 Sync Event occurred : 1 Program 1 Event occurred : 1 Program 2 Event occurred : 0 Late Fail occurred : 0 Inhibit occurred : 1 Write FIFO error occurred : 0 Read FIFO error occurreds : 1 CSR 2 (0x4): Enable Scheduled Sync : 1 Use Clear Permit Timer : 1 Use Front Busy Timer : 1 Use Clear Hold Timer : 1 Use External Front Busy : 1 Lock ROC Branch 1 : 0 Lock ROC Branch 2 : 0 Lock ROC Branch 3 : 0 Lock ROC Branch 4 : 0 Lock ROC Branch 5 : 0 Enable Program 1 front panel input : 0 Enable Interrupt : 1 Enable local ROC (branch 5) : 1 Trigger Control Register (0x8): 0x00000581 ROC Enable Register (0xc) val=0xf07fff0f: Branch 1: 0x f bits: 00001111 Branch 2: 0xff bits: 11111111 Branch 3: 0x7f bits: 01111111 Branch 4: 0xf0 bits: 11110000 Synchronization Interval Register (0x10): 1000 Trigger Word Count Register (0x14): 0 Trigger Data Register (0x18): 64 Local ROC (Branch 5) Data Register (0x1c): 28 Synchronization Flag : 0 Late Fail Flag : 0 ROC code : 7 Input Trigger Prescale Registers: Input 1 Prescale Factor : 0 Input 2 Prescale Factor : 0 Input 3 Prescale Factor : 0 Input 4 Prescale Factor : 0 Input 5 Prescale Factor : 0 Input 6 Prescale Factor : 0 Input 7 Prescale Factor : 0 Input 8 Prescale Factor : 40 Clear Permit Timer Register = 0 Level2 Accept Timer Register = 83 Level3 Accept Timer Register = 0 Front Busy Timer Register = 325 Clear Hold Timer Register = 100 Branch (1-4) ROC Buffer Status Register (0x58): 0x40864040 Branch 1: Buffer Count = 0, Empty Flag = 1, Full Flag = 0 Branch 2: Buffer Count = 0, Empty Flag = 1, Full Flag = 0 Branch 3: Buffer Count = 6, Empty Flag = 0, Full Flag = 1 Branch 4: Buffer Count = 0, Empty Flag = 1, Full Flag = 0 Local ROC (Branch 5) Buffer Status Register (0x5c): 0xffff0040 Branch 5: Buffer Count = 0, Empty Flag = 1, Full Flag = 0,Local Acknowledge = 0, Local Event Strob Status = 0 ROC Acknowledge Status Register (0x60): val=0x3f0000 Branch 1: 0x 0 bits: 00000000 (enabled: 00001111) Branch 2: 0x 0 bits: 00000000 (enabled: 11111111) Branch 3: 0x3f bits: 00111111 (enabled: 01111111) Branch 4: 0x 0 bits: 00000000 (enabled: 11110000) State Register (0x6C): Level 1 Accept : 1 Start Level 2 Trigger : 0 Level 2 Pass Latched : 0 Level 2 Fail Latched : 0 Level 2 Accept : 1 Start Level 3 Trigger : 0 Level 3 Pass Latched : 0 Level 3 Fail Latched : 0 Level 3 Accept : 1 Clear : 0 Front End Busy (external) : 0 External Inhibit : 0 Latched Trigger : 1 TS Busy : 1 TS Active : 1 TS Ready : 0 Main Sequencer Active : 1 Synchronization Sequencer Active : 0 Program 1 Event Sequencer Active : 1 Program 2 Event Sequencer Active : 0 Event Count (0xc8): 432 Live 1 Count (0xcc): 28737 Live 2 Count (0xd0): 43116042 value = 1 = 0x1 -> DINC closing... clon10:main> clon10:main> clon10:main> clon10:main> clon10:main> clon10:main> or_stats task spawned: id = 0x2d339a0, name = t3 t3 Front End Busy Status. vhg Sat Feb 18 23:25:44 EST 2006 ========================================================= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 421 0xf07fff0f clon10:main> # coda_l3 (running on clonxt1) crashed, last messages: ........... End .. Prestart .. [1] current run number = 50800 [1] set run number = 50801 Go .. End .. et ERROR: etn_events_get, read error etmacros: socket communication error # et_2_et clonxt2:/tmp/et_sys_clasprod clon10-daq1:/tmp/et_sys_clasprod ET2ET10 crashed, last messages: et ERROR: etn_events_get, read error et ERROR: et_events_bridge, error (status = -9) getting events from "from" ET system et ERROR: etr_forcedclose, read error # coda_er stuck with event rate 200Hz, it consumes 99% CPU; iostat output: clonxt2:clasrun> iostat Linux 2.6.9-22.ELsmp (clonxt2) 02/19/2006 avg-cpu: %user %nice %sys %iowait %idle 1.29 0.00 10.08 1.18 87.46 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.10 10.82 19.38 7381255 13216404 sdb 5.97 2919.10 1839.60 1990547571 1254428696 sdc 12.05 4776.70 5077.95 3257251235 3462678112 sdd 3.55 1490.20 1408.70 1016176795 960598624 sde 3.70 1717.89 1317.35 1171438659 898303344 clonxt2:clasrun> iostat Linux 2.6.9-22.ELsmp (clonxt2) 02/19/2006 avg-cpu: %user %nice %sys %iowait %idle 1.29 0.00 10.08 1.18 87.46 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.10 10.82 19.38 7381271 13216812 sdb 5.97 2919.08 1839.58 1990547571 1254428696 sdc 12.05 4776.66 5077.92 3257251235 3462682960 sdd 3.55 1490.19 1408.69 1016176795 960598624 sde 3.70 1717.88 1317.34 1171438659 898303344 clonxt2:clasrun> iostat -p 9199 Linux 2.6.9-22.ELsmp (clonxt2) 02/19/2006 avg-cpu: %user %nice %sys %iowait %idle 1.29 0.00 10.08 1.18 87.46 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.10 10.82 19.38 7381279 13216924 sda1 0.00 0.00 0.00 1301 4 sda2 2.93 10.82 19.38 7375082 13214128 sda3 0.00 0.01 0.00 3708 2784 sdb 5.97 2919.04 1839.56 1990547571 1254428696 sdb1 234.03 2919.02 1839.56 1990534099 1254428616 sdc 12.05 4776.59 5077.86 3257251243 3462692728 sdc1 641.25 4776.57 5077.86 3257237771 3462691880 sdd 3.55 1490.17 1408.67 1016176795 960598624 sdd1 178.12 1490.15 1408.67 1016159219 960598424 sde 3.70 1717.85 1317.32 1171438659 898303344 sde1 166.96 1717.83 1317.32 1171421083 898303008 clonxt2:clasrun> iostat -p 9199 Linux 2.6.9-22.ELsmp (clonxt2) 02/19/2006 avg-cpu: %user %nice %sys %iowait %idle 1.29 0.00 10.08 1.18 87.46 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.10 10.82 19.38 7381279 13217284 sda1 0.00 0.00 0.00 1301 4 sda2 2.93 10.81 19.38 7375082 13214488 sda3 0.00 0.01 0.00 3708 2784 sdb 5.97 2918.93 1839.49 1990547571 1254428696 sdb1 234.02 2918.91 1839.49 1990534099 1254428616 sdc 12.05 4776.42 5077.71 3257251243 3462717008 sdc1 641.23 4776.40 5077.71 3257237771 3462716160 sdd 3.55 1490.12 1408.62 1016176795 960598624 sdd1 178.11 1490.09 1408.62 1016159219 960598424 sde 3.69 1717.79 1317.27 1171438659 898303344 sde1 166.96 1717.77 1317.27 1171421083 898303008 clonxt2:clasrun> iostat -p 9199 Linux 2.6.9-22.ELsmp (clonxt2) 02/19/2006 avg-cpu: %user %nice %sys %iowait %idle 1.29 0.00 10.08 1.18 87.46 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1.10 10.82 19.38 7381279 13218172 sda1 0.00 0.00 0.00 1301 4 sda2 2.93 10.81 19.38 7375082 13215376 sda3 0.00 0.01 0.00 3708 2784 sdb 5.97 2918.68 1839.33 1990547571 1254428696 sdb1 234.00 2918.66 1839.33 1990534099 1254428616 sdc 12.05 4776.01 5077.36 3257251243 3462775480 sdc1 641.19 4775.99 5077.36 3257237771 3462774632 sdd 3.55 1489.99 1408.50 1016176795 960598624 sdd1 178.10 1489.96 1408.50 1016159219 960598424 sde 3.69 1717.65 1317.16 1171438659 898303344 sde1 166.94 1717.62 1317.15 1171421083 898303008 clonxt2:clasrun> strace -p 9199 Process 9199 attached - interrupt to quit select(1, [0], NULL, NULL, NULL Process 9199 detached clonxt2:clasrun> strace -p 9199 Process 9199 attached - interrupt to quit select(1, [0], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted) Process 9199 detached clonxt2:clasrun> strace -p 9199 attach: ptrace(PTRACE_ATTACH, ...): No such process clonxt2:clasrun> clonxt2:clasrun> clonxt2:clasrun> * INFO: during Prestart, when one 'Prestart' event passed to clonxt2 and nothing else, 'top' looks as following: Tasks: 135 total, 1 running, 134 sleeping, 0 stopped, 0 zombie Cpu(s): 1.1% us, 13.3% sy, 0.0% ni, 85.6% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 8055976k total, 8023628k used, 32348k free, 102264k buffers Swap: 8193140k total, 184k used, 8192956k free, 7715152k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15191 clasrun 16 0 927m 765m 764m S 57.5 9.7 235:24.10 et_start 2637 clasrun 15 0 827m 2484 1760 S 57.2 0.0 0:10.90 coda_er 2684 clasrun 16 0 2856 984 752 R 0.3 0.0 0:00.17 top 1 root 16 0 1824 556 476 S 0.0 0.0 0:06.70 init 2 root RT 0 0 0 0 S 0.0 0.0 0:00.14 migration/0 and it stays this way until 'Go' pressed; too busy to process one event ????? # Feb 18 2006: smatrsocket problem: forexample scaler_server was not running and cannot be started because of "duplicate name" message from SS; ipc_info shows that process as active; kill all rtservers on clon10/clon00/clon05 and start again - problem fixed # ET2ER crashed; command was: clonxt1:clasrun> et_2_et clonxt1:/tmp/et_sys_clasprod clonxt2-daq2:/tmp/et_sys_clasprod ET2ER last message: ......... ===================================== ===================================== ===================================== setting swap function 'et_bridge_BOS' ===================================== ===================================== ===================================== tcp_writev(4,,201) = writev(4,,16) = -1 tcp_writev: Connection reset by peer et ERROR: et_events_bridge, write error et ERROR: etr_events_dump, write error et ERROR: etr_forcedclose, write error et ERROR: etr_forcedclose, read error ......... # EB2 crashed: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! *** glibc detected *** corrupted double-linked list: 0x00884858 *** Abort clonxt1:clasrun> ================================================================ ========= MOVE from clonxt2 (Linux) to clonxt3 (SunOS) ========= ================================================================ # 'rm /mnt/raid2/stage_in/*' on clonxt3: 'ls' shows empty directory immediately, but 'df -k .' hung for the long time, ctrl-C, ctrl-Z and 'kill -9' could not kill it (even as root); top shows: RUN STOPED: load averages: 0.79, 1.00, 1.05 08:32:20 75 processes: 73 sleeping, 1 zombie, 1 on cpu CPU states: 78.8% idle, 0.0% user, 21.2% kernel, 0.0% iowait, 0.0% swap Memory: 6016M real, 4438M free, 931M swap in use, 12G swap free PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 9 root 16 59 0 9032K 7992K sleep 0:06 0.07% svc.configd 3211 root 1 59 0 7572K 3808K sleep 0:00 0.03% sshd 3216 bhess 1 59 0 2628K 1920K sleep 0:00 0.03% tcsh 673 noaccess 20 59 0 119M 52M sleep 1:01 0.02% java 3207 clasrun 1 59 0 3140K 1280K cpu/1 0:00 0.01% top 7 root 14 59 0 11M 8320K sleep 0:03 0.01% svc.startd 3214 bhess 1 59 0 7448K 1936K sleep 0:00 0.01% sshd 93 daemon 3 59 0 3756K 1860K sleep 0:00 0.01% kcfd 510 root 1 59 0 11M 9968K sleep 0:14 0.00% Xorg 315 root 2 59 0 4964K 2420K sleep 0:03 0.00% automountd 92 root 26 59 0 3904K 3016K sleep 0:01 0.00% nscd 910 clasrun 6 59 0 779M 104M sleep 777:12 0.00% coda_er 907 clasrun 15 59 0 767M 766M sleep 25:11 0.00% et_start 925 clasrun 4 59 0 788M 116M sleep 9:44 0.00% et_2_et 714 root 1 59 0 9800K 7144K sleep 0:08 0.00% dtgreet RUN IN PROGRESS: load averages: 1.80, 1.54, 1.24 08:47:53 80 processes: 77 sleeping, 1 zombie, 2 on cpu CPU states: 52.8% idle, 24.8% user, 22.5% kernel, 0.0% iowait, 0.0% swap Memory: 6016M real, 4432M free, 934M swap in use, 12G swap free PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 910 clasrun 7 0 0 779M 104M cpu/1 783:50 24.94% coda_er 907 clasrun 15 59 0 767M 766M sleep 25:15 0.30% et_start 925 clasrun 4 59 0 788M 116M sleep 9:46 0.14% et_2_et 673 noaccess 20 59 0 119M 52M sleep 1:02 0.05% java 3471 root 1 59 0 3036K 1176K cpu/3 0:00 0.02% top 510 root 1 59 0 11M 9968K sleep 0:14 0.00% Xorg 714 root 1 59 0 9800K 7144K sleep 0:08 0.00% dtgreet 9 root 16 59 0 9032K 7992K sleep 0:06 0.00% svc.configd 7 root 14 59 0 11M 8320K sleep 0:03 0.00% svc.startd 315 root 2 59 0 4964K 2424K sleep 0:03 0.00% automountd 679 root 7 59 0 2928K 2304K sleep 0:02 0.00% vold 92 root 26 59 0 3904K 3040K sleep 0:01 0.00% nscd 697 root 1 59 0 6644K 4724K sleep 0:00 0.00% snmpd 3211 root 1 59 0 7572K 3788K sleep 0:00 0.00% sshd 3165 root 1 59 0 7496K 3644K sleep 0:00 0.00% sshd clonxt3:clasrun> iostat -I 2 tty sd0 sd1 sd2 sd3 cpu tin tout kpi tpi serv kpi tpi serv kpi tpi serv kpi tpi serv us sy wt id 6712 3262331 0 9 0 0 16 0 0 19 0 503542 63045 3 21 3 0 76 0 245 0 0 0 0 0 0 0 0 0 0 0 0 25 16 0 59 0 80 0 0 0 0 0 0 0 0 0 0 0 0 25 19 0 56 0 80 0 0 0 0 0 0 0 0 0 0 0 0 25 21 0 55 0 80 0 0 0 0 0 0 0 0 0 0 0 0 25 22 0 54 0 80 0 0 0 0 0 0 0 0 0 0 0 0 25 21 0 54 0 80 0 0 0 0 0 0 0 0 0 0 0 0 25 22 0 53 0 80 0 0 0 0 0 0 0 0 0 0 0 0 25 21 0 54 0 80 0 0 0 0 0 0 0 0 0 0 0 0 25 22 0 53 3 83 0 0 0 0 0 0 0 0 0 7 1 5 25 22 0 53 10 158 0 0 0 0 0 0 0 0 0 0 0 0 25 22 0 53 12 139 0 0 0 0 0 0 0 0 0 0 0 0 25 23 0 52 6 22301 0 0 0 0 0 0 0 0 0 0 0 0 25 24 0 52 4 114 0 0 0 0 0 0 0 0 0 0 0 0 25 23 0 53 8 121 0 0 0 0 0 0 0 0 0 0 0 0 25 24 0 51 3 83 0 0 0 0 0 0 0 0 0 0 0 0 25 13 0 62 10 123 0 0 0 0 0 0 0 0 0 3 1 10 25 11 0 64 2 322 0 0 0 0 0 0 0 0 0 36 5 3 25 18 0 57 # ec2 in the middle of the run: clon10:clasrun> tsconnect ec2 Port '/dev/cua/4' flushed. TYPE ~q TO QUIT ------- DINC --- port=/dev/cua/4 ------- 9600 BAUD 8 NONE 1 SWFC=OFF HWFC=OFF CAR=ON DTR=ON RTS=ON CTS=ON DSR=OFF Type ~? for help. No buffer space available (errno=55) No buffer space available (errno=55) -> No buffer space available (errno=55) -> -> -> No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) -> No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) iNo buffer space available (errno=55) NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask bffe200 0 PEND 1f9dc8 bffe0e0 0 0 tLogTask logTask bffb660 0 PEND 1f9dc8 bffb550 0 0 tShell shell bccef40 1 READY 226c88 bcceb20 3d0002 0 tRlogind rlogind bcdf430 2 PEND 221090 bcdefd0 0 0 tWdbTask wdbTask bcd1780 3 PEND 221090 bcd1650 0 0 tNetTask netTask bd17520 50 SUSPEND 1b967c bd17400 3d 0 tPortmapd portmapd bcdaec0 54 PEND 221090 bcdac60 3d0002 0 tTelnetd telnetd bcdd080 55 PEND 221090 bcdcf10 0 0 t1 CODAtcpServe bcc32b0 100 PEND 221090 bcc2c90 2b0001 0 rols_loop rols_loop 8afe2d0 105 DELAY 2265ac 8afddd0 0 13 roc_udp UDP_loop 8baaee0 110 DELAY 2265ac 8baae20 37 45 coda_net write_thread 8acd370 110 PEND 221090 8acd0d0 0 0 ROC coda_roc b0a9db0 200 PEND 221090 b0a8b70 3d0002 0 TCP_SERVER tcpServer bcbe270 250 PEND 221090 bcbdc50 2b0001 0 value = 0 = 0x0 -> No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) netStackDataPoolShowNo buffer space available (errno=55) type number --------- ------ FREE : 4849 DATA : 133 HEADER : 18 SOCKET : 0 PCB : 0 RTABLE : 0 HTABLE : 0 ATABLE : 0 SONAME : 0 ZOMBIE : 0 SOOPTS : 0 FTABLE : 0 RIGHTS : 0 IFADDR : 0 CONTROL : 0 OOBDATA : 0 IPMOPTS : 0 IPMADDR : 0 IFMADDR : 0 MRTABLE : 0 TOTAL : 5000 number of mbufs: 5000 number of times failed to find space: 0 number of times waited for space: 0 number of times drained protocols for space: 0 __________________ CLUSTER POOL TABLE _______________________________________________________________________________ size clusters free usage ------------------------------------------------------------------------------- 64 500 498 11622 128 500 418 3544619 256 500 499 20853No buffer space available (errno=512 500 55500 25090) 1024 500 485 1427879 2048 500 478 1907778 ------------------------------------------------------------------------------- value = 80 = 0x50 = 'P' -> No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) -> No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) netStackSysPoolShowNo buffer space available (errno=55) type number --------- ------ FREE : 4041 DATA : 0 HEADER : 0 SOCKET : 8 PCB : 14 RTABLE : 27 HTABLE : 0 ATABLE : 0 SONAME : 0 ZOMBIE : 0 SOOPTS : 0 FTABLE : 0 RIGHTS : 0 IFADDR : 4 CONTROL : 0 OOBDATA : 0 IPMOPTS : 0 IPMADDR : 2 IFMADDR : 0 MRTABLE : 0 TOTAL : 4096 number of mbufs: 4096 number of times failed to find space: 0 number of times waited for space: 0 number of times drained protocols for space: 0 __________________ CLUSTER POOL TABLE _______________________________________________________________________________ size clusters free usage ------------------------------------------------------------------------------- 64 512 495 18 128 512 497 456 256 512 489 896 512 512 512 0 ------------------------------------------------------------------------------- value = 80 = 0x50 = 'P' -> No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask bffe200 0 PEND 1f9dc8 bffe0e0 0 0 tLogTask logTask bffb660 0 PEND 1f9dc8 bffb550 0 0 tShell shell bccef40 1 READY 226c88 bcceb20 3d0002 0 tRlogind rlogind bcdf430 2 PEND 221090 bcdefd0 0 0 tWdbTask wdbTask bcd1780 3 PEND 221090 bcd1650 0 0 tNetTask netTaskNo buffer spa ce available (e rrno= bd17520 50 SUSPEND55 1b967c ) bd17400 3d 0 tPortmapd portmapd bcdaec0 54 PEND 221090 bcdac60 3d0002 0 tTelnetd telnetd bcdd080 55 PEND 221090 bcdcf10 0 0 t1 CODAtcpServe bcc32b0 100 PEND 221090 bcc2c90 2b0001 0 rols_loop rols_loop 8afe2d0 105 DELAY 2265ac 8afddd0 0 55 roc_udp UDP_loop 8baaee0 110 DELAY 2265ac 8baae20 37 64 coda_net write_thread 8acd370 110 PEND 221090 8acd0d0 0 0 ROC coda_roc b0a9db0 200 PEND 221090 b0a8b70 3d0002 0 TCP_SERVER tcpServer bcbe270 250 PEND 221090 bcbdc50 2b0001 0 value = 0 = 0x0 -> No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) td No buffer space available (errno=55) No buffer space available (errno=55) No buffer space available (errno=55) rocNo buffer space available (errno=55) _No buffer space available (errno=55) udNo buffer space available (errno=55) p value = 0 = 0x0 -> -> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask bffe200 0 PEND 1f9dc8 bffe0e0 0 0 tLogTask logTask bffb660 0 PEND 1f9dc8 bffb550 0 0 tShell shell bccef40 1 READY 226c88 bcceb20 3d0002 0 tRlogind rlogind bcdf430 2 PEND 221090 bcdefd0 0 0 tWdbTask wdbTask bcd1780 3 PEND 221090 bcd1650 0 0 tNetTask netTask bd17520 50 SUSPEND 1b967c bd17400 3d 0 tPortmapd portmapd bcdaec0 54 PEND 221090 bcdac60 3d0002 0 tTelnetd telnetd bcdd080 55 PEND 221090 bcdcf10 0 0 t1 CODAtcpServe bcc32b0 100 PEND 221090 bcc2c90 2b0001 0 rols_loop rols_loop 8afe2d0 105 DELAY 2265ac 8afddd0 0 54 coda_net write_thread 8acd370 110 PEND 221090 8acd0d0 0 0 ROC coda_roc b0a9db0 200 PEND 221090 b0a8b70 3d0002 0 TCP_SERVER tcpServer bcbe270 250 PEND 221090 bcbdc50 2b0001 0 value = 0 = 0x0 -> -> -> -> -> -> DINC closing... clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> roc_reboot ec2 # scaler4: lost half memory ???? informEB: alloc error - return ERROR in bb_write_current: FAILED clasprod::scaler4> exec memShow status bytes blocks avg block max block ------ ---------- --------- ---------- ---------- current free 3260864 63 51759 2073136 alloc 25885984 8342 3103 - cumulative alloc 29665296 13931 2129 - clasprod::scaler4> wait: coda request in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition # dc6 end - too long !!!!!!!!!!!!!!! write_thread: wait= 401 send= 34 microsec per event (nev=6157) interrupt: timer: 19 microsec (min=12 max=512 rms**2=17) 0x8b41ba0 (ROLS_LOOP): timer: 17 microsec (min=3 max=1326 rms**2=16) write_thread: wait= 407 send= 34 microsec per event (nev=6074) wait: coda request in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition roc_end reached INIT_NAME: rolp->daproc = 3 ENDINGGGGGG: sfiIntCount=2686119 syncFlag=1 lateFail=0 type=0 0x8ac7420 (twork0003): ending by force_sync (scaler) event sfiUserEnd: Ended after 2686119 events GEN End: 1 events left on sfiOUT queue INFO: User End 1 Executed INIT_NAME: rolp->daproc = 3 INFO: User End 2 Executed wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress codaUpdateStatus: >UPDATE process SET state='ending' WHERE name='dc6'< wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask bffe200 0 PEND 1f9dc8 bffe0e0 0 0 tLogTask logTask bffb660 0 PEND 1f9dc8 bffb550 0 0 tShell shell bccf190 1 READY 226c88 bcced70 320002 0 tRlogind rlogind bcdf430 2 PEND 221090 bcdefd0 0 0 tWdbTask wdbTask bcd1780 3 PEND 221090 bcd1650 0 0 tNetTask netTask bd17520 50 PEND 221090 bd17430 0 0 tPortmapd portmapd bcdaec0 54 PEND 221090 bcdac60 3d0002 0 tTelnetd telnetd bcdd080 55 PEND 221090 bcdcf10 0 0 roc_udp UDP_loop 8bad0a0 110 DELAY 2265ac 8bacff0 0 27 coda_net write_thread 8af8380 110 DELAY 2265ac 8af8280 0 34 CODATCPSRV CODAtcpServe b81c170 150 DELAY 2265ac b81bbe0 2b0001 26 twork0003 CODAtcpServe 8ac7420 150 PEND 221090 8ac6030 2b0001 0 ROC coda_roc b0a99f0 200 PEND 221090 b0a8790 3d0002 0 TCP_SERVER tcpServerwait: coda reques t in progress bcc32b0 250 PEND 221090 bcc2c90 2b0001 0 ROLS_LOOP rols_loop 8b41ba0 255 READY bca4648 8b415e0 0 0 value = 0 = 0x0 -> UDP_standard_request >sta:dc6 ending< UDP_standard_request >sta:dc6 ending< UDP_standard_request >sta:dc6 ending< UDP_standard_request >sta:dc6 ending< UDP_standard_request >sta:dc6 ending< UDP_standard_request >sta:dc6 ending< UDP_cancel: cancel >sta:dc6 active< wait: coda request in progress wait: coda request in progress wait: coda request in progress NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! roc_end done coda_roc: rocp->state == ENDING and input list is empty coda_roc: process rol1 buffer by rol2 completed coda_roc: break loop coda_roc: processing case 'DA_ENDING': last event=2686119 nevents=2686119 coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU coda_roc: call informEB(EV_END) informEB reached informEB: 11 -1 6 1 12 1 - 0x00000004 0x001401cc 0x000004b0 0x0000c73a 0x0028fc7 0x002a0006 Inserted End event on queue codaUpdateStatus: >UPDATE process SET state='downloaded' WHERE name='dc6'< UDP_standard_request >sta:dc6 downloaded< UDP_standard_request >sta:dc6 downloaded< UDP_standard_request >sta:dc6 downloaded< UDP_standard_request >sta:dc6 downloaded< UDP_standard_request >sta:dc6 downloaded< UDP_standard_request >sta:dc6 downloaded< UDP_cancel: cancel >sta:dc6 ending< ended after 2686119 events write_thread: END condition received roc_network +++++++++++++++++++++++++++++++++++++ 1 bb_cleanup 0: 0x0b7dca48 bb_cleanup 1: 0x0bc42150 bb_cleanup 2 roc_network +++++++++++++++++++++++++++++++++++++ 2 LINK_close: socket #12 connection closed roc_network +++++++++++++++++++++++++++++++++++++ 3 WRITE THREAD EXIT call: 'dc6 close_links' rocCloselink reached -> -> -> # dc6 go also slow ... -> wait: coda request in progress codaExecute reached, message >go<, len=2 codaExecute: 'go' transition activating .. informEB reached informEB: 11 -1 6 1 12 0 - 0x00000004 0x001201cc 0x000004b0 0x00000000 0x00000000 0x00000000 1-1: dabufp set to 0x97ad318 INIT_NAME: rolp->daproc = 5 WANNARAW=0 PROFILE =0 regular ------------------------- INFO: User Go 2 Executed INIT_NAME: rolp->daproc = 5 0x8ac7360 (twork0005): sfiIntCount=0 INFO: User Go 1 Executed wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress codaUpdateStatus: >UPDATE process SET state='active' WHERE name='dc6'< wait: coda request in progress UDP_standard_request >sta:dc6 active< UDP_standard_request >sta:dc6 active< UDP_standard_request >sta:dc6 active< UDP_standard_request >sta:dc6 active< UDP_standard_request >sta:dc6 active< UDP_standard_request >sta:dc6 active< UDP_cancel: cancel >sta:dc6 paused< active, events so far 0 POLLS: 1 1 -> nevents 1 newevnb 257 old 167 dabufp[0] = 0x00000244 (580) dabufp[1] = 0x00070101 (459009) interrupt: timer: 19 microsec (min=12 max=596 rms**2=10) 0x8b41ba0 (ROLS_LOOP): timer: 17 microsec (min=3 max=1326 rms**2=15) # when dc6 end fast: interrupt: timer: 18 microsec (min=12 max=596 rms**2=2) 0x8b41ba0 (ROLS_LOOP): timer: 13 microsec (min=3 max=1326 rms**2=18) write_thread: wait= 784 send= 26 microsec per event (nev=3304) 0x8b41ba0 (ROLS_LOOP): timer: 14 microsec (min=3 max=1326 rms**2=14) interrupt: timer: 18 microsec (min=12 max=596 rms**2=6) write_thread: wait= 775 send= 26 microsec per event (nev=3343) wait: coda request in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition roc_end reached INIT_NAME: rolp->daproc = 3 ENDINGGGGGG: sfiIntCount=1024651 syncFlag=1 lateFail=0 type=0 0x8ac7360 (twork0006): ending by force_sync (scaler) event sfiUserEnd: Ended after 1024651 events GEN End: 1 events left on sfiOUT queue INFO: User End 1 Executed INIT_NAME: rolp->daproc = 3 INFO: User End 2 Executed codaUpdateStatus: >UPDATE process SET state='ending' WHERE name='dc6'< UDP_standard_request >sta:dc6 ending< UDP_standard_request >sta:dc6 ending< UDP_standard_request >sta:dc6 ending< UDP_standard_request >sta:dc6 ending< UDP_standard_request >sta:dc6 ending< UDP_standard_request >sta:dc6 ending< UDP_cancel: cancel >sta:dc6 active< wait: coda request in progress wait: coda request in progress wait: coda request in progress NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! roc_end done coda_roc: rocp->state == ENDING and input list is empty coda_roc: process rol1 buffer by rol2 completed coda_roc: break loop coda_roc: processing case 'DA_ENDING': last event=1024651 nevents=1024651 coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' coda_roc: processing case 'DA_ENDING' UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU coda_roc: call informEB(EV_END) informEB reached informEB: 11 -1 6 1 12 1 - 0x00000004 0x001401cc 0x000004b0 0x0000c73b 0x000fa28b 0x0a6604de Inserted End event on queue codaUpdateStatus: >UPDATE process SET state='downloaded' WHERE name='dc6'< UDP_standard_request >sta:dc6 downloaded< UDP_standard_request >sta:dc6 downloaded< UDP_standard_request >sta:dc6 downloaded< UDP_standard_request >sta:dc6 downloaded< UDP_standard_request >sta:dc6 downloaded< UDP_standard_request >sta:dc6 downloaded< UDP_cancel: cancel >sta:dc6 ending< ended after 1024651 events write_thread: END condition received roc_network +++++++++++++++++++++++++++++++++++++ 1 bb_cleanup 0: 0x0b7dca48 bb_cleanup 1: 0x0bc42150 bb_cleanup 2 roc_network +++++++++++++++++++++++++++++++++++++ 2 LINK_close: socket #12 connection closed roc_network +++++++++++++++++++++++++++++++++++++ 3 WRITE THREAD EXIT call: 'dc6 close_links' rocCloselink reached -> -> # ---bad--- UDP_standard_request >sta:ec1 active< UDP_standard_request >sta:ec1 active< UDP_standard_request >sta:ec1 active< UDP_standard_request >sta:ec1 active< UDP_cancel: cancel >sta:ec1 paused< active, events so far 0 POLLS: 1 1 codaExecute done normTim=33 normTim=33 wait: coda request in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition roc_end reached INIT_NAME: rolp->daproc = 3 Address total free busy size incr (KBytes) Name ------- ----- ---- ---- ----- ---- -------- ---- 0x0bc9f7d0 200 199 1 65584 0 (12810) sfiIN 0x0bc992d0 0 0 0 48 0 (0) sfiOUT ENDINGGGGGG: sfiIntCount=77842 syncFlag=1 lateFail=0 type=0 0x8484490 (twork0003): ending by force_sync (scaler) event Address total free busy size incr (KBytes) Name INIT_------- ----- ---- ---- ----- ---- -------- ---- N0xAME: rolp->daproc 0= bc9f7d0 200 1993 1 65584 0 (12810) sfiINAddress tota l free busy si0xze incr (KBytes)0 Name bc992d0 0------- ----- - --- ---- ----- ---- -------- 1---- -1 48 0x 0 (0) sfiOUT0 sfiUserEnd: Ended after bc9f7d077842 events GEN End: 1 events left on sfiOUT queue 200GEN End(second check): 0 events left on sfiOUT queue INFO: User End 1 Executed 1INIT_NAME: rolp->daproc = 993 INFO: User End 2 Executed codaUpdateStatus: dbConnecting .. 1 6codaUpdateStatus: dbConnect done 5584codaUpdateStatus: > UPDATE process SET state='ending' WHERE name='ec1' < 0 (128codaUpdateStatus: dbDisconnecting .. 10codaUpdateStatus: dbDisconnect done ) UDP_standard_request > sta:ec1 ending< sfiINUDP_standard_request >wait: coda rsta:ec1 endingequest< in progress UDP_standard_request >0xsta:ec1 ending0< UDP_standard_request >bsta:ec1 endingc992d0< UDP_standard_request > sta:ec1 ending0< UDP_standard_request > sta:ec1 ending < UDP_cancel: cancel >0sta:ec1 active < 0 48 0 (0) sfiOUT ENDINGGGGGG: sfiIntCount=77842 syncFlag=1 lateFail=0 type=0 0x84fec10 (ROLS_LOOPAddress tota): l free busy ending by force_sync (scaler) event size incr (KBytes) Name ------- ----- ---- ---- ----- ---- -------- ---- 0x0bc9f7d0 200 200 0 65584 0 (12810) sfiIN 0x0bc992d0 0 1 -1 48 0 (0) sfiOUT sfiUserEnd: Ended after 77842 events GEN End: 1 events left on sfiOUT queue GEN End(second check): 0 events left on sfiOUT queue INFO: User End 1 Executed wait: coda request in progress wait: coda request in progress wait: coda request in progress NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! ---good--- UDP_standard_request >sta:ec2 active< UDP_standard_request >sta:ec2 active< UDP_standard_request >sta:ec2 active< UDP_standard_request >sta:ec2 active< UDP_standard_request >sta:ec2 active< UDP_standard_request >sta:ec2 active< UDP_cancel: cancel >sta:ec2 paused< active, events so far 0 POLLS: 1 1 codaExecute done normTim=33 normTim=33 wait: coda request in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition roc_end reached INIT_NAME: rolp->daproc = 3 Address total free busy size incr (KBytes) Name ------- ----- ---- ---- ----- ---- -------- ---- 0x0bc9f7d0 200 199 1 65584 0 (12810) sfiIN 0x0bc992d0 0 0 0 48 0 (0) sfiOUT ENDINGGGGGG: sfiIntCount=77842 syncFlag=1 lateFail=0 type=0 0x8484490 (twork0003): ending by force_sync (scaler) event Address total free busy size incr (KBytes) Name ------- ----- ---- ---- ----- ---- -------- ---- 0x0bc9f7d0 200 200 0 65584 0 (12810) sfiIN 0x0bc992d0 0 0 0 48 0 (0) sfiOUT sfiUserEnd: Ended after 77842 events GEN End: sfiOUT queue Empty INFO: User End 1 Executed INIT_NAME: rolp->daproc = 3 codaUpdateStatus: dbConnecting .. codaUpdateStatus: dbConnect done codaUpdateStatus: >UPDATE process SET state='ending' WHERE name='ec2'< codaUpdateStatus: dbDisconnecting .. codaUpdateStatus: dbDisconnect done UDP_standard_request >sta:ec2 ending< UDP_standard_request >sta:ec2 ending< UDP_standard_request >sta:ec2 ending< UDP_standard_request >sta:ec2 ending< UDP_standard_request >sta:ec2 ending< UDP_standard_request >sta:ec2 ending< UDP_cancel: cancel >sta:ec2 active< wait: coda request in progress wait: coda request in progress wait: coda request in progress NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! --- dmaPFreeItep must be called once !!!!!!!!!!!!!!!!!!!!! --- # sc2 ------- DINC --- port=/dev/cua/8 ------- 9600 BAUD 8 NONE 1 SWFC=OFF HWFC=OFF CAR=ON DTR=ON RTS=ON CTS=ON DSR=OFF Type ~? for help. (slot=7 channel=32 data=0x00000000) 0x00829a88 untabled id=0 (slot=15 channel=121 data=0x000003a6) 0x00829a8c untabled id=0 (slot=15 channel=0 data=0x00005b78) 0x00829a90 untabled id=0 (slot=7 channel=40 data=0x00000089) 0x00829a94 untabled id=0 (slot=12 channel=0 data=0x00000001) 0x00829a98 untabled id=0 (slot=9 channel=0 data=0x0000032c) 0x00829a9c untabled id=0 (slot=16 channel=2 data=0x00070000) 0x00829aa0 untabled id=0 (slot=5 channel=0 data=0x00000000) 0x00829aa4 untabled id=0 (slot=8 wait: coda request in progress channel=16 data=0x0002000c) 0x00829aa8 untabled id=0 (slot=16 channel=91 data=0x0001000c) 0x00829ab4 untabled id=0 (slot=7 channel=0 data=0x00000001) 0x00829ab8 untabled id=0 (slot=9 channel=0 data=0x00000154) 0x00829ac0 untabled id=0 (slot=16 channel=3 data=0x00040010) 0x00829ac4 untabled id=0 (slot=4 channel=39 data=0x00050000) 0x00829ac8 untabled id=0 (slot=15 channel=45 data=0x0001e914) 0x00829acc untabled id=0 (slot=10 channel=0 data=0x00000006) 0x00829ad0 untabled id=0 (slot=13 channel=0 data=0x0000e000) 0x00829ad4 untabled id=0 (slot=6 channel=40 data=0x0000ffff) 0x00829ad8 untabled id=0 (slot=15 channel=37 data=0x00020110) 0x00829adc untabled id=0 (slot=15 channel=36 data=0x00005b79) 0x00829ae0 untabled id=0 (slot=8 channel=16 data=0x00020010) 0x00829ae4 untabled id=0 (slot=16 channel=3 data=0x00050000) 0x00829ae8 untabled id=0 (slot=5 channel=0 data=0x00000000) 0x00829aec untabled id=0 (slot=8 channel=16 data=0xwait: coda request in progress 00020124) 0x00829af0 untabled id=0 (slot=16 channel=2 data=0x00070000) 0x00829af4 untabled id=0 (slot=5 channel=0 data=0x00000000) 0x00829af8 untabled id=0 (slot=8 channel=16 data=0x000200f0) 0x00829afc untabled id=0 (slot=15 channel=92 data=0x0003bb78) 0x00829b00 untabled id=0 (slot=9 channel=0 data=0x000046ed) 0x00829b04 untabled id=0 (slot=16 channel=2 data=0x00070000) 0x00829b08 untabled id=0 (slot=5 channel=0 data=0x00000000) 0x00829b0c untabled id=0 (slot=8 channel=16 data=0x000200dc) 0x00829b10 untabled id=0 (slot=7 channel=36 data=0x00000089) 0x00829b14 untabled id=0 (slot=16 channel=1 data=0x00019090) 0x00829b18 untabled id=0 (slot=5 channel=0 data=0x00000000) 0x00829b1c untabled id=0 (slot=8 channel=48 data=0x000200c0) 0x00829b20 untabled id=0 (slot=7 channel=44 data=0x00000089) 0x00829b24 untabled id=0 (slot=7 channel=36 data=0x00001000) 0x00829b28 untabled id=0 (slot=16 channel=1 data=0x0003c6fc) 0x00829b2c wait: coda request in progress untabled id=0 (slot=12 channel=37 data=0x00010008) 0x00829b30 untabled id=0 (slot=15 channel=0 data=0x00004838) 0x00829b34 untabled id=0 (slot=15 channel=0 data=0x00004800) 0x00829b38 untabled id=0 (slot=8 channel=16 data=0x000200a4) 0x00829b3c untabled id=0 (slot=7 channel=36 data=0x00000089) 0x00829b40 untabled id=0 (slot=16 channel=45 data=0x0001c620) 0x00829b44 untabled id=0 (slot=14 channel=1 data=0x00010010) 0x00829b4c untabled id=0 (slot=8 channel=48 data=0x00020090) 0x00829b50 untabled id=0 (slot=7 channel=36 data=0x00000089) 0x00829b54 untabled id=0 (slot=16 channel=27 data=0x00040010) 0x00829b58 untabled id=0 (slot=7 channel=12 data=0x00000001) 0x00829b5c untabled id=0 (slot=16 channel=1 data=0x0001c74c) 0x00829b60 untabled id=0 (slot=7 channel=16 data=0x00000001) 0x00829b64 untabled id=0 (slot=7 channel=20 data=0x00000000) 0x00829b68 untabled id=0 (slot=15 channel=1 data=0x000003a6) 0x00829bwait: coda request in progress 6c untabled id=0 (slot=9 channel=24 data=0x00063182) 0x00829b70 untabled id=0 (slot=9 channel=80 data=0x00000021) 0x00829b74 untabled id=0 (slot=5 channel=0 data=0x00030000) 0x00829b78 untabled id=0 (slot=8 channel=48 data=0x00020064) 0x00829b7c untabled id=0 (slot=7 channel=36 data=0x00000089) 0x00829b80 untabled id=0 (slot=7 channel=40 data=0x00000089) 0x00829b84 untabled id=0 (slot=16 channel=27 data=0x00000004) 0x00829b88 untabled id=0 (slot=7 channel=32 data=0x00000089) 0x00829b90 untabled id=0 (slot=7 channel=12 data=0x00004ea2) 0x00829b94 untabled id=0 (slot=7 channel=16 data=0x00000002) 0x00829b9c untabled id=0 (slot=7 channel=28 data=0x00000000) 0x00829ba0 untabled id=0 (slot=10 channel=45 data=0x0003402e) 0x00829ba4 untabled id=0 (slot=16 channel=121 data=0x0000c768) 0x00829ba8 untabled id=0 (slot=7 channel=36 data=0x00000000) 0x00829bac untabled id=0 (slot=12 channel=45 data=0x00033000) 0x00829bb0 untabled idwait: coda request in progress =0 (slot=10 channel=0 data=0x0000801e) 0x00829bb4 untabled id=0 (slot=12 channel=45 data=0x00033400) 0x00829bb8 untabled id=0 (slot=7 channel=32 data=0x00000000) 0x00829bbc untabled id=0 (slot=15 channel=121 data=0x000003a6) 0x00829bc0 untabled id=0 (slot=15 channel=0 data=0x00005b78) 0x00829bc4 untabled id=0 (slot=7 channel=40 data=0x00000089) 0x00829bc8 untabled id=0 (slot=12 channel=0 data=0x00000001) 0x00829bcc untabled id=0 (slot=15 channel=0 data=0x00050378) 0x00829bd0 untabled id=0 (slot=9 channel=24 data=0x00063182) 0x00829bd8 untabled id=0 (slot=9 channel=80 data=0x00000021) 0x00829bdc untabled id=0 (slot=7 channel=44 data=0x00000089) 0x00829be0 untabled id=0 (slot=7 channel=84 data=0x00000041) 0x00829be4 untabled id=0 (slot=9 channel=0 data=0x000001f8) 0x00829be8 untabled id=0 (slot=16 channel=46 data=0x00070000) 0x00829bec untabled id=0 (slot=16 channel=37 data=0x00030038) 0x00829bf4 untabled id=0 (slot=14 chawait: coda request in progress nnel=1 data=0x00020002) 0x00829bf8 untabled id=0 (slot=16 channel=101 data=0x00030040) 0x00829bfc untabled id=0 (slot=7 channel=37 data=0x00010001) 0x00829c00 untabled id=0 (slot=16 channel=89 data=0x0003003c) 0x00829c04 untabled id=0 (slot=18 channel=37 data=0x00030038) 0x00829c08 untabled id=0 (slot=8 channel=48 data=0x00020008) 0x00829c0c untabled id=0 (slot=16 channel=97 data=0x00030030) 0x00829c10 untabled id=0 (slot=16 channel=3 data=0x00040010) 0x00829c14 ...................... # roc_reboot sc-laser1 does not work (7-mar-2006) reset cable chain was not restored after BONUS ROCs removal - fixed # rcn problem (9-mar-2006) ER3 waiting for prestart forever; ER3 UDPs 'configured' reset-download helped # pretrig2 problem: diman cannot read TOF pretrig discrs pretrig2> pretrig2> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1bfe600 0 PEND 1c89dc 1bfe4e0 3d0001 0 tLogTask logTask 1bfba60 0 PEND 1c89dc 1bfb950 0 0 tShell shell 18c49a0 1 READY 1f5770 18c4580 320002 0 tRlogind rlogind 190efb0 2 PEND 1efb78 190eb50 0 0 tWdbTask wdbTask 19015d0 HUNG reboot it - problem fixed # raid after fix: clonxt3:clasrun> df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t0d0s0 20170417 6080728 13887985 31% / /devices 0 0 0 0% /devices ctfs 0 0 0 0% /system/contract proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab swap 13726884 636 13726248 1% /etc/svc/volatile objfs 0 0 0 0% /system/object /dev/dsk/c0t0d0p0:boot 15922 1506 14416 10% /boot /usr/lib/libc/libc_hwcap2.so.1 20170417 6080728 13887985 31% /lib/libc.so.1 fd 0 0 0 0% /dev/fd swap 14508356 782108 13726248 6% /tmp swap 13726272 24 13726248 1% /var/run /dev/dsk/c0t0d0s7 42110568 41777 41647686 1% /export/home clonfs2:/home 20971520 14549968 6421552 70% /home clonfs2:/local 104857600 56944064 47913536 55% /usr/local jlabapps:/appsroot/SunOS 709247180 646079496 63167684 92% /u/apps clonfs2:/ssa 338309500 114581496 223728004 34% /ssa /dev/dsk/c6t600C0FF0000000000983922537B57400d0s0 704212675 113688817 583481732 17% /mnt/raid2 /dev/dsk/c6t600C0FF00000000009839217F9632F00d0s0 561765437 111345424 444802359 21% /mnt/raid3 jlabwrk:/vol/vol0/mss 709247180 646079496 63167684 92% /w/mss clonxt3:/ssa> rm /mnt/raid3/stage_in/* clonxt3:clasrun> df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t0d0s0 20170417 6080728 13887985 31% / /devices 0 0 0 0% /devices ctfs 0 0 0 0% /system/contract proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab swap 13727072 636 13726436 1% /etc/svc/volatile objfs 0 0 0 0% /system/object /dev/dsk/c0t0d0p0:boot 15922 1506 14416 10% /boot /usr/lib/libc/libc_hwcap2.so.1 20170417 6080728 13887985 31% /lib/libc.so.1 fd 0 0 0 0% /dev/fd swap 14508544 782108 13726436 6% /tmp swap 13726460 24 13726436 1% /var/run /dev/dsk/c0t0d0s7 42110568 41777 41647686 1% /export/home clonfs2:/home 20971520 14549968 6421552 70% /home clonfs2:/local 104857600 56944060 47913540 55% /usr/local jlabapps:/appsroot/SunOS 709247180 646084188 63162992 92% /u/apps clonfs2:/ssa 338309500 114581544 223727956 34% /ssa /dev/dsk/c6t600C0FF0000000000983922537B57400d0s0 704212675 113819801 583350748 17% /mnt/raid2 /dev/dsk/c6t600C0FF00000000009839217F9632F00d0s0 561765437 65560 556082223 1% /mnt/raid3 jlabwrk:/vol/vol0/mss 709247180 646084148 63163032 92% /w/mss clonxt3:clasrun> df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t0d0s0 20170417 6080728 13887985 31% / /devices 0 0 0 0% /devices ctfs 0 0 0 0% /system/contract proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab swap 13727280 636 13726644 1% /etc/svc/volatile objfs 0 0 0 0% /system/object /dev/dsk/c0t0d0p0:boot 15922 1506 14416 10% /boot /usr/lib/libc/libc_hwcap2.so.1 20170417 6080728 13887985 31% /lib/libc.so.1 fd 0 0 0 0% /dev/fd swap 14508752 782108 13726644 6% /tmp swap 13726668 24 13726644 1% /var/run /dev/dsk/c0t0d0s7 42110568 41777 41647686 1% /export/home clonfs2:/home 20971520 14549968 6421552 70% /home clonfs2:/local 104857600 56944064 47913536 55% /usr/local jlabapps:/appsroot/SunOS 709247180 646084192 63162988 92% /u/apps clonfs2:/ssa 338309500 114581536 223727964 34% /ssa /dev/dsk/c6t600C0FF0000000000983922537B57400d0s0 704212675 113924753 583245796 17% /mnt/raid2 ^C # rcn problem "wait clastrig2 to postprestart" 550sec clastrig2 looks good, it "paused" reset-download-prestart-go helped # dc10 problem ......... bb_cleanup 2 UPDATE proroc_network +++++++++++++++++++++++++++++++++++++ 2 cess SETLINK_close: socket #12 state='downloaded connection closed 'roc_network +++++++++++++++++++++++++++++++++++++ 3 WHERE nWRITE THREAD EXIT ame='dc10'< codaUpdateStatus: dbDisconnecting .. codaUpdateStatus: dbDisconnect done UDP_standard_request >sta:dc10 downloaded< UDP_standard_request >sta:dc10 downloaded< UDP_standard_request >sta:dc10 downloaded< UDP_standard_request >sta:dc10 downloaded< UDP_standard_request >sta:dc10 downloaded< UDP_standard_request >sta:dc10 downloaded< UDP_cancel: cancel >sta:dc10 ending< ended after 2944 events call: 'dc10 close_links' rocCloselink reached wait: coda request in progress codaExecute reached, message >prestart PROD13<, len=15 codaExecute: 'prestart' transition roc_prestart reached partReInitAll() reached ? partReInitAll() reached ! machine check Exception next instruction address: 0x0012bfb0 Machine Status Register: 0x0010b030 Condition Register: 0x20000044 Task: 0x1d805ec0 "twork0004" wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress ............. ========================== # 24-mar-2006 after CC_CALIB run was started, all buttons were unvisible, as well as database field, session field, configuration fiels, runnumber field, window with the list of components and the window with messages (bottom one). Histograms were Ok. After about 3 minutes everything restored. =========================== # 24-mar-2006: ec2 hung ........... 0x848d900 (ROLS_LOOP): timer: 6 microsec (min=2 max=246 rms**2=6) interrupt: timer: 18 microsec (min=12 max=55 rms**2=16) write_thread: wait= 611 send= 9 microsec per event (nev=4367) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=246 rms**2=11) interrupt: timer: 18 microsec (min=12 max=55 rms**2=7) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=246 rms**2=11) write_thread: wait= 313 send= 9 microsec per event (nev=8256) interrupt: timer: 18 microsec (min=12 max=55 rms**2=7) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=746 rms**2=17) interrupt: timer: 18 microsec (min=12 max=55 rms**2=16) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=746 rms**2=4) write_thread: wait= 336 send= 9 microsec per event (nev=7823) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=746 rms**2=14) interrupt: timer: 18 microsec (min=12 max=55 rms**2=1) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=746 rms**2=8) interrupt: timer: 18 microsec (min=12 max=55 rms**2=13) write_thread: wait= 334 send= 9 microsec per event (nev=7755) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=746 rms**2=15) interrupt: timer: 18 microsec (min=12 max=55 rms**2=10) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=746 rms**2=10) write_thread: wait= 337 send= 9 microsec per event (nev=7804) interrupt: timer: 18 microsec (min=12 max=55 rms**2=4) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=746 rms**2=20) interrupt: timer: 18 microsec (min=12 max=55 rms**2=19) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=753 rms**2=11) write_thread: wait= 260 send= 10 microsec per event (nev=9961) interrupt: timer: 18 microsec (min=12 max=55 rms**2=4) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=753 rms**2=6) interrupt: timer: 18 microsec (min=12 max=55 rms**2=9) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=753 rms**2=18) write_thread: wait= 253 send= 10 microsec per event (nev=10219) interrupt: timer: 18 microsec (min=12 max=55 rms**2=14) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=771 rms**2=13) interrupt: timer: 18 microsec (min=12 max=55 rms**2=0) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=771 rms**2=1) write_thread: wait= 255 send= 10 microsec per event (nev=10169) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=771 rms**2=1) interrupt: timer: 18 microsec (min=12 max=55 rms**2=15) 0x848d900 (ROLS_LOOP): timer: 7 microsec (min=2 max=771 rms**2=5) interrupt: timer: 18 microsec (min=12 max=55 rms**2=8) write_thread: wait= 343 send= 10 microsec per event (nev=7545) write_thread: wait= 27528 send= 0 microsec per event (nev=97) write_thread: wait= 27390 send= 0 microsec per event (nev=99) write_thread: wait= 26145 send= 0 microsec per event (nev=102) write_thread: wait= 26520 send= 0 microsec per event (nev=101) interrupt: Unconnected main interrupt 17 interrupt: Unconnected main interrupt 17 interrupt: Unconnected main interrupt 28 interrupt: Unconnected main interrupt 17 interrupt: Unconnected main interrupt 49 interrupt: Unconnected main interrupt 28 interrupt: Unconnected main interrupt 28 interrupt: Unconnected main interrupt 28 interrupt: Unconnected main interrupt 17 interrupt: Unconnected main interrupt 49 interrupt: Unconnected main interrupt 28 interrupt: Unconnected main interrupt 31 interrupt: Unconnected main interrupt 31 interrupt: Unconnected main interrupt 31 interrupt: Unconnected main interrupt 31 interrupt: Unconnected main interrupt 28 interrupt: Unconnected main interrupt 31 ............... interrupt: Unconnected main interrupt 15 interrupt: Unconnected main interrupt 18 interrupt: Unconnected main interrupt 31 interr and it hungs !!!!!!!!!!!!!!!!! replace mv6100 -> mv5500 # ER3: Daughty complanes: 'ER3 waiting for predownload' when waiting for script - confusing -- can it be : waiting for script -- or something ??? # sc2pmc1 write_thread: wait= 165 send= 24 microsec per event (nev=7998) proc_thread: wait= 153 send= 41 microsec per event (nev=5634) 0x19f9b710 (coda_proc): timer: 30 microsec (min=9 max=1446 rms**2=19) proc_thread: wait= 153 send= 40 microsec per event (nev=5657) 0x19f9b710 (coda_proc): timer: 30 microsec (min=9 max=1446 rms**2=3) program Exception current instruction address: 0x00000000 Machine Status Register: 0x0008b030 Condition Register: 0x40222022 Task: 0x19f213d0 "coda_net" -> -> i NAME ENTRY TID PRI STATUS PC SP ERR ---------- ------------ -------- --- ---------- -------- -------- ---- tExcTask excTask 1dffe8b0 0 PEND 1f70d4 1dffe790 tLogTask logTask 1dffbd10 0 PEND 1f70d4 1dffbc00 tShell shell 1dc5a9a0 1 READY 1f0760 1dc5a580 320 tRlogind rlogind 1dca33d0 2 PEND 1eae04 1dca2f70 tWdbTask wdbTask 1dc95fb0 3 PEND 1eae04 1dc95e80 tNetTask netTask 1dd0f640 50 PEND 1eca70 1dd0f400 coda_net write_thread 19f213d0 50 SUSPEND+I 0 19f20ed0 tPortmapd portmapd 1dc9ee60 54 PEND 1eae04 1dc9ec00 3d0 tTelnetd telnetd 1dca1020 55 PEND 1eae04 1dca0eb0 coda_proc proc_thread 19f9b710 110 DELAY 1f0084 19f9b610 coda_pmc coda_pmc 1d70a1f0 200 DELAY 1f0084 1d70a0f0 3d0 value = 0 = 0x0 -> MUST BE: -> i NAME ENTRY TID PRI STATUS PC SP ERR ---------- ------------ -------- --- ---------- -------- -------- ---- tExcTask excTask 1effe6c0 0 PEND 11f5e4 1effe5a0 tLogTask logTask 1effbb20 0 PEND 11f5e4 1effba10 tShell shell 1eb98c90 1 READY 115fcc 1eb98870 320 tRlogind rlogind 1ebb3850 2 PEND 110670 1ebb33f0 tWdbTask wdbTask 1ebaa570 3 PEND 110670 1ebaa440 tNetTask netTask 1ed0f280 50 PEND 110670 1ed0f190 tPortmapd portmapd 1ebb1690 54 PEND 110670 1ebb1430 3d0 tFtpdTask 873fc 1ebaebc0 55 PEND 110670 1ebaea10 coda_proc proc_thread 1af250f0 110 DELAY 1158f0 1af24fe0 coda_net write_thread 1aeaadb0 110 DELAY 1158f0 1aeaacb0 coda_pmc coda_pmc 1e7ad510 200 DELAY 1158f0 1e7ad410 3d0 value = 0 = 0x0 -> ================== # dc4: 28-mar-2006 proc_thread: wait= 25057 send= 25058 microsec per event (nev=3085) proc_thread: wait= 21410 send= 9882 microsec per event (nev=5215) write_thread: wait= 7815 send= 9552 microsec per event (nev=9891) proc_thread: wait= 20012 send= 10776 microsec per event (nev=5579) proc_thread: wait= 19126 send= 5885 microsec per event (nev=5838) proc_thread: wait= 23357 send= 2920 microsec per event (nev=5884) program Exception current instruction address: 0x00000000 Machine Status Register: 0x0008b030 Condition Register: 0x40224042 Task: 0x1ae7c900 "coda_net" proc_thread: wait= 19922 send= 5692 microsec per event (nev=6036) proc_thread: wait= 18763 send= 8041 microsec per event (nev=6409) ÿýÿÿÿÿÿÿÿÿýýý Init serial MPSC0 port done Force PowerPMC-280 Copyright Force Computers, Ltd., 2005 - 2006 Total Memory detected : 0x20000000 Error Checking and Correction is enabled on onboard SDRAM Memory detection and configuration complete Perséÿýÿÿÿÿÿÿÿÿýýý Init serial MPSC0 port done ========================= # sc1: 28-mar-2006 crashed and self-rebooted, no other information available ======================== # sc1: 28-mar-2006 crash during 'Go' .......................... uthbook1: free histogram buffer ... done. INFO: User Prestart 1 executed 111 333: rocp->primefd=8 informEB reached informEB: 11 -1 13 1 8 0 - 0x00000004 0x001101cc 0x000004b0 0x0000c909 0x0000000b 0x00000000 codaUpdateStatus: >UPDATE process SET state='paused' WHERE name='sc1'< UDP_standard_request >sta:sc1 paused< UDP_standard_request >sta:sc1 paused< UDP_standard_request >sta:sc1 paused< UDP_standard_request >sta:sc1 paused< UDP_standard_request >sta:sc1 paused< UDP_standard_request >sta:sc1 paused< UDP_cancel: cancel >sta:sc1 downloaded< prestarted POLLS: wait: coda requ1est in progress 1 codaExecute done machine check Exception next instruction address: 0x0012bfb0 Machine Status Register: 0x0010b030 Condition Register: 0x20000048 Task: 0x1d7d7cd0 "CODATCPSRV" -> -> tt 130558 vxTaskEntry +68 : 1d731b3c () 1d731b3c CODAtcpServer +3f8: taskSpawn () 1fe5e8 taskSpawn +64 : taskCreat () 1fe9fc taskInit +1cc: bfill () value = 0 = 0x0 -> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1dffe120 0 PEND 1d47ac 1dffe000 30065 0 tLogTask logTask 1dffb580 0 PEND 1d47ac 1dffb470 0 0 tShell shell 1a4f5610 1 READY 200f7c 1a4f51f0 3d0002 0 tRlogind rlogind 1dc8b600 2 PEND 1fb384 1dc8b1a0 0 0 tWdbTask wdbTask 1dc7d950 3 PEND 1fb384 1dc7d820 0 0 tNetTask netTask 1dd0ec80 50 PEND 1fb384 1dd0eb90 0 0 tPortmapd portmapd 1dc87090 54 PEND 1fb384 1dc86e30 3d0002 0 tTelnetd telnetd 1dc89250 55 PEND 1fb384 1dc890e0 0 0 UDP_LOOP UDP_loop 1a526d60 110 DELAY 2008a0 1a526cb0 0 55 CODATCPSRV CODAtcpServe 1d7d7cd0 150 SUSPEND 12bfb0 1d7d75f0 2b0001 0 ROC coda_roc 1ca236b0 200 PEND 1fb384 1ca22450 3d0002 0 TCP_SERVER tcpServer 1dc0d870 250 PEND 1fb384 1dc0d250 2b0001 0 ROLS_LOOP rols_loop 1a4b8de0 255 DELAY 2008a0 1a4b8d30 2b0001 1 value = 0 = 0x0 -> ................................ ================================= sc1: crashed in the middle of the run interrupt: timer: 26 microsec (min=17 max=84 rms**2=6) interrupt: timer: 26 microsec (min=17 max=84 rms**2=18) interrupt: timer: 26 microsec (min=17 max=84 rms**2=4) interrupt: timer: 26 microsec (min=17 max=84 rms**2=8) interrupt: timer: 26 microsec (min=17 max=84 rms**2=17) interrupt: timer: 26 microsec (min=17 max=84 rms**2=3) machine check Exception next instruction address: 0x1d72bd14 Machine Status Register: 0x0010b030 Condition Register: 0x40000082 Task: 0x1a4b2ed0 "ROLS_LOOP" wait: coda request in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress -> wait: coda request in progress wait: coda request in progress wait: coda request in progress tt 130558 vxTaskEntry +68 : rols_loop () 1d72ba18 rols_loop +b0 : output_proc_network () value = 0 = 0x0 -> wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress tt 130558 vxTaskEntry +68 : rols_loop () 1d72ba18 rols_loop +b0 : output_proc_network () value = 0 = 0x0 -> wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress iwait: coda request in progress NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1dffe120 0 PEND 1d47ac 1dffe000 0 0 tLogTask logTask 1dffb580 0 PEND 1d47ac 1dffb470 0 0 tShell shell 1dc7b630 1 READY 200f7c 1dc7b210 3d0002 0 tRlogind rlogind 1dc8b600 2 PEND 1fb384 1dc8b1a0 0 0 tWdbTask wdbTask 1dc7dc20 3 PEND 1fb384 1dc7daf0 0 0 tNetTask netTask 1dd0ec80 50 PEND 1fb384 1dd0eb90 0 0 tPortmapd portmapd 1dc87090 54 PEND 1fb384 1dc86e30 3d0002 0 tTelnetd telnetd 1dc89250 55 PEND 1fb384 1dc890e0 0 0 UDP_LOOP UDP_loop 1a526d60 110 DELAY 2008a0 1a526cb0 0 18 CODATCPSRV CODAtcpServe 1d7d7ee0 150 DELAY 2008a0 1d7d7950 2b0001 38 twork0018 CODAtcpServe 1d805ec0 150 PEND 1fb384 1d8050a0 0 0 ROC coda_roc 1ca236b0 200 PEND 1fb384 1ca22450 3d0002 0 TCP_SERVER tcpServer 1dc0d780 250 PEND 1fb384 1dc0d160 2b0001 0 ROLS_LOOP rols_loop 1a4b2ed0 255 wait: coda request SUSPENDin progress 1d72bd14 1a4b2a10 0 0 value = 0 = 0x0 -> wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress tt Sorry, traces of my own stack begin at tt (). value = -1 = 0xffffffff -> wait: coda request in progress wait: coda request in progress wait: coda request in progress 130558 vxTaskEntry +68 : shell () 1c7284 shell +190: 1c72b0 () 1c73a4 shell +2b0: ledRead () 1bc644 ledRead +f58: read () 1abe2c read +5c : iosRead () 1ad418 iosRead +c8 : tyRead () 1b7a94 tyRead +60 : semTake () 1fc2f0 semTake +13c: semBTake () tShell restarted. -> wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress ~xwait: coda request in progress wait: coda request in progress wait: coda request in progress undefined symbol: x -> wait: coda request in progress DINC closing... =================================================== # rcn: 540 sec ER3 waiting for predownload etc pretrig2 seems hung: pretrig2> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1bfe600 0 PEND 1c89dc 1bfe4e0 3d0001 0 tLogTask logTask 1bfba60 0 PEND 1c89dc 1bfb950 0 0 tShell shell 18c4540 1 READY 1f5770 18c4120 320002 0 tRlogind rlogind 190efb0 2 PEND 1efb78 190eb50 0 0 tWdbTask wdbTask 19015d0 3 PEND 1efb78 19014a0 0 0 tNetTask netTask 1940730 50 SUSPEND 20 1940520 0 0 tPortmapd portmapd 190aa40 54 PEND 1efb78 190a7e0 3d0002 0 tTelnetd telnetd 190cc00 55 PEND 1efb78 190ca90 0 0 TCP_SERVER tcpServer 18c9580 250 PEND 1efb78 18c8f60 2b0001 0 value = 0 = 0x0 pretrig2> after reboot: pretrig2> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1bfe600 0 PEND 1c89dc 1bfe4e0 0 0 tLogTask logTask 1bfba60 0 PEND 1c89dc 1bfb950 0 0 tShell shell 18c4540 1 READY 1f5770 18c4120 320002 0 tRlogind rlogind 190efb0 2 PEND 1efb78 190eb50 0 0 tWdbTask wdbTask 19015d0 3 PEND 1efb78 19014a0 0 0 tNetTask netTask 1940730 50 PEND 1efb78 1940640 0 0 tPortmapd portmapd 190aa40 54 PEND 1efb78 190a7e0 3d0002 0 tTelnetd telnetd 190cc00 55 PEND 1efb78 190ca90 0 0 TCP_SERVER tcpServer 18c9580 250 PEND 1efb78 18c8f60 2b0001 0 value = 0 = 0x0 pretrig2> memShow status bytes blocks avg block max block ------ ---------- --------- ---------- ---------- current free 22998896 24 958287 20062016 alloc 6182928 5227 1182 - cumulative alloc 7604896 7394 1028 - value = 0 = 0x0 ================================== GOOD: -> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 3bfe600 0 PEND 1db824 3bfe4e0 0 0 tLogTask logTask 3bfba60 0 PEND 1db824 3bfb950 0 0 tShell shell 385c3c0 1 READY 2086e4 385bfa0 3d0002 0 tRlogind rlogind 386c170 2 PEND 202aec 386bd10 0 0 tWdbTask wdbTask 385e790 3 PEND 202aec 385e660 0 0 tNetTask netTask 390f390 50 PEND 202aec 390f2a0 0 0 tPortmapd portmapd 3867c00 54 PEND 202aec 38679a0 3d0002 0 tTelnetd telnetd 3869dc0 55 PEND 202aec 3869c50 0 0 UDP_LOOP UDP_loop 910000 110 DELAY 208008 90ff50 0 59 coda_net write_thread 849c50 110 DELAY 208008 849b50 0 18 CODATCPSRV CODAtcpServe 940f60 150 PEND 202aec 940940 2b0001 0 ROC coda_roc 2603eb0 200 PEND 202aec 2602c50 3d0002 0 TCP_SERVER tcpServer 381b070 250 PEND 202aec 381aa50 2b0001 0 ROLS_LOOP rols_loop 87abb0 255 READY 33edcdc 866590 2b0001 0 value = 0 = 0x0 -> -> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1effe6c0 0 PEND 11f5e4 1effe5a0 0 0 tLogTask logTask 1effbb20 0 PEND 11f5e4 1effba10 0 0 tShell shell 1eb981c0 1 READY 115fcc 1eb97da0 320001 0 tRlogind rlogind 1ebb3850 2 PEND 110670 1ebb33f0 0 0 tWdbTask wdbTask 1ebaa570 3 PEND 110670 1ebaa440 0 0 tNetTask netTask 1ed0f280 50 PEND 110670 1ed0f190 0 0 tPortmapd portmapd 1ebb1690 54 PEND 110670 1ebb1430 3d0002 0 tFtpdTask 873fc 1ebaebc0 55 PEND 110670 1ebaea10 0 0 coda_proc proc_thread 1aede690 110 DELAY 1158f0 1aede580 0 56 coda_net write_thread 1ae64350 110 DELAY 1158f0 1ae64250 0 20 coda_pmc coda_pmc 1e7af8d0 200 DELAY 1158f0 1e7af7d0 3d0002 6 value = 0 = 0x0 -> # dc7 30-mar-2006 interrupt: timer: 27 microsec (min=14 max=613 rms**2=10) interrupt: timer: 28 microsec (min=14 max=613 rms**2=16) interrupt: timer: 27 microsec (min=14 max=613 rms**2=15) interrupt: timer: 27 microsec (min=14 max=613 rms**2=9) interrupt: timer: 27 microsec (min=14 max=613 rms**2=2) interrupt: timer: 27 microsec (min=14 max=613 rms**2=2) interrupt: timer: 27 microsec (min=14 max=613 rms**2=2) machine check Exception next instruction address: 0x1dc07f74 Machine Status Register: 0x00103030 Condition Register: 0x80000084 Task: 0x1a450470 "ROLS_LOOP" DINC closing... clon10:clasrun> -> tt 130558 vxTaskEntry +68 : rols_loop () 1d72b7f8 rols_loop +b0 : output_proc_network () 1d72baf8 output_proc_network+2b0: 1dc074b0 () 1dc074b0 rol1__init +480: 1dc07000 () 1dc07018 test_patt1 +4458: 1dc06c78 () 1dc06f94 test_patt1 +43d4: 1dc05950 () 1dc063b0 test_patt1 +37f0: davetrig (1, 1) value = 0 = 0x0 -> -> # dc5pmc1,dc8pmc1,dc9pmc1 in infinite loop: MTDC Header 0xB800A801 : slot 23 count 1 MTDC Header 0xB000280B : slot 22 count 11 Channel 5 : 016D 0925 0C6C Mux case : 20 -> 1 hits Channel 41 : 0904 0A89 Mux case : 2 -> 0 hits Channel 47 : 0099 09E9 0AE7 Mux case : 21 -> 1 hits Channel 70 : 0251 0B95 MTDC Header 0xA800A81B : slot 21 count 27 Channel 5 : 01DB 0212 09D4 0AEE Mux case : 29 -> 2 hits Channel 20 : 06F3 0ED0 Mux case : 11 -> 1 hits Channel 25 : 0748 1072 Mux case : 12 -> 1 hits Channel 27 : 0779 10A7 Mux case : 12 -> 1 hits Channel 30 : 0678 0FE3 Mux case : 12 -> 1 hits Channel 34 : 06BB 0E79 Mux case : 11 -> 1 hits Channel 35 : 01D3 03D0 0D00 1054 Mux case : 33 -> 1 hits Channel 47 : 01C9 03C5 0508 08A1 Mux case : 7 -> 0 hits Channel 77 : 011D 08D9 Mux case : 11 -> 1 hits Channel 89 : 012E 08D0 MTDC Header 0xA0002805 : slot 20 count 5 Channel 10 : 007B Mux case : 0 -> 0 hits Channel 29 : 0046 096D 0B56 MTDC Header 0x98002804 : slot 19 count 4 Channel 35 : 087C Mux case : 0 -> 0 hits Channel 47 : 0085 0878 MTDC Header 0x9000A807 : slot 18 count 7 Channel 38 : 09AB Mux case : 0 -> 0 hits Channel 40 : 01C0 0991 Mux case : 11 -> 1 hits Channel 44 : 02F6 0ACC Mux case : 11 -> 1 hits Channel 65 : 12DC MTDC Header 0x8000A817 : slot 16 count 23 Channel 0 : 0968 126E Mux case : 12 -> 1 hits Channel 1 : 08E7 125E Mux case : 12 -> 1 hits Channel 18 : 09A1 111A Mux case : 11 -> 1 hits Channel 20 : 0244 09E2 Mux case : 11 -> 1 hits Channel 48 : 094B 125F Mux case : 12 -> 1 hits Channel 50 : 093C 124D Mux case : 12 -> 1 hits Channel 53 : 015D 0A10 0A37 133A Mux case : 28 -> 1 hits Channel 55 : 0976 1259 Mux case : 12 -> 1 hits Channel 58 : 0A9D 124D Mux case : 11 -> 1 hits Channel 66 : 0948 124F MTDC Header 0x7800A80B : slot 15 count 11 Channel 48 : 01FF 036B 0959 09F8 Mux case : 32 -> 1 hits Channel 49 : 0617 0EF7 Mux case : 12 -> 1 hits Channel 50 : 0607 0EE7 Mux case : 12 -> 1 hits Channel 66 : 0202 09FA MTDC Header 0x7000A803 : slot 14 count 3 Channel 0 : 08B6 11B3 MTDC Header 0x6800A803 : slot 13 count 3 Channel 88 : 0869 0AF9 # sc1 crash (apr 5, 2006) sc1 crashed after whole day downtime; sc1pmc1 does not respond; 'roc_reboot' reboots sc1 only, not sc1pmc1; came in the hall: there is red led on sc1pmc1 card; tried to blow on it - did not helped; turn crate off and on - problem is gone; pmc was not hot on touch ... # ec2 crash (apr 7, 2006) looks like it self-rebooted ec2pmc1 messages: .......... write_thread: wait= 5705 send= 9508 microsec per event (nev=13551) proc_thread: wait= 0 send= 15185 microsec per event (nev=11312) proc_thread: wait= 0 send= 15130 microsec per event (nev=11354) write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print write_thread: about to print: 272431 20 write_thread: about to print: 272431 20 write_thread: about to print: 272431 20 write_thread: wait= 4414 send= 11350 microsec per event (nev=13621) data access Exception current instruction address: 0x1e67fc28 Machine Status Register: 0x0000b030 Data Access Register: 0x8a97f13c Condition Register: 0x80000083 Data storage interrupt Register: 0x42000000 Task: 0x1af27aa0 "coda_proc" Download command received -> bignet at 0x1d798138 <- bignet at 0xdd798138 bignet at 0xdd798138, bignet.gbigBuffer at 0xdd798158 -> 0xdd798158 proc_on_pmc=1, net_on_pmc=1 rocID=15 proc: downloading DDL table ... clonbanks reached clonbanks: use file >/usr/local/clas/parms/clonbanks.ddl< adr1(nddl)=0x1eb53370, int: 4 bytes, long: 4 bytes N name (nname) fmt (nfmt) ncol [ 1] >PTRN< (4) >B32< (3) 1 [ 2] >PSYN< (4) >B32< (3) 1 [ 3] >RC00< (4) >B32< (3) 1 [ 4] >RC01< (4) >B32< (3) 1 [ 5] >RC02< (4) >B32< (3) 1 [ 6] >RC03< (4) >B32< (3) 1 [ 7] >RC04< (4) >B32< (3) 1 [ 8] >RC05< (4) >B32< (3) 1 [ 9] >RC06< (4) >B32< (3) 1 [10] >RC07< (4) >B32< (3) 1 [11] >RC08< (4) >B32< (3) 1 [12] >RC09< (4) >B32< (3) 1 [13] >RC10< (4) >B32< (3) 1 [14] >RC11< (4) >B32< (3) 1 [15] >RC12< (4) >B32< (3) 1 [16] >RC13< (4) >B32< (3) 1 [17] >RC14< (4) >B32< (3) 1 [18] >RC15< (4) >B32< (3) 1 .................. so it looks like ec2 was self-rebooted because of error in ec2pmc1 shown above; 'roc_reboot ec2' command was never issued, but next run started fine from 'Reset' ============================================== # ec2 crash (8-apr-2006) in the middle of the run .... interrupt: timer: 26 microsec (min=17 max=85 rms**2=18) interrupt: timer: 26 microsec (min=17 max=85 rms**2=11) interrupt: timer: 26 microsec (min=17 max=85 rms**2=2) interrupt: timer: 26 microsec (min=17 max=85 rms**2=6) interrupt: timer: 26 microsec (min=17 max=85 rms**2=14) interrupt: timer: 26 microsec (min=17 max=85 rms**2=15) interrupt: timer: 26 microsec (min=17 max=85 rms**2=7) interrupt: timer: 26 microsec (min=17 max=85 rms**2=0) interrupt: timer: 26 microsec (min=17 max=85 rms**2=7) interrupt: timer: 26 microsec (min=17 max=85 rms**2=15) interrupt: timer: 26 microsec (min=17 max=85 rms**2=15) interrupt: timer: 26 microsec (min=17 max=85 rms**2=7) interrupt: timer: 26 microsec (min=17 max=85 rms**2=0) interrupt: timer: 26 microsec (min=17 max=85 rms**2=9) interrupt: timer: 26 microsec (min=17 max=85 rms**2=17) machine check Exception next instruction address: 0x1dc08054 Machine Status Register: 0x0010b030 Condition Register: 0x80000084 Task: 0x1a4a3850 "ROLS_LOOP" -> wait: coda request in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress wait: coda request in progress ....... -> wait: coda request in progress i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1dffe120 0 PEND 1d47ac 1dffe000 0 0 tLogTask logTask 1dffb580 0 PEND 1d47ac 1dffb470 0 0 tShell shell 1dc7b360 1 READY 200f7c 1dc7af40 3d0002 0 tRlogind rlogind 1dc8b600 2 PEND 1fb384 1dc8b1a0 0 0 tWdbTask wdbTask 1dc7d950 3 PEND 1fb384 1dc7d820 0 0 tNetTask netTask 1dd0ec80 50 PEND 1fb384 1dd0eb90 3d 0 tPortmapd portmapd 1dc87090 54 PEND 1fb384 1dc86e30 3d0002 0 tTelnetd telnetd 1dc89250 55 PEND 1fb384 1dc890e0 0 0 UDP_LOOP UDP_loop 1a527430 110 DELAY 2008a0wait: coda requ est in progress 1a527380 0 33 CODATCPSRV CODAtcpServe 1d7d8320 150 DELAY 2008a0 1d7d7d90 2b0001 98 twork0063 CODAtcpServe 1d806510 150 PEND 1fb384 1d8056f0 0 0 ROC coda_roc 1ca23d80 200 PEND 1fb384 1ca22b20 3d0002 0 TCP_SERVER tcpServer 1dc0d870 250 PEND 1fb384 1dc0d250 2b0001 0 ROLS_LOOP rols_loop 1a4a3850 255 SUSPEND 1dc08054 1a4a3200 2b0001 0 value = 0 = 0x0 -> wait: coda request in progress .......... -> wait: coda request in progress ioswait: coda request in progress FdSwait: coda request in progress how fd name drv 3 /tyCo/0 1 in out err 4 (socket) 3 5 (socket) 3 6 (socket) 3 7 (socket) 3 8 (socket) 3 9 (socket) 3 10 (socket) 3 11 (socket) 3 value = 32 = 0x20 = ' ' -> wait: coda request in progress wait: coda request in progress ............... # clastrig2 crashed (8-apr-2006) I crashed clastrig2 by typing 'i' twice !!! .............. interrupt: inputs 7-12: 4193 1850 0 3 0 0 interrupt: live time = 94 percent (gated=1841796 ungated=1953103) interrupt: live_corr = 94 percent (gated=1841796 ungated=1953103) 0x8732a0 (ROLS_LOOP): timer: 18 microsec (min=3 max=44268 rms**2=22) UDP_cancel: cancel >ts2: 94 0 0 0 0 0 0 4193 1850 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 4201 1847 0 3 0 0 interrupt: live time = 87 percent (gated=1703575 ungated=1953116) interrupt: live_corr = 87 percent (gated=1703575 ungated=1953116) i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 3bfebd0 0 PEND 1cf9c4 3bfeab0 0 0 tLogTask logTask 3bfc030 0 PEND 1cf9c4 3bfbf20 0 0 tShell shell 38cdef0 1 READY 1fc758 38cdad0 320002 0 tRlogind rlogind 38de1e0 2 PEND 1f6b60 38ddd80 0 0 tWdbTask wdbTask 38d0800 3 PEND 1f6b60 38d06d0 0 0 tNetTask netTask 390f960 50 PEND 1f6b60 390f870 3d 0 tPortmapd portmapd 38d9c70 54 PEND 1f6b60 38d9a10 3d0002 0 tTelnetd telnetd 38dbe30 55 PEND 1f6b60 38dbcc0 0 0 t1 vme_server 38c23b0 100 PEND 1f6b60 38c2200 3d0001 0 UDP_LOOP UDP_loop 2d0a470 110 DELAY 1fc07c 2d0a3c0 0 48 coda_net write_thread 842340 110 DELAY 1fc07c 842240 0 55 FORCE_SYNC ts2syncTask 3897ab0 119 DELAY 1fc07c 3897a00 0 830 CODATCPSRV CODAtcpServe 2d3b3d0 150 PEND 1f6b60 2d3adb0 2b0001 0 ROC coda_roc 2c2c210 200 PEND 1f6b60 2c2afb0 3d0002 0 TCP_SERVER tcpServer 385ce90 250 PEND 1f6b60 385c870 2b0001 0 ROLS_LOOP rols_loop 8732a0 255 READY 87bc98 872d10 0 0 value = 0 = 0x0 -> write_thread: wait= 183 send= 2 microsec per event (nev=10931) i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 3bfebd0 0 PEND 1cf9c4 3bfeab0 0 0 tLogTask logTask 3bfc030 0 PEND 1cf9c4 3bfbf20 0 0 tShell shell 38cdef0 1 READY 1fc758 38cdad0 320002 0 tRlogind rlogind 38de1e0 2 PEND 1f6b60 38ddd80 0 0 tWdbTask wdbTask 38d0800 3 PEND 1f6b60 38d06d0 0 0 tNetTask netTask 390f960 50 PEND 1f6b60 390f870 3d 0 tPortmapd portmapd 38d9c70 54 PEND 1f6b60 38d9a10 3d0002 0 tTelnetd telnetd 38dbe30 55 PEND 1f6b60 38dbcc0 0 0 t1 vme_server 38c23b0 100 PEND 1f6b60 38c2200 3d0001 0 UDP_LOOP UDP_loop 2d0a470 110 DELAY 1fc07c 2d0a3c0 0sysHwInit: PCI_ sysHwInit: LOCAL_MEM_LOCAL_ADRS = 0x00000000 sysHwInit: PCI_AUTOCONFIG_FLAG_OFFSET = 0x00004c00 sysHwInit: PCI_AUTOCONFIG_FLAG=1 sysHwInit 1 sysHwInit 2 ......... # NOT A CRASH: when restarting after crash, dc5pmc1, dc6pmc1 and dc11pmc1 printed in the end of Prestart: .......... 0x1a675990 (coda_proc): bigprocptr=0xcb7dc8d8 offset=0xc0000000 0x1a675990 (coda_proc): bigproc at 0xcb7dc8d8, bigproc.gbigBuffer at 0xcb7dc8f8 -> 0xcb7dc8f8 proc_thread reached bignetptr=0x1d79e5ec offset=0x00000000 >>>>>>>>>>>>>>>> use pid=-1 <<<<<<<<<<<<<<<<< bignetptr=0x1d79e5ec offset=0x00000000 bignetptr=0x1d79e5ec offset=0x00000000 bignet at 0x1d79e5ec, bignet.gbigBuffer at 0x1d79e60c -> 0x1d79e60c WARN: no data - return WARN: no data - return WARN: no data - return WARN: no data - return .............. but everythibg started fine: .............. Go command received: net=1 proc=1 calls 'proc_go', rocid=6 proc_go proc_go proc_go proc_go proc_go ..... regular ------------------------- >>>>>>>>>>>>>>>> use pid=-1 <<<<<<<<<<<<<<<<< 0x1a675990 (coda_proc): timer: 27 microsec (min=4 max=1736 rms**2=3) proc_thread: wait= 1472 send= 36 microsec per event (nev=2574) ........... In the same tine dc7pmc1 complained in 'Go' but started fine as well: ........... 0x1af27aa0 (coda_proc): bigproc at 0xdd797b08, bigproc.gbigBuffer at 0xdd797b28 -> 0xdd797b28 proc_thread reached bignetptr=0x1e6e295c offset=0x00000000 bignetptr=0x1e6e295c offset=0x00000000 bignetptr=0x1e6e295c offset=0x00000000 >>>>>>>>>>>>>>>> use pid=-1 <<<<<<<<<<<<<<<<< bignet at 0x1e6e295c, bignet.gbigBuffer at 0x1e6e297c -> 0x1e6e297c >>>>>>>>>>>>>>>> use pid=-1 <<<<<<<<<<<<<<<<< Go command received: net=1 proc=1 calls 'proc_go', rocid=7 proc_go proc_go proc_go proc_go proc_go ..... regular ------------------------- ERROR1: LINK_sized_write() returns errno=851971 ERROR: write_thread failed (in LINK_sized_write). 0x1af27aa0 (coda_proc): timer: 22 microsec (min=9 max=1042 rms**2=1) proc_thread: wait= 1163 send= 29 microsec per event (nev=3487) 0x1af27aa0 (coda_proc): timer: 22 microsec (min=9 max=1042 rms**2=16) ..................... # NOT A CRASH clastrig2 memory observations: after reboot: after first run: -> memShow status bytes blocks avg block max block ------ ---------- --------- ---------- ---------- current free 4938368 54 91451 2670256 alloc 55665760 7623 7302 - cumulative alloc 63542896 13335 4765 - value = 0 = 0x0 -> after second run: -> memShow status bytes blocks avg block max block ------ ---------- --------- ---------- ---------- current free 4735888 54 87701 2369168 alloc 55868240 7671 7283 - cumulative alloc 64165840 13520 4745 - value = 0 = 0x0 -> -> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 3bfebd0 0 PEND 1cf9c4 3bfeab0 0 0 tLogTask logTask 3bfc030 0 PEND 1cf9c4 3bfbf20 0 0 tShell shell 38cdf20 1 READY 1fc758 38cdb00 320002 0 tRlogind rlogind 38de1e0 2 PEND 1f6b60 38ddd80 0 0 tWdbTask wdbTask 38d0800 3 PEND 1f6b60 38d06d0 0 0 tNetTask netTask 390f960 50 PEND 1f6b60 390f870 3d 0 tPortmapd portmapd 38d9c70 54 PEND 1f6b60 38d9a10 3d0002 0 tTelnetd telnetd 38dbe30 55 PEND 1f6b60 38dbcc0 0 0 t1 vme_server 38c23b0 100 PEND 1f6b60 38c2200 3d0001 0 UDP_LOOP UDP_loop 2d0a470 110 DELAY 1fc07c 2d0a3c0 0 59 coda_net write_thread 8a74d0 110 DELAY 1fc07c 8a73d0 0 51 FORCE_SYNC ts2syncTask 3897ab0 119 DELAY 1fc07c 3897a00 0 442 CODATCPSRV CODAtcpServe 2d3b3d0 150 PEND 1f6b60 2d3adb0 2b0001 0 ROC coda_roc 2c2c210 200 PEND 1f6b60 2c2afb0 3d0002 0 TCP_SERVER tcpServer 385ce90 250 PEND 1f6b60 385c870 2b0001 0 ROLS_LOOP rols_loop 8f0db0 255 READY 389cfb0 8f0820 2b0001 0 value = 0 = 0x0 -> -> checkStack NAME ENTRY TID SIZE CUR HIGH MARGIN ------------ ------------ -------- ----- ----- ----- ------ tExcTask excTask 3bfebd0 7984 288 528 7456 tLogTask logTask 3bfc030 4992 272 1424 3568 tShell shell 38cdf20 19040 1056 4496 14544 tRlogind rlogind 38de1e0 7984 1120 1408 6576 tWdbTask wdbTask 38d0800 7904 304 448 7456 tNetTask netTask 390f960 9984 240 1488 8496 tPortmapd portmapd 38d9c70 9984 608 3184 6800 tTelnetd telnetd 38dbe30 7984 368 512 7472 t1 vme_server 38c23b0 19712 432 1520 18192 UDP_LOOP UDP_loop 2d0a470 199984 176 3456 196528 coda_net write_thread 8a74d0 199984 256 1520 198464 FORCE_SYNC ts2syncTask 3897ab0 5984 176 416 5568 CODATCPSRV CODAtcpServe 2d3b3d0 199312 1568 2864 196448 ROC coda_roc 2c2c210 499312 4704 34640 464672 TCP_SERVER tcpServer 385ce90 19312 1568 3360 15952 ROLS_LOOP rols_loop 8f0db0 199312 1216 4608 194704 INTERRUPT 5008 0 3120 1888 -> # 10-apr-2006 (Pasyuk) event rate dropped to 0, no pulses on scope (?), no visible crashes End worked fine, Prestart-Go worked as well, run started fine the only problem: web runcontrol freeze # 10-apr-2006 - Sergey event rate droped to 3 Hz, bits 7 and 8 are gone, then it came back: ............... UDP_cancel: cancel >ts2: 86 0 0 0 0 0 0 4876 2119 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 4863 2128 0 3 0 0 interrupt: live time = 90 percent (gated=1759152 ungated=1953102) interrupt: live_corr = 90 percent (gated=1759152 ungated=1953102) 0x83b3a0 (ROLS_LOOP): timer: 18 microsec (min=3 max=5737 rms**2=4) UDP_cancel: cancel >ts2: 90 0 0 0 0 0 0 4863 2128 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 4850 2106 0 3 0 0 interrupt: live time = 82 percent (gated=1618771 ungated=1953100) interrupt: live_corr = 82 percent (gated=1618771 ungated=1953100) 0x83b3a0 (ROLS_LOOP): timer: 18 microsec (min=3 max=5737 rms**2=27) write_thread: wait= 159 send= 2 microsec per event (nev=12394) UDP_cancel: cancel >ts2: 82 0 0 0 0 0 0 4850 2106 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 4895 2111 0 3 0 0 interrupt: live time = 89 percent (gated=1750483 ungated=1953108) interrupt: live_corr = 89 percent (gated=1750483 ungated=1953108) UDP_cancel: cancel >ts2: 89 0 0 0 0 0 0 4895 2111 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 475 211 0 3 0 0 interrupt: live time = 97 percent (gated=1905536 ungated=1953087) interrupt: live_corr = 97 percent (gated=1905536 ungated=1953087) UDP_cancel: cancel >ts2: 97 0 0 0 0 0 0 475 211 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 0 interrupt: live time = 98 percent (gated=1923188 ungated=1953102) interrupt: live_corr = 98 percent (gated=1923188 ungated=1953102) UDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 0 interrupt: live time = 98 percent (gated=1923087 ungated=1953102) interrupt: live_corr = 98 percent (gated=1923087 ungated=1953102) write_thread: wait= 3108 send= 2 microsec per event (nev=646) UDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 0 interrupt: live time = 98 percent (gated=1923443 ungated=1953101) interrupt: live_corr = 98 percent (gated=1923443 ungated=1953101) UDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 0 interrupt: live time = 98 percent (gated=1923090 ungated=1953101) interrupt: live_corr = 98 percent (gated=1923090 ungated=1953101) UDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 3530 1541 0 3 0 0 interrupt: live time = 91 percent (gated=1794345 ungated=1953113) interrupt: live_corr = 91 percent (gated=1794345 ungated=1953113) 0x83b3a0 (ROLS_LOOP): timer: 17 microsec (min=3 max=5737 rms**2=38) UDP_cancel: cancel >ts2: 91 0 0 0 0 0 0 3530 1541 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 4622 2030 0 3 0 0 interrupt: live time = 87 percent (gated=1707007 ungated=1953098) interrupt: live_corr = 87 percent (gated=1707007 ungated=1953098) ................. # same , run 51678 ..... interrupt: live time = 99 percent (gated=1946524 ungated=1953107) interrupt: live_corr = 99 percent (gated=1946524 ungated=1953107) write_thread: wait= 158 send= 2 microsec per event (nev=12452) 0x83b3a0 (ROLS_LOOP): timer: 18 microsec (min=3 max=6350 rms**2=17) UDP_cancel: cancel >ts2: 99 0 0 0 0 0 0 4939 2132 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 4884 2119 0 3 0 0 interrupt: live time = 86 percent (gated=1692568 ungated=1953088) interrupt: live_corr = 86 percent (gated=1692568 ungated=1953088) UDP_cancel: cancel >ts2: 86 0 0 0 0 0 0 4884 2119 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 816 355 0 3 0 0 interrupt: live time = 96 percent (gated=1894033 ungated=1953093) interrupt: live_corr = 96 percent (gated=1894033 ungated=1953093) UDP_cancel: cancel >ts2: 96 0 0 0 0 0 0 816 355 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 0 interrupt: live time = 98 percent (gated=1922575 ungated=1953100) interrupt: live_corr = 98 percent (gated=1922575 ungated=1953100) UDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 0 interrupt: live time = 98 percent (gated=1923446 ungated=1953100) interrupt: live_corr = 98 percent (gated=1923446 ungated=1953100) write_thread: wait= 835 send= 2 microsec per event (nev=2403) -> -> -> UDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 interrupt: live time = 98 percent (gated=1923340 ungated=1953102) interrupt: live_corr = 98 percent (gated=1923340 ungated=1953102) -> -> -> -> i NAME ENTRY TID PRI STATUS PC SP ERR ---------- ------------ -------- --- ---------- -------- -------- ---- tExcTask excTask 3bfebd0 0 PEND 1cf9c4 3bfeab0 30 tLogTask logTask 3bfc030 0 PEND 1cf9c4 3bfbf20 tShell shell 8ac930 1 READY 1fc758 8ac510 320 tRlogind rlogind 38de1e0 2 PEND 1f6b60 38ddd80 tWdbTask wdbTask 38d0800 3 PEND 1f6b60 38d06d0 tNetTask netTask 390f960 50 PEND 1f6b60 390f870 tPortmapd portmapd 38d9c70 54 PEND 1f6b60 38d9a10 3d0 tTelnetd telnetd 38dbe30 55 PEND 1f6b60 38dbcc0 t1 vme_server 38c23b0 100 PEND 1f6b60 38c2200 3d0 UDP_LOOP UDP_loop 9318c0 110 DELAY 1fc07c 931810 coda_net write_thread 80a440 110 DELAY 1fc07c 80a340 FORCE_SYNC ts2syncTask 3897ab0 119 DELAY 1fc07c 3897a00 CODATCPSRV CODAtcpServe 2d3b3d0 150 PEND 1f6b60 2d3adb0 2b0 ROC coda_roc 2c2c210 200 PEND 1f6b60 2c2afb0 3d0 TCP_SERVER tcpServer 38bd370 250 PEND 1f6b60 38bcd50 2b0 ROLS_LOOP rols_loop 83b3a0 255 READY 8814ec 83ae60 2b0 value = 0 = 0x0 -> -> memShow status bytes blocks avg block max block ------ ---------- --------- ---------- ---------- current free 4628960 94 49244 2073136 alloc 55975168 8055 6949 - cumulative alloc 77593408 24846 3122 - value = 0 = 0x0 -> UDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 interrupt: live time = 98 percent (gated=1922932 ungated=1953101) interrupt: live_corr = 98 percent (gated=1922932 ungated=1953101) -> checkStack NAME ENTRY TID SIZE CUR HIGH MARGIN ------------ ------------ -------- ----- ----- ----- ------ tExcTask excTask 3bfebd0 7984 288 1376 6608 tLogTask logTask 3bfc030 4992 272 1328 3664 tShell shell 8ac930 19040 1056 4496 14544 tRlogind rlogind 38de1e0 7984 1120 1408 6576 tWdbTask wdbTask 38d0800 7904 304 448 7456 tNetTask netTask 390f960 9984 240 1472 8512 tPortmapd portmapd 38d9c70 9984 608 3184 6800 tTelnetd telnetd 38dbe30 7984 368 512 7472 t1 vme_server 38c23b0 19712 432 1536 18176 UDP_LOOP UDP_loop 9318c0 199984 176 3456 196528 coda_net write_thread 80a440 199984 256 1520 198464 FORCE_SYNC ts2syncTask 3897ab0 5984 176 432 5552 CODATCPSRV CODAtcpServe 2d3b3d0 199312 1568 2864 196448 ROC coda_roc 2c2c210 499312 4704 34640 464672 TCP_SERVER tcpServer 38bd370 99312 1568 3360 95952 ROLS_LOOP rols_loop 83b3a0 199312 1216 4608 194704 INTERRUPT 5008 0 3120 1888 value = 36 = 0x24 = '$' -> -> iosFdSUDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 interrupt: live time = 98 percent (gated=1923339 ungated=1953103) interrupt: live_corr = 98 percent (gated=1923339 ungated=1953103) how fd name drv 3 /tyCo/0 1 in out err 4 (socket) 3 5 (socket) 3 6 (socket) 3 7 (socket) 3 8 (socket) 3 10 (socket) 3 11 (socket) 3 12 (socket) 3 13 (socket) 3 value = 32 = 0x20 = ' ' -> UDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 0 interrupt: live time = 98 percent (gated=1923700 ungated=1953099) interrupt: live_corr = 98 percent (gated=1923700 ungated=1953099) write_thread: wait= 331697 send= 8 microsec per event (nev=6) UDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 3 0 0 interrupt: live time = 98 percent (gated=1923084 ungated=1953100) interrupt: live_corr = 98 percent (gated=1923084 ungated=1953100) wait: coda request in progress codaExecute reached, message >end<, len=3 codaExecute: 'end' transition wait: coda request in progress roc_end reached INIT_NAME: rolp->daproc = 3 UDP_cancel: cancel >ts2: 98 0 0 0 0 0 0 0 0 0 3 0 0 < interrupt: inputs 1-6: 0 0 0 0 0 0 interrupt: inputs 7-12: 0 0 0 1 0 0 interrupt: live time = 98 percent (gated=905748 ungated=919899) interrupt: live_corr = 98 percent (gated=905748 ungated=919899) wait: coda request in progress 0x857d20 (twork0031): TS csr: 0xfc000000 0x857d20 (twork0031): setForceSyncInterval: forceSyncInterval set to 0 0x857d20 (twork0031): INFO: User End Executed INIT_NAME: rolp->daproc = 3 INFO: User End 2 Executed codaUpdateStatus: >UPDATE process SET state='ending' WHERE name='clastrig2'< UDP_standard_request >sta:clastrig2 ending< UDP_standard_request >sta:clastrig2 ending< UDP_standard_request >sta:clastrig2 ending< UDP_standard_request >sta:clastrig2 ending< UDP_standard_request >sta:clastrig2 ending< UDP_standard_request >sta:clastrig2 ending< UDP_cancel: cancel >sta:clastrig2 active< NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! NOW !!!!!!!!!!!!!!!!!!! roc_end done codaExecute done coda_roc: rocp->state == ENDING and input list is empty coda_roc: process rol1 buffer by rol2 completed coda_roc: break loop coda_roc: processing case 'DA_ENDING': last event=22924922 nevents=22924922 # rcn 12-apr-2006 0:15 End-Reset, then reboot sc2,ec3,ec4 (no crashes, just to download new soft), then Download: .................... Waiting for a predownload : 135 sec. Waiting for a predownload : 136 sec. Waiting for a predownload : 137 sec. Waiting for a predownload : 138 sec. Waiting for a predownload : 139 sec. .................... goes forever Reconfigure - the same story Reset again - the same story kill rcn, start again - problem fixed ======================================== # 12-apr-2006 (Sergey) event rate 0, no pulses on scope, web scope works clon10:clasrun> tsconnect clastrig2 Port '/dev/cua/21' flushed. TYPE ~q TO QUIT ------ DINC --- port=/dev/cua/21 ------ 9600 BAUD 8 NONE 1 SWFC=OFF HWFC=OFF CAR=ON DTR=ON RTS=ON CTS=ON DSR=OFF Type ~? for help. -> -> memShow status bytes blocks avg block max block ------ ---------- --------- ---------- ---------- current free 4501120 112 40188 2073136 alloc 56102576 8914 6293 - cumulative alloc 96331232 25787 3735 - value = 0 = 0x0 -> -> -> -> -> ts2status CSR 1 (0x0): Go : 1 Pause on Next scheduled Sync : 0 Sync and Pause : 0 Initiate Sync Event : 0 Initiate Program 1 Event : 1 Initiate Program 2 Event : 0 Enable Level 1 (drives outputs) : 1 Override Inhibit : 0 Test Mode : 0 Reserved : 0 Reset : 0 Initialize : 0 Sync Event occurred : 1 Program 1 Event occurred : 1 Program 2 Event occurred : 0 Late Fail occurred : 0 Inhibit occurred : 1 Write FIFO error occurred : 0 Read FIFO error occurreds : 1 CSR 2 (0x4): Enable Scheduled Sync : 1 Use Clear Permit Timer : 1 Use Front Busy Timer : 1 Use Clear Hold Timer : 1 Use External Front Busy : 1 Lock ROC Branch 1 : 0 Lock ROC Branch 2 : 0 Lock ROC Branch 3 : 0 Lock ROC Branch 4 : 0 Lock ROC Branch 5 : 0 Enable Program 1 front panel input : 0 Enable Interrupt : 1 Enable local ROC (branch 5) : 1 Trigger Control Register (0x8): 0x00000581 ROC Enable Register (0xc) val=0xf07fff0f: Branch 1: 0x f bits: 00001111 Branch 2: 0xff bits: 11111111 Branch 3: 0x7f bits: 01111111 Branch 4: 0xf0 bits: 11110000 Synchronization Interval Register (0x10): 1000 Trigger Word Count Register (0x14): 0 Trigger Data Register (0x18): 64 Local ROC (Branch 5) Data Register (0x1c): 28 Synchronization Flag : 0 Late Fail Flag : 0 ROC code : 7 Input Trigger Prescale Registers: Input 1 Prescale Factor : 0 Input 2 Prescale Factor : 0 Input 3 Prescale Factor : 0 Input 4 Prescale Factor : 0 Input 5 Prescale Factor : 0 Input 6 Prescale Factor : 0 Input 7 Prescale Factor : 0 Input 8 Prescale Factor : 30 Clear Permit Timer Register = 0 Level2 Accept Timer Register = 83 Level3 Accept Timer Register = 0 Front Busy Timer Register = 325 Clear Hold Timer Register = 100 Branch (1-4) ROC Buffer Status Register (0x58): 0x40864040 Branch 1: Buffer Count = 0, Empty Flag = 1, Full Flag = 0 Branch 2: Buffer Count = 0, Empty Flag = 1, Full Flag = 0 Branch 3: Buffer Count = 6, Empty Flag = 0, Full Flag = 1 Branch 4: Buffer Count = 0, Empty Flag = 1, Full Flag = 0 Local ROC (Branch 5) Buffer Status Register (0x5c): 0xffff0040 Branch 5: Buffer Count = 0, Empty Flag = 1, Full Flag = 0,Local Acknowledge = 0, Local Event Strob Status = 0 ROC Acknowledge Status Register (0x60): val=0x770000 Branch 1: 0x 0 bits: 00000000 (enabled: 00001111) Branch 2: 0x 0 bits: 00000000 (enabled: 11111111) Branch 3: 0x77 bits: 01110111 (enabled: 01111111) Branch 4: 0x 0 bits: 00000000 (enabled: 11110000) State Register (0x6C): Level 1 Accept : 1 Start Level 2 Trigger : 0 Level 2 Pass Latched : 0 Level 2 Fail Latched : 0 Level 2 Accept : 1 Start Level 3 Trigger : 0 Level 3 Pass Latched : 0 Level 3 Fail Latched : 0 Level 3 Accept : 1 Clear : 0 Front End Busy (external) : 0 External Inhibit : 0 Latched Trigger : 1 TS Busy : 1 TS Active : 1 TS Ready : 0 Main Sequencer Active : 1 Synchronization Sequencer Active : 0 Program 1 Event Sequencer Active : 1 Program 2 Event Sequencer Active : 0 Event Count (0xc8): 2777 Live 1 Count (0xcc): 74079 Live 2 Count (0xd0): 180774269 value = 1 = 0x1 -> DINC closing... click "End", ec2 ending long time ec2 information: interrupt: timer: 26 microsec (min=17 max=85 rms**2=7) interrupt: timer: 26 microsec (min=17 max=85 rms**2=17) interrupt: timer: 26 microsec (min=17 max=85 rms**2=12) machine check Exception next instruction address: 0x00205290 Machine Status Register: 0x0010b030 Condition Register: 0x20000082 Task: 0x1a4484a0 "ROLS_LOOP" -> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1dffe120 0 PEND 1d5884 1dffe000 0 0 tLogTask logTask 1dffb580 0 PEND 1d5884 1dffb470 0 0 tShell shell 1dc7b110 1 READY 202054 1dc7acf0 3d0002 0 tRlogind rlogind 1dc8b600 2 PEND 1fc45c 1dc8b1a0 0 0 tWdbTask wdbTask 1dc7d950 3 PEND 1fc45c 1dc7d820 0 0 tNetTask netTask 1dd0ec80 50 PEND 1fc45c 1dd0eb90 3d 0 tPortmapd portmapd 1dc87090 54 PEND 1fc45c 1dc86e30 3d0002 0 tTelnetd telnetd 1dc89250 55 PEND 1fc45c 1dc890e0 0 0 UDP_LOOP UDP_loop 1a479a50 110 DELAY 201978 1a4799a0 0 20 CODATCPSRV CODAtcpServe 1a526f60 150 PEND 1fc45c 1a526940 2b0001 0 ROC coda_roc 1ca238b0 200 PEND 1fc45c 1ca22650 3d0002 0 TCP_SERVER tcpServer 1dc3cb40 250 PEND 1fc45c 1dc3c520 2b0001 0 ROLS_LOOP rols_loop 1a4484a0 255 SUSPEND 205290 1a4483b0 2b0001 0 value = 0 = 0x0 -> reboot ec2, start new run # 12-apr-2006 after rebooting sc2,ec3,ec4, rcn downloaded EB and ER but none of ROcs EB2 not responding for 120sec restart rcn, does not help restart again # 21-apr-2006 after recover (rcn was restarted) the same file was written (raid partition was switched during crash from raid0 to raid2): clonxt3:/ssa> ll /mnt/raid0/stage_in/* -rw-r--r-- 1 clasrun 9998 1179648 Apr 20 23:55 /mnt/raid0/stage_in/clas_051797.A00 clonxt3:/ssa> clonxt3:/ssa> ll /mnt/raid2/stage_in/ total 31504 drwxrwxr-x 2 clasrun 9998 11776 Apr 21 00:31 . drwxr-xr-x 4 root root 512 Apr 13 21:24 .. -rw-r--r-- 1 clasrun 9998 15958016 Apr 21 00:29 clas_051797.A00 -rw-r--r-- 1 clasrun 9998 131072 Apr 21 00:35 clas_051798.A00 clonxt3:/ssa> my types during recovery: clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51796 config: PROD13 state: end clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51797 config: PROD13 state: prestart ?msql database run is 51796, disagrees with rcstate file: 51797 clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> roc_status Fri Apr 21 00:20:28 EDT 2006 ...................................^C clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51797 config: PROD13 state: prestart ?msql database run is 51796, disagrees with rcstate file: 51797 clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51797 config: PROD13 state: prestart ?msql database run is 51796, disagrees with rcstate file: 51797 clon10:clasrun> clon10:clasrun> clon10:clasrun> roc_status Fri Apr 21 00:22:04 EDT 2006 ...........................................^C clon10:clasrun> roc_status Fri Apr 21 00:22:11 EDT 2006 ...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... cc1 status - configured clastrig2 status - configured dc1 status - configured dc10 status - configured dc11 status - configured dc2 status - configured dc3 status - configured dc4 status - configured dc5 status - configured dc6 status - configured dc7 status - configured dc8 status - configured dc9 status - configured ec1 status - configured ec2 status - configured ec3 status - configured ec4 status - configured polar status - configured sc1 status - configured sc2 status - configured scaler1 status - configured scaler2 status - booted scaler3 status - configured scaler4 status - configured Fri Apr 21 00:23:59 EDT 2006 clon10:clasrun> roc_status Fri Apr 21 00:24:18 EDT 2006 ....... cc1 status - configured clastrig2 status - configured dc1 status - configured dc10 status - configured dc11 status - configured dc2 status - configured dc3 status - configured dc4 status - configured dc5 status - configured dc6 status - configured dc7 status - configured dc8 status - configured dc9 status - configured ec1 status - configured ec2 status - configured ec3 status - configured ec4 status - configured polar status - configured sc1 status - configured sc2 status - configured scaler1 status - configured scaler2 status - booted scaler3 status - configured scaler4 status - configured Fri Apr 21 00:24:20 EDT 2006 clon10:clasrun> clon10:clasrun> clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51797 config: PROD13 state: prestart ?msql database run is 51796, disagrees with rcstate file: 51797 clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51797 config: PROD13 state: prestart ?msql database run is 51796, disagrees with rcstate file: 51797 clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> roc_status Fri Apr 21 00:26:33 EDT 2006 ....... cc1 status - configured clastrig2 status - configured dc1 status - configured dc10 status - configured dc11 status - configured dc2 status - configured dc3 status - configured dc4 status - configured dc5 status - configured dc6 status - configured dc7 status - configured dc8 status - configured dc9 status - configured ec1 status - configured ec2 status - configured ec3 status - configured ec4 status - configured polar status - configured sc1 status - configured sc2 status - configured scaler1 status - configured scaler2 status - booted scaler3 status - configured scaler4 status - configured Fri Apr 21 00:26:35 EDT 2006 clon10:clasrun> clon10:clasrun> clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51796 config: PROD13 state: download clon10:clasrun> roc_status Fri Apr 21 00:28:15 EDT 2006 ....... cc1 status - downloaded clastrig2 status - downloaded dc1 status - downloaded dc10 status - downloaded dc11 status - downloaded dc2 status - downloaded dc3 status - downloaded dc4 status - downloaded dc5 status - downloaded dc6 status - downloaded dc7 status - downloaded dc8 status - downloaded dc9 status - downloaded ec1 status - downloaded ec2 status - downloaded ec3 status - downloaded ec4 status - downloaded polar status - downloaded sc1 status - downloaded sc2 status - downloaded scaler1 status - downloaded scaler2 status - downloaded scaler3 status - downloaded scaler4 status - downloaded Fri Apr 21 00:28:17 EDT 2006 clon10:clasrun> clon10:clasrun> clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51796 config: PROD13 state: download clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51796 config: PROD13 state: download ?msql database run is 51797, disagrees with rcstate file: 51796 clon10:clasrun> clon10:clasrun> clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51796 config: PROD13 state: download ?msql database run is 51797, disagrees with rcstate file: 51796 clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51796 config: PROD13 state: download ?msql database run is 51797, disagrees with rcstate file: 51796 clon10:clasrun> clon10:clasrun> clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51797 config: PROD13 state: prestart clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51797 config: PROD13 state: go clon10:clasrun> clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51797 config: PROD13 state: go clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51797 config: PROD13 state: go clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51797 config: PROD13 state: end ?msql database run is 51798, disagrees with rcstate file: 51797 clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51798 config: PROD13 state: prestart clon10:clasrun> clon10:clasrun> clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51798 config: PROD13 state: prestart clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> clon10:clasrun> run_status Info for session clasprod (clasprod): current run: 51798 config: PROD13 state: prestart ?msql database run is 51797, disagrees with rcstate file: 51798 clon10:clasrun> ==================================================== 4/27/06 Sergey clasrun's cron jobs are not executed on clon01 file '/var/cron/log' contains following: ............................ ! c queue max run limit reached Thu Apr 27 16:03:00 2006 ! rescheduling a cron job Thu Apr 27 16:03:00 2006 ! c queue max run limit reached Thu Apr 27 16:03:00 2006 ! rescheduling a cron job Thu Apr 27 16:03:00 2006 ! c queue max run limit reached Thu Apr 27 16:03:00 2006 ! rescheduling a cron job Thu Apr 27 16:03:00 2006 ! c queue max run limit reached Thu Apr 27 16:03:00 2006 ! rescheduling a cron job Thu Apr 27 16:03:00 2006 ! c queue max run limit reached Thu Apr 27 16:03:00 2006 ! rescheduling a cron job Thu Apr 27 16:03:00 2006 found a lot of processes like following: ..................................... clasrun 23583 23526 0 Apr 14 ? 0:00 /apps/perl/5.6.0/bin/perl /home/freyberg/INGRES/run_log_begin_update.pl 20 clasrun 10343 10340 0 Apr 14 ? 0:00 /bin/csh -c source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_files_upd clasrun 15630 15574 0 Apr 14 ? 0:00 /apps/perl/5.6.0/bin/perl /home/freyberg/INGRES/run_log_files_update.pl 3 clasrun 16606 16600 0 Apr 14 ? 0:00 /bin/csh -c source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_end_updat clasrun 13409 13404 0 Apr 14 ? 0:00 /bin/csh -c source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_end_updat clasrun 18829 18772 0 Apr 14 ? 0:00 /apps/perl/5.6.0/bin/perl /home/freyberg/INGRES/run_log_files_update.pl 3 clasrun 3770 3712 0 Apr 14 ? 0:00 /apps/perl/5.6.0/bin/perl /home/freyberg/INGRES/run_log_end_update.pl 60 clasrun 13467 13409 0 Apr 14 ? 0:00 /apps/perl/5.6.0/bin/perl /home/freyberg/INGRES/run_log_end_update.pl 60 clasrun 7050 7045 0 Apr 14 ? 0:00 /bin/csh -c source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_end_updat clasrun 29545 1128 0 Apr 14 ? 0:00 sh -c (/bin/csh -c "source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_ clasrun 29426 1128 0 Apr 14 ? 0:00 sh -c (/bin/csh -c "source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_ clasrun 29577 29548 0 Apr 14 ? 0:00 /apps/perl/5.6.0/bin/perl /home/freyberg/INGRES/run_log_files_update.pl 3 clasrun 29432 29426 0 Apr 14 ? 0:00 /bin/csh -c source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_end_updat clasrun 3333 3274 0 Apr 14 ? 0:00 /apps/perl/5.6.0/bin/perl /home/freyberg/INGRES/run_log_begin_update.pl 20 clasrun 2402 2374 0 Apr 14 ? 0:00 /apps/perl/5.6.0/bin/perl /home/freyberg/INGRES/run_log_begin_update.pl 20 clasrun 2371 1128 0 Apr 14 ? 0:00 sh -c (/bin/csh -c "source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_ clasrun 2374 2371 0 Apr 14 ? 0:00 /bin/csh -c source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_begin_upd clasrun 1692 1686 0 Apr 14 ? 0:00 /bin/csh -c source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_files_upd clasrun 2842 1128 0 Apr 14 ? 0:00 sh -c (/bin/csh -c "source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_ clasrun 2845 2842 0 Apr 14 ? 0:00 /bin/csh -c source /apps/ingres/.setup; /home/freyberg/INGRES/run_log_files_upd ............... It looks like the problem is a poorly-behaved job that didn't exit and was getting called many times, thus causing lots of children cron was awaiting. Kill all of them, it should work now. ============================================================= May 4, 2006 Sergey 'Prestart' progress window stuck on '... setting rawbanks ...', sc1 crashed: ......................... UDP_standard_request >sta:sc1 downloaded< UDP_standard_request >sta:sc1 downloaded< UDP_standard_request >sta:sc1 downloaded< UDP_cancel: cancel >sta:sc1 ending< ended after 20249124 events machine check Exception next instruction address: 0x0012d084 Machine Status Register: 0x0010b030 Condition Register: 0x20000045 Task: 0x1dc1e770 "TCP_SERVER" machine check Exception next instruction address: 0x001a9700 Machine Status Register: 0x0010b030 Condition Register: 0x40000042 Task: 0x1a3ffdc0 "UDP_LOOP" and no prompt, it is dead; sc1pmc1 looks good; roc_reboot sc1 blank window is on screen, kill all dpsh's: clon10:clasrun> ps -ef | grep dp root 22162 1483 0 Feb 07 ? 0:00 rpc.nisd_resolv -F -C 10 -p 1073741824 -t udp clasrun 3567 1 0 Apr 21 ? 0:00 /usr/local/clas/release/current/source/coda/SunOS_sun4u/b in/dpsh -f /usr/local/ clasrun 23601 1 0 Apr 21 ? 0:00 /usr/local/clas/release/current/source/coda/SunOS_sun4u/b in/dpsh -f /usr/local/ clasrun 26471 1 0 Apr 21 ? 0:00 /usr/local/clas/release/current/source/coda/SunOS_sun4u/b in/dpsh -f /usr/local/ clasrun 13542 1 0 Apr 21 ? 0:00 /usr/local/clas/release/current/source/coda/SunOS_sun4u/b in/dpsh -f /usr/local/ clasrun 1640 6993 0 15:22:58 pts/43 0:00 grep -i dp clasrun 19324 1 0 Apr 21 ? 0:00 /usr/local/clas/release/current/source/coda/SunOS_sun4u/b in/dpsh -f /usr/local/ clasrun 26712 1 0 Apr 21 ? 0:00 /usr/local/clas/release/current/source/coda/SunOS_sun4u/b in/dpsh -f /usr/local/ clasrun 16687 16655 0 14:57:36 pts/62 0:00 /usr/local/clas/release/current/source/coda/SunOS_sun4u/b in/dpsh -f /usr/local/ clon10:clasrun> kill -9 3567 23601 26471 13542 19324 26712 16687 blank window is gone new run started fine ===================================================== May 4, 2006 Sergey rcn scroll messages after new run was started: .............. Agents are not getting reports for 465 seconds. Agents are not getting reports for 470 seconds. Agents are not getting reports for 475 seconds. Agents are not getting reports for 480 seconds. Agents are not getting reports for 485 seconds. Agents are not getting reports for 490 seconds. Agents are not getting reports for 495 seconds. Agents are not getting reports for 500 seconds. .............. nothing updated on GUI: no event number, rates etc CODA running fine =============================== May 5, 2006 Sergey cc1 stoped: ................. interrupt: timer: 27 microsec (min=19 max=90 rms**2=1) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=19) interrupt: timer: 28 microsec (min=19 max=90 rms**2=33) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=33) write_thread: wait= 99 send= 6 microsec per event (nev=18949) interrupt: timer: 28 microsec (min=19 max=90 rms**2=7) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=18) interrupt: timer: 27 microsec (min=19 max=90 rms**2=26) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=27) interrupt: timer: 28 microsec (min=19 max=90 rms**2=14) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=20) interrupt: timer: 27 microsec (min=19 max=90 rms**2=19) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=17) write_thread: wait= 96 send= 6 microsec per event (nev=19504) interrupt: timer: 27 microsec (min=19 max=90 rms**2=21) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=9) interrupt: timer: 28 microsec (min=19 max=90 rms**2=10) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=1) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=16) interrupt: timer: 27 microsec (min=19 max=90 rms**2=29) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=28) interrupt: timer: 27 microsec (min=19 max=90 rms**2=5) write_thread: wait= 98 send= 7 microsec per event (nev=19142) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=32) interrupt: timer: 27 microsec (min=19 max=90 rms**2=34) 0x82bb00 (ROLS_LOOP): timer: 20 microsec (min=10 max=14819 rms**2=2) program Exception current instruction address: 0x00000038 Machine Status Register: 0x00083030 Condition Register: 0x48000084 Task: 0x82bb00 "ROLS_LOOP" -> -> tt 12ca7c vxTaskEntry +68 : rols_loop () 330c648 rols_loop +b0 : output_proc_network () 330c948 output_proc_network+2b0: 870f80 () 870f80 rol1__init +480: 870ad0 () 870ae8 davetrig_done +3890: 870748 () 870a64 davetrig_done +380c: 86f420 () 86fe90 davetrig_done +2c38: 38 () value = 0 = 0x0 -> i NAME ENTRY TID PRI STATUS PC SP ERR ---------- ------------ -------- --- ---------- -------- -------- ---- tExcTask excTask 3bfe600 0 PEND 1db824 3bfe4e0 tLogTask logTask 3bfba60 0 PEND 1db824 3bfb950 tShell shell 385c1a0 1 READY 2086e4 385bd80 1c0 tRlogind rlogind 386c170 2 PEND 202aec 386bd10 tWdbTask wdbTask 385e790 3 PEND 202aec 385e660 tNetTask netTask 390f390 50 PEND 202aec 390f2a0 tPortmapd portmapd 3867c00 54 PEND 202aec 38679a0 3d0 tTelnetd telnetd 3869dc0 55 PEND 202aec 3869c50 UDP_LOOP UDP_loop 8f62f0 110 DELAY 208008 8f6240 coda_net write_thread 7faba0 110 DELAY 208008 7faaa0 CODATCPSRV CODAtcpServe 940f80 150 PEND 202aec 940960 2b0 ROC coda_roc 265d9d0 200 PEND 202aec 265c770 3d0 TCP_SERVER tcpServer 910020 250 PEND 202aec 90fa00 2b0 ROLS_LOOP rols_loop 82bb00 255 SUSPEND 38 82b500 2b0 value = 0 = 0x0 -> ======================================== May 6, 2006 Sergey sc1 crash: interrupt: timer: 31 microsec (min=16 max=89 rms**2=8) interrupt: timer: 31 microsec (min=16 max=89 rms**2=5) interrupt: timer: 31 microsec (min=16 max=89 rms**2=5) interrupt: timer: 31 microsec (min=16 max=89 rms**2=4) interrupt: timer: 31 microsec (min=16 max=89 rms**2=4) interrupt: timer: 31 microsec (min=16 max=89 rms**2=3) interrupt: timer: 31 microsec (min=16 max=89 rms**2=3) interrupt: timer: 31 microsec (min=16 max=89 rms**2=1) interrupt: timer: 31 microsec (min=16 max=89 rms**2=1) interrupt: timer: 31 microsec (min=16 max=89 rms**2=0) interrupt: timer: 31 microsec (min=16 max=89 rms**2=2) machine check Exception next instruction address: 0x1a495ef4 Machine Status Register: 0x0010b030 Condition Register: 0x80000084 Task: 0x1a458c90 "ROLS_LOOP" -> tt 13162c vxTaskEntry +68 : rols_loop () 1d72b818 rols_loop +b0 : output_proc_network () 1d72bc10 output_proc_network+3a8: 1a49504c () 1a49504c rol2_tt__init +480: 1a494b9c () 1a494bb4 StartOfEvent +1b24: 1a494814 () 1a494b30 StartOfEvent +1aa0: 1a493430 () 1a493f4c StartOfEvent +ebc: davetrig (1, 1) value = 0 = 0x0 -> i NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask excTask 1dffe120 0 PEND 1d5884 1dffe000 0 0 tLogTask logTask 1dffb580 0 PEND 1d5884 1dffb470 0 0 tShell shell 1dc7b360 1 READY 202054 1dc7af40 1c0001 0 tRlogind rlogind 1dc8b600 2 PEND 1fc45c 1dc8b1a0 0 0 tWdbTask wdbTask 1dc7d950 3 PEND 1fc45c 1dc7d820 0 0 tNetTask netTask 1dd0ec80 50 PEND 1fc45c 1dd0eb90 3d 0 tPortmapd portmapd 1dc87090 54 PEND 1fc45c 1dc86e30 3d0002 0 tTelnetd telnetd 1dc89250 55 PEND 1fc45c 1dc890e0 0 0 UDP_LOOP UDP_loop 1a5058a0 110 DELAY 201978 1a5057f0 0 79 CODATCPSRV CODAtcpServe 1a536800 150 PEND 1fc45c 1a5361e0 2b0001 0 ROC coda_roc 1ca33150 200 PEND 1fc45c 1ca31ef0 3d0002 0 TCP_SERVER tcpServer 1dc1bf10 250 PEND 1fc45c 1dc1b8f0 2b0001 0 ROLS_LOOP rols_loop 1a458c90 255 SUSPEND 1a495ef4 1a458640 2b0001 0 value = 0 = 0x0 -> reboot sc1, restart EB, start new run, everything fine ====================================================== May 7, 2006 Sergey ec3 hung, no messages on telnet window roc_reboot does not work 69 switch hung on FC, cannot reboot any VME crates on FC, on SF it works (another switch) recycle power on switch - everything good now: roc_reboot works, ec3 and ec3pmc1 booted fine !!!!!!!!!!!!!!!!!!!!! SWITCH NEED LABEL !!!!!!!!!!!!!!!!! while working with switch, clonxt3 self rebooted; /mnt/raid1 was moving to SILO, it didnot mount, fsck was run: clonxt3:/ssa> mount /mnt/raid1 mount: The state of /dev/dsk/c6t600C0FF00000000009839205A879C400d0s0 is not okay and it was attempted to be mounted read/write mount: Please run fsck and try again clonxt3:/ssa> fsck /mnt/raid1 ** /dev/rdsk/c6t600C0FF00000000009839205A879C400d0s0 ** Last Mounted on /mnt/raid1 ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups FILE SYSTEM STATE IN SUPERBLOCK IS WRONG; FIX? y 159 files, 316797475 used, 387415200 free (16 frags, 48426898 blocks, 0.0% fragmentation) clonxt3:/ssa> ec3 hung again !!!!!!!! this time roc_reboot works remount raid partitions on clonxt3, restart daq from scratch, start run, everything looks normal ============================================================= May 8, 2006 Sergey stadis does not update livetime; found that it takes data from $CLON_PARMS/scalers/archive/scalers_clasprod_XXXXXX.txt file. try to restart different scaler/stat/etc servers without success found that rtserver on clon10 takes 12% cpu time, restart it restart different scaler/stat/etc servers again livetime updated now ============================================================== 12-may-2006 Sergey 9am EG4A is over !!! switch to old runcontrol; EB0->rcServer UDP does not work, rcServer did not updated port number field in msql 'process' table; notice that rcServer was running since May 8; kill it and start runcontrol again - this time it works; EB0 was never restarted; probably rcServer did not expected UDP port to be erased in msql ... ==============================================================