Scratch: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
Boiarino (talk | contribs)
No edit summary
No edit summary
 
(22 intermediate revisions by 7 users not shown)
Line 1: Line 1:
===
test setup error aug 31, 2011:
...
0x9478760 (ROLS_LOOP): INFO: User Go ...
proc_thread: waiting= 31106 processing=     234 microsec per event (nev=59)
net_thread:  waiting=  21361    sending=      2 microsec per event (nev=108)
proc_thread: waiting=  12613 processing=    106 microsec per event (nev=66)
proc_thread: waiting=  13237 processing=    137 microsec per event (nev=66)
net_thread:  waiting=  13001    sending=      1 microsec per event (nev=129)
proc_thread: waiting=  12753 processing=    110 microsec per event (nev=65)
proc_thread: waiting=  13227 processing=    106 microsec per event (nev=66)
0x9478760 (ROLS_LOOP): tdc1190ReadBoardDmaDone: WRONG: nbytes_save[4]=176, res=0 => mbytes=176
0x9478760 (ROLS_LOOP): [ 4] ERROR: tdc1190ReadEvent[Dma] returns -2
0x9478760 (ROLS_LOOP): [ 5] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): [ 6] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): [ 7] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): [ 8] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 0] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 1] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 2] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 3] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 4] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 5] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 6] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 7] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 8] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): [ 0] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): [ 1] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): [ 2] ERROR: tdc1190ReadEvent[Dma] returns 0
...............


'''Sergey Boyarinov's TODO list'''
Sergey Kuleshov Aug 15 2011: send Ben's schematic for DC TDC, and mentioned young engineers from Chile for CLAS disassemble/CLAS12 assemble 


- PCAL stest setup (with Sergey P.)
timeout 2: 9599514 + 64 = 9599578
timeout: 9599579 9599578
SEND_BUFFER_ROC 4
timeout 2: 9599579 + 64 = 9599643
timeout: 9599644 9599643
SEND_BUFFER_ROC 4
interrupt: SEND_BUFFER_ROL1
timeout 2: 9599644 + 64 = 9599708
timeout: 9599709 9599708
SEND_BUFFER_ROC 4
interrupt: SEND_BUFFER_ROL1
attempt to send short buffer failed !!!
timeout 2: 9599709 + 64 = 9599773
timeout: 9599774 9599773
SEND_BUFFER_ROC 4
timeout 2: 9599774 + 64 = 9599838
ERROR1: LINK_sized_write() returns errno=851971 (cc=-1, sizeof(nbytes)=4(104040), netlong=104040)
ERROR: net_thread failed (in LINK_sized_write).
timeout: 9599839 9599838
SEND_BUFFER_ROC 4
ERROR: big0.failure=0, big1.failure=1
interrupt: SEND_BUFFER_ROL1
0x9247630 (coda_proc): timer: 2 microsec (min=0 max=601 rms**2=1)
0x9247630 (coda_proc): timer: 2 microsec (min=0 max=601 rms**2=6)
interrupt: SEND_BUFFER_ROL1


- get TIBCO license


- equipment list with DB, lebles (with Sergey P.)
-------------------------


- JLAB discriminators: ask Volker to push it
Quota check:


- need 1881M ADCs, at least few modules
http://cc.jlab.org/cgi-bin/quotacheck.cgi


- test sy527 which arrived from repair (with George Jacobs)
Nerses:


- buy labels for both labeling machines
Home number is +374 10 425049


- ET system debugging (with Carl)
Cell phone +374 91 206 217


clonpc7:/etc> ifconfig -a
-----------
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        inet6 fe80::20d:93ff:fe65:ba7e%en0 prefixlen 64 scopeid 0x4
        inet 129.57.68.7 netmask 0xffffff00 broadcast 129.57.68.255
        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
        ether 00:0d:93:65:ba:7e
        media: autoselect (100baseTX <full-duplex>) status: active
        supported media: none autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP
    <full-duplex,hw-loopback> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback>
en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        ether 00:11:24:a1:d0:47
        media: autoselect (<unknown type>) status: inactive
        supported media: autoselect
fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 2030
        lladdr 00:0d:93:ff:fe:65:ba:7e
        media: autoselect <full-duplex> status: inactive
        supported media: autoselect <full-duplex>


ET system was confused:
clon02:/etc> ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 131.225.70.20 netmask ff000000
age0: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 131.225.70.20 netmask fffffc00 broadcast 131.225.255.255
        ether 0:a0:80:0:52:e5
eri0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 129.57.167.2 netmask ffffff00 broadcast 129.57.167.255
        ether 0:3:ba:1d:9b:c0
clon02:/etc>


clonpc7:et> et_start -n 100 -s 10000 -f /tmp/test3
et_netinfo reached
et_netinfo: fully qualified default hostname >clonpc7.jlab.org<


ifi->ifi_flags = 0xffff8049
-------
LOOPBACK INTERFACE
INTERFACE IS UP
ifi->ifi_addr = 0x00300300
hptr = 0x00300190
addr_in->sin_addr = 127.0.0.1,


ifi->ifi_flags = 0xffff8863
SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major
  INTERFACE IS UP
EVENT-TIME: Thu Mar 5 17:41:00 EST 2009
ifi->ifi_addr = 0x00300350
PLATFORM: SUNW,Netra-240, CSN: -, HOSTNAME: clon10
hptr = 0x00300190
SOURCE: zfs-diagnosis, REV: 1.0
  addr_in->sin_addr = 129.57.68.7,
EVENT-ID: 2f3d5430-2bbb-c8aa-87d5-f79b44f89edf
DESC: The ZFS pool has experienced currently unrecoverable I/O failures. Refer to http://sun.com/msg/ZFS-8000-HC for more information.
AUTO-RESPONSE: No automated response will be taken.
IMPACT: Read and write I/Os cannot be serviced.
REC-ACTION:


ifi->ifi_flags = 0xffff8863
The pool has experienced I/O failures. Since the ZFS pool property 'failmode'
  INTERFACE IS UP
is set to 'wait', all I/Os (reads and writes) are blocked. See the zpool(1M)
  ifi->ifi_addr = 0x003003a0
manpage for more information on the 'failmode' property. Manual intervention
  hptr = 0x00000000
is required for I/Os to be serviced. You can see which devices are
addr_in->sin_addr = 192.168.2.1,
affected by running 'zpool status -x':


et_netinfo: address = 129.57.68.7
et_netinfo: error in gethostbyaddr
we've got 192.168.2.1, do not believe it is true
et_netinfo: address = 129.57.68.7
removing file >/tmp/test3<
file >/tmp/test3< removed
et_udpreceive: port=11111
et_udpreceive: port=11112
et_udpreceive: port=11112
et_udpreceive: port=11112
ET user library >/usr/local/clas/devel/coda/Darwin_powerpc/lib/libet_user.so< will be used




to eliminate alias ''192.168.2.1'' following command was used:
# zpool status -x
  pool: test
state: FAULTED
status: There are I/O failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
  see: http://www.sun.com/msg/ZFS-8000-HC
scrub: none requested
config:


  ifconfig en0 -alias 192.168.2.1
        NAME        STATE    READ WRITE CKSUM
        test        FAULTED      0    13    0 insufficient replicas
          c0t0d0    FAULTED      0    7    0  experienced I/O failures
          c0t1d0    ONLINE      0    0    0


Now it looks better:
errors: 1 data errors, use '-v' for a list


clonpc7:/etc> et_start -n 100 -s 10000 -f /tmp/test3
et_netinfo reached
et_netinfo: fully qualified default hostname >clonpc7.jlab.org<


ifi->ifi_flags = 0xffff8049
LOOPBACK INTERFACE
INTERFACE IS UP
ifi->ifi_addr = 0x00300300
hptr = 0x00300190
addr_in->sin_addr = 127.0.0.1,


ifi->ifi_flags = 0xffff8863
After you have made sure the affected devices are connected, run 'zpool clear'
INTERFACE IS UP
to allow I/O to the pool again:
ifi->ifi_addr = 0x00300350
hptr = 0x00300190
addr_in->sin_addr = 129.57.68.7,


et_netinfo: address = 129.57.68.7
removing file >/tmp/test3<
file >/tmp/test3< removed
et_udpreceive: port=11111
et_udpreceive: port=11112
ET user library >/usr/local/clas/devel/coda/Darwin_powerpc/lib/libet_user.so< will be used




- NTP servers on Solaris, update Solaris post-install page
# zpool clear test


- replug AC power to emergency generators


- learn, start and test auto-shutdown software on clons


- on Nerses's request: install S99caRepeater and S99logServer scripts,
If I/O failures continue to happen, then applications and commands  for the
update corresponding procedure for Solaris (ask Paul Letta if necessary)
pool may hang.  At this point, a reboot may be necessary to allow I/O to the
pool again.


- fix colors for clasrun accounts (and others ?) on clons


- order 2 more discriminators for DVCSCAL
----


- make paper 'DVCSCAL trigger system' and pass it to Chris and Ben
mgetty


'''Sergey Boyarinov's COMPLETED list'''
Sergey --- does this help you ?  Today marked the 3rd week since this
case was opened....


===
Paul
 
 
 
-------- Original Message --------
Subject:        CASE 66172087
Date:        Wed, 28 Jan 2009 12:39:51 -0500
From:        Roland 'butch' Morrissette - Sun Microsystems
<Roland.Morrissette@sun.com>
To:        letta@jlab.org
 
 
 
Paul
 
Received this from engineering. Hopefully this is of some helpfull.
 
xdm is creating $HOME/.Xauthority file, xauth only reads it. In
$HOME/.Xdefaults file try adding the following entry as an alternate
location to see if it works.
*
 
DisplayManager.DISPLAY.userAuthDir
 
*/DISPLAY should be the actual display, like
 
/DisplayManager.host1:0.0.userAuthDir: /path/to/alternate/file
 
*
*(If this does not work and still $HOME/.Xauthority file gets written
try changing the permissions on $HOME/.Xauthority file to readonly)
 
Regards,
 
--
 
Roland 'Butch' Morrissette
Sun Service OS Support
Sun Microsystems, Inc.
 
phone:  (781) 442-7112
email:  roland.morrissette@sun.com
(800)USA-4SUN (Reference your Case Id #)
 
My Working Hours : 8am-4pm ET, Monday thru Friday
My Manager's Email: dawn.ball@sun.com
 
 
 
 
 
 
 
5126
303-6644
 
CLAS12 DC (Mac 14-nov-2007): 2 stereo, +-6 degrees, good resolution (1% dp/p, 1 mrad angle), six 6-layer superlayers, 112 wires per layer; reconstruction improvements: use double hits, use segment angle in road dictionary, early l-r ambig. resolution (now it resolved locally and then corrected after track reconstructed ?), find tracks with no TOF hit and cut off accidental tracks (using residials ?), derive off-diagonal terms in error matrix
 
 
SVT readout:
 
SVX4 - old chip
132ns clock, 40pipeline cells ->5.2us trigger latency; can select readout window like pipeline TDCs (position defined +-132us, window size 132ns fixed)
initial part stored upto 4 events, they are rotating internally
32 bits per hit, 128 channels per chip, ... -> 3.2us per chip to get data from the chip to the buffer of 512 full events
L2 pipeline 16us
L1 latency from Amrit is 3us, Amrit will try to make it 4us
Use the same clock as entire CLAS12 trigger system (256MHz)
 
FSSR 125ns instead of 132ns (built for BTev, never used)
self-triggering, do not need L1ACCEPT, L2 pipe 16us can be implemented
 
SVX4 goes to review !!!
 
Generic DAQ drawing will be sent to Amrit in April

Latest revision as of 11:08, 31 August 2011

test setup error aug 31, 2011:

...
0x9478760 (ROLS_LOOP): INFO: User Go ... 
proc_thread: waiting=  31106 processing=     234 microsec per event (nev=59)
net_thread:  waiting=  21361    sending=      2 microsec per event (nev=108)
proc_thread: waiting=  12613 processing=    106 microsec per event (nev=66)
proc_thread: waiting=  13237 processing=    137 microsec per event (nev=66)
net_thread:  waiting=  13001    sending=      1 microsec per event (nev=129)
proc_thread: waiting=  12753 processing=    110 microsec per event (nev=65)
proc_thread: waiting=  13227 processing=    106 microsec per event (nev=66)
0x9478760 (ROLS_LOOP): tdc1190ReadBoardDmaDone: WRONG: nbytes_save[4]=176, res=0 => mbytes=176
0x9478760 (ROLS_LOOP): [ 4] ERROR: tdc1190ReadEvent[Dma] returns -2
0x9478760 (ROLS_LOOP): [ 5] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): [ 6] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): [ 7] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): [ 8] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 0] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 1] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 2] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 3] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 4] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 5] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 6] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 7] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): tdc1190ReadStart: [ 8] not ready ! (nev=0)
0x9478760 (ROLS_LOOP): [ 0] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): [ 1] ERROR: tdc1190ReadEvent[Dma] returns 0
0x9478760 (ROLS_LOOP): [ 2] ERROR: tdc1190ReadEvent[Dma] returns 0
...............

Sergey Kuleshov Aug 15 2011: send Ben's schematic for DC TDC, and mentioned young engineers from Chile for CLAS disassemble/CLAS12 assemble

timeout 2: 9599514 + 64 = 9599578

timeout: 9599579 9599578
SEND_BUFFER_ROC 4
timeout 2: 9599579 + 64 = 9599643
timeout: 9599644 9599643
SEND_BUFFER_ROC 4
interrupt: SEND_BUFFER_ROL1
timeout 2: 9599644 + 64 = 9599708
timeout: 9599709 9599708
SEND_BUFFER_ROC 4
interrupt: SEND_BUFFER_ROL1
attempt to send short buffer failed !!!
timeout 2: 9599709 + 64 = 9599773
timeout: 9599774 9599773
SEND_BUFFER_ROC 4
timeout 2: 9599774 + 64 = 9599838
ERROR1: LINK_sized_write() returns errno=851971 (cc=-1, sizeof(nbytes)=4(104040), netlong=104040)
ERROR: net_thread failed (in LINK_sized_write).
timeout: 9599839 9599838
SEND_BUFFER_ROC 4
ERROR: big0.failure=0, big1.failure=1
interrupt: SEND_BUFFER_ROL1
0x9247630 (coda_proc): timer: 2 microsec (min=0 max=601 rms**2=1)
0x9247630 (coda_proc): timer: 2 microsec (min=0 max=601 rms**2=6)
interrupt: SEND_BUFFER_ROL1



Quota check:

http://cc.jlab.org/cgi-bin/quotacheck.cgi

Nerses:

Home number is +374 10 425049

Cell phone +374 91 206 217


clon02:/etc> ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
       inet 131.225.70.20 netmask ff000000 
age0: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
       inet 131.225.70.20 netmask fffffc00 broadcast 131.225.255.255
       ether 0:a0:80:0:52:e5 
eri0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
       inet 129.57.167.2 netmask ffffff00 broadcast 129.57.167.255
       ether 0:3:ba:1d:9b:c0 
clon02:/etc> 



SUNW-MSG-ID: ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major EVENT-TIME: Thu Mar 5 17:41:00 EST 2009 PLATFORM: SUNW,Netra-240, CSN: -, HOSTNAME: clon10 SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 2f3d5430-2bbb-c8aa-87d5-f79b44f89edf DESC: The ZFS pool has experienced currently unrecoverable I/O failures. Refer to http://sun.com/msg/ZFS-8000-HC for more information. AUTO-RESPONSE: No automated response will be taken. IMPACT: Read and write I/Os cannot be serviced. REC-ACTION:

The pool has experienced I/O failures. Since the ZFS pool property 'failmode' is set to 'wait', all I/Os (reads and writes) are blocked. See the zpool(1M) manpage for more information on the 'failmode' property. Manual intervention is required for I/Os to be serviced. You can see which devices are affected by running 'zpool status -x':


  1. zpool status -x
 pool: test
state: FAULTED

status: There are I/O failures. action: Make sure the affected devices are connected, then run 'zpool clear'.

  see: http://www.sun.com/msg/ZFS-8000-HC
scrub: none requested

config:

       NAME        STATE     READ WRITE CKSUM
       test        FAULTED      0    13     0  insufficient replicas
         c0t0d0    FAULTED      0     7     0  experienced I/O failures
         c0t1d0    ONLINE       0     0     0

errors: 1 data errors, use '-v' for a list


After you have made sure the affected devices are connected, run 'zpool clear' to allow I/O to the pool again:


  1. zpool clear test


If I/O failures continue to happen, then applications and commands for the pool may hang. At this point, a reboot may be necessary to allow I/O to the pool again.



mgetty

Sergey --- does this help you ? Today marked the 3rd week since this case was opened....

Paul



Original Message --------

Subject: CASE 66172087 Date: Wed, 28 Jan 2009 12:39:51 -0500 From: Roland 'butch' Morrissette - Sun Microsystems <Roland.Morrissette@sun.com> To: letta@jlab.org


Paul

Received this from engineering. Hopefully this is of some helpfull.

xdm is creating $HOME/.Xauthority file, xauth only reads it. In $HOME/.Xdefaults file try adding the following entry as an alternate location to see if it works.

DisplayManager.DISPLAY.userAuthDir

  • /DISPLAY should be the actual display, like

/DisplayManager.host1:0.0.userAuthDir: /path/to/alternate/file

  • (If this does not work and still $HOME/.Xauthority file gets written

try changing the permissions on $HOME/.Xauthority file to readonly)

Regards,

--

Roland 'Butch' Morrissette Sun Service OS Support Sun Microsystems, Inc.

phone: (781) 442-7112 email: roland.morrissette@sun.com (800)USA-4SUN (Reference your Case Id #)

My Working Hours : 8am-4pm ET, Monday thru Friday My Manager's Email: dawn.ball@sun.com




5126 303-6644

CLAS12 DC (Mac 14-nov-2007): 2 stereo, +-6 degrees, good resolution (1% dp/p, 1 mrad angle), six 6-layer superlayers, 112 wires per layer; reconstruction improvements: use double hits, use segment angle in road dictionary, early l-r ambig. resolution (now it resolved locally and then corrected after track reconstructed ?), find tracks with no TOF hit and cut off accidental tracks (using residials ?), derive off-diagonal terms in error matrix


SVT readout:

SVX4 - old chip 132ns clock, 40pipeline cells ->5.2us trigger latency; can select readout window like pipeline TDCs (position defined +-132us, window size 132ns fixed) initial part stored upto 4 events, they are rotating internally 32 bits per hit, 128 channels per chip, ... -> 3.2us per chip to get data from the chip to the buffer of 512 full events L2 pipeline 16us L1 latency from Amrit is 3us, Amrit will try to make it 4us Use the same clock as entire CLAS12 trigger system (256MHz)

FSSR 125ns instead of 132ns (built for BTev, never used) self-triggering, do not need L1ACCEPT, L2 pipe 16us can be implemented

SVX4 goes to review !!!

Generic DAQ drawing will be sent to Amrit in April