CLAS DAQ upgrade
Part3: CLAS online cluster and network
V.Sapunenko, S.Boyarinov
(CLAS online)
P.Letta (Computer Center)

Why to upgrade ?
Low clon10/clon00 performance
Network problems
Old equipment (RAID)
More network connections are needed

Goals
Make it possible to increase event rate for entire CLAS up to 10kHz at data rate up to 100MB/sec
Provide dual-cpu Readout Controllers with Gbit Ethernet Network and Serial links
Provide new network for Ethernet-controlled slow control devices (VME crates, scopes, High Voltage Mainframes etc)

New network equipment
Two Foundry Edgeiron 24G-A switches on 68 subnet: 24 Gbit ports, 4 Gbit uplinks (one in use)
Two Foundry Edgeiron 2402CF switches on 69 subnet: 24 100BaseT ports, 2 Gbit uplinks (one in use)
24-pair optical cable Counting Room - Forward Carriage
Inter-level patch panels on Space Frame and Forward Carriage
Optical converters with auto negotiation
Ethernet and Serial wires

New Hall B Network
New UNIX cluster components
Clonxt1: SunFire V40z Opteron Quad Server
Clonxt2: SuperServer 8042 Xeon Quad Server
Clonxt3: SunFire X4200 Opteron dual-core dual-cpu Server
Raid: Sun StorEdge 3510 2.7TB RAID5 UFS system
All clonxtÕs have default Ethernet adapter on subnet 167 for general use and two designated adapters on subnets 68 and 69 for DAQ purposes only
Opteron-based servers have 4 hours hardware support

OS configuration
RHEL4 was installed on all three servers; performance was good including RAID system
CLAS CODA was tested, some problems observed: Event Recorder was crashing occasionally, and debugger was useless
Clonxt2 and Clonxt3 moved to Solaris10 with Studio11 compilers
RAID system performance problem observed under Solaris, solved by patching
Huge delays in Ôdf -kÕ command were temporary fixed by turning off logging
Final configuration includes Linux and Solaris machines, will run this way for a while
New ZFS file system will be installed and tested on RAID system when available

DAQ structure
Primary Readout Controller CPUs are connected directly to BigIron switch over 100BaseT optical links
Secondary Readout Controller CPUs (PMCs) are connected to BigIron switch through two Gbit switches with 1 uplink (can be up to 4)
Event Builder is running on clonxt1 under Linux
Event Recorder is running on clonxt3 under Solaris; RAID system is mounted on clonxt3
Runcontrol is running on clon10, as well as other service procedures
Clonxt2 is reserved for Event Builder or Event Recorder
DAQ scheme (next slide)

Slide 9
Project in numbers
4 switches
1 terminal server
6 patch panels, 2 fiber patch panels
About 60 Ethernet and serial wires
3 UNIX servers
One raid system
Overall cost over $200k

Multithreaded Event Building
Event Builder was redesigned to utilize multi-processor computers: in addition to multithreaded input part it has multithreaded building part
Current CLAS dataflow does not require multithreaded event building, clonxt1 is fast enough to build up to 100MB/sec, so single-threaded version of Event Builder is in use

Project timeline
Jan 2004: decision to upgrade CLON cluster to satisfy EG3 run requirements
Feb - Jun 2004: clon10 performance tests; drops in data rate were observed in long-running tests, and big data rate fluctuations observed as well; no explanations were received from Sun
Jun - Aug 2004: Xeon server from CC and v880 sparc server from Sun were tested; Xeon works as expected, performance is good; v880 performance was better then on clon10 but same problems were observed
Sep - Oct 2004: received v40z from Sun for testing; machine shows good performance, no any indications of previously observed problems
Nov 2004: CLAS CODA tests completed on both Xeon and Opteron machines, 32-bit Solaris is the only OS where CODA runs without any problems; decision to buy intel-style servers; Xeon vs Opteron discussion
Nov - Dec 2004: two quad servers (Xeon and Opteron) received and installed under 32-bit Solaris 9
Dec 2004: Opteron-based SunFire v40z server under Solaris 9 is used as Event Building machine in EG3 run
Jan - Oct 2005: CLAS CODA redesign for multiplatform support
Aug 2005: decision to replace RAID system
Oct 2005: new RAID system received
Dec 2005: third server (X4200) received
Nov 2005 - Feb 2006: servers OS installation, RAID configuring, CLAS CODA testing
Feb 2006: project completed, EG4 run started using new CLON cluster

Conclusion
CLON cluster is equipped with new hardware and ready to run until 12GeV upgrade time
Maximum data rate achieved so far in real run conditions is 35MB/sec and is limited by other CLAS components
Data rate limit measured in test mode is <to be measured>

References
CLAS DAQ paper
CODA paper

Acknowledgments
CODA group
Computer Center