CLAS DAQ upgrade |
Part3: CLAS online cluster and network | |
V.Sapunenko, S.Boyarinov | |
(CLAS online) | |
P.Letta (Computer Center) |
Why to upgrade ? |
Low clon10/clon00 performance | |
Network problems | |
Old equipment (RAID) | |
More network connections are needed |
Goals |
Make it possible to increase event rate for entire CLAS up to 10kHz at data rate up to 100MB/sec | |
Provide dual-cpu Readout Controllers with Gbit Ethernet Network and Serial links | |
Provide new network for Ethernet-controlled slow control devices (VME crates, scopes, High Voltage Mainframes etc) | |
New network equipment |
Two Foundry Edgeiron 24G-A switches on 68 subnet: 24 Gbit ports, 4 Gbit uplinks (one in use) | |
Two Foundry Edgeiron 2402CF switches on 69 subnet: 24 100BaseT ports, 2 Gbit uplinks (one in use) | |
24-pair optical cable Counting Room - Forward Carriage | |
Inter-level patch panels on Space Frame and Forward Carriage | |
Optical converters with auto negotiation | |
Ethernet and Serial wires | |
New Hall B Network |
New UNIX cluster components |
Clonxt1: SunFire V40z Opteron Quad Server | |
Clonxt2: SuperServer 8042 Xeon Quad Server | |
Clonxt3: SunFire X4200 Opteron dual-core dual-cpu Server | |
Raid: Sun StorEdge 3510 2.7TB RAID5 UFS system | |
All clonxtÕs have default Ethernet adapter on subnet 167 for general use and two designated adapters on subnets 68 and 69 for DAQ purposes only | |
Opteron-based servers have 4 hours hardware support |
OS configuration |
RHEL4 was installed on all three servers; performance was good including RAID system | |
CLAS CODA was tested, some problems observed: Event Recorder was crashing occasionally, and debugger was useless | |
Clonxt2 and Clonxt3 moved to Solaris10 with Studio11 compilers | |
RAID system performance problem observed under Solaris, solved by patching | |
Huge delays in Ôdf -kÕ command were temporary fixed by turning off logging | |
Final configuration includes Linux and Solaris machines, will run this way for a while | |
New ZFS file system will be installed and tested on RAID system when available | |
DAQ structure |
Primary Readout Controller CPUs are connected directly to BigIron switch over 100BaseT optical links | |
Secondary Readout Controller CPUs (PMCs) are connected to BigIron switch through two Gbit switches with 1 uplink (can be up to 4) | |
Event Builder is running on clonxt1 under Linux | |
Event Recorder is running on clonxt3 under Solaris; RAID system is mounted on clonxt3 | |
Runcontrol is running on clon10, as well as other service procedures | |
Clonxt2 is reserved for Event Builder or Event Recorder | |
DAQ scheme (next slide) | |
Slide 9 |
Project in numbers |
4 switches | |
1 terminal server | |
6 patch panels, 2 fiber patch panels | |
About 60 Ethernet and serial wires | |
3 UNIX servers | |
One raid system | |
Overall cost over $200k | |
Multithreaded Event Building |
Event Builder was redesigned to utilize multi-processor computers: in addition to multithreaded input part it has multithreaded building part | |
Current CLAS dataflow does not require multithreaded event building, clonxt1 is fast enough to build up to 100MB/sec, so single-threaded version of Event Builder is in use | |
Project timeline |
Jan 2004: decision to upgrade CLON cluster to satisfy EG3 run requirements | |
Feb - Jun 2004: clon10 performance tests; drops in data rate were observed in long-running tests, and big data rate fluctuations observed as well; no explanations were received from Sun | |
Jun - Aug 2004: Xeon server from CC and v880 sparc server from Sun were tested; Xeon works as expected, performance is good; v880 performance was better then on clon10 but same problems were observed | |
Sep - Oct 2004: received v40z from Sun for testing; machine shows good performance, no any indications of previously observed problems | |
Nov 2004: CLAS CODA tests completed on both Xeon and Opteron machines, 32-bit Solaris is the only OS where CODA runs without any problems; decision to buy intel-style servers; Xeon vs Opteron discussion | |
Nov - Dec 2004: two quad servers (Xeon and Opteron) received and installed under 32-bit Solaris 9 | |
Dec 2004: Opteron-based SunFire v40z server under Solaris 9 is used as Event Building machine in EG3 run | |
Jan - Oct 2005: CLAS CODA redesign for multiplatform support | |
Aug 2005: decision to replace RAID system | |
Oct 2005: new RAID system received | |
Dec 2005: third server (X4200) received | |
Nov 2005 - Feb 2006: servers OS installation, RAID configuring, CLAS CODA testing | |
Feb 2006: project completed, EG4 run started using new CLON cluster |
Conclusion |
CLON cluster is equipped with new hardware and ready to run until 12GeV upgrade time | |
Maximum data rate achieved so far in real run conditions is 35MB/sec and is limited by other CLAS components | |
Data rate limit measured in test mode is <to be measured> | |
References |
CLAS DAQ paper | |
CODA paper | |
Acknowledgments |
CODA group | |
Computer Center | |