Data Monitor: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
 
(12 intermediate revisions by 4 users not shown)
Line 1: Line 1:
'''Data Monitor''' is the data quality check program. It runs online being attached to the [[ET System]].
'''Data Monitor''' is the data quality check program. It runs online being attached to the [[ET System]].
== to do ==
Sergey: Moscow cell phone: 9-011-7-916-729-8203, skype: boiarino
1. Normalization:
* if running without beam (cosmic) for long time, should recognize it and give ONE warning for entire DC
2. Reporting:
* warning messages (hot channels (50% up, 120% down etc); if up more then 300% up, discuss with Mac
* any unusual effects - discuss with Mac
3. Results:
* program
* wiki manual with algorithm description


== Juan's work ==
== Juan's work ==


The file juanlib.c analyzes data from drift chamber wires to determine whether there exists a hardware malfunction within the drift chamber. After 90,000 events (~30 seconds) the program stores the hits from each wire into 6x6 groups designated by sector, superlayer, and wire number. Using these groups as a reference, the program can look at future events to scan for alarming activity.  
The file dmlibtest.c analyzes data from drift chamber wires to determine whether there exists a hardware malfunction within the drift chamber. After 90,000 events (~30 seconds) the program stores the hits from each wire into groups designated by corresponding hardware (ex:fuse, ADB boards/crates). Using these groups as a reference, the program can look at future events to scan for alarming activity.  




At the 180,000 event mark and for every 90,000 events afterward, the program compares the past 90,000 events with the reference to determine whether suspicious activity occurs. Any 6x6 group that changes by 63% or more, with respect to the reference, is marked as an error. The 63% threshold was determined by trial and error on many data sets. A drop of roughly 63% across all regions generally corresponds to a drop in beam current. Errors are stored in a separate data structure named errorlist.
At the 180,000 event mark and for every 90,000 events afterward, the program compares the past 90,000 events with the reference to determine whether suspicious activity. Any group that changes by 60% or more, with respect to the reference, is flagged as only a warning. If greater than 80% however, they are marked as errors. Thresholds were determined by trial and error on many data sets. There exists a phenomenon in which a drop of roughly 60% occurs across all regions occurs. This has not yet been dealt with by the algorithm.  




Alarms are only sent if greater than 5 errors occur and only if they occur within the same sector. This was implemented due to small inconsistencies with beam current leading to small regions being flagged as errors. If 5 or more of these errors fall within the same superlayer only a superlayer alarm will occur, rather than multiple 6x6 alarms. If greater than 100 errors are recorded, only an alarm warning that major drops are found in many areas will appear. This type of alarm usually corresponds to a drop in beam current.
Alarms are stored in arrays known as warnlist and errorlist. These lists are then scanned to assure that no hardware alarms overlap. For example, if an entire ADB crate is malfunctioning, all associated ADB board warnings will be cleared. This sorting prevents redundancy when sending alarms. No implementation is in place for overlapping, non-associative units(besides LV supply and ADB crates). For example, if a TDC crate malfunctions, all fuses corresponding to that section will also be flagged. If greater than 200 alarms are triggered then a single message informs the user of a universal drop rather than bombarding them with error messages. Below is a list of how the algorithm prioritizes alarms with respect to corresponding units.




Juanlib.c also creates histogram files from the data collected. Marked as 101-106, 201-206 and 301-306, these plots correspond to data from the 6 respective sectors. Histograms 101-106 correspond to one dimensional histograms, 201-206 to two dimensional histograms (hits in wire and layer), and 301-306 to normalized changes in 6x6 groups with respect to the reference. The box feature is recommended for the 301-306 plots.
Corresponding Units:
 
*TDC crate > TDC boards
 
*ADB crate > ADB boards
 
*LV supply > fuses
 
 
The algorithm used for overlapping LV supplies and ADB crates is based on the layers associated with both. Upon finding alarming activity, the algorithm checks the malfunctioning hardware's config file and stores some key variables associated with it. The stored variables are:
 
 
*sector
*highest and lowest layer number
 
 
If a LV supply malfunction is found, the algorithm scans the entire alarm list to check for ADB crates. Using the variables collected from the config file, (those listed above) it checks to see if the LV supply and ADB crate are in the same sector and whether the range of ADB crate layers are within the LV supply's. Here is a hypothetical example of an ADB crate that would be removed:
 
*LV supply sector:6  low:10  high:24
*ADB crate sector:6  low:17  high:22  <---the ADB crate's range is 17-22 while the LV supply's is 10-24. Since it falls in inside LV supply's range, the  ADB crate should be removed to prevent redundancy in alarms
 
 
Another file, juanlib.c, creates histogram files from the data collected in a similar fashion. Marked as 101-106, 201-206 and 301-306, these plots correspond to data from the 6 respective sectors. Histograms 101-106 correspond to one dimensional histograms, 201-206 to two dimensional histograms (hits in wire and layer), and 301-306 to normalized changes in 6x6 wire groups with respect to the reference. The box feature is recommended for the 301-306 plots.


== Drift Chamber Hardware Mapping ==
== Drift Chamber Hardware Mapping ==
Line 29: Line 73:


So-called 'tree' is used to obtain all sets which suppose to be filled using particular data channel.
So-called 'tree' is used to obtain all sets which suppose to be filled using particular data channel.
BankID->BankNumber->HighID->LowID->Place

Latest revision as of 01:37, 31 July 2010

Data Monitor is the data quality check program. It runs online being attached to the ET System.

to do

Sergey: Moscow cell phone: 9-011-7-916-729-8203, skype: boiarino

1. Normalization:

  • if running without beam (cosmic) for long time, should recognize it and give ONE warning for entire DC


2. Reporting:

  • warning messages (hot channels (50% up, 120% down etc); if up more then 300% up, discuss with Mac
  • any unusual effects - discuss with Mac


3. Results:

  • program
  • wiki manual with algorithm description

Juan's work

The file dmlibtest.c analyzes data from drift chamber wires to determine whether there exists a hardware malfunction within the drift chamber. After 90,000 events (~30 seconds) the program stores the hits from each wire into groups designated by corresponding hardware (ex:fuse, ADB boards/crates). Using these groups as a reference, the program can look at future events to scan for alarming activity.


At the 180,000 event mark and for every 90,000 events afterward, the program compares the past 90,000 events with the reference to determine whether suspicious activity. Any group that changes by 60% or more, with respect to the reference, is flagged as only a warning. If greater than 80% however, they are marked as errors. Thresholds were determined by trial and error on many data sets. There exists a phenomenon in which a drop of roughly 60% occurs across all regions occurs. This has not yet been dealt with by the algorithm.


Alarms are stored in arrays known as warnlist and errorlist. These lists are then scanned to assure that no hardware alarms overlap. For example, if an entire ADB crate is malfunctioning, all associated ADB board warnings will be cleared. This sorting prevents redundancy when sending alarms. No implementation is in place for overlapping, non-associative units(besides LV supply and ADB crates). For example, if a TDC crate malfunctions, all fuses corresponding to that section will also be flagged. If greater than 200 alarms are triggered then a single message informs the user of a universal drop rather than bombarding them with error messages. Below is a list of how the algorithm prioritizes alarms with respect to corresponding units.


Corresponding Units:

  • TDC crate > TDC boards
  • ADB crate > ADB boards
  • LV supply > fuses


The algorithm used for overlapping LV supplies and ADB crates is based on the layers associated with both. Upon finding alarming activity, the algorithm checks the malfunctioning hardware's config file and stores some key variables associated with it. The stored variables are:


  • sector
  • highest and lowest layer number


If a LV supply malfunction is found, the algorithm scans the entire alarm list to check for ADB crates. Using the variables collected from the config file, (those listed above) it checks to see if the LV supply and ADB crate are in the same sector and whether the range of ADB crate layers are within the LV supply's. Here is a hypothetical example of an ADB crate that would be removed:

  • LV supply sector:6 low:10 high:24
  • ADB crate sector:6 low:17 high:22 <---the ADB crate's range is 17-22 while the LV supply's is 10-24. Since it falls in inside LV supply's range, the ADB crate should be removed to prevent redundancy in alarms


Another file, juanlib.c, creates histogram files from the data collected in a similar fashion. Marked as 101-106, 201-206 and 301-306, these plots correspond to data from the 6 respective sectors. Histograms 101-106 correspond to one dimensional histograms, 201-206 to two dimensional histograms (hits in wire and layer), and 301-306 to normalized changes in 6x6 wire groups with respect to the reference. The box feature is recommended for the 301-306 plots.

Drift Chamber Hardware Mapping

  • TDC mapping
  • Low voltage mapping
  • fuses mapping
  • HV mapping
  • ADB mapping

dmlib.c algorithm

So-called 'tree' is used to obtain all sets which suppose to be filled using particular data channel.

BankID->BankNumber->HighID->LowID->Place