EVIO: Difference between revisions
No edit summary |
m Text replacement - "clonwiki.jlab.org" to "clonwiki0.jlab.org" |
||
(81 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
EVIO is data format and corresponding software used at JLAB. | EVIO is data format and corresponding software used at JLAB. | ||
EVIO Manual version 2.0: [https:// | EVIO Manual version 2.0: [https://clonwiki0.jlab.org/wiki/clondocs/Docs/evio_Users_Guide.pdf pdf] [https://clonwiki0.jlab.org/wiki/clondocs/Docs/evio_Users_Guide.doc doc]. | ||
Event Building EVIO Scheme: [https://clonwiki0.jlab.org/wiki/clondocs/Docs/eventbuilding.pdf pdf] [https://clonwiki0.jlab.org/wiki/clondocs/Docs/eventbuilding.ppt ppt] [https://clonwiki0.jlab.org/wiki/clondocs/Docs/eventbuilding.pptx pptx]. | |||
Not in the manual yet: 8-word EVIO block(record) header: | |||
1. block length | |||
2. block number | |||
3. header length=8 | |||
4. event count (# of banks, without dictionary if it is inserted) | |||
5. not used | |||
6. bit info[31-8] version [7-0]; if dectionary is therer, bit 8 (starting from 0) will be set | |||
7. unused | |||
8. magic int = 0xc0da0100 | |||
'''CLAS12 guidelines:''' | |||
* 'EVIO BANK' (not 'EVIO SEGMENT' or 'EVIO TAGSEGMENT' will be used for data banks; bank header has 2 words, first one is exclusive bank length in words, second one contains 3 fields: tag[31:20], contentType[15:8] and num[7:0] | |||
* no mixed formats (for example 'int' and 'short') will be allowed in the same bank, format is described in 8-bit field ''contentType'' (or use type=0 and do custom swap ?), except for formatted banks (see below) | |||
* BOS bank name will be replaced with 12-bit field ''tag'', BOS bank number will be replaced with 8-bit field ''num'' | |||
* banks dictionary (former DDL replacement) will be provided in some form (ASCII ?) and will contains a notation on every bank ''tag'', most important will be the number of 'columns' and the meaning of every column (for example id,tdcl,adcl,tdcr,tdcl), it will help to decode data from the bank and can be used by evio viewers; will be written at least in the beginning of every file | |||
* data banks will be in 'unsigned short' or 'unsigned int' format; first element will be ID ('slot<<8+channel' without translation, 'layer<<8+wire' or similar with translation) | |||
* some bank examples: | |||
tdc version 1: ID, TDC | |||
tdc version 2: ID, N_HITS, TDC_1, ... , TDC_N | |||
adc (raw data hits): ID, N_SAMPLES(=window_width), ADC_1, ... , ADC_N | |||
adc (window integral - 32bit): ID, ADC_SUM(can be 22 bits+1 bit overflow) | |||
adc (pulse data hits): ID, N_SAMPLES, FIRST_SAMPLE, ADC_1, ..., ADC_N | |||
* may give a tags to all existing CLAS banks and make sure CLAS12 bank tags do not overlap - will allow to process old and new banks together | |||
'''EVIO possible extensions:''' | |||
* file split - implemented in CLAS | |||
* bank 'dictionary-based', or 'format-based' swap (will allow mixed format), can be enforced if contentType=0 | |||
'''EVIO format-based swap''' | |||
Sergey B. proposed 2 functions to handle format-based banks in EVIO. The idea is to have designated bank with contentType=0xf (or 0x11, 0x12 etc), which will contain 2 internal banks (or segments). First bank (or segment) will contain bank format description, and second bank (or segment) will contain data. Format can be stored in form of ascii string, or in binary form. Two [https://clonwiki0.jlab.org/wiki/clondocs/Downloads/evioswap.tar evio swap functions] are provided to handle format-based banks: ''eviofmt()'' converts format from character string form to binary form, and ''eviofmtswap()'' swaps data between big-endian and little-endian forms in according to the format (using binary form). Functions were tested using included test program, but more detailed tests and checks are required, as well as implementation into EVIO package. Available formats are following FORTRAN rules in general, with some restrictions/additions. | |||
Possible formats and corresponding binary codes: | |||
1 'i' unsigned int | |||
2 'F' floating point | |||
3 'a' 8-bit char (C++) | |||
4 'S' short | |||
5 's' unsigned short | |||
6 'C' char | |||
7 'c' unsigned char | |||
8 'D' double (64-bit float) | |||
9 'L' long long (64-bit int) | |||
10 'l' unsigned long long (64-bit int) | |||
11 'I' int | |||
12 'A' hollerith (4-byte char with int*32 bytes order for backward compatibility with BOS) | |||
Examples: | |||
"3I,4S,5I" | |||
"3I,4S,(F,I)" | |||
"3I,4S,3(F,I),F,I,2S,L" | |||
"3I,4S,2(2(F,I)),2S,L" | |||
"3I,4S,3(F,I),F,NS,L" | |||
Rules: | |||
* when format control reaches last (outer) right parenthesis and there are data left, the format starts again by the last preceding right parenthesis, including its group repeat count, if any, or, if no group specification exists, then at the first left parenthesis of the format specification; if none of above exist, it starts from the beginning of the format control (FORTRAN rules) | |||
* repeat count must be between 2 and 15, missing repeat count assumed 1 | |||
* if repeat count is 'N' instead of digits, it will be taken from the data assuming int*32 format; it allows to have variable length rows (see last example) | |||
'''EVIO formated banks - EVIO version 4.0 and later''' | |||
EVIO banks with contentType=0xf are formatted banks. They contains two parts: ''tagsegment'' with format description as ASCII string, and ''bank'' with data. All data banks produced by CLAS12 DAQ will be primitive data types whenever possible, or formatted type. Formatted banks will be used if different data types are needed inside the same bank, or/and if banks contains raws with variable length. Following formats will be used for basic data banks as they are inserted into the Event Transfer system from Event Builder: | |||
* CAEN v1190/v1290 TDCs: <span style="color:#FF0000"> "2c,Ns" </span> - slot, channel, the number of hits, hits | |||
* JLAB FADC250 in WINDOW RAW DATA mode: <span style="color:#FF0000"> "c,i,l,N(c,Ns)" </span> - slot#, trig#, timestamp, the number of channels (channel#, the number of samples, samples) | |||
* JLAB FADC250 in PULSE RAW DATA mode: <span style="color:#FF0000"> "c,i,l,N(c,N(c,Ns))" </span> - slot#, trig#, timestamp, the number of channels (channel#, the number of pulses (first sample#, the number of samples in pulse, samples)) | |||
* JLAB FADC250 in PULSE INTEGRAL mode: <span style="color:#FF0000"> "c,i,l,N(c,N(s,i))" </span> - slot#, trig#, timestamp, the number of channels (channel#, the number of pulses (pulse time, pulse integral)) | |||
For 'translated' banks geographic identifiers (slot-channel) will be replaced by logical identifiers, for example layer-wire etc. | |||
Every subsystem will be assigned with unique 16-bit bank tags, while 8-bit bank number can be used for data separation inside subsystem, for example for different crates or sectors. Following scheme can be proposed: | |||
31..........24 23..........16 15..........8 7..........0 | |||
l e n g t h <- bank[0] | |||
system# 0xf sysnum <- bank[1] | |||
strlen 0x6 length <- tagsegment | |||
"2c,l,N(c,i)" <- format string | |||
l e n g t h <- bank[0] | |||
subsystem# 0x0 subsysnum <- bank[1] | |||
<- data | |||
'''system#''' is unique id assigned to the particular detector, it can be the number or 2-letter (EC, CC etc). '''sysnum''' cab be used for futher subdivision inside system, for example sector number. '''strlen''' contains the length of the format string in bytes. '''subsystem#''' specifies different kinds of data (ADC raw, ADC pulse, TDC, Scalers etc), it can be the number or 2-letter as well. '''subsysnum''' allows to specify data representation (for example raw, translated, etc). |
Latest revision as of 15:06, 17 April 2015
EVIO is data format and corresponding software used at JLAB.
EVIO Manual version 2.0: pdf doc.
Event Building EVIO Scheme: pdf ppt pptx.
Not in the manual yet: 8-word EVIO block(record) header:
1. block length 2. block number 3. header length=8 4. event count (# of banks, without dictionary if it is inserted) 5. not used 6. bit info[31-8] version [7-0]; if dectionary is therer, bit 8 (starting from 0) will be set 7. unused 8. magic int = 0xc0da0100
CLAS12 guidelines:
- 'EVIO BANK' (not 'EVIO SEGMENT' or 'EVIO TAGSEGMENT' will be used for data banks; bank header has 2 words, first one is exclusive bank length in words, second one contains 3 fields: tag[31:20], contentType[15:8] and num[7:0]
- no mixed formats (for example 'int' and 'short') will be allowed in the same bank, format is described in 8-bit field contentType (or use type=0 and do custom swap ?), except for formatted banks (see below)
- BOS bank name will be replaced with 12-bit field tag, BOS bank number will be replaced with 8-bit field num
- banks dictionary (former DDL replacement) will be provided in some form (ASCII ?) and will contains a notation on every bank tag, most important will be the number of 'columns' and the meaning of every column (for example id,tdcl,adcl,tdcr,tdcl), it will help to decode data from the bank and can be used by evio viewers; will be written at least in the beginning of every file
- data banks will be in 'unsigned short' or 'unsigned int' format; first element will be ID ('slot<<8+channel' without translation, 'layer<<8+wire' or similar with translation)
- some bank examples:
tdc version 1: ID, TDC tdc version 2: ID, N_HITS, TDC_1, ... , TDC_N adc (raw data hits): ID, N_SAMPLES(=window_width), ADC_1, ... , ADC_N adc (window integral - 32bit): ID, ADC_SUM(can be 22 bits+1 bit overflow) adc (pulse data hits): ID, N_SAMPLES, FIRST_SAMPLE, ADC_1, ..., ADC_N
- may give a tags to all existing CLAS banks and make sure CLAS12 bank tags do not overlap - will allow to process old and new banks together
EVIO possible extensions:
- file split - implemented in CLAS
- bank 'dictionary-based', or 'format-based' swap (will allow mixed format), can be enforced if contentType=0
EVIO format-based swap
Sergey B. proposed 2 functions to handle format-based banks in EVIO. The idea is to have designated bank with contentType=0xf (or 0x11, 0x12 etc), which will contain 2 internal banks (or segments). First bank (or segment) will contain bank format description, and second bank (or segment) will contain data. Format can be stored in form of ascii string, or in binary form. Two evio swap functions are provided to handle format-based banks: eviofmt() converts format from character string form to binary form, and eviofmtswap() swaps data between big-endian and little-endian forms in according to the format (using binary form). Functions were tested using included test program, but more detailed tests and checks are required, as well as implementation into EVIO package. Available formats are following FORTRAN rules in general, with some restrictions/additions.
Possible formats and corresponding binary codes:
1 'i' unsigned int 2 'F' floating point 3 'a' 8-bit char (C++) 4 'S' short 5 's' unsigned short 6 'C' char 7 'c' unsigned char 8 'D' double (64-bit float) 9 'L' long long (64-bit int) 10 'l' unsigned long long (64-bit int) 11 'I' int 12 'A' hollerith (4-byte char with int*32 bytes order for backward compatibility with BOS)
Examples:
"3I,4S,5I" "3I,4S,(F,I)" "3I,4S,3(F,I),F,I,2S,L" "3I,4S,2(2(F,I)),2S,L" "3I,4S,3(F,I),F,NS,L"
Rules:
- when format control reaches last (outer) right parenthesis and there are data left, the format starts again by the last preceding right parenthesis, including its group repeat count, if any, or, if no group specification exists, then at the first left parenthesis of the format specification; if none of above exist, it starts from the beginning of the format control (FORTRAN rules)
- repeat count must be between 2 and 15, missing repeat count assumed 1
- if repeat count is 'N' instead of digits, it will be taken from the data assuming int*32 format; it allows to have variable length rows (see last example)
EVIO formated banks - EVIO version 4.0 and later
EVIO banks with contentType=0xf are formatted banks. They contains two parts: tagsegment with format description as ASCII string, and bank with data. All data banks produced by CLAS12 DAQ will be primitive data types whenever possible, or formatted type. Formatted banks will be used if different data types are needed inside the same bank, or/and if banks contains raws with variable length. Following formats will be used for basic data banks as they are inserted into the Event Transfer system from Event Builder:
- CAEN v1190/v1290 TDCs: "2c,Ns" - slot, channel, the number of hits, hits
- JLAB FADC250 in WINDOW RAW DATA mode: "c,i,l,N(c,Ns)" - slot#, trig#, timestamp, the number of channels (channel#, the number of samples, samples)
- JLAB FADC250 in PULSE RAW DATA mode: "c,i,l,N(c,N(c,Ns))" - slot#, trig#, timestamp, the number of channels (channel#, the number of pulses (first sample#, the number of samples in pulse, samples))
- JLAB FADC250 in PULSE INTEGRAL mode: "c,i,l,N(c,N(s,i))" - slot#, trig#, timestamp, the number of channels (channel#, the number of pulses (pulse time, pulse integral))
For 'translated' banks geographic identifiers (slot-channel) will be replaced by logical identifiers, for example layer-wire etc.
Every subsystem will be assigned with unique 16-bit bank tags, while 8-bit bank number can be used for data separation inside subsystem, for example for different crates or sectors. Following scheme can be proposed:
31..........24 23..........16 15..........8 7..........0 l e n g t h <- bank[0] system# 0xf sysnum <- bank[1] strlen 0x6 length <- tagsegment "2c,l,N(c,i)" <- format string l e n g t h <- bank[0] subsystem# 0x0 subsysnum <- bank[1] <- data
system# is unique id assigned to the particular detector, it can be the number or 2-letter (EC, CC etc). sysnum cab be used for futher subdivision inside system, for example sector number. strlen contains the length of the format string in bytes. subsystem# specifies different kinds of data (ADC raw, ADC pulse, TDC, Scalers etc), it can be the number or 2-letter as well. subsysnum allows to specify data representation (for example raw, translated, etc).