For an introductory overview, see LCIO (Linear Collider I/O) and ROOT Notes.
Software Chain and Data FlowEdit
The image depicts how data flows from one stage to the next. Note that, during this process, the data may be converted between formats if required. The software in each stage receives as input the processed data from the previous stage, thus forming a pipeline, which also takes as a separate input the data describing the geometry of the detector.
Overview of the Detector DesignEdit
Detectors used in these types of experiment consist of several subdetectors, which are basically materials which interact with various particles, causing them to leave behind traces called tracks as they move through them. The innermost subdetectors (the ones closest to the collision event) are the tracking subdetectors; these interact only with chared particles (the neutral ones pass undeteced). The individual detector hits which form the tracks in this layer of the detector are refered to as tracker hits. When the particles leave the tracking chamber, their direction and energy are practically unaffected. The track data collected in the tracking chamber serves to provide directional information for charged particles. The middle layer of the overall detector consits of two types of calorimeters - the electromagnetic calorimeter which serves to detect the total energy of e-, e+, and photons, and the hadron calorimeter which measures the total energy of hadrons. The dense material in the calorimeters causes particle showers, which leave tree-like traces that, for a given particle, originate from a single point, called the vertex. (This is why the calorimeters are located behind the tracker - to enable collecting directional information first.) The tree-like tracks left by the showers are refered to as clusters, while the individual detector hits that form them are called calorimeter hits. As the calorimeters generally absorb all of the energy ot the particles they detect, specific classes of particles are not expected to be found beyond their corresponding calorimeters. Muons and neutrinos penetrate the tracker and the calorimeters, and continue to the muon detectors. The data collected in this way is then fed into software that uses pattern recognition and reconstruction algorithms to infer, working backwards, the particles which were involved in the detection event, up to the originator.
- In accordance with best practices in software development, the persistency system is designed in a way which hides the implementation details of the data storage mechanism from the user code (thus allowing the framework to change the underlying representation without affecting the code that uses it in a significant way, if at all).
- The simulation is ultimately a detector response simulation. The simulation stage involves, eventually, writing the data that represents the result of the simulation "triggered" by an event produced by the generator software, which can be done (after format conversion, if required) by LCIO.
- The reconstruction stage involves reading, extending and generally manipulating data. LCIO's job is to read the previously saved data into memory, and pass it to the reconstruction program.
- The analysis stage implies read only access to (the original copies of) the input data.
The data model was designed with the above requirements in mind. Conceptual constituents of the data model are described below.
- Events hold collections of simulation and reconstruction output data.
- A list of generated particles, which may be extended with particles created during detector response simulation. The simulation adds hits from tracking detectors and calorimeters to TrackerHit and CalorimeterHit. The name MCParticle stands for Monte Carlo particle, due to the approach used.
Track and ClusterEdit
- In the reconstruction stage, pattern recognition and cluster algorithms store their output in Track and Cluster entities, respectively. These in turn reference a list of the contributing particle hits (Track points to TrackerHit-s, and Cluster to CalorimeterHit-s). These references are optional as that data may not be required at some point in time, and thus may be dropped. Clusters can also be combined from other clusters, allowing for a tree data structure, which may be exploited by the simulation program for easier generation of clusters.
- A cluster may be assigned to a track (for example, if a track leading to the cluster's vertex was detected).
- This class is intended to be the basis for most physical analyses (where the need to go into details of the constituent tracks, clusters or hits rarely arises). These objects are the end goal of the reconstruction phase - they represent the detected particles, as infered by the reconstruction software, from the detector data. "Particle flow" algorithms create a list of reconstructed particles from tracks and clusters. Generally, such an algorithm takes the tracker measurement alone for charged particles, and derives the kinematics of the neutral particles from clusters not assigned to tracks. By combining particles from the original list of reconstructed particles, a composite ReconstructedParticle object may be created and added to the event; such ReconstructedParticle objects will point back to their constituent ReconstructedParticle objects, and may also include lists of Track-s and Cluster-s.
- Objects that cannot be related through the relations built into the LCIO data model, can be related using LCRelation objects. This design provides a clear separation between classes that are used on the data analysis and reconstruction, and the Monte Carlo -related classes. These are typically used to strore links between raw data and Monte Carlo truth information. Relation objects are also used to relate reconstructed particles back to the Monte Carlo truth, for quick reference or if the hits are dropped from the files to conserve memory. Another use case is to represent temporary links between various objects at runtime. Relation objects are generic (do not expect a specific type), and support one-to-one, one-to-many and many-to-many relationships.
Support For User ExtensionsEdit
- The LCIntVec, LCFloatVec, LCStringVec, and LCGenericObject classes support custom extensions of the data model, enabling the user to define and store custom data.
This section describes how is the data model outlined above supported by the API (Application Programming Interface) of the LCIO framework. The same API can be used for both modifying and read-only access to data.
Abstract High Level InterfacesEdit
The following interfaces are high-level, abstract interfaces. Note that these only provide get-methods (can't be used to change the data). The LCIO framework's data writing code uses these interfaces and it doesn't see the concrete implementations. The code that reads the data, however, uses the default implementations provided by the LCIO to read the data in memory; the user code can then either use these abstract interfaces for read-only access, or type-cast to concrete types if the data needs to be modifiable (since concrete implementations also provide the set-methods). The LCIO framework also abstracts away the actual I/O implementations, so the user code does not depend on the concrete data format.
TODO: Add image (simplified class diagram)
- LCEvent holds untyped collections (lists) of data representing simulation or reconstruction output, or arbitrary numerical values. The collections (LCCollection objects) are "untyped" in the sense that there is no restriction on what kind of data they may store, as long as, in the C++ implementation, the classes that represent the actual data implement the LCObject interface.
- LCObject is a detail specific to the C++ implementation; it is an empty, no-functionality class that serves simply to provide a common base class for all types of data that can be stored in one of the LCEvent's collections. This enables the collections stored in the LCEvent to store data of any kind as LCObject-s, without "knowing" what the actual underlying type is (as far as the collection is concerned, since each specific data type implements (inherits) the LCObject interface, each is an LCObject; the collection "sees" it as an LCObject - it's an abstraction mechanism. (See inheritance and polymorphism in C++).
- The actual LCObject can be an MCParticle, a TrackerHit or a CalorimeterHit, or an LCFloatVec or LCIntVec instance. The last two types provide a way to insert arbitrary numerical data, if required (e.g. locations, scalar multipliers, colors that may be used by a visualizer later on...).
- Represents a collection (a list) of LCObject-s. It provides functions that retrieve information about the collection, such as its size (element count), or the type of the elements stored, as well as functions that enable access to individual elements.
An LCEvent internally contains a list of one or more of these collection objects, that are associated with the event (the collections can be added, retrieved, removed).
Default Implementation ClassesEdit
The classes provided by the LCIO in namespaces with names containing IMPL represent concrete implementations of the abstract interface classes described above. Unlike the abstract classes described above, these classes also provide set-methods, thus enabling user code to change the underlyng data.
TODO: Add image (simplified class diagram)
I/O is achieved through the abstract LCWriter and LCReader interfaces.
- It simply allows user code to open/close files (data strems), and to store event and run data.
- Provides several data access options:
- Reading run by run from the file (provides no access to event information)
- Reading event by event from the file (standard way of analyzing events in simple scenarios)
- Via (quasi) direct (indexed) access to a given event
- Via a call-back mechanism to read run and event data (client code implements listener interfaces - observer pattern)