4. Usage: as a Python package¶
![digraph class {
rankdir="LR";
{
rank=same;
File [shape=box];
filedata [label="pandas.DataFrame"];
};
{
rank=same;
Table [shape=record; label="<1>Table|<2>Table|..."]
FileInfo [shape=box]
ColumnInfo [shape=none; label=<<table BORDER="0" CELLBORDER="1" CELLSPACING="0"><tr><td>ColumnInfo</td></tr><tr><td>ColumnInfo</td></tr><tr><td>...</td></tr></table>>];
}
{
rank=same;
tabledata [label="pandas.DataFrame"]
ParameterInfo [shape=record; label="ParameterInfo|ParameterInfo|..."]
ValueInfo [shape=record; label="ValueInfo|ValueInfo|..."]
CrossSectionAttributes [shape=box]
};
File->Table [label=".tables (dict)"];
File->filedata [label=" .raw_data"];
File->FileInfo [label=".info"]
Table:1->CrossSectionAttributes [label=".attributes"]
Table:1->tabledata[label="._df"]
FileInfo->ValueInfo [label="values (list)"];
FileInfo->ParameterInfo [label="parameters (list)"];
FileInfo->ColumnInfo:1 [label="columns (list)"];
}](_images/graphviz-e33adce6f19150137acd4b427a679a5304d8bb3d.png)
Fig. 4.1 Conceptual structure of data and classes.¶
4.1. Grid-data file and Info file¶
The fundamental objects of this package are File and Table classes, representing the files and the cross-section grid tables, respectively.
A File instance carries two files as paths: File.table_path for grid-data file and File.info_path.
A grid-data file contains a table representing one or more cross sections.
The content of a grid-data file is read and parsed by pandas.read_csv, which can parse most of table-format files [1] with a proper reader_options specified in the “info” file.
The resulting pandas.DataFrame object is stored as-is in File.raw_table for further interpretation.
| [1] | Parsable formats include comma-separated values (CSV), tab-separated values (TSV), and space-separated values (SSV); in addition, fixed-width formatted tables are usually parsable. |
A “info” file corresponds FileInfo instance and is provided in JSON format [ECMAInternational17].
It has data for ColumnInfo, ParameterInfo, and ValueInfo objects in addition to reader_options.
Those three types of information is used to interpret the File.raw_table data.
Detailed specification of “info” files are described below.
One grid table has multiple columns, where the name and unit of each column is specified by ColumnInfo.
Some columns are “parameters” for cross sections, such as the mass of relevant particles, which are specified by ParameterInfo.
Other columns are for “values” and ValueInfo is used to define the values.
ValueInfo uses one column as a central value, and one or more columns as uncertainties, which can be relative or absolute and symmetric or asymmetric.
Multiple columns for an uncertainty are combined in quadrature, i.e., \(\sigma_1\oplus\sigma_2 := \sqrt{\sigma_1^2 + \sigma_2^2}\).
For each ValueInfo, the interpreter constructs one DataFrame object.
It is parameterized by Index or MultiIndex and three columns, value, unc+, and unc-, respectively containing the cross-section central value, positive combined absolute uncertainty, and (the absolute values of) negative combined absolute uncertainty.
The DataFrame is wrapped by Table class and stored in File.tables (dict) with keys being the name of the value columns.
This is an example of data handling:
from susy_cross_section import utility
from susy_cross_section.table import File, Table
grid_path, info_path = utility.get_paths("13TeV.n2x1+.wino")
file = File(grid_path, info_path)
xsec_table = file.tables["xsec"]
Here an utility function get_paths is used to look-up paths for the key 13TeV.n2x1+.wino and from the passes a File instance is constructed.
Then a table with the column name xsec is read from the tables dictionary.
4.2. Interpolation¶
The table interpolation is handled by susy_cross_section.interp subpackage.
This package first performs axes transformation using axes_wrapper module, and then use one of the interpolators defined in interpolator module.
Detail information is available in the API document of each module.
The cross-section data with one mass parameter are usually fit well by a negative power of the mass, i.e., \(\sigma(m)\propto m^{-n}\). For such cases, interpolating the function by piece-wise lines in log-log axes would work well, which is implemented as
from susy_cross_section.interp.interpolator import Scipy1dInterpolator
xs = Scipy1dInterpolator(axes="loglog", kind="linear").interpolate(xsec_table)
print(xs(500), xs.fp(500), xs.fm(500), xs.unc_p_at(500), xs.unc_m_at(500))
One can implement more complicated interpolators by extending AbstractInterpolator.
4.3. A proposal for INFO file format¶
An info file is a JSON file and its data is one dict object.
The dict has six keys: document, attributes (optional), columns, reader_options (optional), parameters, and values.
This dictionary may contain any values and no specification is given, but the content should be used only for documental purposes; i.e., programs should not change their behavior by the content of
document. Data for such purposes should be stored not indocumentbut inattributes.Possible keys are:
title,authors,calculator,source, andversion.
This dictionary contains the default values for
CrossSectionAttributes, which is attached to each values. These default values are overridden by theattributesdefined in respective values.
CrossSectionAttributesstores, contrary todocument, non-documental information, based on which programs may change their behavior. Therefore the content must be neat and in machine-friendly formats. The proposed keys are:processes,collider,ecm,order, andpdf_name. For details, see the API document ofCrossSectionAttributes.
columns as a list of dict(str, str):
This is a list of dictionaries used to constructColumnInfo; the n-th element defines n-th column in the grid-data file. The length of this list thus matches the number of the columns. Each dictionary must have two keys:nameandunit, respectively specify the name and unit of the column. The names must be unique in one file. For dimension-less column,unitis an empty string.
reader_options as dict(str, Any):
This dictionary is directly passed toread_csv()and used as the keyword arguments.
parameters as a list of dict(str, Any):
This list defines the parameters for indexing. Each element is a dictionary, which has two keyscolumnandgranularityand constructs aParameterInfoobject. The value forcolumnis one of thenameofcolumns. The value forgranularityis a number used to quantize the parameter grid; for details see the API document ofParameterInfo.
values as a list of dictionary:
This list defines the cross-section values. Each element is a dictionary and constructs a
ValueInfoobject. The dictionary has possibly the keyscolumn,unc,unc+,unc-, andattributes. Among these keys,columnis mandatory and corresponding value must be one of thenameofcolumns, where the column is used as the central value of cross-section. The value forattributesis a dictionary dict(str, Any). It overrides the file-wide default values (explained above) to construct aCrossSectionAttributes.The other three keys are used to specify uncertainties.
uncspecifies symmetric uncertainty, while a pair ofunc+andunc-specifies asymmetric uncertainty;uncwill not be present together withunc+orunc-. Each value ofunc,unc+, andunc-is a list of dictionaries, list(dict(str, str)). Each element of the list, being a dictionary with two keyscolumnandtype, describes one source of uncertainties. The value forcolumnis one of thenameofcolumns, or a list of the names. If one name is specified, the column is used as the source. If a list is specified, the column with the largest value among them is used as the source. The value fortypespecifies the type of uncertainty; possible options and further details are found in the API document ofValueInfo.
4.4. How to use own tables¶
Users may use this package to handle their own cross-section grid tables, once they provide an INFO file. The procedure is summarized as follows.
Find proper
reader_optionsto read the table.This package uses
pandas.read_csv()to read the grid table, for which proper options should be specified. The following script may be useful to find the proper option for your table. Possible keys forreader_optionsare found in the API document ofpandas.read_csv().import pandas reader_options = { "sep": ";", "skiprows": 1 } grid_path = "mydata/table_grid.txt" data_frame = pandas.read_csv(grid_path, **reader_options) print(data_frame)
Write the INFO file. One should be careful especially of “type” of uncertainties and “unit” of columns.
Verify whether the file is correctly read. show sub-command is useful for this purpose; for example,
$ susy-xs show mydata/table_grid.txt mydata/table_grid.infoAfter verifying with show sub-command, users can use get sub-command, or read the data in their code as:
my_grid = File("mydata/table_grid.txt", "mydata/table_grid.info")
| [ECMAInternational17] | ECMA International. The JSON Data Interchange Format. Standard ECMA-404 2nd Edition, ECMA International, December 2017. |