4. Usage: as a Python package¶
4.1. Grid-data file and Info file¶
The fundamental objects of this package are File
and Table
classes, representing the files and the cross-section grid tables, respectively.
A File
instance carries two files as paths: File.table_path
for grid-data file and File.info_path
.
A grid-data file contains a table representing one or more cross sections.
The content of a grid-data file is read and parsed by pandas.read_csv
, which can parse most of table-format files [1] with a proper reader_options
specified in the “info” file.
The resulting pandas.DataFrame
object is stored as-is in File.raw_table
for further interpretation.
[1] | Parsable formats include comma-separated values (CSV), tab-separated values (TSV), and space-separated values (SSV); in addition, fixed-width formatted tables are usually parsable. |
A “info” file corresponds FileInfo
instance and is provided in JSON format [ECMAInternational17].
It has data for ColumnInfo
, ParameterInfo
, and ValueInfo
objects in addition to reader_options
.
Those three types of information is used to interpret the File.raw_table
data.
Detailed specification of “info” files are described below.
One grid table has multiple columns, where the name and unit of each column is specified by ColumnInfo
.
Some columns are “parameters” for cross sections, such as the mass of relevant particles, which are specified by ParameterInfo
.
Other columns are for “values” and ValueInfo
is used to define the values.
ValueInfo
uses one column as a central value, and one or more columns as uncertainties, which can be relative or absolute and symmetric or asymmetric.
Multiple columns for an uncertainty are combined in quadrature, i.e., \(\sigma_1\oplus\sigma_2 := \sqrt{\sigma_1^2 + \sigma_2^2}\).
For each ValueInfo
, the interpreter constructs one DataFrame
object.
It is parameterized by Index
or MultiIndex
and three columns, value
, unc+
, and unc-
, respectively containing the cross-section central value, positive combined absolute uncertainty, and (the absolute values of) negative combined absolute uncertainty.
The DataFrame
is wrapped by Table
class and stored in File.tables
(dict) with keys being the name
of the value columns.
This is an example of data handling:
from susy_cross_section import utility
from susy_cross_section.table import File, Table
grid_path, info_path = utility.get_paths("13TeV.n2x1+.wino")
file = File(grid_path, info_path)
xsec_table = file.tables["xsec"]
Here an utility function get_paths
is used to look-up paths for the key 13TeV.n2x1+.wino
and from the passes a File
instance is constructed.
Then a table with the column name xsec
is read from the tables
dictionary.
4.2. Interpolation¶
The table interpolation is handled by susy_cross_section.interp
subpackage.
This package first performs axes transformation using axes_wrapper
module, and then use one of the interpolators defined in interpolator
module.
Detail information is available in the API document of each module.
The cross-section data with one mass parameter are usually fit well by a negative power of the mass, i.e., \(\sigma(m)\propto m^{-n}\). For such cases, interpolating the function by piece-wise lines in log-log axes would work well, which is implemented as
from susy_cross_section.interp.interpolator import Scipy1dInterpolator
xs = Scipy1dInterpolator(axes="loglog", kind="linear").interpolate(xsec_table)
print(xs(500), xs.fp(500), xs.fm(500), xs.unc_p_at(500), xs.unc_m_at(500))
One can implement more complicated interpolators by extending AbstractInterpolator
.
4.3. A proposal for INFO file format¶
An info file is a JSON file and its data is one dict object.
The dict has six keys: document
, attributes
(optional), columns
, reader_options
(optional), parameters
, and values
.
This dictionary may contain any values and no specification is given, but the content should be used only for documental purposes; i.e., programs should not change their behavior by the content of
document
. Data for such purposes should be stored not indocument
but inattributes
.Possible keys are:
title
,authors
,calculator
,source
, andversion
.
This dictionary contains the default values for
CrossSectionAttributes
, which is attached to each values. These default values are overridden by theattributes
defined in respective values.
CrossSectionAttributes
stores, contrary todocument
, non-documental information, based on which programs may change their behavior. Therefore the content must be neat and in machine-friendly formats. The proposed keys are:processes
,collider
,ecm
,order
, andpdf_name
. For details, see the API document ofCrossSectionAttributes
.
columns
as a list of dict(str, str):
This is a list of dictionaries used to constructColumnInfo
; the n-th element defines n-th column in the grid-data file. The length of this list thus matches the number of the columns. Each dictionary must have two keys:name
andunit
, respectively specify the name and unit of the column. The names must be unique in one file. For dimension-less column,unit
is an empty string.
reader_options
as dict(str, Any):
This dictionary is directly passed toread_csv()
and used as the keyword arguments.
parameters
as a list of dict(str, Any):
This list defines the parameters for indexing. Each element is a dictionary, which has two keyscolumn
andgranularity
and constructs aParameterInfo
object. The value forcolumn
is one of thename
ofcolumns
. The value forgranularity
is a number used to quantize the parameter grid; for details see the API document ofParameterInfo
.
values
as a list of dictionary:
This list defines the cross-section values. Each element is a dictionary and constructs a
ValueInfo
object. The dictionary has possibly the keyscolumn
,unc
,unc+
,unc-
, andattributes
. Among these keys,column
is mandatory and corresponding value must be one of thename
ofcolumns
, where the column is used as the central value of cross-section. The value forattributes
is a dictionary dict(str, Any). It overrides the file-wide default values (explained above) to construct aCrossSectionAttributes
.The other three keys are used to specify uncertainties.
unc
specifies symmetric uncertainty, while a pair ofunc+
andunc-
specifies asymmetric uncertainty;unc
will not be present together withunc+
orunc-
. Each value ofunc
,unc+
, andunc-
is a list of dictionaries, list(dict(str, str)). Each element of the list, being a dictionary with two keyscolumn
andtype
, describes one source of uncertainties. The value forcolumn
is one of thename
ofcolumns
, or a list of the names. If one name is specified, the column is used as the source. If a list is specified, the column with the largest value among them is used as the source. The value fortype
specifies the type of uncertainty; possible options and further details are found in the API document ofValueInfo
.
4.4. How to use own tables¶
Users may use this package to handle their own cross-section grid tables, once they provide an INFO file. The procedure is summarized as follows.
Find proper
reader_options
to read the table.This package uses
pandas.read_csv()
to read the grid table, for which proper options should be specified. The following script may be useful to find the proper option for your table. Possible keys forreader_options
are found in the API document ofpandas.read_csv()
.import pandas reader_options = { "sep": ";", "skiprows": 1 } grid_path = "mydata/table_grid.txt" data_frame = pandas.read_csv(grid_path, **reader_options) print(data_frame)
Write the INFO file. One should be careful especially of “type” of uncertainties and “unit” of columns.
Verify whether the file is correctly read. show sub-command is useful for this purpose; for example,
$ susy-xs show mydata/table_grid.txt mydata/table_grid.info
After verifying with show sub-command, users can use get sub-command, or read the data in their code as:
my_grid = File("mydata/table_grid.txt", "mydata/table_grid.info")
[ECMAInternational17] | ECMA International. The JSON Data Interchange Format. Standard ECMA-404 2nd Edition, ECMA International, December 2017. |