Basic APIs#
This page includes basic APIs that are commonly used.
pyttop.table#
Created on Wed Sep 4 17:15:10 2024
@author: Yu-Chen Wang
- class Data(data=None, name=None, **kwargs)[source]#
A class to store, manipulate and visualize data tables.
- Parameters:
data (str, file-like, astropy.table.Table, pandas.DataFrame, or similar) –
The data table, which can be one of the following:
A string path to a data file
A file-like object (e.g., returned by
open())An
astropy.table.TableobjectA
pandas.DataFrame, or any object that can be initialized as anastropy.table.Table
name (str, optional) – The name of this Data object. This name will be used in many cases to distinguish datasets. The default is None.
**kwargs –
Additional keyword arguments passed when initializing an
astropy.table.Tableobject.Common arguments include:
- formatstr, optional
File format specifier for
astropy.table.Table.read()(relevant when reading from a file path or file-like object). For a list of supported formats see the Astropy documentation.
Notes
The data table of a
Datainstance (i.e.data.t) is not expected to be changed since creation. Ifdata.tis changed, the matching and subset information may be inconsistent with the table. Create a newDatainstance instead.
- t#
The table.
- Type:
astropy.table.Table
- colnames#
A list of column names.
- Type:
list
- shape#
(<number_of_rows>, <number_of_columns>)- Type:
tuple
- add_subsets(*subsets, group=None, listalways=False, verbose=True)[source]#
Add subsets to a subset group.
A subset refers to a subset (selection) of rows; a subset group is a group of subsets.
Beware that a subset does not “watch” the changes in the data: once added to the data, it never changes, even if the data changes. If you would like to update your subset, you may add it again to replace the old one.
- Parameters:
*subsets (
pyttop.table.Subset) – The subsets to be added to this group. SeeSubsetfor more information.group (str, optional) – The name of the subset group. If not specified, the default subset group will be used.
listalways (bool, optional) – If True, always returns list of subsets (even if len(list) == 1). The default is False.
verbose (bool, optional) – Whether or not information is printed on the screen. The default is True.
- Returns:
subsets – The arguments, i.e. a tuple of subset objects.
- Return type:
tuple
- adsub(*subsets, group=None, listalways=False, verbose=True)#
Alias of
add_subsets()
- apply(func, processes=None, args=(), progress_bar=False, **kwargs)[source]#
Apply function
functo each row of the Table (data.t) to get a new column. This operation is not vectorized.- Parameters:
func (function) –
A function to be applied to each row. Example:
>>> def func(row): # row is a row of the Table. ... return row['a'] + row['b']
Note that if processes is not None, func should be a global function and should not be a lambda function and only accepts one single argument
row.processes (None or int) – if int (>0) is given, this specifies the number of processes used to get the results. if -1 is given, will automatically use all available cpu cores. if None, multiprocessing will not be enabled. The default is None.
args (Iterable, optional) – Additional arguments to be passed to func. The default is ().
progress_bar (bool, optional) – Only relevant when multiprocessing is not enabled. If set to True, a progress bar will be shown. The default is False.
**kwargs – Additional keyword arguments to be passed to func (not supported for multiprocessing).
- Returns:
Result of applying
functo each row.- Return type:
list
- check_duplication(*cols, action='print')[source]#
Check for duplicates for given columns
- Parameters:
*cols (str) – The names of columns (if not given, all columns will be checked).
action (str, optional) –
What to do after checking. The valid actions are:
’print’: print the results
’bool’: return whether duplicates are found
’detail’: return a dict containing the duplicate values for columns with duplicates
’subset’: return a row subset including those where duplicates are found
The default is ‘print’.
- checkdup(*cols, action='print')#
Alias of
check_duplication()
- chkdup(*cols, action='print')#
Alias of
check_duplication()
- clear_subsets(group=None)[source]#
Clear user-defined subsets.
- Parameters:
group (str, optional) – Name of the subset group to be cleared. If not specified, all user-defined subsets are deleted.
- eval(expression, to_col=None, **kwargs)[source]#
Evaluate the value with an expression.
In the expression, the columns of the table can be referred to with:
The name of the column, if the name can be regarded as a Python variable name, and they do not coincidence with names in the local/global namespace.
$(<column name>).self['<column name>'].
The Data object itself can be referred to as
self.- Parameters:
expression (str) – The expression to be evaluated.
to_col (str, optional) – Sets
data[to_col]to the evaluated values of the expression. This is preferred to usingdata['name'] = data.eval(...), because the information of the expression is added to the metadata withdata.eval(..., to_col='name'). The default is None.**kwargs –
If the expression uses some name that is not recognized (e.g. using a user-defined name will result in NameError), you can pass the values of the names here.
For example, if you use an expression ‘my_function(col) + my_value’ (where ‘col’ is a column name in the data), you can pass
my_functionandmy_valueby:Data.eval('my_function(col) + my_value', my_function=my_function, my_value=my_value)
- Returns:
The result of the evaluation.
- Return type:
result
- from_which(colname=None, detail=True)[source]#
When reading a dataset from a file using
Data(<path>, name=<name>), the name of the data is associated with each columns. After matching and merging it with other datasets, you may want to check the name of the data from whichcolnameis matched. See examples below.WARNING: The information for user-added columns may be invalid.
- Parameters:
colname (str, optional) – Column name. If this argument is not given, a dict with the information for all columns will be returned.
detail (bool, optional) – Whether the detail of the data is returned. The default is True.
- Returns:
The name (str) of the data from which
colnameis matched, or a dict containing the information for all columns.- Return type:
str or dict
Examples
Say you have two catalog files,
cat1.csvandcat2.csv.>>> cat1 = Data('cat1.csv', name=cat1) # with columns 'col1', etc. >>> cat2 = Data('cat2.csv', name=cat2) # with columns 'col2', etc. >>> cat_merged = cat1.match(cat2, SkyMatcher()).merge() ... # cat_merged has columns 'col1', 'col2', etc. >>> cat_merged.from_which('col1') cat1 (loaded from "cat1.csv") >>> cat_merged.from_which('col2') cat2 (loaded from "cat2.csv")
- get_labels(*cols, listalways=False, eval=False)[source]#
Get the labels of columns (if not set by
set_labels, the column name will be used).- Parameters:
*cols (str) – names of the columns
listalways (bool, optional) – If True, always returns list of labels (even if len(list) == 1). The default is False.
eval (bool, optional) – If True, column names that do not belong to this data will be considered as expressions that can be evaluated with
Data.eval(). The default is False.
- Return type:
str or list of str
- get_subsets(path=None, name=None, group=None, listalways=False, force=False)[source]#
Retrieve one or more subsets by specifying a group name, subset name(s), or path(s) formatted as
'<group_name>/<subset_name>'.If no arguments are provided, this method returns all subsets organized by group and subset names, accessible as a nested dictionary:
>>> subsets = data.get_subsets() >>> mysubset = subsets['group_name']['subset_name']
Note that a special subset is temporarily created when retrieving (or referring to) it. They can only be retrieved using the paths (e.g.,
'$unmasked:<column name>'). Otherwise, a GroupNotFoundError will be raised.- Parameters:
path (str or list of str, optional) – The path or a list of paths. If provided, the
nameandgrouparguments are ignored. If aSubsetobject is given, that object itself is returned. The default is None.name (str or list of str, optional) – The names of subsets or a list of names. Defaults to all subsets in the specified group.
group (str, optional) – The name of the group. Defaults to the default group.
listalways (bool, optional) – If True, always returns a list of subsets (even if the list contains only one subset). The default is False.
force (bool, optional) – Relevant only when
path(orname) is aSubsetobject. If thisSubsetobject is not a subset of this data, an exception will be raised. Settingforceto True will bypass this exception. Default is False.
- Returns:
The specified subset or list of subsets.
- Return type:
pyttop.table.Subsetor list ofpyttop.table.Subset
Notes
A special subset is a virtual subset that does not actually exist. It is used to create a (new) subset as if retrieving an existing subset from the data. These virtual subsets are only created when
get_subsets()is called and are not added to the data. To store a virtual subset as a “normal” subset in thepyttop.table.Datainstance, use the following:data.add_subsets( data.get_subsets('<path to the special subset>'), )
- gs(path=None, name=None, group=None, listalways=False, force=False)#
Alias of
get_subsets()
- gtsub(path=None, name=None, group=None, listalways=False, force=False)#
Alias of
get_subsets()
- classmethod load(path, format='data', **kwargs)[source]#
Load a data file saved with
Data.save()(usually with “.data” or “.pkl” format).Note: You may also read a raw table file like
'*.csv', but it is suggested to useData('your_catalog.csv')instead ofData.load('your_catalog.csv', format='ascii.csv').- Parameters:
path (str) – Path to the file.
format (str, optional) – The format of the file (see
Data.save()). The default is ‘data’.**kwargs – other arguments passed when initializing
Data[Only used when format is neither ‘data’ nor ‘pkl’.]
- Returns:
data
- Return type:
pyttop.table.Data
- mask_missing(cols=None, missval=None, verbose=True)[source]#
Mask missing values represented by
missval(e.g. -999) for columnscols.For example,
data.mask_missing(cols='col', missval=-999)masks all -999 values in column “col”, indicating that they are missing.If verbose, the information for the process will be printed. Note that the printed information indicates the number of elements masked in this process, rather than the total number of masked elements in the columns. To get the number of unmasked elements in a column, try:
print(data.get_subsets('$unmasked/<column_name>'))
- Parameters:
cols (str or list of str, optional) – Name(s) of the columns to be masked. The default is all columns.
missval (optional) – The value regared as missing value. The default is NaN.
verbose (bool, optional) – If verbose, the information for missing values will be printed. The default is True.
- match(data1, matcher, verbose=True, replace=False)[source]#
Match this data object with another pyttop.table.Data object data1.
- Parameters:
data1 (pyttop.table.Data) – Data to be matched to this Data.
matcher (any recognized matcher object) –
A matcher object used to match the two data objects. Built-in matchers includes, e.g.,
ExactMatcherandSkyMatcher.A matcher object should be defined like below:
class MyMatcher(): def __init__(self, args): # 'args' means any number of arguments that you need # initialize it with args you need pass def get_values(self, data, data1, verbose=True): # data1 is matched to data # prepare the data that is needed to do the matching (if necessary) pass def match(self): # do the matching process and calculate: # idx : array of shape (len(data), ). # the index of a record in data1 that best matches the records in data # matched : boolean array of shape (len(data), ). # whether the records in data can be matched to those in data1. return idx, matched
verbose (bool, optional) – Whether to output matching information. The default is True.
replace (bool, optional) – When
data1(Data to be matched) has the same name as a Data object that has already been matched to this Data, whether to replace the old matching. If False, a ValueError is raised. The default is False.
- Raises:
ValueError –
Data with the same name to be matched to this Data twise.
- match_merge(data1, matcher, keep_unmatched=[], merge_columns={}, ignore_columns={}, outname=None, verbose=True)[source]#
Match this data with
data1and immediately merge everything that can be matched to this data. Seematch()andmerge()for more information.
- match_tree(depth=-1, detail=True)[source]#
Print a “match tree”, showing all data that can be matched and merged to this data.
The data that are directly matched to this data are called “chilren” of this data, and are on “depth 1”. The data directly matched to data on “depth 1” are on “depth 2”, etc.
- Parameters:
depth (int, optional) – The depth. For example, if depth == 1, only the direct children (without grandchildren) of this data are shown. if depth == -1, all children (including all grandchildren) are shown. The default is -1.
detail (bool, optional) – Whether to show detail (including how the data are matched). The default is True.
- merge(depth=-1, keep_unmatched=[], merge_columns={}, ignore_columns={}, innames={}, outname=None, keep_subsets=False, matchinfo_subset=False, verbose=True)[source]#
Merge all data objects that are matched to this data.
The data that are directly matched to this data are called “chilren” of this data, and are on “depth 1”. The data directly matched to data on “depth 1” are on “depth 2”, etc.
- Parameters:
depth (int, optional) – The depth of merging. For example, if
depth == 1, only the direct children (without grandchildren) of this data are merged. ifdepth == -1, all children (including all grandchildren) are merged. The default is -1.keep_unmatched (Iterable or True, optional) – A list of names of pyttop.table.Data objects (you can check the names with e.g. data.name). A record (row) of THIS data is kept even if a dataset in the above list cannot be matched to this data. To set
keep_unmatchedfor all data, passkeep_unmatched=True. The default is [] (which means that only those that can be matched to each child data of this data are kept).merge_columns (dict, optional) –
A dict that specifies fields (columns) to be merged. For example, if
data1with name ‘Data_1’ is matched to this object, and you want to merge only ‘column1’, ‘column2’ indata1into the merged catalog, use:{'Data_1': ['column1', 'column2']}
If, e.g,
merge_columnsfordata2(with name ‘Data_2’) is not specified, every fields (columns) ofdata2will be merged. The list can also include regular expressions, e.g.:# requires `import re` {'Data_1': ['column1', re.compile('class.*')]}
The default is {}.
ignore_columns (dict, optional) – A dict that specifies fields (columns) not to be merged. Similar to argument
merge_columns. If bothmerge_columnsandignore_columnsare specified for a field, the columns INmerge_columnsAND NOT INignore_columnsare merged. The default is {}.innames (dict, optional) – A dict in the form of
{data_name: rename_name}. This is used to generate unique output column names in case of conflicts (i.e., same column names in differentDataobjects). By default, columns are renamed as ‘{column_name}_{data_name}’. If adata_nameis included ininnames, the corresponding ‘{column_name}_{rename_name}’ will be used instead. This can be used to avoid long column names. If subsets are kept (keep_subsets=True), conflicts in subset or group names will be handled in a similar manner. The default is {}.outname (str, optional) – The name of the merged data. If not given, this will be automatically generated from the names of data that are merged. The default is None.
keep_subsets (bool, optional) – Whether the subsets of the data are kept and merged. The default is False.
matchinfo_subset (bool, optional) – If
keep_unmatched != [], whether to add a subset ‘matched/<this_data_name>/<name_of_data_matched_to_this_data>’, indicating whether each row can be matched to that data. The default is False.verbose (bool, optional) – Whether to show more information on merging. The default is True.
- Returns:
matched_data – An
pyttop.table.Dataobject containing the merged catalog.- Return type:
pyttop.table.Data
Notes
If the
keep_unmatchedis not empty ([]), saykeep_unmatched=['data1']. Then, the rows in THIS data that has no match with the dataset called ‘data1’ are kept, and the columns from ‘data1’ for this row are missing values.‘data1’ may also have its subsets. When
keep_subsetsis set to True, the subsets of ‘data1’ are also merged. The rows with no match with ‘data1’ always do NOT belong to the subsets merged from ‘data1’.
- merge_matchinfo(depth=-1)[source]#
Merge the matchinfo for all of the children data of this data, so that each info is the match with repect to this data. If there are duplicates in the child data, only the first found is used.
- Parameters:
depth (int, optional) – The depth of merging. For example, if depth == 1, only the direct children (without grandchildren) of this data are merged. if depth == -1, all children (including all grandchildren) are merged. The default is -1.
- Returns:
outinfo – .
- Return type:
list of objdicts
- metaJson(save_path=None, yes=False)[source]#
Generate a json string for the metadata of this Data.
The metadata of a
pyttop.table.Dataobject typically saves the information on how it was initialized, how it was merged (if it is a merged catalog), etc. It can be retrieved withdata.meta. This is saved as the metadata ofdata.t, i.e.data.meta is data.t.meta.- Parameters:
save_path (str, optional) – A path to save the json as a file. The default is None (do not save).
yes (bool, optional) – If set to True, existing files will be overwritten without prompts. The default is False.
- Returns:
meta – A json string.
- Return type:
str
- mm(cols=None, missval=None, verbose=True)#
Alias of
mask_missing()
- mskmis(cols=None, missval=None, verbose=True)#
Alias of
mask_missing()
- plot(func, *args, col_input=None, cols=None, kwcols={}, eval=False, eval_kwargs={}, paths=None, subsets=None, groups=None, autolabel=True, ax=None, verbose=True, global_selection=None, title=None, iter_kwargs={}, **kwargs)[source]#
Make a plot given a plot function.
Arguments
paths,subsets,groupsare used to specify the subsets of data that are plotted in the same subplot.- Parameters:
func (str or Callable) – Function to make plots, e.g.
plt.plot, or name of the function, e.g.'plot'.*args – Arguments to be passed to func.
cols (str or list of str, optional) –
The name of the columns to be passed to
func. For example, ifcols = ['col1', 'col2'],funcwill be called by:func(data['col1'], data['col2'], *args)
Note: When
autolabelis True, the len of this argument is used to guess the dimension of the plot (e.g. 2D/3D). The default is None.kwcols (dict, optional) –
Names of data columns that are passed to
funcas keyword arguments. For example, ifkwcols={'x': 'col1', 'y':'col2'},funcwill be called by:func(x=data['col1'], y=data['col2'])
eval (bool, optional) – If set to
True, the names of data columns forcolsandkwcolswill be regarded as expressions to be evaluated withData.eval(). This means that you can not only input column names, but also input expressions. Seeeval()for the syntax of expressions. Otherwise, the names will simply be considered as column names. The default is False.eval_kwargs (dict, optional) – Keyword arguments to be passed to
Data.eval()when evaluatingcolsandkwcols. Ignored if argumentevalset toFalse. The default is {}.paths (str or list of str, optional) – The full path of a subset (e.g.
'<group_name>/<subset_name>') or a list of paths. If this is given, argumentssubsetsandgroupare ignored. The default is None.subsets (str or list of str, optional) – The names of subsets, or a list of names. The default is all subsets in the specified group.
groups (str, optional) – The name of the group. The default is the default group.
autolabel (bool, optional) –
If True, will try to automatically add labels to the plot (made by
func) as well as the axes, using the labels stored in Data and Subset objects.NOTE: The labels for axes are auto-set according to the argument
columns, and may not get the results you expects. Label for axes and legends are only possible for axes if argumentaxis given.The default is True.
verbose (bool, optional) – Whether some detailed information is printed. The default is True.
ax (axes, optional) – The axis to make the plot. The default is None.
global_selection (
astrodata.table.Subsetor str or list of str, optional) –The global selection [or the path(s) of the selection(s)] for this plot. If not None, only data selected by this argument is plotted. Accepted input:
An
pyttop.table.Subsetobject. Note that logical operations of subsets are supported, e.g.subset1 & subset2 | subset3.The path to the subset, i.e.
'groupname/subsetname'. If group name is ‘default’, you can directly use ‘subsetname’.A list/tuple/set of paths to the subsets. The global selection will be the logical AND (i.e. the intersection set) of the subsets.
The default is None.
title (str) – Manually setting the title of the plot. This will overwrite the title automatically generated. The default is None (automatically generated if autolabel is True).
iter_kwargs (dict, optional) –
Lists of keywoard arguments that are different for each subset specified. Suppose 3 subsets are specified using the
subsetsargument, an example value foriter_kwargsis{'color': ['b', 'r', 'k'], 'linestyle': ['-', '--', '-.']}
The default is {}.
**kwargs – Additional keyword arguments to be passed to
func.
- plots(func, *args, cols=None, kwcols={}, eval=False, eval_kwargs={}, plotpaths=None, plotsubsets=None, plotgroups=None, arraygroups=None, global_selection=None, share_ax=False, autobreak=False, autolabel=True, ax_callback=None, returns='fig', verbose=True, axes=None, fig=None, iter_kwargs={}, **kwargs)[source]#
Make a plot given the function
funcused for plotting.If
arraygroupsis notNone, plot an “array” of subplots (panels; subplots with several rows and columns) for different selections given inarraygroups; Each of the panels consists of several plots for different selections given inplotgroups. This is useful if one wishes to compare a plot for different subsets of the data. For example, sayplotgroups='group1',arraygroups=['group2', 'group3']. Then each panel compares different subsets in'group1'; different panels compares the results between subsets in'group2'and'group3'. Note that the dataset for each plot in each panel is the INTERSECTION of the corresponding subsets in'group1','group2'and'group3'.- Parameters:
func (str or Callable or
pyttop.plot.PlotFunction) –Name of the
matplotlib.pyplotfunction used to make plots, e.g.'plot','scatter'.Also accepts custum functions that receives an axis as the only argument, and returns a function (called “plotting function” hereafter) to make plots. Example:
lambda ax: ax.plot.You can also input your custom plot function
funcdefined by:from pyttop.plot import plotFunc @plotFunc def func(<your inputs>): <make the plot> return # you can return somthing here
Or:
from pyttop.plot import plotFuncAx @plotFuncAx def func(ax): # input ax axis def plot(<your inputs>): <make the plot> return # you can return somthing here return plot
*args – Arguments to be passed to the plotting function.
cols (str or list of str, optional) –
The name of the columns to be passed to the plotting function. For example, if
cols = ['col1', 'col2'], the plotting function will be called by:func(data['col1'], data['col2'], *args)
Note: When
autolabelis True, the len of this argument is used to guess the dimension of the plot (e.g. 2D/3D). The default is None.kwcols (dict, optional) –
Names of data columns that are passed to the plotting function as keyword arguments. For example, if
kwcols={'x': 'col1', 'y': 'col2'}, the plotting function will be called by:func(x=data['col1'], y=data['col2'])
eval (bool, optional) – If set to
True, the names of data columns forcolsandkwcolswill be regarded as expressions to be evaluated withData.eval(). This means that you can not only input column names, but also input expressions. SeeData.eval()for the syntax of expressions. Otherwise, the names will simply be considered as column names. The default is False.eval_kwargs (dict, optional) – Keyword arguments to be passed to
Data.eval()when evaluatingcolsandkwcols. Ignored if argumentevalset toFalse. The default is {}.paths – aliases of “plotpaths”, “plotsubsets” and “plotgroups”.
subsets – aliases of “plotpaths”, “plotsubsets” and “plotgroups”.
groups – aliases of “plotpaths”, “plotsubsets” and “plotgroups”.
plotpaths (str or list of str, optional) – The full path of a subset (e.g.
'<group_name>/<subset_name>') or a list of paths, for plots in each subplot. If this is given, argumentsplotsubsetsandplotgroupsare ignored. The default is None.plotsubsets (str or list of str, optional) – The names of subsets, or a list of names, for plots in each subplot. The default is all subsets in the specified group.
plotgroups (str, optional) – The name of the subset group used to make different plots in each one of the panels. For example, when the plotting function plots curves and
plotgroupsconsists of 3 subsets, 3 curves for the 3 subsets are plotted in each of the panels. The default is None.arraygroups (str or iterable of len <= 2, optional) –
The name of subset groups used to make different panels. Examples:
arraygroups = ['group1'], where ‘group1’ consists of 3 subsets. Then subplots withnrow=1, ncol=3(1x3) are generated.arraygroups = ['group1', 'group2'], where ‘group1’, ‘group2’ consists of 3, 4 subsets respectively. Then subplots withnrow=3, ncol=4(3x4) are generated.
The default is None.
global_selection (
pyttop.table.Subsetor str or list of str, optional) –Only consider data in subset
global_selection. Accepted input:An
pyttop.table.Subsetobject. Note that logical operations of subsets are supported, e.g.subset1 & subset2 | subset3.The path to the subset, i.e.
'groupname/subsetname'. If group name is ‘default’, you can directly use ‘subsetname’.A list/tuple/set of paths to the subsets. The global selection will be the logical AND (i.e. the intersection set) of the subsets.
The default is None (the whole dataset is considered).
share_ax (bool, optional) – Whether the x, y axes are shared. The default is False.
autobreak (bool, optional) – When
arraygroupsconsists of only one group, whether to automatically break the row into several rows (since the default result is a group of subplots with only one row). The default is False.autolabel (bool, optional) –
If True, will try to automatically add labels to the plot (made by
func) as well as the axes, using the labels stored in Data and Subset objects.NOTE: The labels for axes are auto-set according to the argument
columns, and may not get the results you expects. Label for axes and legends are only possible for axes if argumentaxis given.The default is True.
ax_callback (function, optional) – The function to be called as
ax_callback(ax)after plotting in each panel, whereaxis the axis object of this panel.returns (str, optional) –
Decide what to return.
'fig'or'fig, axes':return figure and axes.
'plot'or'return':return a list of the returned values of the plot function.
Whatever this argument is, you can always retrive the figure, axes and the returned values (of the plot function) of the last call of
data.plot()withdata.plot_fig, data.plot_axes, data.plot_returns.verbose (bool, optional) – Whether some detailed information is printed. The default is True.
ax – alias of “axes”.
axes (list of axes, optional) – The axes of the subplots. The default is None.
fig (
matplotlib.figure.Figure, optional) – The figure on which the subplots are made. The default is None.iter_kwargs (dict, optional) –
Lists of keywoard arguments that are different for each subset in
plotgroups. Supposeplotgroups='group1'consists of 3 subsets, an example value foriter_kwargsis{'color': ['b', 'r', 'k'], 'linestyle': ['-', '--', '-.']}
The default is {}.
**kwargs – Additional keyword arguments to be passed to the plotting function.
- Returns:
fig (
matplotlib.figure.Figure)axes (
matplotlib.axes.Axesor array ofmatplotlib.axes.Axes)
- save(path, format='data', overwrite=False)[source]#
Save data to file.
- Parameters:
path (str) – Path to the file.
format (str, optional) –
The format of the file. The default is ‘data’. Supported formats include:
- ’pkl’:
Saving the full data object to a
"*.pkl"file.
- ’data’ (default):
Saving key data (including the data table, the subsets, etc.) to a
"*.data"file. Note that the matching data is not saved.
- Other formats: Any format supported by
astropy.table.Table.write. Only saving the data table (
astropy.table.Table). This is equivalent todata.t.write(<...>).
- Other formats: Any format supported by
overwrite (bool, optional) – Whether to overwrite the file if it exists. If set to
False, aFileExistsErrorwill be raised. The default is False.
- Raises:
FileExistsError – The file already exists.
Notes
Notes for developers
When setting
format='pkl', a Data object will be saved with the standardpicklemodule. This means that all data for the object is converted and saved as a byte stream. When settingformat='data', only a selected subset of attributes will be saved separately, and are not necessarily saved with the Python’s standard pickling protocols. This makes it possible to retrieve some data from the'*.data'file even without e.g. Python’spicklemodule.
- set_labels(**kwargs)[source]#
label(<column_name>=<label>)
Add/update the labels used for, e.g., the labels on the axes of the plots.
Example: if
col1='$x_1$', the data indata.t['col1']will be labeled as ‘$x_1$’ on the plots.- Parameters:
**kwargs (<column_name:str>=<label:str>)
- sort(keys, *, keep_subsets=False, kind=None, reverse=False)[source]#
Returns a new
Datainstance with the table sorted according to one or more keys (columns).Unlike the
sort()method ofastropy.table.Table(i.e.,data.t.sort()), this method does not perform an in-place sort.- Parameters:
keys (str or list of str) – The column name(s) to order the table by.
keep_subsets (bool, optional) – If
True, the subsets will be preserved. The default isFalse.kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – Sorting algorithrm used by
numpy.argsort.reverse (bool, optional) – If
True, sort in reverse order. The default isFalse.
- Returns:
A
Datainstance with the table sorted.- Return type:
pyttop.table.Data
- ss(group=None)#
Alias of
subset_summary()
- subdat(path=None, name=None, group=None, expr=None, verbose=True, **kwargs)#
Alias of
subset_data()
- subplot_array(func, *args, cols=None, kwcols={}, eval=False, eval_kwargs={}, plotpaths=None, plotsubsets=None, plotgroups=None, arraygroups=None, global_selection=None, share_ax=False, autobreak=False, autolabel=True, ax_callback=None, returns='fig', verbose=True, axes=None, fig=None, iter_kwargs={}, **kwargs)[source]#
Deprecated name of
plots()
- subset_data(path=None, name=None, group=None, expr=None, verbose=True, **kwargs)[source]#
Get a subset (or several subsets) of data by specifying the subset(s) using the name(s) of subset group(s), subset(s), or the full path(s) (i.e.
'<group_name>/<subset_name>'). This is different from theget_subsetsmethod, which returns theSubsetobjects.You may also pass a
Subsetobject or a list ofSubsetobjects to thepathparameter, to directly get the data.For convenience, you can also directly specify an expression:
data.subset_data(expr = 'col1 == 1')
Which is equivalent to:
data.subset_data(data.add_subsets(Subset(expr), group='temp'))
This is similar to:
data.t[data.t['col1'] == 1]
but supports expressions and returns a Data.
- Parameters:
path (Subset OR list of Subset OR str OR list of str, optional) – A Subset object or a list of Subset objects, or the path or a list of paths. If this is given, arguments
nameandgroupare ignored. The default is None.name (str or list of str, optional) – The names of subsets, or a list of names. The default is all subsets in the specified group.
group (str, optional) – The name of the group. The default is the default group.
expr (str, optional) – An expression that can be evaluated with
Data.eval()(e.g.col1 == 1). The default is None.verbose (bool, optional) – Whether print more information or not. The defualt is True.
kwargs – Arguments passed to
Subset()orData.eval().
- Returns:
The subset of data or list of subsets of data specified.
- Return type:
pyttop.table.Dataor list ofpyttop.table.Data
Examples
- subset_group_from_ranges(column, ranges, group_name=None, overwrite=False)[source]#
Create a subset group by setting several ranges of values of a column.
For example,
data.subset_group_from_ranges(column='col1', ranges=[[0, 1], [1, 2]])defines a subset group named'col1', which includes 2 subsets,0 < col1 < 1and1 < col1 < 2.- Parameters:
column (str) – The name of the column.
ranges (list of lists (or similar objects)) – List of ranges.
group_name (str, optional) – The name of the created subset group. The default is the name of the column.
overwrite (bool, optional) – When a group with
group_namealready exists, whether to overwrite the group. The default is False.
- Returns:
A list of the created subsets.
- Return type:
list
- Raises:
ValueError – A group with
group_namealready exists, andoverwriteset toFalse.
- subset_group_from_values(column, group_name=None, overwrite=False)[source]#
Create a subset group by the unique values of a column.
For example, if a column named “class” has 3 possible values, “A”, “B” and “C”, a subset group will be defined with 3 subsets for class=A, B, C, respectively.
- Parameters:
column (str) – The name of the column.
group_name (str, optional) – The name of the created subset group. The default is the name of the column.
overwrite (bool, optional) – When a group with
group_namealready exists, whether to overwrite the group. The default is False.
- Raises:
ValueError – A group with
group_namealready exists, andoverwriteset toFalse.
- subset_summary(group=None)[source]#
Get a summary table for the subsets and subset groups.
The table consists of the following columns:
group: name of the subset group
name: name of the subset
size: size of the subset
fraction: fracion of the size to the total number
expression: expression/source code that specifies the selection of the subset
label: label of the subset used for plotting
- Parameters:
group (str or list of str, optional) – The name (or list of names) of the subset group(s) to be shown in the table. If not given, all groups will be shown by default.
- Returns:
summary
- Return type:
astropy.table.Table
- subsets()[source]#
Retrieve subsets organized by group and subset names, accessible like a nested dictionary.
Example
>>> subsets = data.subsets() >>> mysubset = subsets['group_name']['subset_name']
- subsum(group=None)#
Alias of
subset_summary()
- tree(depth=-1, detail=True)#
Alias of
match_tree()
- class Subset(selection, name=None, expression=None, label=None, **kwargs)[source]#
A class to specify a row subset of a
pyttop.table.Dataobject. Although this class is independent to theDataclass, it should be only used together with aDataobject.The common way to specify the selection criteria, name, etc. of a subset is:
Subset(<selection>, name=<name>, <...>)
Convenient methods for specifying a subset are:
Subset.by_range(<column name>=<value range>, <...>) Subset.by_value(<column name>, <value>)
See
by_range()andby_value()for more information.In practice, a subset of
datais usually defined using theadd_subsets()method:subset = data.add_subsets(Subset(<...>))
You may also define multiple subsets at a time:
subset1, subset2 = data.add_subsets( Subset(<...>), Subset(<...>))
Subsetobjects can be used as if they are arrays (for most cases). For example, you can get the intersection setsubset1 & subset2, the union setsubset1 | subset2, and the complementary set~subset1.Note that the name (will be auto-generated if not given) is used as the address of a subset in a
data. If you add a subset to a certain subset group in which the name is already used by another subset, the original subset will be replaced and no longer recognized as part of thatdata:s1 = data.add_subsets(Subset(<...>, name='subset')) s2 = data.add_subsets(Subset(<...>, name='subset')) # this replaces the original subset at 'default/subset' s1 in data, s2 in data # (False, True)
- Parameters:
selection (callable (e.g. function), iterable (e.g. array-like) or string) –
If it is iterable, it should be a boolean array indicating whether each row is included in this subset. It should have a shape of
(len(data),)wheredatais anpyttop.table.Datainstance.If it is callable, it should be defined like below:
def selection(table): # input: astropy.table.Table object <...> return arr # boolean array # whether each row is included in subset
If it is a string, should be an expression that can be evaluated by
Data.eval, e.g.'(column1 > 0) & (column2 < 1)'. Refer toeval()for details.name (str, optional) – The name of the subset. The default is None.
expression (str, optional) – The expression [e.g. ‘(col1 > 0) & (col2 == “A”)’] used to recognize the conditions. The default is None.
label (str, optional) – The label used in figures. The default is None.
kwargs – Arguments passed to
Data.eval()ifselectionis evaluated as an expression.
Notes
The parameters
selection,name,expression, andlabelwill become attributes of theSubsetobject. By defining a subset usingdata.add_subsets(Subset(<...>)), they are evaluated givendata:The attribute
selectionwill be converted to a boolean array;The attribute
namewill be set to the default name if it is None;The attribute
expressionwill be automatically set if it is None;The attribute
labelwill be set tonameif it is None; strings will be replaced according to the mapping of dictdata.col_labels.
If the input
selectionis/results in a masked (boolean) array, the masked elements are filled with False (which means that they are NOT included in this subset by definition). This often happens whenselectionis calculated from a masked column of the table. The finalselectionafter evaluation is never a masked array.Caveat. Subsets constructed with
exprand~exprare NOT necessarily complements of each other! See the example below:>>> from pyttop.table import Data, Subset >>> d = Data(name='test') >>> d['x'] = [-1, 1, -99] >>> d.mask_missing(missval=-99) [mask missing] col 'x': 1/3 (33.33%) masked (value: -99). >>> s1 = d.add_subsets(Subset('x < 0')) >>> s2 = d.add_subsets(Subset('~(x < 0)')) >>> s21 = d.add_subsets(Subset('x >= 0')) >>> s1, s1.selection (<Subset 'x < 0' of Data 'test' (1/3)>, array([ True, False, False])) >>> s2, s2.selection (<Subset '~(x < 0)' of Data 'test' (1/3)>, array([False, True, False])) >>> s21, s21.selection (<Subset 'x >= 0' of Data 'test' (1/3)>, array([False, True, False])) >>> (~s1), (~s1).selection (<Subset 'NOT(x < 0)' of Data 'test' (2/3)>, array([False, True, True])) >>> (~s2), (~s2).selection (<Subset 'NOT(~(x < 0))' of Data 'test' (2/3)>, array([ True, False, True])) >>> (~s21), (~s21).selection (<Subset 'NOT(x >= 0)' of Data 'test' (2/3)>, array([ True, False, True]))
- classmethod by_range(**ranges)[source]#
Initializes a subset by specifying ranges for the data.
For example,
Subset.by_range(col1=[0, 1], col2=[0, np.inf])defines a subset with (0 < col1 < 1) & (col2 > 0).- Parameters:
**ranges (key - value pairs:) –
- keystr
Name of the column in the data.
- valuelist or tuple (or other similar objects) with length=2
List of 2 numbers, e.g.
[0, 1], specifying a range of the column.
- Return type:
pyttop.table.Subset
- classmethod by_value(column, value)[source]#
Initializes a subset by specifying the exact value of column.
- Parameters:
column (str) – Name of the data column.
value – Value of the column.
- Return type:
pyttop.table.Subset
- eqs(subset)[source]#
Checks if selections of two subsets are the same. For example:
if subset1.eqs(subset2): print('same')
- Parameters:
subset (pyttop.table.Subset)
- Return type:
bool
- eval_(data, existing_keys=())[source]#
Evaluate the selection array, expression, name and label, given data. This method should be executed if, self.selection is not a boolean array OR either self.name or self.expression is None.
Note that if a subset is added to
datausingdata.add_subset(Subset(<...>)), this method is already called, and do not need to be called again.- Parameters:
data (
pyttop.table.Data)existing_keys (Iterable, optional) – Names of subsets that already exists. This is used to automatically generate subset names. The default is ().
pyttop.matcher#
Created on Sat Jul 30 2022
@author: Yuchen Wang
Built-in matchers.
- class ExactMatcher(value, value1=None)[source]#
Used to match pyttop.table.Data objects data1 to data. Match records with exact values. This should be passed to method data.match(). See help(data.match).
- Parameters:
value (str or Iterable) – Specify values for data used to match catalogs. Possible inputs are: - str, name of the field used for matching. - Iterable, values for data. len(value) should be equal to len(data).
value1 (str or Iterable, optional) –
Specify values for data1 used to match catalogs. Possible inputs are: - str, name of the field used for matching. - Iterable, values for data1. len(value1) should be equal to len(data1).
If not given and
valueis a string,value1set to the same asvalue.
- class IdentityMatcher[source]#
Used to match
pyttop.table.Dataobjectsdata1todata. Directly match records row by row, i.e. row #1 matched to row #1, row #2 matched to row #2, etc. Only possible iflen(data1) == len(data). This should be passed to method data.match(). Seehelp(data.match).
- class SkyMatcher(thres=1, coord=None, coord1=None, unit=Unit('deg'), unit1=Unit('deg'))[source]#
Used to match pyttop.table.Data objects data1 to data. Match records with nearest coordinates. This should be passed to method data.match(). See help(data.match).
- Parameters:
thres (number, optional) – Threshold in arcsec. The default is 1.
coord (str or astropy.coordinates.SkyCoord, optional) – Specify coordinate for the base data. Possible inputs are: - astropy.coordinates.SkyCoord (recommended), the coordinate object. - str, should be like ‘RA-DEC’, which specifies the column name for RA and Dec. - None (default), will try [‘ra’, ‘RA’] and [‘DEC’, ‘Dec’, ‘dec’]. The default is None.
coord1 (str or astropy.coordinates.SkyCoord, optional) – Specify coordinate for the matched data. Possible inputs are: - astropy.coordinates.SkyCoord (recommended), the coordinate object. - str, should be like ‘RA-DEC’, which specifies the column name for RA and Dec. - None (default), will try [‘ra’, ‘RA’] and [‘DEC’, ‘Dec’, ‘dec’]. The default is None.
unit (astropy.units.core.Unit or list/tuple/array of it) – If astropy.coordinates.SkyCoord object is not given for coord, this is used to specify the unit of coord. The default is astropy.units.deg.
unit1 (astropy.units.core.Unit or list/tuple/array of it) – If astropy.coordinates.SkyCoord object is not given for coord1, this is used to specify the unit of coord1. The default is astropy.units.deg.
Notes
The data columns for RA, Dec may already have units (e.g.
data.t['RA'].unit). In this case, any input forunitorunit1is ignored, and the units recorded in the columns are used.
pyttop.plot#
Created on Sun Jul 13 12:33:34 2025
@author: Yu-Chen Wang
pyttop.utils#
Created on Sat Jul 30 2022
@author: Yuchen Wang