Basic APIs#

This page includes basic APIs that are commonly used.

pyttop.table#

Created on Wed Sep 4 17:15:10 2024

@author: Yu-Chen Wang

class Data(data=None, name=None, **kwargs)[source]#

A class to store, manipulate and visualize data tables.

Parameters:
  • data (str, file-like, astropy.table.Table, pandas.DataFrame, or similar) –

    The data table, which can be one of the following:

    • A string path to a data file

    • A file-like object (e.g., returned by open())

    • An astropy.table.Table object

    • A pandas.DataFrame, or any object that can be initialized as an astropy.table.Table

  • name (str, optional) – The name of this Data object. This name will be used in many cases to distinguish datasets. The default is None.

  • **kwargs

    Additional keyword arguments passed when initializing an astropy.table.Table object.

    Common arguments include:

    formatstr, optional

    File format specifier for astropy.table.Table.read() (relevant when reading from a file path or file-like object). For a list of supported formats see the Astropy documentation.

Notes

  • The data table of a Data instance (i.e. data.t) is not expected to be changed since creation. If data.t is changed, the matching and subset information may be inconsistent with the table. Create a new Data instance instead.

t#

The table.

Type:

astropy.table.Table

colnames#

A list of column names.

Type:

list

shape#

(<number_of_rows>, <number_of_columns>)

Type:

tuple

add_subsets(*subsets, group=None, listalways=False, verbose=True)[source]#

Add subsets to a subset group.

A subset refers to a subset (selection) of rows; a subset group is a group of subsets.

Beware that a subset does not “watch” the changes in the data: once added to the data, it never changes, even if the data changes. If you would like to update your subset, you may add it again to replace the old one.

Parameters:
  • *subsets (pyttop.table.Subset) – The subsets to be added to this group. See Subset for more information.

  • group (str, optional) – The name of the subset group. If not specified, the default subset group will be used.

  • listalways (bool, optional) – If True, always returns list of subsets (even if len(list) == 1). The default is False.

  • verbose (bool, optional) – Whether or not information is printed on the screen. The default is True.

Returns:

subsets – The arguments, i.e. a tuple of subset objects.

Return type:

tuple

adsub(*subsets, group=None, listalways=False, verbose=True)#

Alias of add_subsets()

apply(func, processes=None, args=(), progress_bar=False, **kwargs)[source]#

Apply function func to each row of the Table (data.t) to get a new column. This operation is not vectorized.

Parameters:
  • func (function) –

    A function to be applied to each row. Example:

    >>> def func(row): # row is a row of the Table.
    ...     return row['a'] + row['b']
    

    Note that if processes is not None, func should be a global function and should not be a lambda function and only accepts one single argument row.

  • processes (None or int) – if int (>0) is given, this specifies the number of processes used to get the results. if -1 is given, will automatically use all available cpu cores. if None, multiprocessing will not be enabled. The default is None.

  • args (Iterable, optional) – Additional arguments to be passed to func. The default is ().

  • progress_bar (bool, optional) – Only relevant when multiprocessing is not enabled. If set to True, a progress bar will be shown. The default is False.

  • **kwargs – Additional keyword arguments to be passed to func (not supported for multiprocessing).

Returns:

Result of applying func to each row.

Return type:

list

check_duplication(*cols, action='print')[source]#

Check for duplicates for given columns

Parameters:
  • *cols (str) – The names of columns (if not given, all columns will be checked).

  • action (str, optional) –

    What to do after checking. The valid actions are:

    • ’print’: print the results

    • ’bool’: return whether duplicates are found

    • ’detail’: return a dict containing the duplicate values for columns with duplicates

    • ’subset’: return a row subset including those where duplicates are found

    The default is ‘print’.

checkdup(*cols, action='print')#

Alias of check_duplication()

chkdup(*cols, action='print')#

Alias of check_duplication()

clear_subsets(group=None)[source]#

Clear user-defined subsets.

Parameters:

group (str, optional) – Name of the subset group to be cleared. If not specified, all user-defined subsets are deleted.

eval(expression, to_col=None, **kwargs)[source]#

Evaluate the value with an expression.

In the expression, the columns of the table can be referred to with:

  • The name of the column, if the name can be regarded as a Python variable name, and they do not coincidence with names in the local/global namespace.

  • $(<column name>).

  • self['<column name>'].

The Data object itself can be referred to as self.

Parameters:
  • expression (str) – The expression to be evaluated.

  • to_col (str, optional) – Sets data[to_col] to the evaluated values of the expression. This is preferred to using data['name'] = data.eval(...), because the information of the expression is added to the metadata with data.eval(..., to_col='name'). The default is None.

  • **kwargs

    If the expression uses some name that is not recognized (e.g. using a user-defined name will result in NameError), you can pass the values of the names here.

    For example, if you use an expression ‘my_function(col) + my_value’ (where ‘col’ is a column name in the data), you can pass my_function and my_value by:

    Data.eval('my_function(col) + my_value', my_function=my_function, my_value=my_value)
    

Returns:

The result of the evaluation.

Return type:

result

from_which(colname=None, detail=True)[source]#

When reading a dataset from a file using Data(<path>, name=<name>), the name of the data is associated with each columns. After matching and merging it with other datasets, you may want to check the name of the data from which colname is matched. See examples below.

WARNING: The information for user-added columns may be invalid.

Parameters:
  • colname (str, optional) – Column name. If this argument is not given, a dict with the information for all columns will be returned.

  • detail (bool, optional) – Whether the detail of the data is returned. The default is True.

Returns:

The name (str) of the data from which colname is matched, or a dict containing the information for all columns.

Return type:

str or dict

Examples

Say you have two catalog files, cat1.csv and cat2.csv.

>>> cat1 = Data('cat1.csv', name=cat1) # with columns 'col1', etc.
>>> cat2 = Data('cat2.csv', name=cat2) # with columns 'col2', etc.
>>> cat_merged = cat1.match(cat2, SkyMatcher()).merge()
... # cat_merged has columns 'col1', 'col2', etc.
>>> cat_merged.from_which('col1')
cat1 (loaded from "cat1.csv")
>>> cat_merged.from_which('col2')
cat2 (loaded from "cat2.csv")
get_labels(*cols, listalways=False, eval=False)[source]#

Get the labels of columns (if not set by set_labels, the column name will be used).

Parameters:
  • *cols (str) – names of the columns

  • listalways (bool, optional) – If True, always returns list of labels (even if len(list) == 1). The default is False.

  • eval (bool, optional) – If True, column names that do not belong to this data will be considered as expressions that can be evaluated with Data.eval(). The default is False.

Return type:

str or list of str

get_subsets(path=None, name=None, group=None, listalways=False, force=False)[source]#

Retrieve one or more subsets by specifying a group name, subset name(s), or path(s) formatted as '<group_name>/<subset_name>'.

If no arguments are provided, this method returns all subsets organized by group and subset names, accessible as a nested dictionary:

>>> subsets = data.get_subsets()
>>> mysubset = subsets['group_name']['subset_name']

Note that a special subset is temporarily created when retrieving (or referring to) it. They can only be retrieved using the paths (e.g., '$unmasked:<column name>'). Otherwise, a GroupNotFoundError will be raised.

Parameters:
  • path (str or list of str, optional) – The path or a list of paths. If provided, the name and group arguments are ignored. If a Subset object is given, that object itself is returned. The default is None.

  • name (str or list of str, optional) – The names of subsets or a list of names. Defaults to all subsets in the specified group.

  • group (str, optional) – The name of the group. Defaults to the default group.

  • listalways (bool, optional) – If True, always returns a list of subsets (even if the list contains only one subset). The default is False.

  • force (bool, optional) – Relevant only when path (or name) is a Subset object. If this Subset object is not a subset of this data, an exception will be raised. Setting force to True will bypass this exception. Default is False.

Returns:

The specified subset or list of subsets.

Return type:

pyttop.table.Subset or list of pyttop.table.Subset

Notes

A special subset is a virtual subset that does not actually exist. It is used to create a (new) subset as if retrieving an existing subset from the data. These virtual subsets are only created when get_subsets() is called and are not added to the data. To store a virtual subset as a “normal” subset in the pyttop.table.Data instance, use the following:

data.add_subsets(
    data.get_subsets('<path to the special subset>'),
    )
gs(path=None, name=None, group=None, listalways=False, force=False)#

Alias of get_subsets()

gtsub(path=None, name=None, group=None, listalways=False, force=False)#

Alias of get_subsets()

classmethod load(path, format='data', **kwargs)[source]#

Load a data file saved with Data.save() (usually with “.data” or “.pkl” format).

Note: You may also read a raw table file like '*.csv', but it is suggested to use Data('your_catalog.csv') instead of Data.load('your_catalog.csv', format='ascii.csv').

Parameters:
  • path (str) – Path to the file.

  • format (str, optional) – The format of the file (see Data.save()). The default is ‘data’.

  • **kwargs – other arguments passed when initializing Data [Only used when format is neither ‘data’ nor ‘pkl’.]

Returns:

data

Return type:

pyttop.table.Data

mask_missing(cols=None, missval=None, verbose=True)[source]#

Mask missing values represented by missval (e.g. -999) for columns cols.

For example, data.mask_missing(cols='col', missval=-999) masks all -999 values in column “col”, indicating that they are missing.

If verbose, the information for the process will be printed. Note that the printed information indicates the number of elements masked in this process, rather than the total number of masked elements in the columns. To get the number of unmasked elements in a column, try:

print(data.get_subsets('$unmasked/<column_name>'))
Parameters:
  • cols (str or list of str, optional) – Name(s) of the columns to be masked. The default is all columns.

  • missval (optional) – The value regared as missing value. The default is NaN.

  • verbose (bool, optional) – If verbose, the information for missing values will be printed. The default is True.

match(data1, matcher, verbose=True, replace=False)[source]#

Match this data object with another pyttop.table.Data object data1.

Parameters:
  • data1 (pyttop.table.Data) – Data to be matched to this Data.

  • matcher (any recognized matcher object) –

    A matcher object used to match the two data objects. Built-in matchers includes, e.g., ExactMatcher and SkyMatcher.

    A matcher object should be defined like below:

    class MyMatcher():
        def __init__(self, args): # 'args' means any number of arguments that you need
            # initialize it with args you need
            pass
    
        def get_values(self, data, data1, verbose=True): # data1 is matched to data
            # prepare the data that is needed to do the matching (if necessary)
            pass
    
        def match(self):
            # do the matching process and calculate:
            # idx : array of shape (len(data), ).
            #     the index of a record in data1 that best matches the records in data
            # matched : boolean array of shape (len(data), ).
            #     whether the records in data can be matched to those in data1.
            return idx, matched
    

  • verbose (bool, optional) – Whether to output matching information. The default is True.

  • replace (bool, optional) – When data1 (Data to be matched) has the same name as a Data object that has already been matched to this Data, whether to replace the old matching. If False, a ValueError is raised. The default is False.

Raises:

ValueError

  • Data with the same name to be matched to this Data twise.

match_merge(data1, matcher, keep_unmatched=[], merge_columns={}, ignore_columns={}, outname=None, verbose=True)[source]#

Match this data with data1 and immediately merge everything that can be matched to this data. See match() and merge() for more information.

match_tree(depth=-1, detail=True)[source]#

Print a “match tree”, showing all data that can be matched and merged to this data.

The data that are directly matched to this data are called “chilren” of this data, and are on “depth 1”. The data directly matched to data on “depth 1” are on “depth 2”, etc.

Parameters:
  • depth (int, optional) – The depth. For example, if depth == 1, only the direct children (without grandchildren) of this data are shown. if depth == -1, all children (including all grandchildren) are shown. The default is -1.

  • detail (bool, optional) – Whether to show detail (including how the data are matched). The default is True.

merge(depth=-1, keep_unmatched=[], merge_columns={}, ignore_columns={}, innames={}, outname=None, keep_subsets=False, matchinfo_subset=False, verbose=True)[source]#

Merge all data objects that are matched to this data.

The data that are directly matched to this data are called “chilren” of this data, and are on “depth 1”. The data directly matched to data on “depth 1” are on “depth 2”, etc.

Parameters:
  • depth (int, optional) – The depth of merging. For example, if depth == 1, only the direct children (without grandchildren) of this data are merged. if depth == -1, all children (including all grandchildren) are merged. The default is -1.

  • keep_unmatched (Iterable or True, optional) – A list of names of pyttop.table.Data objects (you can check the names with e.g. data.name). A record (row) of THIS data is kept even if a dataset in the above list cannot be matched to this data. To set keep_unmatched for all data, pass keep_unmatched=True. The default is [] (which means that only those that can be matched to each child data of this data are kept).

  • merge_columns (dict, optional) –

    A dict that specifies fields (columns) to be merged. For example, if data1 with name ‘Data_1’ is matched to this object, and you want to merge only ‘column1’, ‘column2’ in data1 into the merged catalog, use:

    {'Data_1': ['column1', 'column2']}
    

    If, e.g, merge_columns for data2 (with name ‘Data_2’) is not specified, every fields (columns) of data2 will be merged. The list can also include regular expressions, e.g.:

    # requires `import re`
    {'Data_1': ['column1', re.compile('class.*')]}
    

    The default is {}.

  • ignore_columns (dict, optional) – A dict that specifies fields (columns) not to be merged. Similar to argument merge_columns. If both merge_columns and ignore_columns are specified for a field, the columns IN merge_columns AND NOT IN ignore_columns are merged. The default is {}.

  • innames (dict, optional) – A dict in the form of {data_name: rename_name}. This is used to generate unique output column names in case of conflicts (i.e., same column names in different Data objects). By default, columns are renamed as ‘{column_name}_{data_name}’. If a data_name is included in innames, the corresponding ‘{column_name}_{rename_name}’ will be used instead. This can be used to avoid long column names. If subsets are kept (keep_subsets=True), conflicts in subset or group names will be handled in a similar manner. The default is {}.

  • outname (str, optional) – The name of the merged data. If not given, this will be automatically generated from the names of data that are merged. The default is None.

  • keep_subsets (bool, optional) – Whether the subsets of the data are kept and merged. The default is False.

  • matchinfo_subset (bool, optional) – If keep_unmatched != [], whether to add a subset ‘matched/<this_data_name>/<name_of_data_matched_to_this_data>’, indicating whether each row can be matched to that data. The default is False.

  • verbose (bool, optional) – Whether to show more information on merging. The default is True.

Returns:

matched_data – An pyttop.table.Data object containing the merged catalog.

Return type:

pyttop.table.Data

Notes

If the keep_unmatched is not empty ([]), say keep_unmatched=['data1']. Then, the rows in THIS data that has no match with the dataset called ‘data1’ are kept, and the columns from ‘data1’ for this row are missing values.

‘data1’ may also have its subsets. When keep_subsets is set to True, the subsets of ‘data1’ are also merged. The rows with no match with ‘data1’ always do NOT belong to the subsets merged from ‘data1’.

merge_matchinfo(depth=-1)[source]#

Merge the matchinfo for all of the children data of this data, so that each info is the match with repect to this data. If there are duplicates in the child data, only the first found is used.

Parameters:

depth (int, optional) – The depth of merging. For example, if depth == 1, only the direct children (without grandchildren) of this data are merged. if depth == -1, all children (including all grandchildren) are merged. The default is -1.

Returns:

outinfo – .

Return type:

list of objdicts

metaJson(save_path=None, yes=False)[source]#

Generate a json string for the metadata of this Data.

The metadata of a pyttop.table.Data object typically saves the information on how it was initialized, how it was merged (if it is a merged catalog), etc. It can be retrieved with data.meta. This is saved as the metadata of data.t, i.e. data.meta is data.t.meta.

Parameters:
  • save_path (str, optional) – A path to save the json as a file. The default is None (do not save).

  • yes (bool, optional) – If set to True, existing files will be overwritten without prompts. The default is False.

Returns:

meta – A json string.

Return type:

str

mm(cols=None, missval=None, verbose=True)#

Alias of mask_missing()

mskmis(cols=None, missval=None, verbose=True)#

Alias of mask_missing()

plot(func, *args, col_input=None, cols=None, kwcols={}, eval=False, eval_kwargs={}, paths=None, subsets=None, groups=None, autolabel=True, ax=None, verbose=True, global_selection=None, title=None, iter_kwargs={}, **kwargs)[source]#

Make a plot given a plot function.

Arguments paths, subsets, groups are used to specify the subsets of data that are plotted in the same subplot.

Parameters:
  • func (str or Callable) – Function to make plots, e.g. plt.plot, or name of the function, e.g. 'plot'.

  • *args – Arguments to be passed to func.

  • cols (str or list of str, optional) –

    The name of the columns to be passed to func. For example, if cols = ['col1', 'col2'], func will be called by:

    func(data['col1'], data['col2'], *args)
    

    Note: When autolabel is True, the len of this argument is used to guess the dimension of the plot (e.g. 2D/3D). The default is None.

  • kwcols (dict, optional) –

    Names of data columns that are passed to func as keyword arguments. For example, if kwcols={'x': 'col1', 'y':'col2'}, func will be called by:

    func(x=data['col1'], y=data['col2'])
    

  • eval (bool, optional) – If set to True, the names of data columns for cols and kwcols will be regarded as expressions to be evaluated with Data.eval(). This means that you can not only input column names, but also input expressions. See eval() for the syntax of expressions. Otherwise, the names will simply be considered as column names. The default is False.

  • eval_kwargs (dict, optional) – Keyword arguments to be passed to Data.eval() when evaluating cols and kwcols. Ignored if argument eval set to False. The default is {}.

  • paths (str or list of str, optional) – The full path of a subset (e.g. '<group_name>/<subset_name>') or a list of paths. If this is given, arguments subsets and group are ignored. The default is None.

  • subsets (str or list of str, optional) – The names of subsets, or a list of names. The default is all subsets in the specified group.

  • groups (str, optional) – The name of the group. The default is the default group.

  • autolabel (bool, optional) –

    If True, will try to automatically add labels to the plot (made by func) as well as the axes, using the labels stored in Data and Subset objects.

    NOTE: The labels for axes are auto-set according to the argument columns, and may not get the results you expects. Label for axes and legends are only possible for axes if argument ax is given.

    The default is True.

  • verbose (bool, optional) – Whether some detailed information is printed. The default is True.

  • ax (axes, optional) – The axis to make the plot. The default is None.

  • global_selection (astrodata.table.Subset or str or list of str, optional) –

    The global selection [or the path(s) of the selection(s)] for this plot. If not None, only data selected by this argument is plotted. Accepted input:

    • An pyttop.table.Subset object. Note that logical operations of subsets are supported, e.g. subset1 & subset2 | subset3.

    • The path to the subset, i.e. 'groupname/subsetname'. If group name is ‘default’, you can directly use ‘subsetname’.

    • A list/tuple/set of paths to the subsets. The global selection will be the logical AND (i.e. the intersection set) of the subsets.

    The default is None.

  • title (str) – Manually setting the title of the plot. This will overwrite the title automatically generated. The default is None (automatically generated if autolabel is True).

  • iter_kwargs (dict, optional) –

    Lists of keywoard arguments that are different for each subset specified. Suppose 3 subsets are specified using the subsets argument, an example value for iter_kwargs is

    {'color': ['b', 'r', 'k'], 'linestyle': ['-', '--', '-.']}
    

    The default is {}.

  • **kwargs – Additional keyword arguments to be passed to func.

plots(func, *args, cols=None, kwcols={}, eval=False, eval_kwargs={}, plotpaths=None, plotsubsets=None, plotgroups=None, arraygroups=None, global_selection=None, share_ax=False, autobreak=False, autolabel=True, ax_callback=None, returns='fig', verbose=True, axes=None, fig=None, iter_kwargs={}, **kwargs)[source]#

Make a plot given the function func used for plotting.

If arraygroups is not None, plot an “array” of subplots (panels; subplots with several rows and columns) for different selections given in arraygroups; Each of the panels consists of several plots for different selections given in plotgroups. This is useful if one wishes to compare a plot for different subsets of the data. For example, say plotgroups='group1', arraygroups=['group2', 'group3']. Then each panel compares different subsets in 'group1'; different panels compares the results between subsets in 'group2' and 'group3'. Note that the dataset for each plot in each panel is the INTERSECTION of the corresponding subsets in 'group1', 'group2' and 'group3'.

Parameters:
  • func (str or Callable or pyttop.plot.PlotFunction) –

    Name of the matplotlib.pyplot function used to make plots, e.g. 'plot', 'scatter'.

    Also accepts custum functions that receives an axis as the only argument, and returns a function (called “plotting function” hereafter) to make plots. Example:lambda ax: ax.plot.

    You can also input your custom plot function func defined by:

    from pyttop.plot import plotFunc
    @plotFunc
    def func(<your inputs>):
        <make the plot>
        return # you can return somthing here
    

    Or:

    from pyttop.plot import plotFuncAx
    @plotFuncAx
    def func(ax): # input ax axis
        def plot(<your inputs>):
            <make the plot>
            return # you can return somthing here
        return plot
    

  • *args – Arguments to be passed to the plotting function.

  • cols (str or list of str, optional) –

    The name of the columns to be passed to the plotting function. For example, if cols = ['col1', 'col2'], the plotting function will be called by:

    func(data['col1'], data['col2'], *args)
    

    Note: When autolabel is True, the len of this argument is used to guess the dimension of the plot (e.g. 2D/3D). The default is None.

  • kwcols (dict, optional) –

    Names of data columns that are passed to the plotting function as keyword arguments. For example, if kwcols={'x': 'col1', 'y': 'col2'}, the plotting function will be called by:

    func(x=data['col1'], y=data['col2'])
    

  • eval (bool, optional) – If set to True, the names of data columns for cols and kwcols will be regarded as expressions to be evaluated with Data.eval(). This means that you can not only input column names, but also input expressions. See Data.eval() for the syntax of expressions. Otherwise, the names will simply be considered as column names. The default is False.

  • eval_kwargs (dict, optional) – Keyword arguments to be passed to Data.eval() when evaluating cols and kwcols. Ignored if argument eval set to False. The default is {}.

  • paths – aliases of “plotpaths”, “plotsubsets” and “plotgroups”.

  • subsets – aliases of “plotpaths”, “plotsubsets” and “plotgroups”.

  • groups – aliases of “plotpaths”, “plotsubsets” and “plotgroups”.

  • plotpaths (str or list of str, optional) – The full path of a subset (e.g. '<group_name>/<subset_name>') or a list of paths, for plots in each subplot. If this is given, arguments plotsubsets and plotgroups are ignored. The default is None.

  • plotsubsets (str or list of str, optional) – The names of subsets, or a list of names, for plots in each subplot. The default is all subsets in the specified group.

  • plotgroups (str, optional) – The name of the subset group used to make different plots in each one of the panels. For example, when the plotting function plots curves and plotgroups consists of 3 subsets, 3 curves for the 3 subsets are plotted in each of the panels. The default is None.

  • arraygroups (str or iterable of len <= 2, optional) –

    The name of subset groups used to make different panels. Examples:

    • arraygroups = ['group1'], where ‘group1’ consists of 3 subsets. Then subplots with nrow=1, ncol=3 (1x3) are generated.

    • arraygroups = ['group1', 'group2'], where ‘group1’, ‘group2’ consists of 3, 4 subsets respectively. Then subplots with nrow=3, ncol=4 (3x4) are generated.

    The default is None.

  • global_selection (pyttop.table.Subset or str or list of str, optional) –

    Only consider data in subset global_selection. Accepted input:

    • An pyttop.table.Subset object. Note that logical operations of subsets are supported, e.g. subset1 & subset2 | subset3.

    • The path to the subset, i.e. 'groupname/subsetname'. If group name is ‘default’, you can directly use ‘subsetname’.

    • A list/tuple/set of paths to the subsets. The global selection will be the logical AND (i.e. the intersection set) of the subsets.

    The default is None (the whole dataset is considered).

  • share_ax (bool, optional) – Whether the x, y axes are shared. The default is False.

  • autobreak (bool, optional) – When arraygroups consists of only one group, whether to automatically break the row into several rows (since the default result is a group of subplots with only one row). The default is False.

  • autolabel (bool, optional) –

    If True, will try to automatically add labels to the plot (made by func) as well as the axes, using the labels stored in Data and Subset objects.

    NOTE: The labels for axes are auto-set according to the argument columns, and may not get the results you expects. Label for axes and legends are only possible for axes if argument ax is given.

    The default is True.

  • ax_callback (function, optional) – The function to be called as ax_callback(ax) after plotting in each panel, where ax is the axis object of this panel.

  • returns (str, optional) –

    Decide what to return.

    • 'fig' or 'fig, axes':

      return figure and axes.

    • 'plot' or 'return':

      return a list of the returned values of the plot function.

    Whatever this argument is, you can always retrive the figure, axes and the returned values (of the plot function) of the last call of data.plot() with data.plot_fig, data.plot_axes, data.plot_returns.

  • verbose (bool, optional) – Whether some detailed information is printed. The default is True.

  • ax – alias of “axes”.

  • axes (list of axes, optional) – The axes of the subplots. The default is None.

  • fig (matplotlib.figure.Figure, optional) – The figure on which the subplots are made. The default is None.

  • iter_kwargs (dict, optional) –

    Lists of keywoard arguments that are different for each subset in plotgroups. Suppose plotgroups='group1' consists of 3 subsets, an example value for iter_kwargs is

    {'color': ['b', 'r', 'k'], 'linestyle': ['-', '--', '-.']}
    

    The default is {}.

  • **kwargs – Additional keyword arguments to be passed to the plotting function.

Returns:

  • fig (matplotlib.figure.Figure)

  • axes (matplotlib.axes.Axes or array of matplotlib.axes.Axes)

reset_match()[source]#

Remove all match information.

save(path, format='data', overwrite=False)[source]#

Save data to file.

Parameters:
  • path (str) – Path to the file.

  • format (str, optional) –

    The format of the file. The default is ‘data’. Supported formats include:

    • ’pkl’:

      Saving the full data object to a "*.pkl" file.

    • ’data’ (default):

      Saving key data (including the data table, the subsets, etc.) to a "*.data" file. Note that the matching data is not saved.

    • Other formats: Any format supported by astropy.table.Table.write.

      Only saving the data table (astropy.table.Table). This is equivalent to data.t.write(<...>).

  • overwrite (bool, optional) – Whether to overwrite the file if it exists. If set to False, a FileExistsError will be raised. The default is False.

Raises:

FileExistsError – The file already exists.

Notes

Notes for developers

When setting format='pkl', a Data object will be saved with the standard pickle module. This means that all data for the object is converted and saved as a byte stream. When setting format='data', only a selected subset of attributes will be saved separately, and are not necessarily saved with the Python’s standard pickling protocols. This makes it possible to retrieve some data from the '*.data' file even without e.g. Python’s pickle module.

set_labels(**kwargs)[source]#

label(<column_name>=<label>)

Add/update the labels used for, e.g., the labels on the axes of the plots.

Example: if col1='$x_1$', the data in data.t['col1'] will be labeled as ‘$x_1$’ on the plots.

Parameters:

**kwargs (<column_name:str>=<label:str>)

sort(keys, *, keep_subsets=False, kind=None, reverse=False)[source]#

Returns a new Data instance with the table sorted according to one or more keys (columns).

Unlike the sort() method of astropy.table.Table (i.e., data.t.sort()), this method does not perform an in-place sort.

Parameters:
  • keys (str or list of str) – The column name(s) to order the table by.

  • keep_subsets (bool, optional) – If True, the subsets will be preserved. The default is False.

  • kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – Sorting algorithrm used by numpy.argsort.

  • reverse (bool, optional) – If True, sort in reverse order. The default is False.

Returns:

A Data instance with the table sorted.

Return type:

pyttop.table.Data

ss(group=None)#

Alias of subset_summary()

subdat(path=None, name=None, group=None, expr=None, verbose=True, **kwargs)#

Alias of subset_data()

subplot_array(func, *args, cols=None, kwcols={}, eval=False, eval_kwargs={}, plotpaths=None, plotsubsets=None, plotgroups=None, arraygroups=None, global_selection=None, share_ax=False, autobreak=False, autolabel=True, ax_callback=None, returns='fig', verbose=True, axes=None, fig=None, iter_kwargs={}, **kwargs)[source]#

Deprecated name of plots()

subset_data(path=None, name=None, group=None, expr=None, verbose=True, **kwargs)[source]#

Get a subset (or several subsets) of data by specifying the subset(s) using the name(s) of subset group(s), subset(s), or the full path(s) (i.e. '<group_name>/<subset_name>'). This is different from the get_subsets method, which returns the Subset objects.

You may also pass a Subset object or a list of Subset objects to the path parameter, to directly get the data.

For convenience, you can also directly specify an expression:

data.subset_data(expr = 'col1 == 1')

Which is equivalent to:

data.subset_data(data.add_subsets(Subset(expr), group='temp'))

This is similar to:

data.t[data.t['col1'] == 1]

but supports expressions and returns a Data.

Parameters:
  • path (Subset OR list of Subset OR str OR list of str, optional) – A Subset object or a list of Subset objects, or the path or a list of paths. If this is given, arguments name and group are ignored. The default is None.

  • name (str or list of str, optional) – The names of subsets, or a list of names. The default is all subsets in the specified group.

  • group (str, optional) – The name of the group. The default is the default group.

  • expr (str, optional) – An expression that can be evaluated with Data.eval() (e.g. col1 == 1). The default is None.

  • verbose (bool, optional) – Whether print more information or not. The defualt is True.

  • kwargs – Arguments passed to Subset() or Data.eval().

Returns:

The subset of data or list of subsets of data specified.

Return type:

pyttop.table.Data or list of pyttop.table.Data

Examples

subset_group_from_ranges(column, ranges, group_name=None, overwrite=False)[source]#

Create a subset group by setting several ranges of values of a column.

For example, data.subset_group_from_ranges(column='col1', ranges=[[0, 1], [1, 2]]) defines a subset group named 'col1', which includes 2 subsets, 0 < col1 < 1 and 1 < col1 < 2.

Parameters:
  • column (str) – The name of the column.

  • ranges (list of lists (or similar objects)) – List of ranges.

  • group_name (str, optional) – The name of the created subset group. The default is the name of the column.

  • overwrite (bool, optional) – When a group with group_name already exists, whether to overwrite the group. The default is False.

Returns:

A list of the created subsets.

Return type:

list

Raises:

ValueError – A group with group_name already exists, and overwrite set to False.

subset_group_from_values(column, group_name=None, overwrite=False)[source]#

Create a subset group by the unique values of a column.

For example, if a column named “class” has 3 possible values, “A”, “B” and “C”, a subset group will be defined with 3 subsets for class=A, B, C, respectively.

Parameters:
  • column (str) – The name of the column.

  • group_name (str, optional) – The name of the created subset group. The default is the name of the column.

  • overwrite (bool, optional) – When a group with group_name already exists, whether to overwrite the group. The default is False.

Raises:

ValueError – A group with group_name already exists, and overwrite set to False.

subset_summary(group=None)[source]#

Get a summary table for the subsets and subset groups.

The table consists of the following columns:

  • group: name of the subset group

  • name: name of the subset

  • size: size of the subset

  • fraction: fracion of the size to the total number

  • expression: expression/source code that specifies the selection of the subset

  • label: label of the subset used for plotting

Parameters:

group (str or list of str, optional) – The name (or list of names) of the subset group(s) to be shown in the table. If not given, all groups will be shown by default.

Returns:

summary

Return type:

astropy.table.Table

subsets()[source]#

Retrieve subsets organized by group and subset names, accessible like a nested dictionary.

Example

>>> subsets = data.subsets()
>>> mysubset = subsets['group_name']['subset_name']
subsum(group=None)#

Alias of subset_summary()

tree(depth=-1, detail=True)#

Alias of match_tree()

unmatch(data1, verbose=True)[source]#

Remove the match of data1.

Parameters:
  • data1 (pyttop.table.Data or str) – The Data or the name of the Data.

  • verbose (bool, optional) – Whether to output information. The default is True.

Return type:

None.

class Subset(selection, name=None, expression=None, label=None, **kwargs)[source]#

A class to specify a row subset of a pyttop.table.Data object. Although this class is independent to the Data class, it should be only used together with a Data object.

The common way to specify the selection criteria, name, etc. of a subset is:

Subset(<selection>, name=<name>, <...>)

Convenient methods for specifying a subset are:

Subset.by_range(<column name>=<value range>, <...>)
Subset.by_value(<column name>, <value>)

See by_range() and by_value() for more information.

In practice, a subset of data is usually defined using the add_subsets() method:

subset = data.add_subsets(Subset(<...>))

You may also define multiple subsets at a time:

subset1, subset2 = data.add_subsets(
    Subset(<...>),
    Subset(<...>))

Subset objects can be used as if they are arrays (for most cases). For example, you can get the intersection set subset1 & subset2, the union set subset1 | subset2, and the complementary set ~subset1.

Note that the name (will be auto-generated if not given) is used as the address of a subset in a data. If you add a subset to a certain subset group in which the name is already used by another subset, the original subset will be replaced and no longer recognized as part of that data:

s1 = data.add_subsets(Subset(<...>, name='subset'))
s2 = data.add_subsets(Subset(<...>, name='subset')) # this replaces the original subset at 'default/subset'
s1 in data, s2 in data # (False, True)
Parameters:
  • selection (callable (e.g. function), iterable (e.g. array-like) or string) –

    If it is iterable, it should be a boolean array indicating whether each row is included in this subset. It should have a shape of (len(data),) where data is an pyttop.table.Data instance.

    If it is callable, it should be defined like below:

    def selection(table): # input: astropy.table.Table object
        <...>
        return arr # boolean array
                   # whether each row is included in subset
    

    If it is a string, should be an expression that can be evaluated by Data.eval, e.g. '(column1 > 0) & (column2 < 1)'. Refer to eval() for details.

  • name (str, optional) – The name of the subset. The default is None.

  • expression (str, optional) – The expression [e.g. ‘(col1 > 0) & (col2 == “A”)’] used to recognize the conditions. The default is None.

  • label (str, optional) – The label used in figures. The default is None.

  • kwargs – Arguments passed to Data.eval() if selection is evaluated as an expression.

Notes

The parameters selection, name, expression, and label will become attributes of the Subset object. By defining a subset using data.add_subsets(Subset(<...>)), they are evaluated given data:

  • The attribute selection will be converted to a boolean array;

  • The attribute name will be set to the default name if it is None;

  • The attribute expression will be automatically set if it is None;

  • The attribute label will be set to name if it is None; strings will be replaced according to the mapping of dict data.col_labels.

If the input selection is/results in a masked (boolean) array, the masked elements are filled with False (which means that they are NOT included in this subset by definition). This often happens when selection is calculated from a masked column of the table. The final selection after evaluation is never a masked array.

Caveat. Subsets constructed with expr and ~expr are NOT necessarily complements of each other! See the example below:

>>> from pyttop.table import Data, Subset
>>> d = Data(name='test')
>>> d['x'] = [-1, 1, -99]
>>> d.mask_missing(missval=-99)
[mask missing] col 'x': 1/3 (33.33%) masked (value: -99).
>>> s1 = d.add_subsets(Subset('x < 0'))
>>> s2 = d.add_subsets(Subset('~(x < 0)'))
>>> s21 = d.add_subsets(Subset('x >= 0'))
>>> s1, s1.selection
(<Subset 'x < 0' of Data 'test' (1/3)>, array([ True, False, False]))
>>> s2, s2.selection
(<Subset '~(x < 0)' of Data 'test' (1/3)>, array([False,  True, False]))
>>> s21, s21.selection
(<Subset 'x >= 0' of Data 'test' (1/3)>, array([False,  True, False]))
>>> (~s1), (~s1).selection
(<Subset 'NOT(x < 0)' of Data 'test' (2/3)>, array([False,  True,  True]))
>>> (~s2), (~s2).selection
(<Subset 'NOT(~(x < 0))' of Data 'test' (2/3)>, array([ True, False,  True]))
>>> (~s21), (~s21).selection
(<Subset 'NOT(x >= 0)' of Data 'test' (2/3)>, array([ True, False,  True]))
classmethod by_range(**ranges)[source]#

Initializes a subset by specifying ranges for the data.

For example, Subset.by_range(col1=[0, 1], col2=[0, np.inf]) defines a subset with (0 < col1 < 1) & (col2 > 0).

Parameters:

**ranges (key - value pairs:) –

keystr

Name of the column in the data.

valuelist or tuple (or other similar objects) with length=2

List of 2 numbers, e.g. [0, 1], specifying a range of the column.

Return type:

pyttop.table.Subset

classmethod by_value(column, value)[source]#

Initializes a subset by specifying the exact value of column.

Parameters:
  • column (str) – Name of the data column.

  • value – Value of the column.

Return type:

pyttop.table.Subset

eqs(subset)[source]#

Checks if selections of two subsets are the same. For example:

if subset1.eqs(subset2):
    print('same')
Parameters:

subset (pyttop.table.Subset)

Return type:

bool

eval_(data, existing_keys=())[source]#

Evaluate the selection array, expression, name and label, given data. This method should be executed if, self.selection is not a boolean array OR either self.name or self.expression is None.

Note that if a subset is added to data using data.add_subset(Subset(<...>)), this method is already called, and do not need to be called again.

Parameters:
  • data (pyttop.table.Data)

  • existing_keys (Iterable, optional) – Names of subsets that already exists. This is used to automatically generate subset names. The default is ().

pyttop.matcher#

Created on Sat Jul 30 2022

@author: Yuchen Wang

Built-in matchers.

exception DuplicationWarning[source]#
class ExactMatcher(value, value1=None)[source]#

Used to match pyttop.table.Data objects data1 to data. Match records with exact values. This should be passed to method data.match(). See help(data.match).

Parameters:
  • value (str or Iterable) – Specify values for data used to match catalogs. Possible inputs are: - str, name of the field used for matching. - Iterable, values for data. len(value) should be equal to len(data).

  • value1 (str or Iterable, optional) –

    Specify values for data1 used to match catalogs. Possible inputs are: - str, name of the field used for matching. - Iterable, values for data1. len(value1) should be equal to len(data1).

    If not given and value is a string, value1 set to the same as value.

class IdentityMatcher[source]#

Used to match pyttop.table.Data objects data1 to data. Directly match records row by row, i.e. row #1 matched to row #1, row #2 matched to row #2, etc. Only possible if len(data1) == len(data). This should be passed to method data.match(). See help(data.match).

class SkyMatcher(thres=1, coord=None, coord1=None, unit=Unit('deg'), unit1=Unit('deg'))[source]#

Used to match pyttop.table.Data objects data1 to data. Match records with nearest coordinates. This should be passed to method data.match(). See help(data.match).

Parameters:
  • thres (number, optional) – Threshold in arcsec. The default is 1.

  • coord (str or astropy.coordinates.SkyCoord, optional) – Specify coordinate for the base data. Possible inputs are: - astropy.coordinates.SkyCoord (recommended), the coordinate object. - str, should be like ‘RA-DEC’, which specifies the column name for RA and Dec. - None (default), will try [‘ra’, ‘RA’] and [‘DEC’, ‘Dec’, ‘dec’]. The default is None.

  • coord1 (str or astropy.coordinates.SkyCoord, optional) – Specify coordinate for the matched data. Possible inputs are: - astropy.coordinates.SkyCoord (recommended), the coordinate object. - str, should be like ‘RA-DEC’, which specifies the column name for RA and Dec. - None (default), will try [‘ra’, ‘RA’] and [‘DEC’, ‘Dec’, ‘dec’]. The default is None.

  • unit (astropy.units.core.Unit or list/tuple/array of it) – If astropy.coordinates.SkyCoord object is not given for coord, this is used to specify the unit of coord. The default is astropy.units.deg.

  • unit1 (astropy.units.core.Unit or list/tuple/array of it) – If astropy.coordinates.SkyCoord object is not given for coord1, this is used to specify the unit of coord1. The default is astropy.units.deg.

Notes

The data columns for RA, Dec may already have units (e.g. data.t['RA'].unit). In this case, any input for unit or unit1 is ignored, and the units recorded in the columns are used.

explore(data, data1)[source]#

Plot as simple histogram to check the distribution of the minimum (2-d) sky separation.

Parameters:
  • data (pyttop.table.Data) – The base data of the match.

  • data1 (pyttop.table.Data) – The data to be matched to data1.

Return type:

None.

pyttop.plot#

Created on Sun Jul 13 12:33:34 2025

@author: Yu-Chen Wang

plotFunc(f)[source]#

Makes a function compatible to pyttop.table.Data.

Usage:

@plotFunc
def plot_func(<your inputs ...>):
    <make the plot>
plotFuncAx(f)[source]#

Makes a function compatible to pyttop.table.Data.

Usage:

@plotFuncAx
def f(ax): # inputs axis object `ax`
    def plot_func(<your inputs ...>):
        <make the plot>
    return plot_func

pyttop.utils#

Created on Sat Jul 30 2022

@author: Yuchen Wang

bitwise_all(iterable)[source]#

Return the bitwise all of an iterable. For example, bitwize_all([a, b, c]) is equivalent to a & b & c.