Debuging and History Tracking Using Metadata of Data

Debuging and History Tracking Using Metadata of `Data`#

PyTTOP stores relavant information in the metadata of tables and their columns. This can be helpful for debugging and tracking the history of tables, especially when working with a Data object where you have not recorded the operations performed to it.

Warning

Please be aware that the metadata is only for reference, and its correctness is NOT guaranteed (especially for the metadata of columns). PyTTOP can only record a very limited number of operations that you perform, and is never aware of any operations done directly to the underlying astropy Table or Column objects. As a result, the metadata information can be incorrect. Please avoid relying fully on the metadata. It is recommanded to stay clear about your operations and keep a record of all your data processing and analysis code.

The metadata of `Data`#

For a Data object, the metadata primarily records how it was initialized and the operations (merging and cutting given subsets, as described in corresponding pages) that have been performed on the data. This is illustrated in the examples below.

from pyttop.table import Data, Subset
from pyttop.matcher import ExactMatcher

d1 = Data(
    {
        'index': [0, 1, 2, 3, 4],
        'id': [101, 102, 104, 105, 108],
        },
    name='d1',
    )

d1.print_meta()

{
    "path": "(initalized from a <class 'dict'> object)"
}

By printing the metadata withs d1.print_meta(), we can see that this Data object was initialized from a dictionary. If it was loaded from a file, the 'path' field will indicate the file’s location.

If the data is cut given a subset, the metadata will indicate the details of the cut:

d1.add_subsets(Subset('id > 103'))
d1_cut = d1.subset_data('id > 103')

d1_cut.print_meta()

{
    "path": "(data 'd1' cut by subset)",
    "subset": {
        "name": "id > 103",
        "expression": "id > 103",
        "label": "id > 103",
        "fraction": "3/5"
    },
    "notes": "The metadata for the orignal data 'd1' is recorded in 'meta'.",
    "meta": {
        "path": "(initalized from a <class 'dict'> object)"
    }
}

The information of the subset used to cut the data is recorded in the 'subset' field, and the metadata of the original data is recorded in the 'meta' field.

If the data is merged from several tables, the metadata will record the details of the matching and merging process:

d2 = Data(name='d2')
d2['index'] = [0, 1, 2, 3, 4, 5]
d2['ID'] = [103, 104, 105, 101, 107, 106]

d1.match(d2, ExactMatcher('id', 'ID'), verbose=False)

d_merged = d1.merge(verbose=False)

d_merged.print_meta()

{
    "path": "(merged data)",
    "merging": {
        "notes": "This is a table merged from several tables. The merging information is recorded below. The metadata for merged datasets are recorded in \"metas\".",
        "options": {
            "depth": -1,
            "keep_unmatched": [],
            "keep_subsets": false,
            "matchinfo_subset": false
        },
        "tree": [
            "",
            "d1 [base]",
            ":   d2 [ExactMatcher(\"id\", \"ID\")]",
            ""
        ],
        "merged": [
            "d1",
            "d2"
        ],
        "metas": {
            "d1": {
                "path": "(initalized from a <class 'dict'> object)"
            },
            "d2": {
                "path": "(initalized from a <class 'NoneType'> object)"
            }
        }
    }
}

The metadata of the original Data objects is always included. If the original Data objects themselves were generated through merging or cutting, their metadata—containing the merging/cutting information—will be nested within the metadata of the new Data object.

The metadata of columns#

The metadata of columns records the original Data they originate from, as well as modifications made to them. For example, when data is merged from several tables, you can use this metadata to determine which original table a particular column came from.

The information can be seen using data.from_which(<colunm name>). For example:

for col in ['id', 'ID']:
    print(col, '-->', d_merged.from_which(col))

id --> d1 ((initalized from a <class 'dict'> object))
ID --> user-added (set by user)

/tmp/ipykernel_826/1437402851.py:2: UserWarning: WARNING: The information for user-added columns may be invalid.
  print(col, '-->', d_merged.from_which(col))

This indicates that the column named 'id' comes from a Data object named 'd1', which was initialized from a dictionary, while the ‘ID’ column was added by the user (via d2['ID'] = ...).

For example, if we overwrite 'id' using the eval() method (as introduced here), this change is also recorded:

d_merged.eval('10 * id', to_col='id')
d_merged.from_which('id')

/tmp/ipykernel_826/529496481.py:2: UserWarning: WARNING: The information for user-added columns may be invalid.
  d_merged.from_which('id')

"d1 ((initalized from a <class 'dict'> object); modified by user with expr '10 * id')"

Debuging and History Tracking Using Metadata of Data

Contents

Debuging and History Tracking Using Metadata of Data#

The metadata of Data#

The metadata of columns#

Debuging and History Tracking Using Metadata of `Data`#

The metadata of `Data`#