Core Subpackage

This subpackage contains the data processing and orchestration pipeline.

Data Module

yugiquery.core.data.find_cards(list_df: DataFrame, card_data: bool = False, set_data: bool = False) → DataFrame

Match a card list against latest datasets and optionally enrich with card data. The function takes an input DataFrame containing a list of cards, and attempts to match each card against the latest card and set datasets. It first checks if the input DataFrame is empty and returns it if so. Then it loads the necessary reference data based on the columns present in the input DataFrame. The matching process is performed in three stages: first by “Card number” using set data, then by “Password” using card data, and finally by “Name” using either card or set data. After matching, the function finalizes the matched cards by grouping and summing counts, and optionally merging with card data for enrichment.

Parameters:

list_df (pd.DataFrame) – DataFrame containing a list of cards to match, with columns such as “Name”, “Card number”, or “Password”.
card_data (bool, optional) – Whether to load and merge card data for enrichment. Defaults to False.
set_data (bool, optional) – Whether to load and merge set data for matching by card number. Defaults to False.

Returns:

A DataFrame containing the matched cards, enriched with card data if requested.

Return type:

pd.DataFrame

yugiquery.core.data.get_releases_by(df, column=None, operation='debut', numeric=False, crosstab=False) → DataFrame

Get release dates grouped by column using a selectable operation among “debut”, “last”, “first”, or “all”. By default, it returns the debut release date for each group, but it can also return the last release date, the first release date, or all unique release dates for each group. If a column is specified, it groups by that column and “Name”, otherwise it groups by “Name” alone. The function also attempts to convert the grouping column to numeric if numeric is True, and can return a crosstab of release dates by the grouping column if crosstab is True.

Parameters:

df (pd.DataFrame) – DataFrame containing card data with a “Release” column and optionally a column to group by.
column (str, optional) – The column name to group by (e.g., “Primary type”). If None, groups by “Name”. Defaults to None.
operation (str, optional) – The operation to perform on release dates. Options are “debut” (default), “last”, “first”, or “all”. Defaults to “debut”.
numeric (bool, optional) – Whether to attempt to convert the grouping column to numeric. Defaults to False.
crosstab (bool, optional) – Whether to return a crosstab of release dates by the grouping column instead of a DataFrame with release dates. Defaults to False.

Returns:

A DataFrame containing release dates grouped by the specified column and operation, or a crosstab if crosstab is True.

Return type:

pd.DataFrame

yugiquery.core.data.load_changelog_for(name: str, timestamp: str | Arrow | None) → DataFrame | None

Load the changelog covering the given timestamp for the specified name. If timestamp is None, loads the latest changelog available for the given name. Otherwise, finds the changelog file whose period covers the timestamp, or the latest one before it if none covers it. The changelog files are expected to be named in the format “{name}_changelog_{from}_{to}.bz2”, where “from” and “to” are timestamps in “YYYYMMDDTHHmmZ” format.

Parameters:

name (str) – The base name of the data file (e.g., ‘bandai’, ‘cards’, etc.)
timestamp (str | arrow.Arrow | None) – The timestamp string or Arrow object (e.g., ‘20250301T1551Z’). If None, loads the latest changelog available for the given name.

Returns:

The loaded changelog DataFrame (or None if not found).

Return type:

pd.DataFrame | None

yugiquery.core.data.load_latest(name_pattern: str, type: str = 'data') → Tuple[DataFrame | None, Arrow | None]

Loads the latest file matching the given name pattern and type, and attempts to parse specified columns as tuples. Returns the loaded DataFrame and the timestamp extracted from the filename. If no matching file is found, returns (None, None). If type is “changelog”, the timestamp returned refers to the end of the period covered by the changelog i.e. the “to” date.

Parameters:

name_pattern (str) – The pattern to match in the filename (e.g., “cards”)
type (str, optional) – The type of file to look for, either “data” or “changelog”. Defaults to “data”.
tuple_cols (List[str], optional) – Additional list of column names to attempt to parse as tuples. Defaults to [].

Returns:

A tuple containing the loaded DataFrame (or None if not found) and the timestamp (or None if not found).

Return type:

Tuple[pd.DataFrame | None, arrow.Arrow | None]

yugiquery.core.data.merge_errata(input_df: DataFrame, input_errata_df: DataFrame) → DataFrame

Merge errata information into the input DataFrame by card name. The function checks if the input DataFrame contains a “Name” column, then applies the _format_errata helper function to the input_errata_df to create a Series of errata information. This Series is merged into the input DataFrame based on the “Name” column, resulting in an “Errata” column that indicates which fields are affected by errata for each card.

Parameters:

input_df (pd.DataFrame) – DataFrame containing card information, must include a “Name” column.
input_errata_df (pd.DataFrame) – DataFrame containing errata information, with columns indicating which cards have name or type errata.

Returns:

The input DataFrame merged with errata information, containing an “Errata” column that specifies which fields are affected by errata for each card.

Return type:

pd.DataFrame

yugiquery.core.data.merge_set_info(input_df: DataFrame, input_info_df: DataFrame) → DataFrame

Merge set metadata into set list data by set and region. The input dataframe must contain “Set” and “Region” columns. The function will look up release dates based on the region and merge additional set information from the input_info_df, which should be indexed by set name.

Parameters:

input_df (pd.DataFrame) – DataFrame containing at least “Set” and “Region” columns.
input_info_df (pd.DataFrame) – DataFrame indexed by set name containing set metadata, including release dates by region.

Returns:

The input dataframe merged with set metadata and release dates.

Return type:

pd.DataFrame

yugiquery.core.data.merge_set_to_cards(*card_df, set_df) → DataFrame

Merge set lists into one or more card information dataframes. The function takes one or more card dataframes and a set dataframe, and merges them based on card names. It creates an “index” column by normalizing card names (lowercasing and removing “#”) in both the card and set dataframes, then performs an inner merge on this index. The resulting dataframe is cleaned up by dropping the index and renaming columns appropriately.

Parameters:

*card_df – One or more DataFrames containing card information, each with a “Name” column.
set_df (pd.DataFrame) – DataFrame containing set information, with a “Name” column.

Returns:

A merged DataFrame containing card information enriched with set data, matched by normalized card names.

Return type:

pd.DataFrame

yugiquery.core.data.select_level(df: DataFrame) → DataFrame

Filters a DataFrame to select cards with a valid “Level” attribute, excluding “Xyz Monster” and “Link Monster” types.

Parameters:: df (pd.DataFrame) – DataFrame containing card data with columns such as “Primary type” and either “Level/Rank/Link” or “Level”.
Raises:: ValueError – If neither “Level/Rank/Link”, “Level/Rank” nor “Level” columns are present in the DataFrame.
Returns:: Filtered DataFrame with a “Level” column, containing only rows with non-null level values and excluding “Xyz Monster” and “Link Monster” types.
Return type:: pd.DataFrame

yugiquery.core.data.select_link(df: DataFrame) → DataFrame

Filters a DataFrame to select cards with a valid “Link” attribute, including only “Link Monster” types.

Parameters:: df (pd.DataFrame) – DataFrame containing card data with columns such as “Primary type” and either “Level/Rank/Link” or “Link”.
Raises:: ValueError – If neither “Level/Rank/Link” nor “Link” columns are present in the DataFrame.
Returns:: Filtered DataFrame with a “Link” column, containing only rows with non-null link values and including only “Link Monster” types.
Return type:: pd.DataFrame

yugiquery.core.data.select_pendulum(df: DataFrame) → DataFrame

Filters a DataFrame to select cards with a valid “Pendulum Scale” attribute, including only “Pendulum Monster” types.

Parameters:: df (pd.DataFrame) – DataFrame containing card data with columns such as “Primary type” and “Pendulum Scale”.
Raises:: ValueError – If “Pendulum Scale” column is not present in the DataFrame.
Returns:: Filtered DataFrame with a “Pendulum Scale” column, containing only rows with non-null pendulum scale values and including only “Pendulum Monster” types.
Return type:: pd.DataFrame

yugiquery.core.data.select_rank(df: DataFrame) → DataFrame

Filters a DataFrame to select cards with a valid “Rank” attribute, including only “Xyz Monster” types.

Parameters:: df (pd.DataFrame) – DataFrame containing card data with columns such as “Primary type” and either “Level/Rank/Link”, “Level/Rank” or “Rank”.
Raises:: ValueError – If neither “Level/Rank/Link”, “Level/Rank” nor “Rank” columns are present in the DataFrame.
Returns:: Filtered DataFrame with a “Rank” column, containing only rows
Return type:: pd.DataFrame

yugiquery.core.data.select_stars(df: DataFrame) → DataFrame

Filters a DataFrame to select cards with a valid “Level” or “Rank” attribute, excluding “Link Monster” type, and renames the column to “Stars”.

Parameters:: df (pd.DataFrame) – DataFrame containing card data with columns such as “Primary type” and either “Level/Rank/Link”, “Level/Rank”, “Level”, “Rank” or “Stars”.
Raises:: ValueError – If neither “Level/Rank/Link”, “Level/Rank”, “Level”, “Rank” nor “Stars” columns are present in the DataFrame.
Returns:: Filtered DataFrame with a “Stars” column, containing only rows with non-null level values and excluding “Xyz Monster” and “Link Monster” types.
Return type:: pd.DataFrame

yugiquery.core.data.select_tc(df: DataFrame) → DataFrame

Filters a DataFrame to select cards that are marked as “Token” or “Counter” in the “Card type” column.

Parameters:: df (pd.DataFrame) – DataFrame containing card data with a “Card type” column.
Raises:: ValueError – If “Card type” column is not present in the DataFrame.
Returns:: Filtered DataFrame containing only rows where “Card type” is “Monster Token” or “Counter”.
Return type:: pd.DataFrame

yugiquery.core.data.select_unusable(df: DataFrame) → DataFrame

Filters a DataFrame to select cards that are marked as “Unusable” in the “Card status” column.

Parameters:: df (pd.DataFrame) – DataFrame containing card data with a “Card status” column.
Raises:: ValueError – If “Card status” column is not present in the DataFrame.
Returns:: Filtered DataFrame containing only rows where “Card status” is “Unusable”.
Return type:: pd.DataFrame

Decks Module

yugiquery.core.decks.assign_deck(collection_df: DataFrame, deck_df: DataFrame, return_collection: bool = False) → DataFrame: Match deck cards to collection cards and adjust card counts.

yugiquery.core.decks.check_limits(deck_df: DataFrame) → DataFrame: Check decklist limits for each format (e.g. TCG and OCG).

yugiquery.core.decks.convert_ydk(ydk_df: DataFrame) → DataFrame

Convert a DataFrame with YDK card codes to a DataFrame with card names.

Parameters:: ydk_df (pd.DataFrame) – DataFrame with YDK card codes.
Returns:: DataFrame with card names. If unable to obtain the card data, returns input DataFrame.
Return type:: (pd.DataFrame)

yugiquery.core.decks.get_collection(file_name: str = 'collection') → None | DataFrame

Load a user collection from CSV or Excel. The function looks for a file with the specified name in the data directory, first checking for an Excel file and then a CSV file. If an Excel file is found, it loads all sheets and concatenates them into a single DataFrame with an additional “Collection” column indicating the sheet name. If a CSV file is found, it loads it directly into a DataFrame. If no file is found, it logs a warning and returns None.

Parameters:: file_name (str, optional) – The base name of the collection file (without extension). Defaults to “collection”.
Returns:: The loaded collection DataFrame if a file is found, otherwise None.
Return type:: pd.DataFrame | None

yugiquery.core.decks.get_decklists(*files: Path | str) → DataFrame

Load decklist files and return a DataFrame with the card names.

Parameters:: files (Path | str) – Paths to the decklist files. If not provided, loads all decklist files in the data directory.
Returns:: DataFrame with card names.
Return type:: (pd.DataFrame)

yugiquery.core.decks.get_ydk(*files: Path | str) → DataFrame

Load YDK files and return a DataFrame with the card names.

Parameters:: files (Path | str) – Paths to YDK files. If not provided, loads all YDK files in the data directory.
Returns:: DataFrame with card names. If unable to obtain the card data, returns raw YDK DataFrame.
Return type:: (pd.DataFrame)

yugiquery.core.decks.read_decklist(file_path: Path | str) → DataFrame

Read a decklist file and return a DataFrame with the card names.

Parameters:: file_path (Path, str) – Path to the decklist file.
Returns:: DataFrame with the card names.
Return type:: (pd.DataFrame)

yugiquery.core.decks.read_ydk(file_path: Path | str) → DataFrame

Read a YDK file and return a DataFrame with the card codes.

Parameters:: file_path (Path | str) – Path to the YDK file.
Returns:: DataFrame with the card codes.
Return type:: (pd.DataFrame)

Maintenance Module

class yugiquery.core.maintenance.BenchmarkEntry: Bases: TypedDict

yugiquery.core.maintenance.benchmark(timestamp: Arrow, group: str = 'report', entry: str | None = None) → None

Record report execution time and persist benchmark history.

Parameters:

timestamp (arrow.Arrow) – Start timestamp for execution.
entry (str | None, optional) – Entry name. If None, infer notebook stem.
group (str | None, optional) – Group name.

yugiquery.core.maintenance.cleanup_data(dryrun: bool = False) → None

Clean up redundant data files and compact benchmark/changelog history.

Performs two main operations: 1. Condenses benchmark history using weighted averages. 2. Removes redundant changelog and data files, keeping only the most recent from each month

and condensing multiple changelogs within the same month into a single consolidated file.

Parameters:: dryrun (bool, optional) – If True, log intended actions without modifying files. Defaults to False.

yugiquery.core.maintenance.condense_benchmark(benchmark: Dict[str, Dict[str, List[BenchmarkEntry]]]) → Dict[str, Dict[str, List[BenchmarkEntry]]]

Condense benchmark history by weighted average and total weight for each key.

Parameters:: benchmark (Dict[str, List[BenchmarkEntry]]) – Benchmark data dictionary.
Returns:: Condensed benchmark dictionary.
Return type:: Dict[str, List[BenchmarkEntry]]

yugiquery.core.maintenance.condense_changelogs(files: List[str | Path]) → DataFrame

Condense multiple changelog files into a consolidated dataframe.

Parameters:: files (List[Path | str]) – A list of changelog file paths.
Returns:: The consolidated changelog dataframe.
Return type:: pd.DataFrame

yugiquery.core.maintenance.generate_changelog(previous_df: DataFrame, current_df: DataFrame, col: str | List[str]) → DataFrame

Generate a changelog DataFrame by comparing two DataFrames on key columns.

Parameters:

previous_df (pd.DataFrame) – Previous version of the data.
current_df (pd.DataFrame) – Current version of the data.
col (str | List[str]) – Column(s) used as comparison keys.

Returns:

DataFrame containing old/new rows for changed records.

Return type:

pd.DataFrame

yugiquery.core.maintenance.update_index(commit: bool = False, page_paths: List[str | Path] | None = None) → str

Update index.md and README.md report table and last execution timestamp.

Parameters:

commit (bool, optional) – If True, commit changes after updating the index.
page_paths (List[Path | str] | None, optional) – Additional .md pages to consider.

Returns:

Git commit output or dry-run advisory message.

Return type:

str

Pipeline Module

yugiquery.core.pipeline.PAPERMILL_TIMEOUT = 300: Timeout in seconds for papermill notebook execution. Can be set via the PAPERMILL_TIMEOUT environment variable. Defaults to 300 seconds (5 minutes).

yugiquery.core.pipeline.run(data: str | List[str] = [], report: str | List[str] | List[Path] = [], progress_handler: ProgressHandler | None = None, cleanup: bool | Literal['auto'] = 'auto', dryrun: bool = False, changelog: bool = True, benchmark: bool = True, squash: bool = True, jekyll: bool = False, discord: bool | Namespace = False, telegram: bool | Namespace = False) → None

Executes all notebooks in the user and package NOTEBOOKS directories that match the specified report, updates the page index to reflect the last execution timestamp, and clean up redundant data files.

Parameters:

data (str | List[str], optional) – The data update flow(s) to run. Defaults to [].
report (str | List[str] | List[Path], optional) – The report(s) to generate. Can be a string, a Path, or a list of strings/Paths. Defaults to [].
progress_handler (ProgressHandler | None, optional) – An optional ProgressHandler instance to report execution progress. Defaults to None.
cleanup (bool | Literal["auto"], optional) – whether to cleanup data files after execution. If True, perform cleanup, if False, doesn’t perform cleanup. If ‘auto’, performs cleanup if there are more than 4 data files for each report (assuming one per week). Defaults to ‘auto’.
dryrun (bool, optional) – dryrun flag to pass to notebook execution and other operations. If True, changes are not committed to Git and data cleanup will only log intended changes. Defaults to False.
squash (bool, optional) – squash commits after execution. Defaults to True.
jekyll (bool, optional) – whether to generate Jekyll markdown pages for HTML reports. Defaults to False.
discord (bool | argparse.Namespace, optional) – Discord configuration, either as a boolean or argparse.Namespace. Default is False.
telegram (bool | argparse.Namespace, optional) – Telegram configuration, either as a boolean or argparse.Namespace. Default is False.

Raises:

Exception – Raised if any exceptions occur during notebook execution.

Returns:

This function does not return a value.

Return type:

None

yugiquery.core.pipeline.run_notebooks(reports: str | list[str] | List[Path], pbars: List[tqdm_asyncio] = [], commit: bool = True) → None

Execute specified Jupyter notebooks using Papermill.

Parameters:

reports (str | List[str] | List[Path]) – List of notebooks to execute.
pbars (List[tqdm], optional) – List of progress bars to update during execution. Default is an empty list.
commit (bool, optional) – Whether to commit changes after execution. Default is True.

Returns:

None

Raises:

Exception – Raised if any exceptions occur during notebook execution.

yugiquery.core.pipeline.update_data(flows: List[str] | str = 'all', pbars: List[tqdm_asyncio] = [], changelog=True, benchmark=True, commit=True) → dict[str, tuple[DataFrame, Path, Path | None]]

Update data for the specified flow(s). Accepts a string or a list of strings. If ‘all’, updates all flows. Ignores unknown flows. Returns a dictionary with results for each flow.

Parameters:

flows (List[str] | str, optional) – The flow(s) to update. Defaults to ‘all’.
pbars (List[tqdm], optional) – List of progress bars to update during the process. Defaults to [].
changelog (bool, optional) – Whether to generate and save changelogs for the updated data. Defaults to True.
benchmark (bool, optional) – Whether to benchmark the data update process and save the results. Defaults to True.
commit (bool, optional) – Whether to commit the updated data and benchmark results to Git. Defaults to True.

Returns:

A dictionary where keys are flow names and values are tuples containing the updated DataFrame, the path to the saved data file, and the path to the generated changelog file (or None if no changelog was generated).

Return type:

dict[str, tuple[pd.DataFrame, Path, Path | None]]