Graph Conversion ================ `TreeToGraphConverter` converts an `ete3.Tree` with numeric node attributes into a PyTorch Geometric `Data` object. Basic workflow -------------- .. code-block:: python from phylognn import TreeToGraphConverter converter = TreeToGraphConverter(feature_names=engineer.feature_names) data = converter.convert(tree, graph_attrs={"tree_id": "example"}) The converter expects every node to have every requested feature and every feature value to be numeric. When to use it -------------- Use this step after feature engineering and before any model or training code. Keep `feature_names` explicit when comparing experiments so `data.x` columns remain stable. Output fields ------------- `data.x` Floating-point node feature matrix with shape `[num_nodes, num_features]`. Column order is defined by `feature_names`, commonly `TreeFeatureEngineer.feature_names`. `data.edge_index` `torch.long` tensor with shape `[2, num_edges]`. Tree parent-child relations are included, and bidirectional conversion adds reverse edges. `data.edge_type` `torch.long` tensor aligned with `data.edge_index`. Values are `0` for tree edges, `1` for virtual-to-real edges, and `2` for virtual-chain edges. `data.node_names` Optional list aligned with graph node order when `preserve_node_names=True`. Original nodes use ETE names, unnamed nodes use an empty string, and virtual nodes use generated names. `data.original_num_nodes` Count of nodes from the original tree before virtual nodes are appended. Converted data also includes `data.virtual_node_mask` and `data.node_type`. User-provided `graph_attrs` are attached as graph-level attributes, except for reserved generated field names such as `time_bin`. When `feature_names` includes `time_bin`, the converter also attaches `data.time_bin` as a one-dimensional `torch.long` tensor with one label per final graph node. The labels follow the same row order as `data.x`. When `feature_names` does not include `time_bin`, the converter does not infer or attach `data.time_bin`. Virtual nodes ------------- Set `add_virtual_nodes=True` to add one virtual node per time bin. In this mode, `feature_names` must include `time_bin`. Virtual-to-real edges have `edge_type=1`; virtual-chain edges have `edge_type=2`. If `append_is_virtual_feature=True`, the final feature column identifies virtual nodes. If `num_time_bins` is configured, one virtual node is created for every configured bin, including empty bins. Generated `data.time_bin` labels for virtual nodes are appended in ascending bin order. Deterministic ordering ---------------------- Feature order is deterministic when callers pass an ordered sequence such as `TreeFeatureEngineer.feature_names`. Node order follows the converter traversal strategy, with `preorder` as the default. Metadata aligned to nodes, including `node_names`, follows that same order. Common validation errors ------------------------ The converter raises clear errors for missing requested node attributes, non-numeric feature values, duplicate feature names, unsupported traversal strategies, invalid virtual-node settings, and graph attribute names that collide with generated fields. Saving and loading ------------------ Use `convert_and_save()` for preprocessing pipelines, `save_data()` to store a PyTorch Geometric `Data` object, and `load_data()` to restore it with `torch.load`. Saved graph files are complete PyTorch objects; load them only from trusted PhyloGNN project outputs. Related pages ------------- See :doc:`feature_engineering` for node features, :doc:`training` for model input expectations, and :doc:`../reference/data` for the public API.