Quickstart

This first tutorial creates an ete3.Tree, attaches node features, converts it to a PyTorch Geometric Data object, adds a target label, runs a tiny training smoke test, and prints a prediction.

Create a small tree

def build_tree() -> Tree:
    return Tree("((A:1.0,B:1.5)C:0.5,D:2.0)root:0.0;", format=1)

Attach node features

TreeFeatureEngineer writes numeric attributes to each tree node. Use feature_names as the stable column order for graph conversion.

def make_graph() -> Data:
    engineer = TreeFeatureEngineer(num_time_bins=6)
    tree = engineer.add_features(
        build_tree(),
        origin_time=4.0,
        feature_names=FEATURE_NAMES,
        rescale=False,
        inplace=True,
    )
    converter = TreeToGraphConverter(
        feature_names=FEATURE_NAMES,
        add_virtual_nodes=False,
        append_is_virtual_feature=False,
        traversal_strategy=engineer.traversal_strategy,
    )
    data = converter.convert(tree, graph_attrs={"sample_id": "quickstart"})
    data.y = torch.tensor([1.0], dtype=torch.float32)
    return data

Convert the tree to graph data

TreeToGraphConverter reads node attributes into graph tensors. The same snippet above also adds a dummy graph-level target label as data.y, which is the field the trainer expects during supervised training.

Add a target label

The smoke test uses a single regression target:

data.y = torch.tensor([1.0], dtype=torch.float32)

For real datasets, attach one target per graph and keep target shape compatible with the selected model head and loss.

Validate the graph fields

Before training, check the required tensor shapes and dtypes.

def validate_graph(data: Data) -> None:
    assert data.x.dim() == 2
    assert data.x.dtype == torch.float32
    assert data.edge_index.shape[0] == 2
    assert data.edge_index.dtype == torch.long
    assert data.y.shape == (1,)
    assert data.y.dtype == torch.float32

For complete field semantics, including data.x, data.edge_index, data.edge_type, data.time_bin, and deterministic node ordering, see Graph Conversion.

Run a tiny training smoke test

Run the maintained script from the repository root:

python examples/quickstart_training.py

The training function creates a temporary output directory, trains for two epochs on the one-graph dataset, and returns one prediction.

def train_and_predict(data: Data) -> float:
    with tempfile.TemporaryDirectory(prefix="phylognn_quickstart_") as temp_dir:
        model = TinyGraphRegressor(input_dim=data.x.size(1))
        config = TrainingConfig(
            epochs=2,
            batch_size=1,
            learning_rate=1e-2,
            weight_decay=0.0,
            scheduler=None,
            early_stopping_patience=None,
            save_dir=str(Path(temp_dir)),
            save_best_only=False,
            verbose=False,
        )
        trainer = Trainer(model=model, config=config)
        trainer.fit(train_dataset=[data])
        prediction = trainer.predict(dataset=[data])
    return float(prediction[0].detach().cpu().item())

Expected output includes stable markers like these:

Quickstart training summary
x shape:
edge_index shape:
target shape:
batch ready: true
prediction:

Completion summary

At this point you have created a tree, attached deterministic features, converted it to graph data, validated the required fields, trained a tiny model, and printed a prediction.

Next steps

Need	Go to
Prepare real trees and features	User Guide
Understand graph fields	Graph Conversion
Configure datasets, splits, and TOML training	Training Configuration
Run complete scripts	Examples