Quickstart
This first tutorial creates an ete3.Tree, attaches node features, converts it
to a PyTorch Geometric Data object, adds a target label, runs a tiny training
smoke test, and prints a prediction.
Create a small tree
def build_tree() -> Tree:
return Tree("((A:1.0,B:1.5)C:0.5,D:2.0)root:0.0;", format=1)
Attach node features
TreeFeatureEngineer writes numeric attributes to each tree node. Use
feature_names as the stable column order for graph conversion.
def make_graph() -> Data:
engineer = TreeFeatureEngineer(num_time_bins=6)
tree = engineer.add_features(
build_tree(),
origin_time=4.0,
feature_names=FEATURE_NAMES,
rescale=False,
inplace=True,
)
converter = TreeToGraphConverter(
feature_names=FEATURE_NAMES,
add_virtual_nodes=False,
append_is_virtual_feature=False,
traversal_strategy=engineer.traversal_strategy,
)
data = converter.convert(tree, graph_attrs={"sample_id": "quickstart"})
data.y = torch.tensor([1.0], dtype=torch.float32)
return data
Convert the tree to graph data
TreeToGraphConverter reads node attributes into graph tensors. The same
snippet above also adds a dummy graph-level target label as data.y, which is
the field the trainer expects during supervised training.
Add a target label
The smoke test uses a single regression target:
data.y = torch.tensor([1.0], dtype=torch.float32)
For real datasets, attach one target per graph and keep target shape compatible with the selected model head and loss.
Validate the graph fields
Before training, check the required tensor shapes and dtypes.
def validate_graph(data: Data) -> None:
assert data.x.dim() == 2
assert data.x.dtype == torch.float32
assert data.edge_index.shape[0] == 2
assert data.edge_index.dtype == torch.long
assert data.y.shape == (1,)
assert data.y.dtype == torch.float32
For complete field semantics, including data.x, data.edge_index,
data.edge_type, data.time_bin, and deterministic node ordering, see
Graph Conversion.
Run a tiny training smoke test
Run the maintained script from the repository root:
python examples/quickstart_training.py
The training function creates a temporary output directory, trains for two epochs on the one-graph dataset, and returns one prediction.
def train_and_predict(data: Data) -> float:
with tempfile.TemporaryDirectory(prefix="phylognn_quickstart_") as temp_dir:
model = TinyGraphRegressor(input_dim=data.x.size(1))
config = TrainingConfig(
epochs=2,
batch_size=1,
learning_rate=1e-2,
weight_decay=0.0,
scheduler=None,
early_stopping_patience=None,
save_dir=str(Path(temp_dir)),
save_best_only=False,
verbose=False,
)
trainer = Trainer(model=model, config=config)
trainer.fit(train_dataset=[data])
prediction = trainer.predict(dataset=[data])
return float(prediction[0].detach().cpu().item())
Expected output includes stable markers like these:
Quickstart training summary
x shape:
edge_index shape:
target shape:
batch ready: true
prediction:
Completion summary
At this point you have created a tree, attached deterministic features, converted it to graph data, validated the required fields, trained a tiny model, and printed a prediction.
Next steps
Need |
Go to |
|---|---|
Prepare real trees and features |
|
Understand graph fields |
|
Configure datasets, splits, and TOML training |
|
Run complete scripts |