mllf.cb.value_net

Value network for baseline estimation in REINFORCE.

The value network learns to predict the expected reward for a given combination, providing a state-dependent baseline that reduces variance in policy gradient updates. This is a standard component in Actor-Critic methods (A2C, PPO, etc.).

Classes

`QNetwork`(in_dim[, action_dim, hidden_dims])	Per-edge Q(s, a) critic for per-pair credit assignment.
`ValueNetwork`(emb_dim[, hidden_dims])	Value network that predicts expected reward from graph encoding.