Documentation
¶
Index ¶
- func FixedSplit(numBlocks, n int) []int
- func FormatTable(report EvalReport) string
- func HeadingSplit(blocks []model.Block) []int
- func Pk(ref, hyp []int, numBlocks, k int) float64
- func RandomSplit(numBlocks, numBoundaries int, seed int64) []int
- func SaveGold(path string, ann *GoldAnnotation) error
- func WindowDiff(ref, hyp []int, numBlocks, k int) float64
- type BoundaryScores
- type EvalReport
- type GoldAnnotation
- type GoldParams
- type MethodResult
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func FixedSplit ¶
FixedSplit places boundaries every n blocks, producing roughly equal-sized segments.
func FormatTable ¶
func FormatTable(report EvalReport) string
FormatTable renders an EvalReport as a human-readable table.
func HeadingSplit ¶
HeadingSplit places a boundary before every heading block. Returns sorted gap indices (the gap before the heading).
func Pk ¶
Pk computes the Pk evaluation metric for text segmentation. ref and hyp are sorted boundary indices (gap positions between blocks). numBlocks is the total number of blocks. k is the window half-width; 0 means auto (average reference segment length / 2). Returns a value in [0.0, 1.0] where lower is better.
func RandomSplit ¶
RandomSplit places numBoundaries boundaries at random gap positions. The seed parameter ensures reproducibility.
func SaveGold ¶
func SaveGold(path string, ann *GoldAnnotation) error
SaveGold writes a gold annotation to a JSON file.
func WindowDiff ¶
WindowDiff computes the WindowDiff metric, an improvement over Pk. It counts windows where the number of boundaries differs between ref and hyp. Returns a value in [0.0, 1.0] where lower is better.
Types ¶
type BoundaryScores ¶
type BoundaryScores struct {
Precision float64 `json:"precision"`
Recall float64 `json:"recall"`
F1 float64 `json:"f1"`
}
BoundaryScores holds precision, recall, and F1 for boundary detection.
func BoundaryPRF ¶
func BoundaryPRF(ref, hyp []int, numBlocks, tolerance int) BoundaryScores
BoundaryPRF computes precision, recall, and F1 for boundary detection. tolerance is the allowed distance (in blocks) for a match; 0 means exact match only.
type EvalReport ¶
type EvalReport struct {
DocID string `json:"doc_id"`
NumBlocks int `json:"num_blocks"`
Gold GoldAnnotation `json:"gold"`
Methods []MethodResult `json:"methods"`
}
EvalReport holds evaluation results for a document.
type GoldAnnotation ¶
type GoldAnnotation struct {
DocID string `json:"doc_id"`
Unit string `json:"unit"` // "block"
Boundaries []int `json:"boundaries"` // sorted gap indices
Params *GoldParams `json:"params,omitempty"` // recommended bench parameters
Notes map[string]string `json:"notes,omitempty"`
}
GoldAnnotation holds the reference segmentation for a document.
func LoadGold ¶
func LoadGold(path string) (*GoldAnnotation, error)
LoadGold reads a gold annotation from a JSON file.
type GoldParams ¶
type GoldParams struct {
EvalStage string `json:"eval_stage,omitempty"`
Tolerance *int `json:"tolerance,omitempty"`
K *int `json:"k,omitempty"`
MinGap *int `json:"min_gap,omitempty"`
MinTokens *int `json:"min_tokens,omitempty"`
MaxTokens *int `json:"max_tokens,omitempty"`
MaxLines *int `json:"max_lines,omitempty"`
Window *int `json:"window,omitempty"`
BlockK *int `json:"block_k,omitempty"`
Threshold *float64 `json:"threshold,omitempty"`
SplitCount *int `json:"split_count,omitempty"`
PackHeadingBarrier *int `json:"pack_heading_barrier,omitempty"`
PseudoHeading *bool `json:"pseudo_heading,omitempty"`
BoundaryHints *bool `json:"boundary_hints,omitempty"`
SectionLock *bool `json:"section_lock,omitempty"`
LockAfterHeading *bool `json:"lock_after_heading,omitempty"`
ParaSplitPattern *string `json:"para_split_pattern,omitempty"`
ForceBoundary *string `json:"force_boundary,omitempty"`
SuppressListBoundary *bool `json:"suppress_list_boundary,omitempty"`
AtomicBoundary *bool `json:"atomic_boundary,omitempty"`
BaselineMinGap *bool `json:"baseline_min_gap,omitempty"`
AllowBlockMismatch *bool `json:"allow_block_mismatch,omitempty"`
}
GoldParams holds recommended bench parameters for this annotation. All fields are optional; nil means "use default".
type MethodResult ¶
type MethodResult struct {
Name string `json:"name"`
Boundaries []int `json:"boundaries"`
NumSegs int `json:"num_segments"`
Pk float64 `json:"pk"`
WindowDiff float64 `json:"window_diff"`
Boundary BoundaryScores `json:"boundary_scores"`
}
MethodResult holds evaluation results for a single segmentation method.
func Evaluate ¶
func Evaluate(gold *GoldAnnotation, numBlocks int, hyp []int, name string, k, tolerance int) MethodResult
Evaluate computes all metrics for a single method against a gold standard.