eval

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 9, 2026 License: MIT Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FixedSplit

func FixedSplit(numBlocks, n int) []int

FixedSplit places boundaries every n blocks, producing roughly equal-sized segments.

func FormatTable

func FormatTable(report EvalReport) string

FormatTable renders an EvalReport as a human-readable table.

func HeadingSplit

func HeadingSplit(blocks []model.Block) []int

HeadingSplit places a boundary before every heading block. Returns sorted gap indices (the gap before the heading).

func Pk

func Pk(ref, hyp []int, numBlocks, k int) float64

Pk computes the Pk evaluation metric for text segmentation. ref and hyp are sorted boundary indices (gap positions between blocks). numBlocks is the total number of blocks. k is the window half-width; 0 means auto (average reference segment length / 2). Returns a value in [0.0, 1.0] where lower is better.

func RandomSplit

func RandomSplit(numBlocks, numBoundaries int, seed int64) []int

RandomSplit places numBoundaries boundaries at random gap positions. The seed parameter ensures reproducibility.

func SaveGold

func SaveGold(path string, ann *GoldAnnotation) error

SaveGold writes a gold annotation to a JSON file.

func WindowDiff

func WindowDiff(ref, hyp []int, numBlocks, k int) float64

WindowDiff computes the WindowDiff metric, an improvement over Pk. It counts windows where the number of boundaries differs between ref and hyp. Returns a value in [0.0, 1.0] where lower is better.

Types

type BoundaryScores

type BoundaryScores struct {
	Precision float64 `json:"precision"`
	Recall    float64 `json:"recall"`
	F1        float64 `json:"f1"`
}

BoundaryScores holds precision, recall, and F1 for boundary detection.

func BoundaryPRF

func BoundaryPRF(ref, hyp []int, numBlocks, tolerance int) BoundaryScores

BoundaryPRF computes precision, recall, and F1 for boundary detection. tolerance is the allowed distance (in blocks) for a match; 0 means exact match only.

type EvalReport

type EvalReport struct {
	DocID     string         `json:"doc_id"`
	NumBlocks int            `json:"num_blocks"`
	Gold      GoldAnnotation `json:"gold"`
	Methods   []MethodResult `json:"methods"`
}

EvalReport holds evaluation results for a document.

type GoldAnnotation

type GoldAnnotation struct {
	DocID      string            `json:"doc_id"`
	Unit       string            `json:"unit"`             // "block"
	Boundaries []int             `json:"boundaries"`       // sorted gap indices
	Params     *GoldParams       `json:"params,omitempty"` // recommended bench parameters
	Notes      map[string]string `json:"notes,omitempty"`
}

GoldAnnotation holds the reference segmentation for a document.

func LoadGold

func LoadGold(path string) (*GoldAnnotation, error)

LoadGold reads a gold annotation from a JSON file.

type GoldParams

type GoldParams struct {
	EvalStage            string   `json:"eval_stage,omitempty"`
	Tolerance            *int     `json:"tolerance,omitempty"`
	K                    *int     `json:"k,omitempty"`
	MinGap               *int     `json:"min_gap,omitempty"`
	MinTokens            *int     `json:"min_tokens,omitempty"`
	MaxTokens            *int     `json:"max_tokens,omitempty"`
	MaxLines             *int     `json:"max_lines,omitempty"`
	Window               *int     `json:"window,omitempty"`
	BlockK               *int     `json:"block_k,omitempty"`
	Threshold            *float64 `json:"threshold,omitempty"`
	SplitCount           *int     `json:"split_count,omitempty"`
	PackHeadingBarrier   *int     `json:"pack_heading_barrier,omitempty"`
	PseudoHeading        *bool    `json:"pseudo_heading,omitempty"`
	BoundaryHints        *bool    `json:"boundary_hints,omitempty"`
	SectionLock          *bool    `json:"section_lock,omitempty"`
	LockAfterHeading     *bool    `json:"lock_after_heading,omitempty"`
	ParaSplitPattern     *string  `json:"para_split_pattern,omitempty"`
	ForceBoundary        *string  `json:"force_boundary,omitempty"`
	SuppressListBoundary *bool    `json:"suppress_list_boundary,omitempty"`
	AtomicBoundary       *bool    `json:"atomic_boundary,omitempty"`
	BaselineMinGap       *bool    `json:"baseline_min_gap,omitempty"`
	AllowBlockMismatch   *bool    `json:"allow_block_mismatch,omitempty"`
}

GoldParams holds recommended bench parameters for this annotation. All fields are optional; nil means "use default".

type MethodResult

type MethodResult struct {
	Name       string         `json:"name"`
	Boundaries []int          `json:"boundaries"`
	NumSegs    int            `json:"num_segments"`
	Pk         float64        `json:"pk"`
	WindowDiff float64        `json:"window_diff"`
	Boundary   BoundaryScores `json:"boundary_scores"`
}

MethodResult holds evaluation results for a single segmentation method.

func Evaluate

func Evaluate(gold *GoldAnnotation, numBlocks int, hyp []int, name string, k, tolerance int) MethodResult

Evaluate computes all metrics for a single method against a gold standard.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL