gsx

package module
v0.0.0-...-c4d05e6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 12, 2026 License: MIT Imports: 15 Imported by: 0

README

gsx

An HTML manipulation toolkit

gsx is a lightweight html toolkit for parsing, manipulating and generating HTML.

Examples

Examples can be found in the ./examples directory

When should I use this?

This is not a "browser-grade" HTML parser, but it is close!

Specifically, the tokenizer is spec compliant and passes all the html5lib tokenizer tests. So gsx will accept any valid HTML "construct" like numeric & named character references and void elements.

However, the tree-builder does not follow the spec. This was done on purpose. A spec compliant tree-builder may restructure your markup for multitude of reasons: badly nested tags, child elements that don't conform to the content model of their parent, missing end tags etc ... The tree-builder in this module takes a simpler approach: it will parse any well-balanced HTML and output a tree that corresponds to that markup, exactly as written.

So this library will work well when you are parsing the output of HTML-generating tools like SSGs or markdown parser. Tools like these don't forget to add end tags :)

On the other hand, parsing random web content is more of a gamble. For example, many sites rely on the fact that you do not need to close your <p> tags. This library will fail on such markup.

TLDR; If your HTML looks like well-formed XML if you squint, this library's HTML parser is for you.

Documentation

Overview

Example (Attributes)
root, err := Parse(`<input name="foo" value="bar" readonly>`)
if err != nil {
	panic(err)
}

input := root.Descendants().Filter(`input[name=foo]`).First()

{
	v, ok := input.Attr("value")
	fmt.Printf("value=%q/%v\n", v, ok)
}
{
	v, ok := input.Attr("readonly")
	fmt.Printf("readonly=%q/%v\n", v, ok)

}
{
	v, ok := input.Attr("blah")
	fmt.Printf("blah=%q/%v\n", v, ok)
}
Output:

value="bar"/true
readonly=""/true
blah=""/false
Example (Generating)
form := Form(
	Attrs{"method": "GET"},
	Input(Attrs{"value": "hello", "type": "text", "readonly": ""}),
	Button(nil, Text("Submit")),
)

fmt.Println(form.Html())
Output:

<form method="GET"><input readonly="" type="text" value="hello"><button>Submit</button></form>
Example (Manipulating)
root, err := Parse(`<html><body>hello<p class="hello">REMOVE ME</p></body></html>`)
if err != nil {
	panic(err)
}

target := root.Descendants().Filter(`.hello`).First()
target.Detach()

fmt.Println(root.Html())
Output:

<html><body>hello</body></html>
Example (Parsing)
root, err := Parse("<h1>Hello, <i>world!</i></h1>")
if err != nil {
	panic(err)
}
fmt.Println(root.Html())
Output:

<h1>Hello, <i>world!</i></h1>
Example (Selecting)
root, err := Parse(`
	  	<ul>
	        <li>Foo</li>
	        <li>Bar</li>
	    </ul>
		<div>
			<span>One</span>
			<span>Two</span>
		</div>
	`)
if err != nil {
	panic(err)
}

fmt.Println("===")
base := root.Descendants()
liElements := base.Filter("ul > li")
spanElements := base.Filter("div > span")
for el := range liElements.Chain(spanElements) {
	fmt.Println(el.TextContent())
}

fmt.Println("===")
// The selections were broken up to emphasise they are composable.
// But you could also do:
for el := range root.Descendants().Filter("ul > li").Chain(root.Descendants().Filter("div > span")) {
	fmt.Println(el.TextContent())
}
Output:

===
Foo
Bar
One
Two
===
Foo
Bar
One
Two

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// ex: </span>
	ErrEndTagWithoutCorrespondingStartTag = "no-matching-start-tag"
	// ex: <div>
	ErrUnclosedStartTag = "unclosed-start-tag"
	// ex: </ link>
	// ex: </ link/>
	ErrVoidElementAsEndTag = "void-element-as-end-tag"
	// ex: <div />
	ErrNonVoidElementStartTagWithTrailingSolidus = "non-void-element-with-trailing-solidus"
)

The ParserError.Code variants that Parse might return.

Functions

This section is empty.

Types

type Attrs

type Attrs map[string]string

A list of HTML element attributes.

type Node

type Node struct {
	// contains filtered or unexported fields
}

An object within an HTML document.

There are 5 types of nodes (see NodeKind). Nodes can have at most one parent, zero or more siblings and zero or more children.

func A

func A(attributes Attrs, children ...*Node) *Node

Returns a new '<a>' node.

func Abbr

func Abbr(attributes Attrs, children ...*Node) *Node

Returns a new '<abbr>' node.

func Address

func Address(attributes Attrs, children ...*Node) *Node

Returns a new '<address>' node.

func Area

func Area(attributes Attrs, children ...*Node) *Node

Returns a new '<area>' node.

func Article

func Article(attributes Attrs, children ...*Node) *Node

Returns a new '<article>' node.

func Aside

func Aside(attributes Attrs, children ...*Node) *Node

Returns a new '<aside>' node.

func Audio

func Audio(attributes Attrs, children ...*Node) *Node

Returns a new '<audio>' node.

func B

func B(attributes Attrs, children ...*Node) *Node

Returns a new '<b>' node.

func Base

func Base(attributes Attrs, children ...*Node) *Node

Returns a new '<base>' node.

func Bdi

func Bdi(attributes Attrs, children ...*Node) *Node

Returns a new '<bdi>' node.

func Bdo

func Bdo(attributes Attrs, children ...*Node) *Node

Returns a new '<bdo>' node.

func Blockquote

func Blockquote(attributes Attrs, children ...*Node) *Node

Returns a new '<blockquote>' node.

func Body

func Body(attributes Attrs, children ...*Node) *Node

Returns a new '<body>' node.

func Br

func Br(attributes Attrs, children ...*Node) *Node

Returns a new '<br>' node.

func Button

func Button(attributes Attrs, children ...*Node) *Node

Returns a new '<button>' node.

func Canvas

func Canvas(attributes Attrs, children ...*Node) *Node

Returns a new '<canvas>' node.

func Caption

func Caption(attributes Attrs, children ...*Node) *Node

Returns a new '<caption>' node.

func Cite

func Cite(attributes Attrs, children ...*Node) *Node

Returns a new '<cite>' node.

func Code

func Code(attributes Attrs, children ...*Node) *Node

Returns a new '<code>' node.

func Col

func Col(attributes Attrs, children ...*Node) *Node

Returns a new '<col>' node.

func Colgroup

func Colgroup(attributes Attrs, children ...*Node) *Node

Returns a new '<colgroup>' node.

func CollectFragment

func CollectFragment(iter Selection) *Node

Like Fragment, but accepts a Node iterator instead.

func Data

func Data(attributes Attrs, children ...*Node) *Node

Returns a new '<data>' node.

func Datalist

func Datalist(attributes Attrs, children ...*Node) *Node

Returns a new '<datalist>' node.

func Dd

func Dd(attributes Attrs, children ...*Node) *Node

Returns a new '<dd>' node.

func Del

func Del(attributes Attrs, children ...*Node) *Node

Returns a new '<del>' node.

func Details

func Details(attributes Attrs, children ...*Node) *Node

Returns a new '<details>' node.

func Dfn

func Dfn(attributes Attrs, children ...*Node) *Node

Returns a new '<dfn>' node.

func Dialog

func Dialog(attributes Attrs, children ...*Node) *Node

Returns a new '<dialog>' node.

func Div

func Div(attributes Attrs, children ...*Node) *Node

Returns a new '<div>' node.

func Dl

func Dl(attributes Attrs, children ...*Node) *Node

Returns a new '<dl>' node.

func Doctype

func Doctype() *Node

Returns a new '<!DOCTYPE html>` node

func Dt

func Dt(attributes Attrs, children ...*Node) *Node

Returns a new '<dt>' node.

func Em

func Em(attributes Attrs, children ...*Node) *Node

Returns a new '<em>' node.

func Embed

func Embed(attributes Attrs, children ...*Node) *Node

Returns a new '<embed>' node.

func Fieldset

func Fieldset(attributes Attrs, children ...*Node) *Node

Returns a new '<fieldset>' node.

func Figcaption

func Figcaption(attributes Attrs, children ...*Node) *Node

Returns a new '<figcaption>' node.

func Figure

func Figure(attributes Attrs, children ...*Node) *Node

Returns a new '<figure>' node.

func Footer(attributes Attrs, children ...*Node) *Node

Returns a new '<footer>' node.

func Form

func Form(attributes Attrs, children ...*Node) *Node

Returns a new '<form>' node.

func Fragment

func Fragment(nodes ...*Node) *Node

Returns a NodeKindFragment node with the given children.

Fragment nodes allow you to group multiple nodes without adding an extra wrapper element.

If/when you append a fragment node into another node, the fragment node "disapears" and its children get appended instead. Same thing applies for inserting.

Fragment nodes never have a [Parent], so calling [InsertBefore] or [InsertAfter] with a fragment node as a receiver will panic.

func H1

func H1(attributes Attrs, children ...*Node) *Node

Returns a new '<h1>' node.

func H2

func H2(attributes Attrs, children ...*Node) *Node

Returns a new '<h2>' node.

func H3

func H3(attributes Attrs, children ...*Node) *Node

Returns a new '<h3>' node.

func H4

func H4(attributes Attrs, children ...*Node) *Node

Returns a new '<h4>' node.

func H5

func H5(attributes Attrs, children ...*Node) *Node

Returns a new '<h5>' node.

func H6

func H6(attributes Attrs, children ...*Node) *Node

Returns a new '<h6>' node.

func Head(attributes Attrs, children ...*Node) *Node

Returns a new '<head>' node.

func Header(attributes Attrs, children ...*Node) *Node

Returns a new '<header>' node.

func Hgroup

func Hgroup(attributes Attrs, children ...*Node) *Node

Returns a new '<hgroup>' node.

func Hr

func Hr(attributes Attrs, children ...*Node) *Node

Returns a new '<hr>' node.

func Html

func Html(attributes Attrs, children ...*Node) *Node

Returns a new '<html>' node.

func I

func I(attributes Attrs, children ...*Node) *Node

Returns a new '<i>' node.

func Iframe

func Iframe(attributes Attrs, text string) *Node

Returns a new '<iframe>' node.

func Img

func Img(attributes Attrs, children ...*Node) *Node

Returns a new '<img>' node.

func Input

func Input(attributes Attrs, children ...*Node) *Node

Returns a new '<input>' node.

func Ins

func Ins(attributes Attrs, children ...*Node) *Node

Returns a new '<ins>' node.

func Kbd

func Kbd(attributes Attrs, children ...*Node) *Node

Returns a new '<kbd>' node.

func Label

func Label(attributes Attrs, children ...*Node) *Node

Returns a new '<label>' node.

func Legend

func Legend(attributes Attrs, children ...*Node) *Node

Returns a new '<legend>' node.

func Li

func Li(attributes Attrs, children ...*Node) *Node

Returns a new '<li>' node.

func Link(attributes Attrs, children ...*Node) *Node

Returns a new '<link>' node.

func Main

func Main(attributes Attrs, children ...*Node) *Node

Returns a new '<main>' node.

func Map

func Map(attributes Attrs, children ...*Node) *Node

Returns a new '<map>' node.

func Mark

func Mark(attributes Attrs, children ...*Node) *Node

Returns a new '<mark>' node.

func Menu(attributes Attrs, children ...*Node) *Node

Returns a new '<menu>' node.

func Meta

func Meta(attributes Attrs, children ...*Node) *Node

Returns a new '<meta>' node.

func Meter

func Meter(attributes Attrs, children ...*Node) *Node

Returns a new '<meter>' node.

func Nav(attributes Attrs, children ...*Node) *Node

Returns a new '<nav>' node.

func Noscript

func Noscript(attributes Attrs, text string) *Node

Returns a new '<noscript>' node.

func Object

func Object(attributes Attrs, children ...*Node) *Node

Returns a new '<object>' node.

func Ol

func Ol(attributes Attrs, children ...*Node) *Node

Returns a new '<ol>' node.

func Optgroup

func Optgroup(attributes Attrs, children ...*Node) *Node

Returns a new '<optgroup>' node.

func Option

func Option(attributes Attrs, children ...*Node) *Node

Returns a new '<option>' node.

func Output

func Output(attributes Attrs, children ...*Node) *Node

Returns a new '<output>' node.

func P

func P(attributes Attrs, children ...*Node) *Node

Returns a new '<p>' node.

func Parse

func Parse(html string) (*Node, error)

Parses the given HTML fragment.

Parsing is done in two phases: tokenization and tree construction:

  • The tokenization phase is spec compliant, so it does not fail hard, instead errors are recovered from in a spec-compliant way. The input will still be converted to tokens and emitted to the next phase.
  • The tree construction phase implements a very small subset of the spec, and fails at the first ParserError, which is returned to the caller.

func Picture

func Picture(attributes Attrs, children ...*Node) *Node

Returns a new '<picture>' node.

func Pre

func Pre(attributes Attrs, children ...*Node) *Node

Returns a new '<pre>' node.

func Progress

func Progress(attributes Attrs, children ...*Node) *Node

Returns a new '<progress>' node.

func Q

func Q(attributes Attrs, children ...*Node) *Node

Returns a new '<q>' node.

func RawText

func RawText(text string) *Node

Returns a NodeKindText node with the given contents.

Unlike Text, text passed in to this function will be serialized verbatim.

This makes it useful for inlining svgs for example.

func Rp

func Rp(attributes Attrs, children ...*Node) *Node

Returns a new '<rp>' node.

func Rt

func Rt(attributes Attrs, children ...*Node) *Node

Returns a new '<rt>' node.

func Ruby

func Ruby(attributes Attrs, children ...*Node) *Node

Returns a new '<ruby>' node.

func S

func S(attributes Attrs, children ...*Node) *Node

Returns a new '<s>' node.

func Samp

func Samp(attributes Attrs, children ...*Node) *Node

Returns a new '<samp>' node.

func Script

func Script(attributes Attrs, text string) *Node

Returns a new '<script>' node.

func Search(attributes Attrs, children ...*Node) *Node

Returns a new '<search>' node.

func Section

func Section(attributes Attrs, children ...*Node) *Node

Returns a new '<section>' node.

func Select

func Select(attributes Attrs, children ...*Node) *Node

Returns a new '<select>' node.

func Selectedcontent

func Selectedcontent(attributes Attrs, children ...*Node) *Node

Returns a new '<selectedcontent>' node.

func Slot

func Slot(attributes Attrs, children ...*Node) *Node

Returns a new '<slot>' node.

func Small

func Small(attributes Attrs, children ...*Node) *Node

Returns a new '<small>' node.

func Source

func Source(attributes Attrs, children ...*Node) *Node

Returns a new '<source>' node.

func Span

func Span(attributes Attrs, children ...*Node) *Node

Returns a new '<span>' node.

func Strong

func Strong(attributes Attrs, children ...*Node) *Node

Returns a new '<strong>' node.

func Style

func Style(attributes Attrs, text string) *Node

Returns a new '<style>` node.

func Sub

func Sub(attributes Attrs, children ...*Node) *Node

Returns a new '<sub>' node.

func Summary

func Summary(attributes Attrs, children ...*Node) *Node

Returns a new '<summary>' node.

func Sup

func Sup(attributes Attrs, children ...*Node) *Node

Returns a new '<sup>' node.

func Table

func Table(attributes Attrs, children ...*Node) *Node

Returns a new '<table>' node.

func Tag

func Tag(name string, attributes Attrs, children ...*Node) *Node

Creates an HTML element node with the given tag name, attributes and children

func Tbody

func Tbody(attributes Attrs, children ...*Node) *Node

Returns a new '<tbody>' node.

func Td

func Td(attributes Attrs, children ...*Node) *Node

Returns a new '<td>' node.

func Template

func Template(attributes Attrs, children ...*Node) *Node

Returns a new '<template>' node.

func Text

func Text(text string) *Node

Returns a NodeKindText node with the given contents.

The text will be properly escaped when serializing to HTML.

func Textarea

func Textarea(attributes Attrs, text string) *Node

Returns a new '<textarea>' node.

func Tfoot

func Tfoot(attributes Attrs, children ...*Node) *Node

Returns a new '<tfoot>' node.

func Th

func Th(attributes Attrs, children ...*Node) *Node

Returns a new '<th>' node.

func Thead

func Thead(attributes Attrs, children ...*Node) *Node

Returns a new '<thead>' node.

func Time

func Time(attributes Attrs, children ...*Node) *Node

Returns a new '<time>' node.

func Title

func Title(attributes Attrs, text string) *Node

Returns a new '<title>' node.

func Tr

func Tr(attributes Attrs, children ...*Node) *Node

Returns a new '<tr>' node.

func Track

func Track(attributes Attrs, children ...*Node) *Node

Returns a new '<track>' node.

func U

func U(attributes Attrs, children ...*Node) *Node

Returns a new '<u>' node.

func Ul

func Ul(attributes Attrs, children ...*Node) *Node

Returns a new '<ul>' node.

func Var

func Var(attributes Attrs, children ...*Node) *Node

Returns a new '<var>' node.

func Video

func Video(attributes Attrs, children ...*Node) *Node

Returns a new '<video>' node.

func Wbr

func Wbr(attributes Attrs, children ...*Node) *Node

Returns a new '<wbr>' node.

func (*Node) Ancestors

func (self *Node) Ancestors() Selection

Returns an iterator of ancestor nodes.

func (*Node) Append

func (self *Node) Append(children ...*Node)

Inserts children after this node's last child.

Inserting a node of kind NodeKindFragment inserts its children instead.

Only NodeKindFragment and NodeKindElement nodes can have children, so this method panics if the receiver's kind is something else.

func (*Node) Attr

func (self *Node) Attr(name string) (string, bool)

Returns the value of the attribute with the given name.

The name comparison is done case insensitively.

If this is no such attribute, or if the receiver isn't a NodeKindElement the returned boolean is set to false.

func (*Node) Children

func (self *Node) Children() Selection

Returns an iterator of child nodes.

func (*Node) Descendants

func (self *Node) Descendants() Selection

Returns an iterator of descendant nodes.

func (*Node) Detach

func (self *Node) Detach()

Orphans this node by detaching it from its parent and siblings.

Children are not affected.

This has no effect on nodes of kind NodeKindFragment.

func (*Node) FirstChild

func (self *Node) FirstChild() *Node

Returns this node's first child, if any.

func (*Node) Following

func (self *Node) Following() Selection

Returns an iterator of sibling nodes after this one.

func (*Node) Html

func (self *Node) Html() string

Returns the HTML representation of the node.

This is the same thing as calling Node.Serialize with a strings.Builder.

func (*Node) InsertAfter

func (self *Node) InsertAfter(siblings ...*Node)

Inserts siblings after this node

Inserting a node of kind NodeKindFragment inserts its children instead.

This method panics if the receiver does not have a parent.

func (*Node) InsertBefore

func (self *Node) InsertBefore(siblings ...*Node)

Inserts siblings before this node

Inserting a node of kind NodeKindFragment inserts its children instead.

This method panics if the receiver does not have a parent.

func (*Node) Kind

func (self *Node) Kind() NodeKind

func (*Node) LastChild

func (self *Node) LastChild() *Node

Returns this node's last child, if any.

func (*Node) Name

func (self *Node) Name() string

Return's the node's name.

For NodeKindElement, this returns the tag name. For NodeKindDoctype this returns the doctype name For all other node kinds, this returns an empty string

func (*Node) Next

func (self *Node) Next() *Node

Returns this node's next sibling, if any.

func (*Node) Parent

func (self *Node) Parent() *Node

Returns this node's parent, if any.

Nodes of kind NodeKindFragment never have a parent.

func (*Node) Preceding

func (self *Node) Preceding() Selection

Returns an iterator of sibling nodes preceding this one.

func (*Node) Prepend

func (self *Node) Prepend(children ...*Node)

Inserts children before this node's first child

Inserting a node of kind NodeKindFragment inserts its children instead.

Only NodeKindFragment and NodeKindElement nodes can have children, so this method panics if the receiver's kind is something else.

func (*Node) Previous

func (self *Node) Previous() *Node

Returns this node's previous sibling, if any.

func (*Node) RemoveAttr

func (self *Node) RemoveAttr(name string)

Removes the attribute with the given name from the node.

func (*Node) ReplaceChildren

func (self *Node) ReplaceChildren(newChildren ...*Node)

Replaces the existing children of the receiver with a new set of children.

Only NodeKindFragment and NodeKindElement nodes can have children, so this method panics if the receiver's kind is something else.

func (*Node) ReverseChildren

func (self *Node) ReverseChildren() Selection

Returns an iterator of child nodes in reverse order.

func (*Node) Serialize

func (self *Node) Serialize(sink io.Writer) error

Write the receiver and its descendants as HTML into the writer.

func (*Node) SetAttr

func (self *Node) SetAttr(name, value string)

Sets the value of the given attribute on the node.

If the element already had an attribute by this name, it is overriden. Attribute names are case-insensitive, so this rule applies even if the existing attribute was inserted using a different case.

func (*Node) String

func (self *Node) String() string

Node implements the fmt.Stringer interface.

func (*Node) TextContent

func (self *Node) TextContent() string

Returns the text content of this node and its descendants.

If the receiver is a NodeKindComment, this returns the comment contents.

Returns an empty string if the receiver is a NodeKindDoctype.

type NodeKind

type NodeKind uint8

The type of the node.

const (
	// An HTML tag
	NodeKindElement NodeKind = iota
	// Text within an HTML element.
	NodeKindText
	// An HTML comment  (e.g. `<!-- blah -->`)
	NodeKindComment
	// A doctype element (e.g. `<!DOCTYPE html>`)
	NodeKindDoctype
	// A synthetic node that exists purely to contain the other nodes.
	NodeKindFragment
)

func (NodeKind) String

func (self NodeKind) String() string

NodeKind implements the fmt.Stringer interface.

type ParserError

type ParserError struct {
	// One of:
	// 		- [ErrEndTagWithoutCorrespondingStartTag]
	// 		- [ErrUnclosedStartTag]
	// 		- [ErrVoidElementAsEndTag]
	// 		- [ErrNonVoidElementStartTagWithTrailingSolidus]
	Code string
	// The byte offset into [ParserError.Markup] at which the error occurred
	Offset int
	// The HTML source.
	// The Parser will normalize the HTML input according to the speck, so this might be slightly different from what you provided to [Parse]
	Markup string
}

An error that occured while parsing HTML

func (*ParserError) Error

func (self *ParserError) Error() string

type Selection

type Selection iter.Seq[*Node]

A Selection wraps an iterator of nodes. A nil Selection is empty.

Some methods on this struct accept a css selector. If the selector is invalid, this function panics.

Such methods support a subset of regular CSS selectors. They will accept what the spec calls a complex selector. However, pseudo-elements, pseudo-classes (e.g. `:first-child`, `:hover`) and namespaces are not supported.

Here are some examples:

// Matches any node.
`*`

// Matches all <div> nodes.
`div`

// Matches all <button> nodes that are direct children of a <div> element.
`div > button`

// Matches all <div> nodes whose "data-foo" attribute starts with "bar".
`div[data-foo^="bar"]`

Type, universal, class, ID and attribute selectors are recognized. For attribute selectors though, the case-sensitivity flags are not supported.

func MakeSelection

func MakeSelection(nodes ...*Node) Selection

Returns a selection that contains the given nodes.

nil nodes are ignored.

func (Selection) Chain

func (self Selection) Chain(other Selection) Selection

Returns a new selection that first yields the elements from receiver and then the elements from the other selection.

In other words, it links to Selections together in a chain.

func (Selection) Collect

func (self Selection) Collect() []*Node

Collects values from the iterator and returns it.

func (Selection) Exclude

func (self Selection) Exclude(selector string) Selection

Returns a new selection that will only yield elements that do not match selector.

Css selectors only match elements, so the resulting selection will only yield NodeKindElement nodes.

func (Selection) Filter

func (self Selection) Filter(selector string) Selection

Returns a new selection that will only yield elements that match selector.

Css selectors only match elements, so the resulting selection will only yield NodeKindElement nodes.

func (Selection) First

func (self Selection) First() *Node

Returns the first element of the selection.

This will be nil if the selection is empty.

func (Selection) Last

func (self Selection) Last() *Node

Returns the last element of the selection

This will be nil if the selection is empty.

func (Selection) Len

func (self Selection) Len() int

Returns the number of elements in the selection

Directories

Path Synopsis
internal
queue
Paackage queue implements a growable queue implemented using a circlar buffer
Paackage queue implements a growable queue implemented using a circlar buffer

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL