dass

package module
v0.0.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 5, 2026 License: Apache-2.0 Imports: 14 Imported by: 2

README

Go Report Card godoc

Dass - Data Assembly

Why

If you have a process that regularly pulls data, you may find the SQL is hard to create and maintain. There are lots of fields calculated. There are multiple joins. It's a mess. Want to add something new? Expect days of tweaking and testing.

The idea of Dass is to move data manipulation away from SQL and into Go where it can be more easily created and maintained.

Concept

Dass is a container for data. It is similar to a dataframe construct in that it stores the data in memory. Unlike most dataframes, the data is stored by row.

Each row is a map which simplifies the code. The value of the map is type any. This does add overhead, but is very flexible. Individual elements can be any type, including slices and maps.

Dass is lightweight in the sense that it supplies a small, core set of methods. These methods mostly support data construction and access.

Key, though, is that it supports applying user functions to the rows via the Apply method. You can move those calculations out of SQL and even joins through these functions.

Dass can read data from SQL, or CSV and XLSX files either on the web or locally.

Documentation

Overview

Dass - Data Assembly

Why Dass

If you have a process that regularly pulls data, you may find the SQL is hard to create and maintain. There are lots of fields calculated. There are multiple joins. It's a mess. Want to add something new? Expect days of tweaking and testing.

The idea of Dass is to move data manipulation away from SQL and into Go where it can be more easily created and maintained.

Concept

Dass is a container for data. It is similar to a dataframe construct in that it stores the data in memory. Unlike most dataframes, the data is stored by row.

Each row is a map which simplifies the code. The value of the map is type *any*. This does add overhead, but is very flexible. Individual elements can be any type, including slices and maps.

Dass is lightweight in the sense that it supplies a small, core set of methods. These methods mostly support data construction and access.

Key, though, is that it supports applying user functions to the rows via the Apply method. You can move those calculations out of SQL and even joins through these functions.

Dass can read data from SQL, or CSV and XLSX files either on the web or locally.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func FetchCSV added in v0.0.3

func FetchCSV(source string) ([][]string, error)

FetchCSV returns the contents of source CSV

- source -- either a web address or local file

func FetchXLSX added in v0.0.3

func FetchXLSX(source string) ([][]string, error)

FetchXLSX returns the contents of source XLSX

- source -- either a web address or local file

func Save added in v0.0.3

func Save(data, localFile string) error

Save saves the string returned by WebFetch. Works for both CSV and XLSX.

func WebFetch added in v0.0.3

func WebFetch(url string) (string, error)

WebFetch returns the contents of the page specified by url.

Types

type FnSpec

type FnSpec struct {
	// contains filtered or unexported fields
}

FnSpec manages converting an arbitrary user-supplied function to a RowFn.

func NewFnSpec

func NewFnSpec(fnName string, rawFn any, args []string, assignTo string) (*FnSpec, error)

NewFnSpec creates a new FnSpec. Note that it cannot be applied to a Row until Make is run.

fnName - function name

rawFn - the raw function which is converted to a RowFn and then applied to a Row

args - the key values in Row that are the arguments to rawFn

assignTo - the key in row to assign the output to

func (*FnSpec) F

func (fs *FnSpec) F() RowFn

func (*FnSpec) Make

func (fs *FnSpec) Make(r Row) error

Make creates a RowFn from the user's raw function.

r - an example row that will serve as the argument to the created RowFn.

func (*FnSpec) Reset

func (fs *FnSpec) Reset()

Reset resets the RowFn to nil.

func (*FnSpec) Return1

func (fs *FnSpec) Return1() reflect.Kind

Return1 is the primary return type of RowFn

func (*FnSpec) Return2

func (fs *FnSpec) Return2() reflect.Kind

Return2 is the secondary return type. If the return is a slice, Return1 is reflect.Slice and Return2 is the type of the slice elements. If the return is simple (e.g. Return1 = reflect.Int), Return2 has value reflect.Invalid.

func (*FnSpec) UpdateArgs

func (fs *FnSpec) UpdateArgs(newArgs []string)

func (*FnSpec) UpdateAssignTo

func (fs *FnSpec) UpdateAssignTo(newAT string)

type Row

type Row map[string]any

Row is the base type: a single row of the data

func ParseRow added in v0.0.3

func ParseRow(inRow, names, template, miss []string) (Row, string)

ParseRow will parse a row of []string into a Row.

inRow - slice to parse template - expected type of each element: float, int, date, CCYYMMDD or string miss - when a missing value is encountered:

skip - skip row
fail - throw an error
return - return anyway with the value substituted

missing elements are set to math.Maxfloat64, math.MaxInt, 1/1/1900.

func (Row) Apply

func (r Row) Apply(fs *FnSpec) error

Apply applies the function fs and adds the result to r.

func (Row) Equal

func (r Row) Equal(r2 Row) bool

func (Row) Names

func (r Row) Names() []string

func (Row) String

func (r Row) String() string

type RowFn

type RowFn func(r Row) any

RowFn is a function that can be applied to a row. The user supplies aribitrary function that is converted to a RowFn via FnSpec.Make. The function can return most data types including slices.

type Rows

type Rows struct {
	// contains filtered or unexported fields
}

Rows collect a slice of row. Each row must have the same set of keys, though the underlying data can differ.

func NewRows

func NewRows(r Row) *Rows

func ParseRows added in v0.0.3

func ParseRows(rows [][]string, names, template, miss []string, skip int) (*Rows, error)

ParseRows will parse rows of []string

rows - data to parse template - expected type of each element: float, int, date, CCYYMMDD or string miss - when a missing value is encountered:

	skip - skip row
	fail - throw an error
	return - return anyway with the value substituted

      missing elements are set to math.Maxfloat64, math.MaxInt, 1/1/1900.

skip - number of rows to skip before parsing

func ReadCSV

func ReadCSV(r io.ReadCloser) (rows *Rows, e error)

func ReadSQL

func ReadSQL(r *sql.Rows) (rows *Rows, e error)

ReadSQL creates a *Rows object from r.

Example

This example shows out to load data from a query

table := os.Getenv("table")
qry := fmt.Sprintf("SELECT distinct state from %s ORDER BY state", table)

db := newConnectCH()
defer db.Close()

r, _ := db.Query(qry)

// load the data into rows
rows, _ := ReadSQL(r)

fmt.Println(rows)
Output:

state
"AK"
"AL"
"AR"
"AZ"
"CA"
Example (Slice)

This example shows that you can return a slice from the query

table := os.Getenv("table")

db := newConnectCH()
defer db.Close()

qry := fmt.Sprintf("SELECT lnId, monthly.upb FROM %s WHERE length(monthly.upb) == 3  ORDER BY lnId limit 10", table)

r, _ := db.Query(qry)
rows, _ := ReadSQL(r)

fmt.Println(rows)
Output:

lnId,monthly.upb
"000097473420",[304379.06 303970.03 0]
"000097473926",[427500 426962.12 0]
"000097474438",[206160.44 205878.55 0]
"000097474496",[165000 164782.7 0]
"000097474584",[386250 385729.72 0]

func (*Rows) AddColumn

func (rs *Rows) AddColumn(name string, data []any) error

func (*Rows) Append

func (rs *Rows) Append(r Row) error

Append appends a row

func (*Rows) Apply

func (rs *Rows) Apply(fs *FnSpec) error

Apply applies fs to all rows

Example (Map)

This example shows how you can replace a SQL join with a RowFn. First, the data that would be the smaller table in the join is created. Then a function is built to apply that data to a Row. We then pull the larger table and use Apply to add the new element to each row.

const n = 100

table := os.Getenv("table")
zipQry := fmt.Sprintf("SELECT distinct zip3 from %s ORDER BY zip3", table)

db := newConnectCH()
defer db.Close()

// pull the unique zip3 (3-digit zip codes)
r, _ := db.Query(zipQry)
rows, _ := ReadSQL(r)
zipMap := make(map[string]int)

// build a simple map of zip3 to int.  We'll apply this to the data adding the "int" value to it.
for j, zip := range rows.Iter() {
	zipMap[zip["zip3"].(string)] = j
}

qry := fmt.Sprintf("SELECT lnId, fico, cltv, numBorr, purpose, propType, units, occ, state, msa, zip3 FROM %s ORDER BY lnId limit %v", table, n)
r, _ = db.Query(qry)

rows, _ = ReadSQL(r)

// rf applys the map to its input
rf := func(zip3 string) int {
	if v, ok := zipMap[zip3]; ok {
		return v
	}

	return -1
}

fs, _ := NewFnSpec("zipInt", rf, []string{"zip3"}, "zipInt")

_ = rows.Apply(fs)

fmt.Println(rows)
Output:

cltv,fico,lnId,msa,numBorr,occ,propType,purpose,state,units,zip3,zipInt
82,765,"000097473062","47900",2,"P","PU","P","MD",1,"207",194
95,766,"000097473063","42660",2,"P","SF","P","WA",1,"980",914
95,769,"000097473064","40140",1,"P","PU","P","CA",1,"923",858
95,687,"000097473065","35380",1,"P","SF","P","LA",1,"704",662
85,779,"000097473066","19820",1,"P","SF","R","MI",1,"480",459
Example (Simple)

Using Apply to run a function over *Rows

table := os.Getenv("table")
qry := fmt.Sprintf("SELECT lnId, state, ltv, cltv FROM %s WHERE cltv < ltv ORDER BY lnId LIMIT 10", table)

db := newConnectCH()
defer db.Close()

r, _ := db.Query(qry)
rows, _ := ReadSQL(r)

fixCLTV := func(ltv, cltv float64) float64 {
	if cltv < ltv {
		return ltv
	}

	return cltv
}

fs, _ := NewFnSpec("cltvFixed", fixCLTV, []string{"ltv", "cltv"}, "cltvFixed")

_ = rows.Apply(fs)

fmt.Println(rows)
Output:

cltv,cltvFixed,lnId,ltv,state
-1,88,"000106024156",88,"NC"
-1,27,"000106024193",27,"CA"
-1,84,"000106024226",84,"CA"
-1,100,"000106024234",100,"ME"
-1,90,"000106024278",90,"GA"
Example (Slice)

Using Apply to run a function that returns a slice

table := os.Getenv("table")
qry := fmt.Sprintf("SELECT lnId, state, ltv, cltv FROM %s ORDER BY lnId LIMIT 10", table)

db := newConnectCH()
defer db.Close()

r, _ := db.Query(qry)
rows, _ := ReadSQL(r)

fn := func(ltv, cltv float64) []float64 {
	if cltv < ltv {
		cltv = ltv
	}

	sltv := cltv - ltv
	return []float64{ltv, sltv, cltv}
}

fs, _ := NewFnSpec("ltvs", fn, []string{"ltv", "cltv"}, "ltvs")

_ = rows.Apply(fs)

fmt.Println(rows)
Output:

cltv,lnId,ltv,ltvs,state
82,"000097473062",82,[82 0 82],"MD"
95,"000097473063",95,[95 0 95],"WA"
95,"000097473064",95,[95 0 95],"CA"
95,"000097473065",95,[95 0 95],"LA"
85,"000097473066",85,[85 0 85],"MI"

func (*Rows) Column

func (rs *Rows) Column(name string) ([]any, error)

Column returns one element from all the rows as a slice

name - name of the element to return

func (*Rows) DropColumn

func (rs *Rows) DropColumn(name string) error

func (*Rows) Equal

func (rs *Rows) Equal(rs2 *Rows) bool

Equal returns true if rs and rs2 have equal values throughout

func (*Rows) Iter

func (rs *Rows) Iter() iter.Seq2[int, Row]

Iter ranges over the rows

func (*Rows) Names

func (rs *Rows) Names() []string

Names returns the element names

func (*Rows) Row

func (rs *Rows) Row(index int) Row

Row returns the row at row number index.

func (*Rows) RowCount

func (rs *Rows) RowCount() int

func (*Rows) String

func (rs *Rows) String() string

func (*Rows) Write

func (rs *Rows) Write(w io.WriteCloser) error

Write writes all rows as comma-separated values, including a header, to w, closing when done.

Example

This example shows how to create a CSV from a *Rows object.

table := os.Getenv("table")
qry := fmt.Sprintf("SELECT state, count(*) As n FROM %s GROUP BY state ORDER BY n DESC", table)

db := newConnectCH()
defer db.Close()

// run query
r, _ := db.Query(qry)

// load data
rows, _ := ReadSQL(r)

fmt.Println(rows)
outFile := os.TempDir() + "/states.csv"

f, _ := os.Create(outFile)
defer f.Close()

rows.Write(f)
Output:

state,n
"CA",10968569
"FL",4679574
"TX",4544139
"IL",3328800
"MI",2683424

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL