Documentation
¶
Overview ¶
Dass - Data Assembly ¶
Why Dass ¶
If you have a process that regularly pulls data, you may find the SQL is hard to create and maintain. There are lots of fields calculated. There are multiple joins. It's a mess. Want to add something new? Expect days of tweaking and testing.
The idea of Dass is to move data manipulation away from SQL and into Go where it can be more easily created and maintained.
Concept ¶
Dass is a container for data. It is similar to a dataframe construct in that it stores the data in memory. Unlike most dataframes, the data is stored by row.
Each row is a map which simplifies the code. The value of the map is type *any*. This does add overhead, but is very flexible. Individual elements can be any type, including slices and maps.
Dass is lightweight in the sense that it supplies a small, core set of methods. These methods mostly support data construction and access.
Key, though, is that it supports applying user functions to the rows via the Apply method. You can move those calculations out of SQL and even joins through these functions.
Dass can read data from SQL, or CSV and XLSX files either on the web or locally.
Index ¶
- func FetchCSV(source string) ([][]string, error)
- func FetchXLSX(source string) ([][]string, error)
- func Save(data, localFile string) error
- func WebFetch(url string) (string, error)
- type FnSpec
- type Row
- type RowFn
- type Rows
- func (rs *Rows) AddColumn(name string, data []any) error
- func (rs *Rows) Append(r Row) error
- func (rs *Rows) Apply(fs *FnSpec) error
- func (rs *Rows) Column(name string) ([]any, error)
- func (rs *Rows) DropColumn(name string) error
- func (rs *Rows) Equal(rs2 *Rows) bool
- func (rs *Rows) Iter() iter.Seq2[int, Row]
- func (rs *Rows) Names() []string
- func (rs *Rows) Row(index int) Row
- func (rs *Rows) RowCount() int
- func (rs *Rows) String() string
- func (rs *Rows) Write(w io.WriteCloser) error
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func FetchCSV ¶ added in v0.0.3
FetchCSV returns the contents of source CSV
- source -- either a web address or local file
func FetchXLSX ¶ added in v0.0.3
FetchXLSX returns the contents of source XLSX
- source -- either a web address or local file
Types ¶
type FnSpec ¶
type FnSpec struct {
// contains filtered or unexported fields
}
FnSpec manages converting an arbitrary user-supplied function to a RowFn.
func NewFnSpec ¶
NewFnSpec creates a new FnSpec. Note that it cannot be applied to a Row until Make is run.
fnName - function name
rawFn - the raw function which is converted to a RowFn and then applied to a Row
args - the key values in Row that are the arguments to rawFn
assignTo - the key in row to assign the output to
func (*FnSpec) Make ¶
Make creates a RowFn from the user's raw function.
r - an example row that will serve as the argument to the created RowFn.
func (*FnSpec) Return2 ¶
Return2 is the secondary return type. If the return is a slice, Return1 is reflect.Slice and Return2 is the type of the slice elements. If the return is simple (e.g. Return1 = reflect.Int), Return2 has value reflect.Invalid.
func (*FnSpec) UpdateArgs ¶
func (*FnSpec) UpdateAssignTo ¶
type Row ¶
Row is the base type: a single row of the data
func ParseRow ¶ added in v0.0.3
ParseRow will parse a row of []string into a Row.
inRow - slice to parse template - expected type of each element: float, int, date, CCYYMMDD or string miss - when a missing value is encountered:
skip - skip row fail - throw an error return - return anyway with the value substituted
missing elements are set to math.Maxfloat64, math.MaxInt, 1/1/1900.
type RowFn ¶
RowFn is a function that can be applied to a row. The user supplies aribitrary function that is converted to a RowFn via FnSpec.Make. The function can return most data types including slices.
type Rows ¶
type Rows struct {
// contains filtered or unexported fields
}
Rows collect a slice of row. Each row must have the same set of keys, though the underlying data can differ.
func ParseRows ¶ added in v0.0.3
ParseRows will parse rows of []string
rows - data to parse template - expected type of each element: float, int, date, CCYYMMDD or string miss - when a missing value is encountered:
skip - skip row
fail - throw an error
return - return anyway with the value substituted
missing elements are set to math.Maxfloat64, math.MaxInt, 1/1/1900.
skip - number of rows to skip before parsing
func ReadSQL ¶
ReadSQL creates a *Rows object from r.
Example ¶
This example shows out to load data from a query
table := os.Getenv("table")
qry := fmt.Sprintf("SELECT distinct state from %s ORDER BY state", table)
db := newConnectCH()
defer db.Close()
r, _ := db.Query(qry)
// load the data into rows
rows, _ := ReadSQL(r)
fmt.Println(rows)
Output: state "AK" "AL" "AR" "AZ" "CA"
Example (Slice) ¶
This example shows that you can return a slice from the query
table := os.Getenv("table")
db := newConnectCH()
defer db.Close()
qry := fmt.Sprintf("SELECT lnId, monthly.upb FROM %s WHERE length(monthly.upb) == 3 ORDER BY lnId limit 10", table)
r, _ := db.Query(qry)
rows, _ := ReadSQL(r)
fmt.Println(rows)
Output: lnId,monthly.upb "000097473420",[304379.06 303970.03 0] "000097473926",[427500 426962.12 0] "000097474438",[206160.44 205878.55 0] "000097474496",[165000 164782.7 0] "000097474584",[386250 385729.72 0]
func (*Rows) Apply ¶
Apply applies fs to all rows
Example (Map) ¶
This example shows how you can replace a SQL join with a RowFn. First, the data that would be the smaller table in the join is created. Then a function is built to apply that data to a Row. We then pull the larger table and use Apply to add the new element to each row.
const n = 100
table := os.Getenv("table")
zipQry := fmt.Sprintf("SELECT distinct zip3 from %s ORDER BY zip3", table)
db := newConnectCH()
defer db.Close()
// pull the unique zip3 (3-digit zip codes)
r, _ := db.Query(zipQry)
rows, _ := ReadSQL(r)
zipMap := make(map[string]int)
// build a simple map of zip3 to int. We'll apply this to the data adding the "int" value to it.
for j, zip := range rows.Iter() {
zipMap[zip["zip3"].(string)] = j
}
qry := fmt.Sprintf("SELECT lnId, fico, cltv, numBorr, purpose, propType, units, occ, state, msa, zip3 FROM %s ORDER BY lnId limit %v", table, n)
r, _ = db.Query(qry)
rows, _ = ReadSQL(r)
// rf applys the map to its input
rf := func(zip3 string) int {
if v, ok := zipMap[zip3]; ok {
return v
}
return -1
}
fs, _ := NewFnSpec("zipInt", rf, []string{"zip3"}, "zipInt")
_ = rows.Apply(fs)
fmt.Println(rows)
Output: cltv,fico,lnId,msa,numBorr,occ,propType,purpose,state,units,zip3,zipInt 82,765,"000097473062","47900",2,"P","PU","P","MD",1,"207",194 95,766,"000097473063","42660",2,"P","SF","P","WA",1,"980",914 95,769,"000097473064","40140",1,"P","PU","P","CA",1,"923",858 95,687,"000097473065","35380",1,"P","SF","P","LA",1,"704",662 85,779,"000097473066","19820",1,"P","SF","R","MI",1,"480",459
Example (Simple) ¶
Using Apply to run a function over *Rows
table := os.Getenv("table")
qry := fmt.Sprintf("SELECT lnId, state, ltv, cltv FROM %s WHERE cltv < ltv ORDER BY lnId LIMIT 10", table)
db := newConnectCH()
defer db.Close()
r, _ := db.Query(qry)
rows, _ := ReadSQL(r)
fixCLTV := func(ltv, cltv float64) float64 {
if cltv < ltv {
return ltv
}
return cltv
}
fs, _ := NewFnSpec("cltvFixed", fixCLTV, []string{"ltv", "cltv"}, "cltvFixed")
_ = rows.Apply(fs)
fmt.Println(rows)
Output: cltv,cltvFixed,lnId,ltv,state -1,88,"000106024156",88,"NC" -1,27,"000106024193",27,"CA" -1,84,"000106024226",84,"CA" -1,100,"000106024234",100,"ME" -1,90,"000106024278",90,"GA"
Example (Slice) ¶
Using Apply to run a function that returns a slice
table := os.Getenv("table")
qry := fmt.Sprintf("SELECT lnId, state, ltv, cltv FROM %s ORDER BY lnId LIMIT 10", table)
db := newConnectCH()
defer db.Close()
r, _ := db.Query(qry)
rows, _ := ReadSQL(r)
fn := func(ltv, cltv float64) []float64 {
if cltv < ltv {
cltv = ltv
}
sltv := cltv - ltv
return []float64{ltv, sltv, cltv}
}
fs, _ := NewFnSpec("ltvs", fn, []string{"ltv", "cltv"}, "ltvs")
_ = rows.Apply(fs)
fmt.Println(rows)
Output: cltv,lnId,ltv,ltvs,state 82,"000097473062",82,[82 0 82],"MD" 95,"000097473063",95,[95 0 95],"WA" 95,"000097473064",95,[95 0 95],"CA" 95,"000097473065",95,[95 0 95],"LA" 85,"000097473066",85,[85 0 85],"MI"
func (*Rows) Column ¶
Column returns one element from all the rows as a slice
name - name of the element to return
func (*Rows) DropColumn ¶
func (*Rows) Write ¶
func (rs *Rows) Write(w io.WriteCloser) error
Write writes all rows as comma-separated values, including a header, to w, closing when done.
Example ¶
This example shows how to create a CSV from a *Rows object.
table := os.Getenv("table")
qry := fmt.Sprintf("SELECT state, count(*) As n FROM %s GROUP BY state ORDER BY n DESC", table)
db := newConnectCH()
defer db.Close()
// run query
r, _ := db.Query(qry)
// load data
rows, _ := ReadSQL(r)
fmt.Println(rows)
outFile := os.TempDir() + "/states.csv"
f, _ := os.Create(outFile)
defer f.Close()
rows.Write(f)
Output: state,n "CA",10968569 "FL",4679574 "TX",4544139 "IL",3328800 "MI",2683424