hashfile

A lightweight, efficient tool for adding CRC32-based integrity comments to source code files. hashfile helps you detect unintended changes to your source files by embedding a checksum comment that can be quickly verified.
Features
- Efficient streaming algorithm - Single buffer allocation with sliding window, processes files of any size
- No-op on unchanged files - Only modifies files when the integrity comment is missing or incorrect
- Multiple language support - Auto-detects comment style based on file extension (Go, Python, C/C++, SQL, HTML, Shell, Ruby, JavaScript, CSS, Templ)
- Preserves file attributes - Maintains permissions and ownership when updating files
- Line ending aware - Preserves CRLF vs LF line endings
- Zero dependencies - Uses only Go standard library
Use Cases
- Verify source code hasn't been modified during deployment
- Detect accidental edits to configuration files
- Ensure generated code remains unchanged
- Quick integrity checks for code reviews
- Build system verification step
Note: This is NOT a security tool. It uses CRC32 for fast integrity checking, not cryptographic security. Do not use this to detect malicious modifications.
Installation
From Source
go install github.com/dmoose/hashfile/cmd/hashfile@latest
Build from Repository
git clone https://github.com/dmoose/hashfile.git
cd hashfile
make build
# Binary will be in bin/hashfile
As a Library
go get github.com/dmoose/hashfile
Command Line Usage
Add integrity comments to one or more files:
# Single file
hashfile add main.go
# Multiple files
hashfile add main.go handler.go utils.go
# Using glob patterns
hashfile add src/*.go
# Specify comment style explicitly
hashfile add -style=python script.txt
What happens:
- Calculates CRC32 of file content (excluding the integrity comment itself)
- Adds a comment line at the end:
// FileIntegrity: ABCD1234
- If comment already exists and is correct, file is not modified (no-op)
- If comment exists but is wrong, it's updated with the correct hash
Verify File Integrity
Check if files have been modified:
# Verify files (silent mode, use exit code)
hashfile verify *.go
echo $? # 0 = all valid, 1 = invalid or error
# Quiet mode (no output at all)
hashfile verify -q *.go
Exit codes:
0 - All files verified successfully
1 - One or more files invalid or errors occurred
Check with Output
Display human-readable verification status:
hashfile check src/*.go
Example output:
✓ src/main.go
✓ src/handler.go
✗ src/utils.go (integrity check failed)
Total: 3 files, 2 valid, 1 invalid, 0 errors
Library Usage
Basic Example
package main
import (
"fmt"
"log"
"github.com/dmoose/hashfile"
)
func main() {
// Add integrity comment (auto-detects .go extension)
if err := hashfile.ProcessFile("main.go"); err != nil {
log.Fatal(err)
}
// Verify integrity
valid, err := hashfile.VerifyFile("main.go")
if err != nil {
log.Fatal(err)
}
if valid {
fmt.Println("File integrity verified!")
} else {
fmt.Println("File has been modified!")
}
}
Custom Configuration
package main
import (
"github.com/dmoose/hashfile"
)
func main() {
// Create custom configuration
config := hashfile.Config{
CommentStyle: hashfile.PythonStyle,
BufferSize: 128 * 1024, // 128KB buffer
}
// Use custom config
writer := hashfile.NewWriter(config)
writer.ProcessFile("script.py")
reader := hashfile.NewReader(config)
valid, _ := reader.VerifyFile("script.py")
}
hashfile.GoStyle // FileIntegrity: ABCD1234
hashfile.PythonStyle # FileIntegrity: ABCD1234
hashfile.CStyle // FileIntegrity: ABCD1234
hashfile.SQLStyle -- FileIntegrity: ABCD1234
hashfile.HTMLStyle <!-- FileIntegrity: ABCD1234 -->
hashfile.ShellStyle # FileIntegrity: ABCD1234
hashfile.RubyStyle # FileIntegrity: ABCD1234
hashfile.JSStyle // FileIntegrity: ABCD1234
hashfile.CSSStyle /* FileIntegrity: ABCD1234 */
hashfile.TemplStyle const FileIntegrity = "ABCD1234"
Note: TemplStyle uses a Go constant declaration instead of a comment. Since templ files compile to Go code, this allows the integrity hash to be embedded in generated HTML comments for traceability (e.g., <!-- Template Integrity: { FileIntegrity } -->).
Auto-Detection by Extension
The library automatically selects the appropriate comment style:
| Extension |
Comment Style |
.go |
// ... |
.py |
# ... |
.c, .h, .cpp, .java, .js, .ts |
// ... |
.sql |
-- ... |
.html, .xml |
<!-- ... --> |
.sh, .bash |
# ... |
.rb |
# ... |
.css, .scss, .sass |
/* ... */ |
.templ |
const FileIntegrity = "..." |
How It Works
Algorithm Overview
-
Streaming with Sliding Window
- Reads file in chunks (default 64KB buffer)
- Maintains a small sliding window (≈30 bytes) at the end
- Calculates CRC32 of all content except the window
- Single buffer allocation for efficiency
-
At End of File
- Examines the window for existing integrity comment
- If found and correct: writes temp file, compares to original, no-op if identical
- If found but wrong: updates comment with new CRC
- If not found: adds new comment
-
Content Hashing
- CRC32 includes all file content EXCEPT the integrity comment line itself
- All content including trailing newlines gets CRC'd
- If a file doesn't end with a newline, one is added and CRC'd before adding the comment
-
No-Op Optimization
- Always writes to temp file during streaming
- If existing integrity comment has matching CRC, deletes temp and leaves original untouched
- When file is modified, preserves attributes (permissions, ownership) during replacement
Example
Before:
package main
func main() {
println("Hello")
}
After adding integrity:
package main
func main() {
println("Hello")
}
// FileIntegrity: A1B2C3D4
Example with CSS:
body {
color: red;
}
/* FileIntegrity: E5F6A7B8 */
Example with Templ:
package components
templ Hello() {
<div>Hello</div>
}
const FileIntegrity = "C9D0E1F2"
If you modify the code:
package main
func main() {
println("Hello, World!") // Changed
}
// FileIntegrity: A1B2C3D4 // Now invalid!
Verification will fail until you run hashfile add again to update the hash.
The streaming algorithm is optimized for efficiency:
- Memory: Single 64KB buffer allocation (configurable)
- I/O: Buffered reads/writes, minimal system calls
- Processing: Each byte processed exactly once for CRC
- Allocations: Minimal heap allocations (30 for process, 7 for verify on ~26KB file)
Benchmark results (Apple M1 Max):
BenchmarkProcessFile-10 3615 304277 ns/op 71972 B/op 30 allocs/op
BenchmarkVerifyFile-10 66835 17837 ns/op 66386 B/op 7 allocs/op
On a ~26KB test file: ~3,600 process operations/sec, ~66,000 verify operations/sec.
Testing
Run the test suite:
# All tests
make test
# With coverage report
make test-coverage
# Benchmarks
make bench
Building
# Build binary
make build
# Build and test
make all
# Install to $GOPATH/bin
make install
# Clean artifacts
make clean
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass:
make test
- Run linter:
make lint (requires golangci-lint)
- Submit a pull request
License
MIT License - see LICENSE file for details.
FAQ
Q: Why CRC32 instead of SHA256 or other cryptographic hash?
A: This tool is designed for detecting accidental changes, not malicious tampering. CRC32 is extremely fast and sufficient for this purpose. If you need cryptographic security, use proper code signing tools.
Q: What if I need to edit a file with an integrity comment?
A: Just edit it normally. The integrity comment will show the file has changed. Run hashfile add again to update the comment with the new hash.
Q: Does this work with binary files?
A: Technically yes, but don't. Will append text to binary file which is almost certainly not what you wanted.
Q: Can I use this in CI/CD pipelines?
A: Yes! Use hashfile verify -q *.go and check the exit code. It's silent and fast.
Q: What happens if I manually change the hash in the comment?
A: Verification will fail. The hash must match the actual CRC32 of the file content.
Q: Does it modify the file's timestamp?
A: Only if the file actually needs to be updated. If the hash is already correct (no-op case), the file is not touched and timestamps are preserved.