same relies on a robust caching strategy to ensure that tasks that have already been executed with the exact same inputs are never re-run unnecessarily. This is achieved through a combination of a Content Addressable Store (CAS) and a strict hashing protocol.
Content Addressable Store (CAS)
The build system persists build metadata and artifacts in a local Content Addressable Store, located at.same/store.
Unlike traditional filesystems where files are retrieved by path, a CAS retrieves data based on its content digest. However, to efficiently map tasks to their results, same uses a Build Info index strategy:
- Task Identity: The store indexes build information using the SHA-256 hash of the Task Name.
- Build Info: This JSON metadata contains the computed Input Hash and Output Hash for a specific execution of a task.
- Artifact Storage: The actual output files (artifacts) are tracked and verified against their Output Hashes.
Hashing Strategy
Central to the caching mechanism is theHasher (implemented in internal/adapters/fs/hasher.go), which uses XXHash (v2) for high-performance, non-cryptographic hashing.
The system distinguishes between two critical types of hashes:
Input Hash
The Input Hash is a unique fingerprint representing everything that could possibly affect the outcome of a task. If the Input Hash changes, the task must be re-run. TheComputeInputHash method aggregates the following components into a single hash:
- Task Definition: The task’s name, command arguments, and explicit dependencies.
- Tools: The resolved versions of all tools used by the task (sorted by key for determinism).
- Environment Variables: All environment variables injected into the task (sorted by key).
- Input Files: The content of all file paths specified in the
inputsection.- If an input is a directory, the hasher recursively walks the directory and hashes every file.
- File names and their contents are both included in the hash.
Output Hash
The Output Hash represents the state of the artifacts produced by the task. TheComputeOutputHash method computes this by:
- Scanning the directories or files specified in the
outputsection. - Sorting the list of output files to ensure deterministic ordering (since filesystem traversal order is not guaranteed).
- Recursively hashing the content of every output file.
Determining Cache Hits
When the scheduler prepares to run a task, it performs the following check:- Compute Current Input Hash: The system calculates the Input Hash based on the current state of files and configuration.
- Lookup Store: It checks
.same/storefor a generic record corresponding to the task. - Compare: It reads the stored
BuildInfoJSON.- Hit: If
Stored.InputHash == Current.InputHash, the task is skipped. The system assumes the existing artifacts (outputs) are valid. - Miss: If the hashes differ (or no record exists), the task is executed.
- Hit: If