No black box.
No vapor.
How the Evidence Tracer is built, end to end. What we call, what we infer, what we store, and — at the bottom — what we're not yet claiming.
- 01 — Architecture
- 02 — Evidence collection
- 03 — Control mapping
- 04 — Reasoning layer
- 05 — Scoring
- 06 — Traceability
- 07 — Honesty note
- 08 — Your data & deletion
01Architecture
Stateless Cloudflare Worker. No persistent server, no VM, no container. Every scan request creates temporary AWS credentials via STS AssumeRole, runs the evidence collection, and exits. State lives in Cloudflare D1 (SQLite at the edge) for 30 days, then deletes automatically. Reports are stored in R2 object storage.
The architecture was chosen for two reasons: it makes zero-credential-persistence easy to verify (no long-running process means no credential caching), and Cloudflare's infrastructure means your data doesn't pass through a server you have to trust us to secure.
- You POST an IAM Role ARN + ExternalId to the Worker
- Worker calls STS AssumeRole for short-lived session credentials (1 hour TTL)
- Evidence collection fans out across AWS services using SigV4-signed requests
- Evidence items are written to D1, chunked per scan
- Claude Sonnet 4.6 runs per-control analysis against structured prompts
- Report is assembled from control results and stored in R2
- All evidence data deletes 30 days after report delivery
No long-lived AWS credentials. No write permissions. Nothing installed in your account beyond the read-only IAM role you provision via CloudFormation.
02Evidence collection
All AWS API calls are SigV4-signed from scratch (no SDK). Calls run in parallel with a concurrency cap to avoid rate limits. Raw responses — XML or JSON — are truncated to fit within analysis context windows, but CRITICAL-severity findings are never truncated. Each response is stored as a discrete evidence item with its source endpoint, timestamp, and content hash.
IAM
- GetAccountSummary
- ListUsers
- GetAccountPasswordPolicy
- ListGroups
- GetCredentialReport
- ListMFADevices
- ListPolicies
- ListRoles
S3
- ListBuckets
- GetBucketEncryption
- GetBucketPolicy
- GetBucketVersioning
- GetPublicAccessBlock
CloudTrail
- DescribeTrails
- GetTrailStatus
- GetEventSelectors
- ListTrails
AWS Config
- DescribeConfigRules
- DescribeConfigurationRecorders
- DescribeDeliveryChannels
EC2 / VPC
- DescribeSecurityGroups
- DescribeVpcs
- DescribeFlowLogs
- DescribeInstances
CloudWatch + SNS
- DescribeAlarms
- ListMetrics
- ListTopics
- ListSubscriptions
Additional services: KMS (ListKeys, GetKeyRotationStatus), GuardDuty (ListDetectors), SecurityHub (GetFindings, GetEnabledStandards), Secrets Manager (ListSecrets metadata only), WAF (ListWebACLs), RDS (DescribeDBInstances, DescribeDBSnapshots), Lambda, SSO.
03Control mapping
Mapping from evidence to SOC 2 Trust Services Criteria is deterministic — the same evidence set will always produce the same control mapping. There is no learned model making this connection; it is a hand-coded rule table updated as the API coverage expands.
Because the mapping is deterministic, every gap finding traces directly to a specific API call and response field. There is no "the AI decided" — there is "this API call returned Encrypted: false, here it is."
04Reasoning layer
After evidence is collected and heuristic scores are computed, each of the 12 controls is analyzed individually by Claude Sonnet 4.6. Each call receives a structured prompt containing: the control definition (AICPA criteria), the relevant evidence items for that control, and a JSON schema for the expected output.
The model is instructed to: cite specific evidence IDs in every finding, avoid hallucinating controls or permissions not present in the evidence, flag when evidence is insufficient rather than guessing, and produce copy-pasteable AWS CLI remediation commands anchored to real resource ARNs from the evidence.
The output contract is enforced: if the model returns malformed JSON or omits required fields, the call is retried. If a call fails after retries (including rate-limit backoff via Cloudflare Queues), the control is marked inconclusive rather than silently dropped.
Each control analysis runs in its own queue message with a 30-second wall-time limit. All 12 run in parallel (at 2 concurrent to avoid Anthropic rate limits). Total analysis time: 60–120 seconds for a typical account.
05Scoring
Two scores are computed for each scan. Both are 0–100. Neither maps to a binary "audit-ready" claim.
Gap Score — percentage of checkpoints meeting their thresholds, weighted by severity of failing findings:
| Finding severity | Score deduction |
|---|---|
| CRITICAL | −25 points |
| HIGH | −15 points |
| MEDIUM | −5 points |
| LOW / INFO | −1 point |
| Score range | Interpretation |
|---|---|
| 80–100 | Low audit risk — known gaps, manageable |
| 60–79 | Moderate — auditor will likely raise findings |
| <60 | High — remediate before scheduling audit |
Freshness Score — recency of the evidence underpinning the analysis. Inputs: IAM access key age vs 90-day rotation policy, credential last-used dates, CloudTrail log delivery recency, Config rule last-evaluation timestamps. Below 70 indicates configurations that auditors commonly flag for staleness.
06Traceability
Every finding in the paid report is anchored to the evidence that produced it. Each evidence item carries:
- The exact AWS API endpoint called (e.g.
iam.amazonaws.com/GetAccountSummary) - The request timestamp in ISO 8601 UTC
- The raw response body (truncated if >50KB, but CRITICAL findings are never truncated)
- A SHA-256 hash of the evidence item for tamper-evidence
- The AWS region the call was issued against
Because the scanner is open-source, an auditor can clone the repo, point it at their client's account, run the exact same calls, and verify that our evidence matches what they collect independently. The hash creates a chain of custody between the raw response and the finding that cited it.
This is why the scan is open-source. Not as a values statement — as a trust mechanism. You can check our work because the work is checkable.
07What we're not yet claiming
The SOC 2 framework covers nine criteria series. AWS API calls can surface evidence for roughly 15–20% of those criteria — the parts with API endpoints. The remaining 80% (governance processes, risk assessments, written policies, access reviews, vendor risk, HR controls, incident response exercises) have no API. We don't assess those.
"Pre-audit readiness" means the AWS infrastructure layer is assessed. It does not mean your auditor will have no findings. It means you'll have fewer surprises on the technical side, and the ones you do have will come with traceable evidence and copy-pasteable CLI commands to fix them.
We don't publish an accuracy number. We don't have one that's meaningful enough to publish yet.
This is a Type I tool. Continuous monitoring (Type II evidence collection over time) is on the roadmap, not shipped.
08Your data & deletion
Here is exactly what we store, where, and for how long — no buried clauses.
| What | Where | How long |
|---|---|---|
| AWS API responses (evidence) | Cloudflare D1 | 30 days, then auto-deleted |
| Gap scores & control analysis | Cloudflare D1 | 30 days, then auto-deleted |
| Generated report (HTML + JSON) | Cloudflare R2 | 30 days, then auto-deleted |
| Finding edits & resolved marks | Cloudflare D1 | 30 days, deleted with scan |
| Data access log | Cloudflare D1 | 30 days, deleted with scan |
| Payment records | Stripe | As required by law (~7 years) |
What we never store: AWS credentials, secret values, application data, customer data, code, or anything your application stores about your users. The IAM role sessions are 1-hour TTL and never persisted after the scan completes.
Delete anytime. Every scan page has a "Delete all my scan data" button. One click wipes all D1 rows and R2 objects for that scan immediately — evidence, report, edits, access log, everything. No email required. Payment records remain with Stripe as required by law but contain no scan data.
Access log. Every time your scan data is accessed — by you or anyone with your token — it is recorded and visible to you on the scan page under "Access log." You can see every download, every Gideon query, every results view. Nothing happens to your data without it appearing there.
What we share: Stripe processes payments. Anthropic's Claude API analyzes your evidence for the paid report. Neither receives your AWS account ID or org name. That's the complete list of third parties.
If you want deletion before the 30-day window and can't access the scan page, email mehta.arja@northeastern.edu with your scan ID.