Methodology

No black box.
No vapor.

How the Evidence Tracer is built, end to end. What we call, what we infer, what we store, and — at the bottom — what we're not yet claiming.

Pilot version: EVT v0.4 · Last updated: 5 May 2026 · Sonnet 4.6

01Architecture

Stateless Cloudflare Worker. No persistent server, no VM, no container. Every scan request creates temporary AWS credentials via STS AssumeRole, runs the evidence collection, and exits. State lives in Cloudflare D1 (SQLite at the edge) for 30 days, then deletes automatically. Reports are stored in R2 object storage.

The architecture was chosen for two reasons: it makes zero-credential-persistence easy to verify (no long-running process means no credential caching), and Cloudflare's infrastructure means your data doesn't pass through a server you have to trust us to secure.

  1. You POST an IAM Role ARN + ExternalId to the Worker
  2. Worker calls STS AssumeRole for short-lived session credentials (1 hour TTL)
  3. Evidence collection fans out across AWS services using SigV4-signed requests
  4. Evidence items are written to D1, chunked per scan
  5. Claude Sonnet 4.6 runs per-control analysis against structured prompts
  6. Report is assembled from control results and stored in R2
  7. All evidence data deletes 30 days after report delivery

No long-lived AWS credentials. No write permissions. Nothing installed in your account beyond the read-only IAM role you provision via CloudFormation.

02Evidence collection

All AWS API calls are SigV4-signed from scratch (no SDK). Calls run in parallel with a concurrency cap to avoid rate limits. Raw responses — XML or JSON — are truncated to fit within analysis context windows, but CRITICAL-severity findings are never truncated. Each response is stored as a discrete evidence item with its source endpoint, timestamp, and content hash.

IAM

  • GetAccountSummary
  • ListUsers
  • GetAccountPasswordPolicy
  • ListGroups
  • GetCredentialReport
  • ListMFADevices
  • ListPolicies
  • ListRoles

S3

  • ListBuckets
  • GetBucketEncryption
  • GetBucketPolicy
  • GetBucketVersioning
  • GetPublicAccessBlock

CloudTrail

  • DescribeTrails
  • GetTrailStatus
  • GetEventSelectors
  • ListTrails

AWS Config

  • DescribeConfigRules
  • DescribeConfigurationRecorders
  • DescribeDeliveryChannels

EC2 / VPC

  • DescribeSecurityGroups
  • DescribeVpcs
  • DescribeFlowLogs
  • DescribeInstances

CloudWatch + SNS

  • DescribeAlarms
  • ListMetrics
  • ListTopics
  • ListSubscriptions

Additional services: KMS (ListKeys, GetKeyRotationStatus), GuardDuty (ListDetectors), SecurityHub (GetFindings, GetEnabledStandards), Secrets Manager (ListSecrets metadata only), WAF (ListWebACLs), RDS (DescribeDBInstances, DescribeDBSnapshots), Lambda, SSO.

03Control mapping

Mapping from evidence to SOC 2 Trust Services Criteria is deterministic — the same evidence set will always produce the same control mapping. There is no learned model making this connection; it is a hand-coded rule table updated as the API coverage expands.

CC6.1Logical Access — Restricted Access
Sources: IAM users, password policy, MFA status, credential report, KMS key policies
CC6.2System Access Provisioning
Sources: IAM user creation dates, group memberships, SSO configuration, access key metadata
CC6.3Role-Based Access & Segregation
Sources: IAM roles, policy attachments, trust relationships, cross-account access
CC6.6External Threat Boundary
Sources: VPC configuration, security groups, WAF Web ACLs, flow logs status, EC2 instances
CC6.7Restricted Data Movement & Encryption
Sources: S3 encryption & public access, KMS keys & rotation, RDS encryption, Secrets Manager
CC7.1Configuration & Vulnerability Management
Sources: AWS Config recorders, delivery channels, Config rules compliance
CC7.2Security Event Monitoring
Sources: CloudTrail trails, multi-region flag, log validation, event selectors, CloudWatch alarms
CC7.3Anomaly Detection
Sources: GuardDuty detectors, CloudWatch alarms, SNS topics, CloudWatch metrics
CC7.4Incident Response
Sources: SecurityHub findings, GuardDuty status, SNS subscriptions, Security Hub standards
CC8.1Change Management
Sources: CloudTrail event selectors, Config rules, Lambda function inventory
CC5.2Technology Controls
Sources: AWS Config, SecurityHub standards, GuardDuty, Config rules
CC9.2Business Continuity & Recovery
Sources: RDS snapshots and backup retention, S3 versioning & replication status

Because the mapping is deterministic, every gap finding traces directly to a specific API call and response field. There is no "the AI decided" — there is "this API call returned Encrypted: false, here it is."

04Reasoning layer

After evidence is collected and heuristic scores are computed, each of the 12 controls is analyzed individually by Claude Sonnet 4.6. Each call receives a structured prompt containing: the control definition (AICPA criteria), the relevant evidence items for that control, and a JSON schema for the expected output.

The model is instructed to: cite specific evidence IDs in every finding, avoid hallucinating controls or permissions not present in the evidence, flag when evidence is insufficient rather than guessing, and produce copy-pasteable AWS CLI remediation commands anchored to real resource ARNs from the evidence.

The output contract is enforced: if the model returns malformed JSON or omits required fields, the call is retried. If a call fails after retries (including rate-limit backoff via Cloudflare Queues), the control is marked inconclusive rather than silently dropped.

Each control analysis runs in its own queue message with a 30-second wall-time limit. All 12 run in parallel (at 2 concurrent to avoid Anthropic rate limits). Total analysis time: 60–120 seconds for a typical account.

05Scoring

Two scores are computed for each scan. Both are 0–100. Neither maps to a binary "audit-ready" claim.

Gap Score — percentage of checkpoints meeting their thresholds, weighted by severity of failing findings:

Finding severityScore deduction
CRITICAL−25 points
HIGH−15 points
MEDIUM−5 points
LOW / INFO−1 point
Score rangeInterpretation
80–100Low audit risk — known gaps, manageable
60–79Moderate — auditor will likely raise findings
<60High — remediate before scheduling audit

Freshness Score — recency of the evidence underpinning the analysis. Inputs: IAM access key age vs 90-day rotation policy, credential last-used dates, CloudTrail log delivery recency, Config rule last-evaluation timestamps. Below 70 indicates configurations that auditors commonly flag for staleness.

06Traceability

Every finding in the paid report is anchored to the evidence that produced it. Each evidence item carries:

Because the scanner is open-source, an auditor can clone the repo, point it at their client's account, run the exact same calls, and verify that our evidence matches what they collect independently. The hash creates a chain of custody between the raw response and the finding that cited it.

This is why the scan is open-source. Not as a values statement — as a trust mechanism. You can check our work because the work is checkable.

07What we're not yet claiming

The SOC 2 framework covers nine criteria series. AWS API calls can surface evidence for roughly 15–20% of those criteria — the parts with API endpoints. The remaining 80% (governance processes, risk assessments, written policies, access reviews, vendor risk, HR controls, incident response exercises) have no API. We don't assess those.

"Pre-audit readiness" means the AWS infrastructure layer is assessed. It does not mean your auditor will have no findings. It means you'll have fewer surprises on the technical side, and the ones you do have will come with traceable evidence and copy-pasteable CLI commands to fix them.

We don't publish an accuracy number. We don't have one that's meaningful enough to publish yet.

This is a Type I tool. Continuous monitoring (Type II evidence collection over time) is on the roadmap, not shipped.

08Your data & deletion

Here is exactly what we store, where, and for how long — no buried clauses.

WhatWhereHow long
AWS API responses (evidence)Cloudflare D130 days, then auto-deleted
Gap scores & control analysisCloudflare D130 days, then auto-deleted
Generated report (HTML + JSON)Cloudflare R230 days, then auto-deleted
Finding edits & resolved marksCloudflare D130 days, deleted with scan
Data access logCloudflare D130 days, deleted with scan
Payment recordsStripeAs required by law (~7 years)

What we never store: AWS credentials, secret values, application data, customer data, code, or anything your application stores about your users. The IAM role sessions are 1-hour TTL and never persisted after the scan completes.

Delete anytime. Every scan page has a "Delete all my scan data" button. One click wipes all D1 rows and R2 objects for that scan immediately — evidence, report, edits, access log, everything. No email required. Payment records remain with Stripe as required by law but contain no scan data.

Access log. Every time your scan data is accessed — by you or anyone with your token — it is recorded and visible to you on the scan page under "Access log." You can see every download, every Gideon query, every results view. Nothing happens to your data without it appearing there.

What we share: Stripe processes payments. Anthropic's Claude API analyzes your evidence for the paid report. Neither receives your AWS account ID or org name. That's the complete list of third parties.

If you want deletion before the 30-day window and can't access the scan page, email mehta.arja@northeastern.edu with your scan ID.

Questions, corrections, or methodology challenges — talk to the founder directly.