uploadCustomer Data Intake

HITRUST/HIPAA-compliant upload channel for customers to stream PHI documents directly into the workspace, where they are logged, retained, and projected as signal events for downstream pipelines.

For customers who need to upload files without API credentials or technical setup, operators can generate shareable upload links. Each link maps to a specific workspace and customer, and grants the holder access to a drag-and-drop upload page - no login required.

How It Works

  1. Generate a link - An operator creates an upload link for a specific customer through the API. The link has a configurable expiration (up to 30 days) and a maximum upload count (up to 10,000 files).

  2. Share the URL - The generated URL points to a self-contained upload page. Send it to the customer via email, chat, or any other channel.

  3. Customer uploads files - The customer opens the link in any browser and drags files onto the page, or clicks to browse. No account, API key, or technical knowledge required.

  4. Files land in the intake pipeline - Uploaded files follow the same intake pipeline as API-submitted files, including metadata tracking and audit logging.

Upload links have four possible states:

Status
Meaning

Active

Link is valid and accepting uploads

Expired

The link's expiration time has passed

Revoked

An operator manually revoked the link

Exhausted

The link reached its maximum upload count

Operators can revoke links at any time. Revoked and expired links show a clear error message to anyone who tries to use them.

Supported File Types

The upload page accepts PDF, Word documents, PowerPoint presentations, CSV files, JPEG, and PNG files up to 100 MB each. Uploaded files are validated against their declared content type using magic byte detection - if the file content does not match the declared type, the upload is rejected.

CSV support enables bulk data intake workflows where customers upload structured data (patient rosters, appointment lists, insurance records) through the same shareable link mechanism used for document uploads.

Download

Operators can download files that were submitted through intake links. Downloads are scoped through the intake link - an operator can only download uploads visible on the corresponding upload listing. If the upload or the underlying file no longer exists (for example, after a right-to-be-forgotten deletion), the download returns a not-found error indistinguishable from a missing upload, preventing information leakage about deleted records. Downloads are logged as PHI access events in the audit trail.

Duplicate Detection

When an uploaded file has the same content hash as a previously uploaded file in the same workspace, the upload response includes the ID and timestamp of the original upload. The duplicate file is still accepted and stored - this is informational only, letting integrators and the upload UI surface duplicate warnings without blocking the upload.

Security

The link token itself is the authentication mechanism - no API key is exposed to the customer. Links are scoped to a single workspace and customer, time-limited, and usage-limited. The upload page enforces strict content security policies and does not embed third-party resources.

API-Based Intake

For integrations that need programmatic upload (automated pipelines, EHR exports, partner systems), the intake API accepts files directly with HMAC-signed authentication. Each file is authenticated, checksum-verified, written to a per-customer storage path, logged to a tamper-evident audit row, and projected into a signal event that downstream pipelines can react to.

This complements the connector system, which pulls data from EHR platforms, FHIR stores, and CRMs on a schedule. Intake covers the cases connectors cannot: bulk historical loads, one-off document drops from partners who have no queryable API, and upstream systems that prefer pushing data on their own cadence rather than exposing an endpoint for Amigo to poll.

When to Use Intake vs a Connector

Scenario
Use

The source system has a queryable API (FHIR, SMART, REST, CRM)

A partner drops files into shared cloud storage on a schedule

The source system prefers to push bulk PHI on its own cadence

Intake

An operations team needs to backfill historical documents

Intake

A referral partner sends clinical summaries as discrete uploads

Intake

Intake does not replace connectors. Most workspaces will run both: connectors for the steady-state sync, intake for bulk and ad-hoc pushes.

Supported File Types

A single intake call accepts one file at a time. Supported formats include:

  • Clinical documents (PDF, CDA / CCDA)

  • FHIR bundles (JSON, NDJSON)

  • Structured exports (CSV, NDJSON)

  • Arbitrary binary payloads (attachments, scanned forms)

There is no hard format restriction at the transport layer - parsing and normalization happen downstream, and the intake channel itself is format-agnostic. The supported format list reflects what downstream parsing currently understands.

How an Upload Works

spinner
  1. Sign and send. The customer integration computes a SHA-256 over the file body and signs a canonical request string with a per-customer HMAC secret. The file bytes are sent as the raw request body so large payloads stream without buffering.

  2. Authenticate. The endpoint verifies the API key, the workspace binding, the HMAC signature, and that the request timestamp is within a short freshness window. Mismatches reject the upload before any bytes are retained.

  3. Stream and hash. Bytes are streamed into a per-customer path in workspace storage. The SHA-256 is computed as the body flows through; if the client's hash disagrees with what actually arrived, the partial object is deleted and the caller receives a validation error.

  4. Log and emit. A row is written to the workspace's intake audit log with the file identifier, customer slug, storage path, size, hash, and timestamp. The upload is then projected as a signal event that downstream pipelines can subscribe to.

Authentication

Intake uses two layers of authentication stacked on top of each other:

  • API key. The request carries a workspace API key. This scopes the caller to a workspace and enforces role-based permissions.

  • Per-customer HMAC. On top of the API key, every upload is signed with a secret that belongs to a specific customer slug. The slug identifies which upstream entity the file belongs to, and the HMAC proves the request came from that entity and has not been replayed or tampered with.

The two-layer design lets a single workspace accept uploads from multiple upstream systems without giving any one of them access to the others' secrets. Rotating a customer's HMAC revokes their upload access without affecting the workspace's API key.

Audit Log

Every accepted upload writes an immutable row containing:

  • Unique upload identifier

  • Workspace and customer slug

  • Storage path the bytes were written to

  • Original filename and content type

  • SHA-256 and size in bytes

  • Actor identifier of the API key that submitted the request

  • Received-at timestamp

  • Scan status and any scan findings

  • Processed-at timestamp and processing error, if any

The log is the authoritative record of what was received from whom and when. It is queryable through the same data-access surface as the rest of the workspace, and it is retained according to the workspace's compliance policy.

Downstream Projection

Uploads do not enter the world model directly. They first land in storage and the audit log, then emit a signal event that downstream pipelines pick up. This shape preserves the world model's invariant that every fact is sourced from an event with a known provenance, confidence, and timestamp.

What a downstream pipeline does with the event depends on the file type and the workspace's configuration:

  • A CCDA or FHIR bundle can be parsed into patient, condition, encounter, and observation events and resolved against existing world model entities.

  • A CSV export can be mapped field-by-field into structured events by a workspace-specific transform.

  • A scanned form or PDF can be routed into a document-understanding pipeline before any structured events are emitted.

The intake channel itself does not prescribe a parser. Parsing and entity resolution are handled by the same pipelines that process data from connectors, so intake files benefit from the same unification, deduplication, and conflict-resolution behavior as data arriving from an EHR poll.

Compliance Posture

The intake path is built to HITRUST and HIPAA requirements:

  • Transport is TLS-only and authenticated at two layers.

  • Storage is workspace-isolated - bytes written for one workspace are never visible to another.

  • The audit log is append-only at the application layer and protected by row-level security at the database layer.

  • Uploads are retained according to the workspace's data residency and retention policy.

Availability

Streaming ingest, audit logging, signal-event projection, and file download are live. Malware scanning and built-in document parsing for the full format list are on the roadmap and will ship without changes to the upload contract.

Last updated

Was this helpful?