# Customer Data Intake

## Shareable Upload Links

For customers who need to upload files without API credentials or technical setup, operators can generate shareable upload links. Each link maps to a specific workspace and customer, and grants the holder access to a drag-and-drop upload page - no login required.

### How It Works

1. **Generate a link** - An operator creates an upload link for a specific customer through the API. The link has a configurable expiration (up to 30 days) and a maximum upload count (up to 10,000 files).
2. **Share the URL** - The generated URL points to a self-contained upload page. Send it to the customer via email, chat, or any other channel.
3. **Customer uploads files** - The customer opens the link in any browser and drags files onto the page, or clicks to browse. No account, API key, or technical knowledge required.
4. **Files land in the intake pipeline** - Uploaded files follow the same intake pipeline as API-submitted files, including metadata tracking and audit logging.

### Link Lifecycle

Upload links have four possible states:

| Status        | Meaning                                   |
| ------------- | ----------------------------------------- |
| **Active**    | Link is valid and accepting uploads       |
| **Expired**   | The link's expiration time has passed     |
| **Revoked**   | An operator manually revoked the link     |
| **Exhausted** | The link reached its maximum upload count |

Operators can revoke links at any time. Revoked and expired links show a clear error message to anyone who tries to use them.

### Supported File Types

The upload page accepts PDF, Word documents, PowerPoint presentations, JPEG, and PNG files up to 100 MB each.

### Security

The link token itself is the authentication mechanism - no API key is exposed to the customer. Links are scoped to a single workspace and customer, time-limited, and usage-limited. The upload page enforces strict content security policies and does not embed third-party resources.

Customer Data Intake is a direct upload channel that lets an approved customer integration stream healthcare documents straight into their workspace. Each file is authenticated, checksum-verified, written to a per-customer storage path, logged to a tamper-evident audit row, and projected into a signal event that downstream pipelines can react to.

It complements the [connector system](https://docs.amigo.ai/data/connectors-and-ehr), which pulls data from EHR platforms, FHIR stores, and CRMs on a schedule. Intake covers the cases connectors cannot: bulk historical loads, one-off document drops from partners who have no queryable API, and upstream systems that prefer pushing data on their own cadence rather than exposing an endpoint for Amigo to poll.

## When to Use Intake vs a Connector

| Scenario                                                        | Use                                                                             |
| --------------------------------------------------------------- | ------------------------------------------------------------------------------- |
| The source system has a queryable API (FHIR, SMART, REST, CRM)  | [Connector](https://docs.amigo.ai/data/connectors-and-ehr)                      |
| A partner drops files into shared cloud storage on a schedule   | [File Drop connector](https://docs.amigo.ai/connectors-and-ehr#connector-types) |
| The source system prefers to push bulk PHI on its own cadence   | **Intake**                                                                      |
| An operations team needs to backfill historical documents       | **Intake**                                                                      |
| A referral partner sends clinical summaries as discrete uploads | **Intake**                                                                      |

Intake does not replace connectors. Most workspaces will run both: connectors for the steady-state sync, intake for bulk and ad-hoc pushes.

## Supported File Types

A single intake call accepts one file at a time. Supported formats include:

* Clinical documents (PDF, CDA / CCDA)
* FHIR bundles (JSON, NDJSON)
* Structured exports (CSV, NDJSON)
* Arbitrary binary payloads (attachments, scanned forms)

There is no hard format restriction at the transport layer - parsing and normalization happen downstream, and the intake channel itself is format-agnostic. The supported format list reflects what downstream parsing currently understands.

## How an Upload Works

{% @mermaid/diagram content="flowchart LR
A\[Customer System] -- HMAC-signed stream --> B\[Intake Endpoint]
B -- verify signature --> B
B -- stream bytes --> C\[Workspace Storage]
B -- write audit row --> D\[Intake Audit Log]
C --> E\[Signal Event]
D --> E
E --> F\[Downstream Pipelines]
F --> G\[(World Model)]" %}

1. **Sign and send.** The customer integration computes a SHA-256 over the file body and signs a canonical request string with a per-customer HMAC secret. The file bytes are sent as the raw request body so large payloads stream without buffering.
2. **Authenticate.** The endpoint verifies the API key, the workspace binding, the HMAC signature, and that the request timestamp is within a short freshness window. Mismatches reject the upload before any bytes are retained.
3. **Stream and hash.** Bytes are streamed into a per-customer path in workspace storage. The SHA-256 is computed as the body flows through; if the client's hash disagrees with what actually arrived, the partial object is deleted and the caller receives a validation error.
4. **Log and emit.** A row is written to the workspace's intake audit log with the file identifier, customer slug, storage path, size, hash, and timestamp. The upload is then projected as a signal event that downstream pipelines can subscribe to.

## Authentication

Intake uses two layers of authentication stacked on top of each other:

* **Platform API key.** The request carries the standard platform-api key or JWT. This scopes the caller to a workspace and enforces role-based permissions.
* **Per-customer HMAC.** On top of the API key, every upload is signed with a secret that belongs to a specific customer slug. The slug identifies which upstream entity the file belongs to, and the HMAC proves the request came from that entity and has not been replayed or tampered with.

The two-layer design lets a single workspace accept uploads from multiple upstream systems without giving any one of them access to the others' secrets. Rotating a customer's HMAC revokes their upload access without affecting the workspace's API key.

## Audit Log

Every accepted upload writes an immutable row containing:

* Unique upload identifier
* Workspace and customer slug
* Storage path the bytes were written to
* Original filename and content type
* SHA-256 and size in bytes
* Actor identifier of the API key that submitted the request
* Received-at timestamp
* Scan status and any scan findings
* Processed-at timestamp and processing error, if any

The log is the authoritative record of what was received from whom and when. It is queryable through the same data-access surface as the rest of the workspace, and it is retained according to the workspace's compliance policy.

## Downstream Projection

Uploads do not enter the world model directly. They first land in storage and the audit log, then emit a signal event that downstream pipelines pick up. This shape preserves the world model's invariant that every fact is sourced from an event with a known provenance, confidence, and timestamp.

What a downstream pipeline does with the event depends on the file type and the workspace's configuration:

* A CCDA or FHIR bundle can be parsed into patient, condition, encounter, and observation events and resolved against existing world model entities.
* A CSV export can be mapped field-by-field into structured events by a workspace-specific transform.
* A scanned form or PDF can be routed into a document-understanding pipeline before any structured events are emitted.

The intake channel itself does not prescribe a parser. Parsing and entity resolution are handled by the same pipelines that process data from connectors, so intake files benefit from the same unification, deduplication, and conflict-resolution behavior as data arriving from an EHR poll.

## Compliance Posture

The intake path is built to HITRUST and HIPAA requirements:

* Transport is TLS-only and authenticated at two layers.
* Storage is workspace-isolated - bytes written for one workspace are never visible to another.
* The audit log is append-only at the application layer and protected by row-level security at the database layer.
* Uploads are retained according to the workspace's data residency and retention policy.

## Availability

Customer Data Intake is rolling out in stages. Streaming ingest, audit logging, and signal-event projection are live today. Malware scanning, built-in document parsing for the full format list, and a self-service customer portal for rotating secrets are on the near-term roadmap and will ship without changes to the upload contract.

Workspaces that want early access should coordinate with their Amigo solutions contact to be provisioned a customer slug and HMAC secret.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.amigo.ai/data/customer-data-intake.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
