FeaturesHow It WorksPricingBlogGuide
← Blog

clinical genetics

Raw DNA Data Privacy: What Happens After You Upload

Genetic data is the most personal data you have. Here is exactly what GenoSight does with your raw DNA file — encryption, access, what is sent to AI, and deletion.

Sebastian Thorp · May 1, 2026 · 6 min read

Editorial illustration of a DNA file inside a transparent shield with a padlock, surrounded by soft botanical accents

In short

Your raw DNA file is the most personal data you have. Before you upload it anywhere, you should know exactly what happens to it. This page covers what GenoSight does with your file — how it's stored (encrypted at rest, access restricted by row-level security), what gets sent to the AI (a small set of structured findings, never the raw file itself), what isn't shared with third parties, and how deletion and portability work. The goal is full transparency: nothing about how we handle your data should be a surprise after you read this.

Why this matters more than for other data

A genetic file is unique in three ways that change how it should be handled.

It's permanent. Your password can be changed; your SSN can be reissued in extreme cases; your DNA cannot. A genetic file leaked today is leaked for the rest of your life.

It's familial. Your DNA contains information about your siblings, parents, children, and more distant relatives — none of whom consented to the upload. Carrier status, ancestry composition, and shared variants all leak family information from a single individual's file.

It's deeply personal. Your file contains markers for traits and conditions you may not know about yourself, including things you might never want to know. Trustworthy handling means leaving you in control of what surfaces and what stays untouched.

Those three properties drive the design choices below.

What GenoSight does with your raw file

For the broader context of what GenoSight is and what the report contains, the privacy story below sits inside that bigger picture: every choice about storage, AI handling, and deletion is made before any of the synthesis described in those posts can run.

Storage

When you upload a raw genotype file, it's stored in encrypted-at-rest object storage. Encryption keys are managed by the storage layer, separate from the application database.

Database access is restricted by row-level security — every query against your data is automatically filtered so that no other user (and no developer running ad-hoc queries) can read your records. This is enforced at the database layer, not in application code, so a bug in the application can't accidentally bypass it.

What gets sent to the AI

This is the question most people care about, so to be precise: your raw genotype file is never sent to the LLM.

When the analysis pipeline runs, the engines (lifestyle SNPs, PharmGKB, ClinVar) read your file from encrypted storage and extract a small set of relevant variant calls — typically a few hundred matches from a file that contains hundreds of thousands of positions. (Walk through the full five-stage pipeline.) Only those structured findings, plus the health profile you completed during onboarding, are sent to Anthropic Claude for synthesis.

The synthesis prompt looks roughly like: "Given these structured findings (variant + genotype + cited effect) and this user's profile (age, diagnoses, medications, family history, labs), produce a synthesized report." Claude never sees the rsIDs the engines didn't surface, never sees the rest of your file, and never sees raw nucleotide data.

The model also operates under a strict constraint: it can only cite findings provided in the prompt. Hallucinated variants would be caught by post-synthesis validators that re-check every claim against the engine outputs.

What isn't shared with third parties

GenoSight doesn't sell genetic data to research partners, pharmaceutical companies, or anyone else. There's no opt-in dropdown that quietly enrolls your file in research databases. The only third-party services your data touches are infrastructure providers (Supabase for database and storage; Anthropic for LLM synthesis) operating under their respective privacy commitments — and even there, only structured findings (not the raw file) cross the boundary to the LLM provider.

If that ever changes, it would happen with explicit prior notification and an opt-in choice — not buried in updated terms.

Data flow showing only structured findings sent to AI, raw file stays in encrypted storage

Deletion

You can delete your account and your data at any time. Deletion removes:

Deletion is full removal, not a "soft delete" flag — within the operational backup retention window (necessary for disaster recovery), the data is purged. Backups follow standard infrastructure retention policies and are also encrypted at rest.

If you want a copy of your data before you delete, the export endpoint returns your raw file, profile, and report contents in a portable format.

What we log (and why)

For operational reasons we log application-level events (a report was generated; a chat message was sent). These logs include identifiers — user ID, report ID — but not raw genotype data and not chat message content beyond minimal metadata needed for billing (credit usage).

Cost-tracking logs record API spend per LLM call. They don't contain prompt or response content.

How this compares to industry norms

Three reference points worth knowing.

23andMe and AncestryDNA offer research opt-in by default at signup (you can opt out). Aggregated, de-identified data is shared with research partners or, in 23andMe's case, has been subject to commercial deals with pharmaceutical partners. Both companies have suffered notable data incidents in the last several years. (How GenoSight compares functionally to other DNA analysis services.)

Smaller analysis tools (Promethease, SelfDecode, others) have varied policies — some store the raw file long-term, some don't. The relevant question is: read the privacy policy and look for explicit answers to "what gets stored, what gets shared with third parties, what gets sent to AI providers."

GenoSight's posture: encrypted storage, no third-party data sales, raw file never sent to LLMs, deletion fully removes data within standard backup retention windows. That's the design intent and the operational practice.

Real GenoSight onboarding chat capturing personal health context, with the 'GenoSight does not diagnose or prescribe' footer visible

Practical recommendations before any genetic upload

Three habits worth adopting regardless of which service you use:

  1. Read the privacy policy specifically for "research opt-in" and "third-party sharing." If it's opt-in by default and you didn't catch the toggle, change it.
  2. Use a unique strong password and 2FA. Account takeover is the most likely compromise vector.
  3. Decide what you don't want to know. Some services (and GenoSight) let you scope what gets surfaced. If there are conditions you don't want included in your report, configure that during onboarding.

Try GenoSight free

Encrypted storage. No third-party data sales. Raw file never sent to the LLM. 250 credits to start.

Key takeaways


Sources

clinical genetics

Keep reading