Repack Download Human-g1k-v37-decoy.fasta Jun 2026
Different sources name decoys inconsistently ( >phiX174 vs. >gi|9626372|ref|NC_001422.1| ). Aligners see them as different sequences → different mapping outcomes.
The "decoy" sequences added to this FASTA file (specifically derived from the HuRef assembly and E. coli) serve as "sinks" for reads that do not belong to the standard human chromosomes.
bwa mem -M human-g1k-v37-decoy.fasta sample_R1.fastq sample_R2.fastq download human-g1k-v37-decoy.fasta
(Or use a bedtools subtract command with a decoy BED file.)
bwa mem -M -t 8 \ human_g1k_v37_decoy.fasta \ sample_R1.fastq.gz \ sample_R2.fastq.gz \ | samtools view -bS - \ | samtools sort -o sample_aligned.bam - The "decoy" sequences added to this FASTA file
Do not confuse this with hs37d5.fasta , which is the NCBI/Genome Reference Consortium version of hg19 with decoys. human-g1k-v37-decoy.fasta is the 1000 Genomes variant. They are compatible but not identical.
If you are starting a new project, use GRCh38 with decoys ( GCA_000001405.15_GRCh38_no_alt_analysis_set.fna ). For reproducing prior research, you must download human-g1k-v37-decoy.fasta . human-g1k-v37-decoy
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz # Verify the file integrity (MD5sum)
After downloading, you must prepare the file for use in bioinformatics pipelines like GATK or BWA. Decompress gunzip hs37d5.fa.gz Index for SAMtools samtools faidx hs37d5.fa Create a Sequence Dictionary (for GATK/Picard) gatk CreateSequenceDictionary -R hs37d5.fa Index for BWA (if aligning) bwa index hs37d5.fa 4. Key Differences to Note
If your sample has human papillomavirus, Hepatitis B, or TB, those reads will map to human genome with low quality, skewing variant calls.



