Snakemake¶
Snakemake工作流程管理系统是为创建 可重复和可扩展的 数据分析的工具。工作流程使用人类可读的基于Python的语言描述。 它们可以无缝扩展到服务器,集群,网格和云环境,而无需修改工作流程定义。 最后,Snakemake工作流程可能包含所需软件的说明,这些软件将自动部署到任何执行环境。
快速上手¶
Snakemake工作流程本质上是由定义 规则 的声明性代码扩展而来的Python脚本。规则描述了如何从 输入文件 创建 输出文件 。
rule targets:
input:
"plots/dataset1.pdf",
"plots/dataset2.pdf"
rule plot:
input:
"raw/{dataset}.csv"
output:
"plots/{dataset}.pdf"
shell:
"somecommand {input} {output}"
- 与GNU Make类似,可以根据顶部的伪规则指定目标;
- 对于每个目标文件和中间文件,可以创建规则定义如何从输入文件创建它们;
- Snakemake通过匹配文件名来确定规则依赖;
- 输入和输出文件可以包含多个命名通配符。
- 规则可以使用shell命令,Python代码或外部Python或R脚本,从输入文件创建输出文件;
- Snakemake工作流程可以在 工作站, 集群, 网格 和 云环境 而无需修改。作业调度可以限制任意资源,例如,可用的CPU内核,内存或GPU;
- Snakemake可以使用 Conda 或 Singularity 自动部署工作流城所需的软件依赖;
- Snakemake可以使用Amazon S3,Google Storage,Dropbox,FTP,WebDAV,SFTP和iRODS访问输入或输出文件,也可通过HTTP和HTTPS访问输入文件。
新手入门¶
获得关于Snakemake初步印象,可参阅 简介幻灯片 或观看 演示视频 。 有关Snakemake的新闻通过 Twitter 发布。 学习Snakemake,请参照 Snakemake Tutorial ,然后参阅 FAQ 。
支持¶
- 有关版本,请参阅 Changelog 。
- 查看 常见问题解答(FAQ) 。
- 如有疑问,请发贴在 stack overflow 。
- 使用 mailing list 与其他Snakemake用户讨论。 请不要在那里发布问题。使用stack overflow提问题。
- 对于错误和新功能请求,请使用 issue tracker 。
- 有关贡献,请访问 bitbucket,并阅读 guidelines 。
相关资源¶
- Snakemake Wrappers Repository
- Snakemake Wrapper Repository是一个可重复使用的包装器集合,可以快速使用Snakemake规则和工作流程中的流行工具。
- Snakemake Workflows Project
- 该项目提供了一系列高质量的模块化和可重复使用的工作流程。 提供的代码还应作为如何使用Snakemake构建生产工作流程的最佳实践。 邀请每位用户contribute。
- Snakemake Profiles Project
- 该项目为各种执行环境提供Snakemake配置文件。 如果找不到,请考虑contribute。
- Bioconda
- 通过定义使用的软件版本和提供二进制文件,Snakemake可以使用Bioconda创建完全可重现的工作流程。
使用Snakemake的文献¶
下文是使用Snakemake进行分析的 不完整列表。请考虑添加自己的相应文献。
- Doris et al. 2018. Spt6 is required for the fidelity of promoter selection. Molecular Cell.
- Karlsson et al. 2018. Four evolutionary trajectories underlie genetic intratumoral variation in childhood cancer. Nature Genetics.
- Planchard et al. 2018. The translational landscape of Arabidopsis mitochondria. Nucleic acids research.
- Schult et al. 2018. Effect of UV irradiation on Sulfolobus acidocaldarius and involvement of the general transcription factor TFB3 in the early UV response. Nucleic acids research.
- Goormaghtigh et al. 2018. Reassessing the Role of Type II Toxin-Antitoxin Systems in Formation of Escherichia coli Type II Persister Cells. mBio.
- Ramirez et al. 2018. Detecting macroecological patterns in bacterial communities across independent studies of global soils. Nature microbiology.
- Amato et al. 2018. Evolutionary trends in host physiology outweigh dietary niche in structuring primate gut microbiomes. The ISME journal.
- Uhlitz et al. 2017. An immediate–late gene expression module decodes ERK signal duration. Molecular Systems Biology.
- Akkouche et al. 2017. Piwi Is Required during Drosophila Embryogenesis to License Dual-Strand piRNA Clusters for Transposon Repression in Adult Ovaries. Molecular Cell.
- Beatty et al. 2017. Giardia duodenalis induces pathogenic dysbiosis of human intestinal microbiota biofilms. International Journal for Parasitology.
- Meyer et al. 2017. Differential Gene Expression in the Human Brain Is Associated with Conserved, but Not Accelerated, Noncoding Sequences. Molecular Biology and Evolution.
- Lonardo et al. 2017. Priming of soil organic matter: Chemical structure of added compounds is more important than the energy content. Soil Biology and Biochemistry.
- Beisser et al. 2017. Comprehensive transcriptome analysis provides new insights into nutritional strategies and phylogenetic relationships of chrysophytes. PeerJ.
- Piro et al 2017. MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling. Microbiome.
- Dimitrov et al 2017. Successive DNA extractions improve characterization of soil microbial communities. PeerJ.
- de Bourcy et al. 2016. Phylogenetic analysis of the human antibody repertoire reveals quantitative signatures of immune senescence and aging. PNAS.
- Bray et al. 2016. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology.
- Etournay et al. 2016. TissueMiner: a multiscale analysis toolkit to quantify how cellular processes create tissue dynamics. eLife Sciences.
- Townsend et al. 2016. The Public Repository of Xenografts Enables Discovery and Randomized Phase II-like Trials in Mice. Cancer Cell.
- Burrows et al. 2016. Genetic Variation, Not Cell Type of Origin, Underlies the Majority of Identifiable Regulatory Differences in iPSCs. PLOS Genetics.
- Ziller et al. 2015. Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nature Methods.
- Li et al. 2015. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biology.
- Schmied et al. 2015. An automated workflow for parallel processing of large multiview SPIM recordings. Bioinformatics.
- Chung et al. 2015. Whole-Genome Sequencing and Integrative Genomic Analysis Approach on Two 22q11.2 Deletion Syndrome Family Trios for Genotype to Phenotype Correlations. Human Mutation.
- Kim et al. 2015. TUT7 controls the fate of precursor microRNAs by using three different uridylation mechanisms. The EMBO Journal.
- Park et al. 2015. Ebola Virus Epidemiology, Transmission, and Evolution during Seven Months in Sierra Leone. Cell.
- Břinda et al. 2015. RNF: a general framework to evaluate NGS read mappers. Bioinformatics.
- Břinda et al. 2015. Spaced seeds improve k-mer-based metagenomic classification. Bioinformatics.
- Spjuth et al. 2015. Experiences with workflows for automating data-intensive bioinformatics. Biology Direct.
- Schramm et al. 2015. Mutational dynamics between primary and relapse neuroblastomas. Nature Genetics.
- Berulava et al. 2015. N6-Adenosine Methylation in MiRNAs. PLOS ONE.
- The Genome of the Netherlands Consortium 2014. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nature Genetics.
- Patterson et al. 2014. WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads. Journal of Computational Biology.
- Fernández et al. 2014. H3K4me1 marks DNA regions hypomethylated during aging in human stem and differentiated cells. Genome Research.
- Köster et al. 2014. Massively parallel read mapping on GPUs with the q-group index and PEANUT. PeerJ.
- Chang et al. 2014. TAIL-seq: Genome-wide Determination of Poly(A) Tail Length and 3′ End Modifications. Molecular Cell.
- Althoff et al. 2013. MiR-137 functions as a tumor suppressor in neuroblastoma by downregulating KDM1A. International Journal of Cancer.
- Marschall et al. 2013. MATE-CLEVER: Mendelian-Inheritance-Aware Discovery and Genotyping of Midsize and Long Indels. Bioinformatics.
- Rahmann et al. 2013. Identifying transcriptional miRNA biomarkers by integrating high-throughput sequencing and real-time PCR data. Methods.
- Martin et al. 2013. Exome sequencing identifies recurrent somatic mutations in EIF1AX and SF3B1 in uveal melanoma with disomy 3. Nature Genetics.
- Czeschik et al. 2013. Clinical and mutation data in 12 patients with the clinical diagnosis of Nager syndrome. Human Genetics.
- Marschall et al. 2012. CLEVER: Clique-Enumerating Variant Finder. Bioinformatics.