Saturate

A program for evaluating the degree of saturation in mutant screens

(as well as the density of essential genes and the species diversity in novel environments)

David D. Pollock

 

 
Saturate Home Page

     

Description

The SatMut package is designed to predict the number of remaining undiscovered loci in a mutant screen. It applies gamma, poisson, and mixture models ("hot spots") and provides maximum likelihood estimates along with the mean and 95% credible interval of the posterior distributions. Some bits have been written for the "R" statistical package to more quickly evaluate the posterior probabilities and burn-in of likelihood and parameters.

Downloading

This code was written a decade ago and has not been maintained in a long time, and I haven't run it myself in maybe 6 years, but there has been a recent resurgence in interest. If you download it and get it running, please let me know. The package to download is here.

Manual

A rudimentary explanation of the command file is available as part of the zip package. Graphs of sample data, models, and output (Poisson, gamma, multimodel) are also available.

If you successfully get this running, please let me know. Also, if you develop more detailed instructions to help others, I would be more than happy to put them up on the web with the program.

You'll want the commands file and a sample input file. (Note: for no good reason, a file named "infile" currently needs to exist in the folder you are running the program; it should not be named"infile.txt", even if the ".txt" ending is hidden from the browser).

Credits

Original concept by D Pollock and J Larkin, programming by D Pollock.

References

  • D. D. Pollock and J. S. Larkin, “Estimating the degree of saturation in mutant screens.” Genetics, 168(1):489-502 (2003).

Large-scale screens for loss-of-function mutants have played a significant role in recent advances in developmental biology and other fields. In such mutant screens, it is desirable to estimate the degree of “saturation” of the screen (i.e., what fraction of the possible target genes have been identified). We applied Bayesian and maximum likelihood methods for estimating the number of loci remaining undetected in large-scale screens, and produce credibility intervals to assess the uncertainty of these estimates. Since different loci may mutate to alleles with detectable phenotypes at different rates, we also incorporated variation in the degree of mutability among genes, using either gamma-distributed mutation rates or multiple discrete mutation rate classes. We examined eight published data sets from large-scale mutant screens and find that credibility intervals are much broader than implied by previous assumptions about the degree of saturation of screens. The likelihood methods presented here are a significantly better fit to data from published experiments than estimates based on the Poisson distribution, which implicitly assumes a single mutation rate for all loci. The results are reasonably robust to different models of variation in the mutability of genes. We tested our methods against mutant allele data from a region of the Drosophila melanogaster genome for which there is an independent genomics-based estimate of the number of undetected loci, and found that the number of such loci falls within the predicted credibility interval for our models. The methods we have developed may also be useful for estimating the degree of saturation in other types of genetic screens in addition to classical screens for simple loss-of-function mutants, including genetic modifier screens and screens for protein-protein interactions using the yeast two-hybrid method.

  • S. O. Suh, J. V. McHugh, D. D. Pollock, B. Liu, and M Blackwell, “The beetle gut: a hyperdiverse source of novel yeasts.” Mycology Research, 109(Pt 3):261-5 (2005).

In most species, and particularly in vertebrates, the percentage of genes absolutely required for survival, the essential genes, has not been estimated. To obtain this estimation, we used the mouse as an experimental model to carry out high-efficiency N-ethyl-N-nitrosourea (ENU) mutagenesis screens in two balancer chromosome regions, and compared our results to a third previously published screen. The number of essential genes in each region was predicted based on allele frequencies. We determined that the density of essential genes differs by up to an order of magnitude among genomic regions. This indicates that extrapolating from regional estimates to genome-wide estimates of essential genes has a huge variance. A particularly high density of essential genes on mouse Chromosome 11 coincides with a high degree of regional linkage conservation, providing a possible causal explanation for the density variation. This is the first demonstration of regional variation in essential gene density in the mouse genome.

  • K. E. Hentges, D. D. Pollock, B. Liu, and M. J. Justice, “Regional variation in the density of essential genes in mice.” PloS Genetics, 3(5):e72 (2007).

In most species, and particularly in vertebrates, the percentage of genes absolutely required for survival, the essential genes, has not been estimated. To obtain this estimation, we used the mouse as an experimental model to carry out high-efficiency N-ethyl-N-nitrosourea (ENU) mutagenesis screens in two balancer chromosome regions, and compared our results to a third previously published screen. The number of essential genes in each region was predicted based on allele frequencies. We determined that the density of essential genes differs by up to an order of magnitude among genomic regions. This indicates that extrapolating from regional estimates to genome-wide estimates of essential genes has a huge variance. A particularly high density of essential genes on mouse Chromosome 11 coincides with a high degree of regional linkage conservation, providing a possible causal explanation for the density variation. This is the first demonstration of regional variation in essential gene density in the mouse genome.