The P-clouds package is designed to identify repeat structure in large eukaryotic genomes using oligonucleotide counts. It works efficiently on a single desktop computer with 1 Gb memory. The basic program is described in Gu et al. (2008), below, with more details, analysis of human genome, and description of element-specific P-clouds in de Koning et al. (2011).





Version 1.0

An even newer version that includes a full pipeline for running vertebrate genomes is in preparation.

Version 0.9 (modifications and scripts by Jason de Koning and Kathryn Hall)

A new manual is included in the zip file, along with the source code and some pre and post processing code.

Version 0.1.1 (Wanjun Gu)

A Linux executable is available here (includes manual and sample control file).

Sample files include input and output (P-cloud assignment, annotation, oligo counts, P-cloud information) for the human X chromosome.

All files are compressed in .zip format.

Source code is available in the new version.




New manual here. Old manual here. It is pretty brief at this time, so please contact us with questions or requests.




Original concept by W Gu and D Pollock, original programming by W Gu, with additions and scripts by A.P.J de Koning and Kathryn Hall.



  • W. Gu, T. A. Castoe, D. J. Hedges, M. A. Batzer, and D. D. Pollock, “Identification of repeat structure in large genomes using repeat probability clouds .”Anal Biochem. 2008 Sep 1;380(1):77-83. Epub 2008 May 20.

  • A. P. J. de Koning, W. Gu, T. A. Castoe, M. A. Batzer, and D. D. Pollock, “Repetitive elements may comprise over two-thirds of the
    human genome.” Plos Genetics, 7(12):e1002384 (2011).


