PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
19811046
19821864
19831175
19841186
19851298
19869107
198711118
198825143
198946189
199052241
199156297
199266363
1993232595
19944611,056
19953431,399
19964071,806
19975612,367
19987573,124
19998964,020
200010045,024
200110436,067
200211107,177
200315578,734
2004211810,852
2005233813,190
2006264415,834
2007296318,797
2008275721,554
2009281824,372
2010287327,245
2011263629,881
2012288832,769
2013309835,867
2014379839,665
2015314242,807
2016372446,531
2017400850,539
2018370954,248
2019411458,362
2020498263,344
2021452667,870
2022546273,332
2023515678,488
2024548283,970
2025610790,077
2026116491,241