# TREC (disks 4–5)

## About this network

This is the bipartite network of 556,000 text documents from the Text Retrieval Conference's (TREC) Disks 4 and 5, containing 1.1 million words. Each edge represents one document-word inclusion.

## Network info

Code | TR |

Category | ⬤ Text |

Data source | http://www.nist.gov/tac/data/data_desc.html#TREC |

Vertex type | Document, word |

Edge type | Inclusion |

Format | Bipartite |

Edge weights | Multiple unweighted |

Size | 2,285,379 = 1,729,302 + 556,077 vertices (documents + words) |

Volume | 151,632,178 edges (inclusions) |

Unique volume | 83,629,405 edges (inclusions) |

Average degree (overall) | 175.37 edges / vertex |

Average document degree | 272.68 edges / vertex |

Average word degree | 129.24 edges / vertex |

Fill | 0.00012819 edges / vertex^{2} |

Maximum degree | 457,437 edges |

Wedge count | 1,604,790,310,718 |

Claw count | 5.8090382597882208 × 10^{16} |

Power law exponent (estimated) | 1.5037 ± 0.0004 |

Gini coefficient | 85.4% |

Relative edge distribution entropy | 80.9% |

Diameter | 7 edges |

90-percentile effective diameter | 3.81 edges |

Mean shortest path length | 3.40 edges |

Spectral norm | 44117. |

## Downloads

TSV file: | gottron-trec.tar.bz2 (282.55 MiB) |

## References

[1] | Trec (disks 4–5) network dataset -- KONECT, October 2016. [ http ] |

[2] | National Institute of Standards and Technology. Text REtrieval Conference (TREC) English documents. http://trec.nist.gov/data/docs_eng.html, August 2010. Volume 4 & 5. |