konect logo
KONECT
KONECT > Networks > TREC (disks 4–5)

TREC (disks 4–5)

About this network

This is the bipartite network of 556,000 text documents from the Text Retrieval Conference's (TREC) Disks 4 and 5, containing 1.1 million words. Each edge represents one document-word inclusion.

Network info

CodeTR
Category Text
Data source http://www.nist.gov/tac/data/data_desc.html#TREC
Vertex type Document, word
Edge type Inclusion
FormatBipartite: Edges connect two types of nodes Bipartite
Edge weightsMultiple unweighted: Multiple edges are possible Multiple unweighted
Size2,285,379 = 1,729,302 + 556,077 vertices (documents + words)
Volume151,632,178 edges (inclusions)
Unique volume83,629,405 edges (inclusions)
Average degree (overall)175.37 edges / vertex
Average document degree272.68 edges / vertex
Average word degree129.24 edges / vertex
Fill0.00012819 edges / vertex2
Maximum degree457,437 edges
Wedge count1,604,790,310,718
Claw count5.8090382597882208 Χ 1016
Power law exponent (estimated)1.5037 Β± 0.0004
Gini coefficient85.4%
Relative edge distribution entropy80.9%
Diameter7 edges
90-percentile effective diameter3.81 edges
Mean shortest path length3.40 edges
Spectral norm44117.
Edge multiplicity distribution of the TREC (disks 4–5) network
Edge multiplicity distribution
Cumulative edge multiplicity distribution of the TREC (disks 4–5) network
Cumulative edge multiplicity distribution
Document degree distribution of the TREC (disks 4–5) network
Document degree distribution
Word degree distribution of the TREC (disks 4–5) network
Word degree distribution
Document degree distribution of the TREC (disks 4–5) network
Document degree distribution
Word degree distribution of the TREC (disks 4–5) network
Word degree distribution
Degree distribution of the TREC (disks 4–5) network
Degree distribution
Document degree distribution of the TREC (disks 4–5) network
Document degree distribution
Word degree distribution of the TREC (disks 4–5) network
Word degree distribution
Distance distribution of the TREC (disks 4–5) network
Distance distribution
Distance distribution on a logistic scale of the TREC (disks 4–5) network
Distance distribution on a logistic scale
Spectral distribution of the eigenvalues of A of the TREC (disks 4–5) network
Spectral distribution of the eigenvalues of A
Spectral distribution of the eigenvalues of N of the TREC (disks 4–5) network
Spectral distribution of the eigenvalues of N
Spectral distribution of the eigenvalues of L of the TREC (disks 4–5) network
Spectral distribution of the eigenvalues of L
Cumulative spectral distribution of A of the TREC (disks 4–5) network
Cumulative spectral distribution of A
Cumulative spectral distribution of N of the TREC (disks 4–5) network
Cumulative spectral distribution of N
Cumulative spectral distribution of L of the TREC (disks 4–5) network
Cumulative spectral distribution of L

Downloads

TSV file:downloadgottron-trec.tar.bz2 (282.55 MiB)

References

[1] Trec (disks 4–5) network dataset -- KONECT, April 2017. [ http ]
[2] National Institute of Standards and Technology. Text REtrieval Conference (TREC) English documents. http://trec.nist.gov/data/docs_eng.html, August 2010. Volume 4 & 5.

BibTeX