The networks of KONECT are classified into categories, for instance social networks, interaction networks and rating networks.
- Affiliation networks are bipartite networks denoting the membership of actors in groups. Groups can be defined as narrowly as individual online communities in which users have been active (FG) or as broadly as countries (CN). The actors are mainly persons, but can also be other actors such as musical groups. Note that in all affiliation networks we consider, each actor can be in more than one group, as otherwise the network cannot be connected.
- Animal networks are networks of contacts between animals. They are the animal equivalent to human social networks. Note that datasets of websites such as Dogster (Sd) are not included here but in the (online social network) category, since the networks are generated by humans.
- Authorship networks are unweighted bipartite networks consisting of links between authors and their works. In some authorship networks such as that of scientific literature (Pa), works have typically only few authors, whereas works in other authorship networks may have many authors, as in Wikipedia articles (en).
- Citation networks consist of documents that reference each other. The primary example are scientific publications, but the category also allow patents and other types of documents that reference each other.
- Coauthorship networks are unipartite network connecting authors who have written works together, for instance academic literature, but also other types of works such as music or movies.
- Communication networks contain edges that represent individual messages between persons. Communication networks are directed and allow multiple edges. Examples of communication networks are those of emails (EN) and those of Facebook messages (Ow). Note that in some instances, edge directions are not known and KONECT can only provide an undirected network.
- Computer networks are networks of connected computers. Nodes in them are computers, and edges are connections. When speaking about networks in a computer science context, one often means only computer networks. An example is the internet topology network (TO).
- Feature networks are bipartite, and denote any kind of feature assigned to entities. Feature networks are unweighted and have edges that are not annotated with edge creation times. Examples are songs and their genres (GE).
- Folksonomies consist of tag assignments connecting a user, an item and a tag. For folksonomies, we follow the 3-bipartite projection approach and consider the three possible bipartite networks, i.e., the user–item, user–tag and item–tag networks. This allows us to apply methods for bipartite graphs to hypergraphs, which is not possible otherwise. Items that are tagged in folksonomies include bookmarks (Dui), scientific publications (Cui) and movies (Mui).
- Human contact networks are unipartite networks of actual contact between persons, i.e., talking with each other, spending time together, or at least being physically close. Usually, these datasets are collected by giving out RFID tags to people with chips that record which other people are in the vicinity. Determining when an actual contact has happened (as opposed to for instance to persons standing back to back) is a nontrivial research problem. An example is the Reality Mining dataset (RM).
- Human social networks are real-world social networks between humans. The ties must be offline, and not from an online social network. Also, the ties represent a state, as opposed to human contact networks, in which each edge represents an event.
- Hyperlink networks are the networks of web pages connected by hyperlinks.
- Infrastructure networks are networks of physical infrastructure. Examples are road networks (RO), airline connection networks (OF), and power grids (UG).
- Interaction networks are bipartite networks consisting of people and items, where each edge represents an interaction. In interaction networks, we always allow multiple edges between the same person–item pair. Examples are people writing in forums (UF), commenting on movies (Fc) or listening to songs (Ls).
- Lexical networks consist of words from natural languages and the relationships between them. Relationships can be semantic (i.e, related to the meaning of words) such as the synonym relationship (WO), associative such as when two words are associated with each other by people in experiments (EA), or denote cooccurrence, i.e., the fact that two words co-occur in text (SB). Note that lexical cooccurrence networks are explicitly not included in the broader Cooccurrence category.
- Metabolic networks model metabolic pathways.
- Miscellaneous networks are any networks that do not fit into one of the other categories.
- Online Contact networks consist of people and interactions between them. Contact networks are unipartite and allow multiple edges, i.e., there can always be multiple interactions between the same two persons. They can be both directed or undirected. Examples are people that meet each other (RM), or scientists that write a paper together (Pc).
- Physical networks represent physically existing network structures in the broadest sense. This category covers such diverse data as physical computer networks (TO), transport networks (OF) and biological food networks (FD).
- Rating networks consist of assessments given to items by users, weighted by a rating value. Rating networks are bipartite. Networks in which users can rate other users are not included here, but in the Social category instead. If only a single type of rating is possible, for instance the “favorite” relationship, then rating networks are unweighted. Examples of items that are rated are movies (M3), songs (YS), jokes (JE), and even sexual escorts (SX).
- Online social networks represent ties between persons in online social networking platforms. Certain social networks allow negative edges, which denote enmity, distrust or dislike. Examples are Facebook friendships (FSG), the Twitter follower relationship (TF), and friends and foes on Slashdot (SZ). Note that some social networks can be argued to be rating networks, for instance the user–user rating network of a dating site (LI). These networks are all included in the Social category.
- Software networks are networks of interacting software component. Node can be software packages connected by their dependencies, source files connected by includes, and classes connected by imports.
- Text networks consist of text documents containing words. They are bipartite and their nodes are documents and words. Each edge represents the occurrence of a word in a document. Document types are for instance newspaper articles (TR) and Wikipedia articles (EX).
- Trophic networks consist of biological species connected by edges denotes which pairs of species are subject to carbon exchange, i.e., which species eats which. The term food chain describes such relation ships, but note that in the general case, a trophic network is not a chain, i.e., it is not linear. Trophic networks are directed.
Note that the category system of KONECT is in flux. As networks are added to the collection, large categories are split into smaller ones.
We do not include certain kinds of networks that lack a complex structure. This includes networks without a giant connected component, in which most nodes are not reachable from each other, and trees, in which there is only a single path between any two nodes. Note that bipartite relationships extracted from n-to-1 relationships are therefore excluded, as they lead to a disjoint network. For instance, a bipartite person–city network containing was-born-in edges would not be included, as each city would form its own component disconnected from the rest of the network. On the other hand, a band–country network where edges denote the country of origin of individual band members is included, as members of a single band can have different countries of origin. In fact the Countries network CN is of this form. Another example is a bipartite song–genre network, which would only be included in KONECT when songs can have multiple genres. As an example of the lack of complex structure when only a single genre is allowed, the degree distribution in such a song–genre network is skewed because all song nodes have degree one, the diameter cannot be computed since the network is disconnected, and each connected component trivially has a diameter of two or less.