Term definitions¶
- IDE
Integrated Development Environment. Read more on wikipedia.
- API
Application Programming Interface. Read more on wikipedia.
- N-gram
An n-gram is a sequence of linguistic units, typically words or characters.
Depending on the length n of the ‘gram’ sequence, we call them unigrams for single tokens, bigrams for sequences of two, trigrams for three, etc.
As we count the occurrences of these n-grams across a large body of text, called a corpus, we can view patterns of the rhetoric in that corpus. If we additionally can spread samples of the corpus over time, we can see how the use of language develops in that time frame.
Read more on wikipedia.
- Corpus
A collection of texts. The DH-lab Corpus is represented by a list of the publication metadata for each document in the collection.
- Concordance
A list of occurrences for a given word (or word sequence), with its immediate context.
- Collocation
For a given word, its collocations are a list of other, highly relevant words. This list is retrieved by comparing the frequency that the word occurs together with all other words in a given text selection, with the frequency that the word occurs with the same words “in general”, or in a wider reference corpus.