dhlab.text.conc_coll
¶
Module Contents¶
Classes¶
Wrapper for concordance function |
|
Collocations |
|
Provide counts for a corpus - shouldn’t be too large |
|
Functions¶
API¶
- dhlab.text.conc_coll.make_link(row)¶
- dhlab.text.conc_coll.find_hits(x)¶
- class dhlab.text.conc_coll.Concordance(corpus=None, query=None, window=20, limit=500)¶
Bases:
dhlab.text.dhlab_object.DhlabObj
Wrapper for concordance function
Initialization
Get concordances for word(s) in corpus
- Parameters:
corpus – Target corpus, defaults to None
query – word or list or words, defaults to None
window – how many tokens to consider around the target word, defaults to 20
limit – limit returned hits, defaults to 500
- show(n=10, style=True)¶
- classmethod from_df(df)¶
Typecast DataFrame to Concordance
- class dhlab.text.conc_coll.Collocations(corpus=None, words=None, before=10, after=10, reference=None, samplesize=20000, alpha=False, ignore_caps=False)¶
Bases:
dhlab.text.dhlab_object.DhlabObj
Collocations
Initialization
Create collocations object
- Parameters:
corpus (dh.Corpus, optional) – target corpus, defaults to None
words (str or list, optional) – target words(s), defaults to None
before (int, optional) – words to include before, defaults to 10
after (int, optional) – words to include after, defaults to 10
reference (pd.DataFrame, optional) – reference frequency list, defaults to None
samplesize (int, optional) – description, defaults to 20000
alpha (bool, optional) – Only include alphabetical tokens, defaults to False
ignore_caps (bool, optional) – Ignore capitalized letters, defaults to False
- show(sortby='counts', n=20)¶
- keywordlist(top=200, counts=5, relevance=10)¶
- classmethod from_df(df)¶
Typecast DataFrame to Collocation
- Parameters:
df – DataFrame
- Returns:
Collocation
- class dhlab.text.conc_coll.Counts(corpus=None, words=None, cutoff=0, sparse=True)¶
Bases:
dhlab.text.dhlab_object.DhlabObj
Provide counts for a corpus - shouldn’t be too large
Initialization
Get frequency list for Corpus
- Parameters:
corpus – target Corpus, defaults to None
words – list of words to be counted, defaults to None
cutoff – frequency cutoff, will not include words with frequency < cutoff
sparse – return a sparse matrix for memory efficiency
- is_sparse()¶
Function to report sparsity of counts frame
- sum()¶
Summarize Corpus frequencies
- Returns:
frequency list for Corpus
- display_names()¶
Display data with record names as column titles.
- display_rel_names()¶
Display relfreq data with record names as column titles.
- classmethod from_df(df)¶
- property counts¶
Legacy property for freq
- class dhlab.text.conc_coll.WordConc(frame: pandas.DataFrame = None, urn: list = None, dhlabid: list = None, words: List[str] = None, before: int = 12, after: int = 12, limit: int = 100, samplesize: int = 50000)¶
- class dhlab.text.conc_coll.ConcCounts(frame: pandas.DataFrame = None, urn: list = None, dhlabid: list = None, words: str = None, window: int = 25, limit: int = 100)¶