dhlab.text.conc_coll

Module Contents

Classes

Concordance

Wrapper for concordance function

Collocations

Collocations

Counts

Provide counts for a corpus - shouldn’t be too large

WordConc

ConcCounts

Functions

API

dhlab.text.conc_coll.find_hits(x)
class dhlab.text.conc_coll.Concordance(corpus: dhlab.text.corpus.Corpus | None = None, query: str | None = None, window: int = 20, limit: int = 500)

Bases: dhlab.text.dhlab_object.DhlabObj

Wrapper for concordance function

Initialization

Get concordances for word(s) in corpus

Parameters:
  • corpus – Target corpus, defaults to None

  • query – word or list or words, defaults to None

  • window – how many tokens to consider around the target word, defaults to 20

  • limit – limit returned hits, defaults to 500

show(n: int = 10, style: bool = True)
classmethod from_df(df: pandas.DataFrame)

Typecast DataFrame to Concordance

class dhlab.text.conc_coll.Collocations(corpus: dhlab.text.corpus.Corpus | None = None, words: str | list[str] | None = None, before: int = 10, after: int = 10, reference: pandas.DataFrame | None = None, samplesize: int = 20000, alpha: bool = False, ignore_caps: bool = False)

Bases: dhlab.text.dhlab_object.DhlabObj

Collocations

Initialization

Create collocations object

Parameters:
  • corpus (dh.Corpus, optional) – target corpus, defaults to None

  • words (str or list, optional) – target words(s), defaults to None

  • before (int, optional) – words to include before, defaults to 10

  • after (int, optional) – words to include after, defaults to 10

  • reference (pd.DataFrame, optional) – reference frequency list, defaults to None

  • samplesize (int, optional) – description, defaults to 20000

  • alpha (bool, optional) – Only include alphabetical tokens, defaults to False

  • ignore_caps (bool, optional) – Ignore capitalized letters, defaults to False

show(sortby: str = 'counts', n: int = 20)
keywordlist(top: int = 200, counts: int = 5, relevance: float = 10)
classmethod from_df(df: pandas.DataFrame)

Typecast DataFrame to Collocation

Parameters:

df – DataFrame

Returns:

Collocation

class dhlab.text.conc_coll.Counts(corpus: dhlab.text.corpus.Corpus | None = None, words: list[str] | None = None, cutoff: int = 0, sparse: bool = True)

Bases: dhlab.text.dhlab_object.DhlabObj

Provide counts for a corpus - shouldn’t be too large

Initialization

Get frequency list for Corpus

Parameters:
  • corpus – target Corpus, defaults to None

  • words – list of words to be counted, defaults to None

  • cutoff – frequency cutoff, will not include words with frequency < cutoff

  • sparse – return a sparse matrix for memory efficiency

is_sparse()

Function to report sparsity of counts frame

sum()

Summarize Corpus frequencies

Returns:

frequency list for Corpus

display_names()

Display data with record names as column titles.

display_rel_names()

Display relfreq data with record names as column titles.

classmethod from_df(df: pandas.DataFrame)
property counts

Legacy property for freq

class dhlab.text.conc_coll.WordConc(frame: pandas.DataFrame | None = None, urn: list | None = None, dhlabid: list | None = None, words: List[str] | None = None, before: int = 12, after: int = 12, limit: int = 100, samplesize: int = 50000)

Bases: dhlab.text.dhlab_object.DhlabObj

class dhlab.text.conc_coll.ConcCounts(frame: pandas.DataFrame | None = None, urn: list | None = None, dhlabid: list | None = None, words: str | None = None, window: int = 25, limit: int = 100)

Bases: dhlab.text.dhlab_object.DhlabObj