dhlab.text.conc_coll

Module Contents

Classes

Concordance

Wrapper for concordance function

Collocations

Collocations

Counts

Provide counts for a corpus - shouldn’t be too large

WordConc

ConcCounts

Functions

API

dhlab.text.conc_coll.find_hits(x)
class dhlab.text.conc_coll.Concordance(corpus=None, query=None, window=20, limit=500)

Bases: dhlab.text.dhlab_object.DhlabObj

Wrapper for concordance function

Initialization

Get concordances for word(s) in corpus

Parameters:
  • corpus – Target corpus, defaults to None

  • query – word or list or words, defaults to None

  • window – how many tokens to consider around the target word, defaults to 20

  • limit – limit returned hits, defaults to 500

show(n=10, style=True)
classmethod from_df(df)

Typecast DataFrame to Concordance

class dhlab.text.conc_coll.Collocations(corpus=None, words=None, before=10, after=10, reference=None, samplesize=20000, alpha=False, ignore_caps=False)

Bases: dhlab.text.dhlab_object.DhlabObj

Collocations

Initialization

Create collocations object

Parameters:
  • corpus (dh.Corpus, optional) – target corpus, defaults to None

  • words (str or list, optional) – target words(s), defaults to None

  • before (int, optional) – words to include before, defaults to 10

  • after (int, optional) – words to include after, defaults to 10

  • reference (pd.DataFrame, optional) – reference frequency list, defaults to None

  • samplesize (int, optional) – description, defaults to 20000

  • alpha (bool, optional) – Only include alphabetical tokens, defaults to False

  • ignore_caps (bool, optional) – Ignore capitalized letters, defaults to False

show(sortby='counts', n=20)
keywordlist(top=200, counts=5, relevance=10)
classmethod from_df(df)

Typecast DataFrame to Collocation

Parameters:

df – DataFrame

Returns:

Collocation

class dhlab.text.conc_coll.Counts(corpus=None, words=None, cutoff=0, sparse=True)

Bases: dhlab.text.dhlab_object.DhlabObj

Provide counts for a corpus - shouldn’t be too large

Initialization

Get frequency list for Corpus

Parameters:
  • corpus – target Corpus, defaults to None

  • words – list of words to be counted, defaults to None

  • cutoff – frequency cutoff, will not include words with frequency < cutoff

  • sparse – return a sparse matrix for memory efficiency

is_sparse()

Function to report sparsity of counts frame

sum()

Summarize Corpus frequencies

Returns:

frequency list for Corpus

display_names()

Display data with record names as column titles.

display_rel_names()

Display relfreq data with record names as column titles.

classmethod from_df(df)
property counts

Legacy property for freq

class dhlab.text.conc_coll.WordConc(frame: pandas.DataFrame = None, urn: list = None, dhlabid: list = None, words: List[str] = None, before: int = 12, after: int = 12, limit: int = 100, samplesize: int = 50000)

Bases: dhlab.text.dhlab_object.DhlabObj

class dhlab.text.conc_coll.ConcCounts(frame: pandas.DataFrame = None, urn: list = None, dhlabid: list = None, words: str = None, window: int = 25, limit: int = 100)

Bases: dhlab.text.dhlab_object.DhlabObj