dhlab class demo

dhlab class demo#

import dhlab as dh

Corpus#

dh.Corpus??
Init signature:
dh.Corpus(
    doctype=None,
    author=None,
    freetext=None,
    fulltext=None,
    from_year=None,
    to_year=None,
    from_timestamp=None,
    to_timestamp=None,
    title=None,
    ddk=None,
    subject=None,
    lang=None,
    limit=10,
    order_by='random',
)
korpus = dh.Corpus(doctype="digibok", title="Dracula")
korpus.frame.iloc[:5, [0,1,2,3,9]]
dhlabid urn title authors year
0 100384102 URN:NBN:no-nb_digibok_2018121368009 Dracula Arnesson , Malin / Stoker , Bram / Store , Gur... 2004
1 100219857 URN:NBN:no-nb_digibok_2014071506008 Dracula Stoker , Bram / Carling , Bjørn 1980
2 100569450 URN:NBN:no-nb_digibok_2010020103031 Bram Stoker ' s Dracula Mignola , Mike / Thomas , Roy / Stoker , Bram 1993
3 100540112 URN:NBN:no-nb_digibok_2011052408015 Dracula Stoker , Bram / Carling , Bjørn 2004
4 100359518 URN:NBN:no-nb_digibok_2017122268008 Dracula : av Lars Saabye Christensen : fritt e... Christensen , Lars Saabye / Stoker , Bram 2000

Conkordans#

dh.Concordance??
dh.Concordance(corpus=None, query=None, window=20, limit=500)
korpus.conc("Dracula").show()
  link concordance
202 URN:NBN:no-nb_digibok_2011071108102 DRACULA
146 URN:NBN:no-nb_digibok_2014071506008 « Grev Dracula ? »
332 URN:NBN:no-nb_digibok_2011071108102 DRACULA
300 URN:NBN:no-nb_digibok_2013062438048 ... Lucy blir et lett bytte for Dracula som gjør henne til sitt første offer etter ankomsten til Whitby , og...
155 URN:NBN:no-nb_digibok_2011071108102 DRACULA
132 URN:NBN:no-nb_digibok_2011071108102 DRACULA
203 URN:NBN:no-nb_digibok_2011071108102 DRACULA
274 URN:NBN:no-nb_digibok_2011071108102 DRACULA
72 URN:NBN:no-nb_digibok_2011071108102 DRACULA
52 URN:NBN:no-nb_digibok_2013062438048 ... Ringeren i Notre Dame , Gi sel le , Dracula og A Christmas Carol for Northern Ballet Theatre . Som...

Frekvens#

dh.Counts??
dh.Counts(corpus=None, words=None)
korpus.count().display_names()
Dracula : fritt etter Bram Stokers roman Bram Stoker ' s Dracula Dracula Dracula Dracula : et dansedrama i tre akter basert på Bram Stokers roman Dracula Dracula Dracula
. 3268.0 578.0 8099.0 8495.0 99.0 8459.0 127.0 30.0
: 1435.0 36.0 666.0 669.0 4.0 666.0 7.0 0.0
, 1384.0 368.0 9574.0 9133.0 144.0 9636.0 80.0 23.0
er 751.0 171.0 2423.0 2428.0 26.0 2438.0 0.0 3.0
? 646.0 135.0 519.0 514.0 1.0 522.0 41.0 0.0
... ... ... ... ... ... ... ... ...
Elvestad 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0
Halgeir 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0
Heien 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0
Sandvær 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0
utgangspunktet 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0

20528 rows × 8 columns

#

from dhlab import totals
tot = totals()
tot.freq
.               7655423257
,               5052171514
i               2531262027
og              2520268056
-               1314451583
                   ...    
tidspunkter         110667
dirigenter          110660
ondartet            110652
kulturtilbud        110652
trassig             110651
Name: freq, Length: 50000, dtype: int64
(korpus.coll("Dracula").frame.counts / tot.freq).sort_values(ascending=False).to_frame().head(10)
0
Dracula 0.000198
grev 0.000055
Grev 0.000046
uverdige 0.000026
vedlagte 0.000024
hungersnød 0.000024
tyrkerne 0.000022
bukket 0.000018
adressert 0.000018
Hopkins 0.000018

Ngram#

??dh.Ngram
dh.Ngram(
    words=None,
    from_year=None,
    to_year=None,
    doctype='bok',
    mode='relative',
    lang='nob',
    **kwargs,
)
dh.Ngram(["Dracula", "Frankenstein"], from_year=1880, to_year=2020)
_images/6ac7ffbb3d4b83f1cc19db72a5388b89750eda6e33821f1863698ba7f6f4f2d0.png