dhlab class demo#

import dhlab as dh

Corpus#

dh.Corpus??
Init signature:
dh.Corpus(
    doctype=None,
    author=None,
    freetext=None,
    fulltext=None,
    from_year=None,
    to_year=None,
    from_timestamp=None,
    to_timestamp=None,
    title=None,
    ddk=None,
    subject=None,
    lang=None,
    limit=10,
    order_by='random',
)
korpus = dh.Corpus(doctype="digibok", title="Dracula")
korpus.frame.iloc[:5, [0,1,2,3,9]]
dhlabid urn title authors year
0 100163812 URN:NBN:no-nb_digibok_2013062438048 Dracula : et dansedrama i tre akter basert på ... Levin , Mona / Stoker , Bram 2002
1 100346414 URN:NBN:no-nb_digibok_2017091805047 Dracula MacDonald , Eric / Stoker , Bram 1983
2 100462841 URN:NBN:no-nb_digibok_2008090904123 Dracula Fletcher-Watson , Jo / Kolstad , Henning / Bac... 1998
3 100547952 URN:NBN:no-nb_digibok_2011071108102 Dracula Stoker , Bram / Carling , Bjørn 2006
4 100276465 URN:NBN:no-nb_digibok_2016011248064 Dracula Mucci , Michael / Valgermo , Finn / Stoker , B... 2009

Conkordans#

dh.Concordance??
dh.Concordance(corpus=None, query=None, window=20, limit=500)
korpus.conc("Dracula").show()
  link concordance
97 URN:NBN:no-nb_digibok_2008090904123 DRACULA
410 URN:NBN:no-nb_digibok_2011071108102 DRACULA
107 URN:NBN:no-nb_digibok_2008090904123 Jakten Morris og Seward fulgte etter Dracula ved å ri langs bredden av Bistritza . De red opp til Borgopasset...
49 URN:NBN:no-nb_digibok_2013013108024 Bram Stoker Dracula
120 URN:NBN:no-nb_digibok_2014071506008 ... hans egen jord ! Det var så sannelig en Dracula . Det var han hvis egen uverdige bror , etter...
414 URN:NBN:no-nb_digibok_2011071108102 DRACULA
109 URN:NBN:no-nb_digibok_2008090904123 ... ^ j | ^ 9 9 j stearinlys representerer Dracula var The Vampyre ( 1819 ) , en / /...
24 URN:NBN:no-nb_digibok_2010020103031 K Dracula ... ;
94 URN:NBN:no-nb_digibok_2008090904123 Vlad Dracula
119 URN:NBN:no-nb_digibok_2014071506008 I mellomtiden må jeg prøve å finne ut alt jeg kan om grev Dracula , for det kan være nyttig...

Frekvens#

dh.Counts??
dh.Counts(corpus=None, words=None)
korpus.count().display_names()
Dracula Dracula : fritt etter Bram Stokers roman Bram Stoker's Dracula Dracula Dracula Dracula : et dansedrama i tre akter basert på Bram Stokers roman Dracula Dracula Dracula
. 646.0 3268.0 578.0 8495.0 8447.0 99.0 8459.0 127.0 832.0
, 500.0 1384.0 368.0 9133.0 9659.0 144.0 9636.0 80.0 678.0
og 288.0 524.0 147.0 6206.0 6350.0 110.0 6326.0 0.0 261.0
i 265.0 449.0 250.0 2137.0 3092.0 77.0 3066.0 45.0 187.0
^ 154.0 0.0 189.0 0.0 2.0 0.0 0.0 48.0 1.0
... ... ... ... ... ... ... ... ... ...
forts. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 10.0
Nu 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.0
Pause 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.0
Ton 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 17.0
onathan 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 25.0

22582 rows × 9 columns

#

from dhlab import totals
tot = totals()
tot.freq
.               7655423257
,               5052171514
i               2531262027
og              2520268056
-               1314451583
                   ...    
tidspunkter         110667
dirigenter          110660
ondartet            110652
kulturtilbud        110652
trassig             110651
Name: freq, Length: 50000, dtype: int64
(korpus.coll("Dracula").frame.counts / tot.freq).sort_values(ascending=False).to_frame().head(10)
0
Dracula 0.000289
grev 0.000093
Grev 0.000059
Jonathan 0.000046
tyrkerne 0.000030
uverdige 0.000026
vedlagte 0.000024
hungersnød 0.000024
Helsing 0.000023
Mina 0.000021

Ngram#

??dh.Ngram
dh.Ngram(
    words=None,
    from_year=None,
    to_year=None,
    doctype='bok',
    mode='relative',
    lang='nob',
    **kwargs,
)
dh.Ngram(["Dracula", "Frankenstein"], from_year=1880, to_year=2020)
_images/807620c6921ff39a300f8eaa65cc86e840d810f4b648cc0e36b8a9c5bbe91416.png