dhlab class demo#
import dhlab as dh
Corpus#
dh.Corpus??
Init signature:
dh.Corpus(
doctype=None,
author=None,
freetext=None,
fulltext=None,
from_year=None,
to_year=None,
from_timestamp=None,
to_timestamp=None,
title=None,
ddk=None,
subject=None,
lang=None,
limit=10,
order_by='random',
)
korpus = dh.Corpus(doctype="digibok", title="Dracula")
korpus.frame.iloc[:5, [0,1,2,3,9]]
dhlabid | urn | title | authors | year | |
---|---|---|---|---|---|
0 | 100384102 | URN:NBN:no-nb_digibok_2018121368009 | Dracula | Arnesson , Malin / Stoker , Bram / Store , Gur... | 2004 |
1 | 100219857 | URN:NBN:no-nb_digibok_2014071506008 | Dracula | Stoker , Bram / Carling , Bjørn | 1980 |
2 | 100569450 | URN:NBN:no-nb_digibok_2010020103031 | Bram Stoker ' s Dracula | Mignola , Mike / Thomas , Roy / Stoker , Bram | 1993 |
3 | 100540112 | URN:NBN:no-nb_digibok_2011052408015 | Dracula | Stoker , Bram / Carling , Bjørn | 2004 |
4 | 100359518 | URN:NBN:no-nb_digibok_2017122268008 | Dracula : av Lars Saabye Christensen : fritt e... | Christensen , Lars Saabye / Stoker , Bram | 2000 |
Conkordans#
dh.Concordance??
dh.Concordance(corpus=None, query=None, window=20, limit=500)
korpus.conc("Dracula").show()
link | concordance | |
---|---|---|
202 | URN:NBN:no-nb_digibok_2011071108102 | DRACULA |
146 | URN:NBN:no-nb_digibok_2014071506008 | « Grev Dracula ? » |
332 | URN:NBN:no-nb_digibok_2011071108102 | DRACULA |
300 | URN:NBN:no-nb_digibok_2013062438048 | ... Lucy blir et lett bytte for Dracula som gjør henne til sitt første offer etter ankomsten til Whitby , og... |
155 | URN:NBN:no-nb_digibok_2011071108102 | DRACULA |
132 | URN:NBN:no-nb_digibok_2011071108102 | DRACULA |
203 | URN:NBN:no-nb_digibok_2011071108102 | DRACULA |
274 | URN:NBN:no-nb_digibok_2011071108102 | DRACULA |
72 | URN:NBN:no-nb_digibok_2011071108102 | DRACULA |
52 | URN:NBN:no-nb_digibok_2013062438048 | ... Ringeren i Notre Dame , Gi sel le , Dracula og A Christmas Carol for Northern Ballet Theatre . Som... |
Frekvens#
dh.Counts??
dh.Counts(corpus=None, words=None)
korpus.count().display_names()
Dracula : fritt etter Bram Stokers roman | Bram Stoker ' s Dracula | Dracula | Dracula | Dracula : et dansedrama i tre akter basert på Bram Stokers roman | Dracula | Dracula | Dracula | |
---|---|---|---|---|---|---|---|---|
. | 3268.0 | 578.0 | 8099.0 | 8495.0 | 99.0 | 8459.0 | 127.0 | 30.0 |
: | 1435.0 | 36.0 | 666.0 | 669.0 | 4.0 | 666.0 | 7.0 | 0.0 |
, | 1384.0 | 368.0 | 9574.0 | 9133.0 | 144.0 | 9636.0 | 80.0 | 23.0 |
er | 751.0 | 171.0 | 2423.0 | 2428.0 | 26.0 | 2438.0 | 0.0 | 3.0 |
? | 646.0 | 135.0 | 519.0 | 514.0 | 1.0 | 522.0 | 41.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
Elvestad | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 |
Halgeir | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 |
Heien | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 |
Sandvær | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 |
utgangspunktet | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 |
20528 rows × 8 columns
#
from dhlab import totals
tot = totals()
tot.freq
. 7655423257
, 5052171514
i 2531262027
og 2520268056
- 1314451583
...
tidspunkter 110667
dirigenter 110660
ondartet 110652
kulturtilbud 110652
trassig 110651
Name: freq, Length: 50000, dtype: int64
(korpus.coll("Dracula").frame.counts / tot.freq).sort_values(ascending=False).to_frame().head(10)
0 | |
---|---|
Dracula | 0.000198 |
grev | 0.000055 |
Grev | 0.000046 |
uverdige | 0.000026 |
vedlagte | 0.000024 |
hungersnød | 0.000024 |
tyrkerne | 0.000022 |
bukket | 0.000018 |
adressert | 0.000018 |
Hopkins | 0.000018 |
Ngram#
??dh.Ngram
dh.Ngram(
words=None,
from_year=None,
to_year=None,
doctype='bok',
mode='relative',
lang='nob',
**kwargs,
)
dh.Ngram(["Dracula", "Frankenstein"], from_year=1880, to_year=2020)