Architecture

Texts

Contained in prosodic.text.

Reading texts

Loading by string

You can load any text with a string:

# import
import prosodic

sonnetV = prosodic.Text(
"""Those hours, that with gentle work did frame
The lovely gaze where every eye doth dwell,
Will play the tyrants to the very same
And that unfair which fairly doth excel;
For never-resting time leads summer on
To hideous winter, and confounds him there;
Sap checked with frost, and lusty leaves quite gone,
Beauty o’er-snowed and bareness every where:
Then were not summer’s distillation left,
A liquid prisoner pent in walls of glass,
Beauty’s effect with beauty were bereft,
Nor it, nor no remembrance what it was:
But flowers distill’d, though they with winter meet,
Leese but their show; their substance still lives sweet."""
)

Loading texts by filename

Can also read texts (especially larger ones) by filename:

import os

shakespeare_sonnets_filename = os.path.join(
  prosodic.PATH_REPO, 
  'corpora','corppoetry_en','en.shakespeare.txt'
)

# read a text by string
sonnets = prosodic.Text(filename=shakespeare_sonnets_filename)

Displaying texts

In a notebook environmnent, texts objects will display a by-syllable dataframe of the text structure it contains, stored at text.df

# these will display the same, but former actually points to the dataframe
sonnetV.df          
sonnetV
word_num_forms syll_is_stressed syll_is_heavy syll_is_strong syll_is_weak word_is_punc
stanza_num line_num line_txt word_lang wordtoken_num wordtoken_txt wordform_num syll_num syll_txt syll_ipa
1 1 Those hours, that with gentle work did frame en 1 Those 1 1 Those ðoʊz 1 0 1
2 hours 1 1 ho 'aʊ 2 1 1 1 0
2 urs ɛːz 2 0 1 0 1
2 1 hours 'aʊrz 2 1 1
3 , 0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
14 Leese but their show; their substance still lives sweet. en 7 substance 1 2 tance stəns 1 0 1 0 1
8 still 1 1 still 'stɪl 1 1 1
9 lives 1 1 lives 'lɪvz 1 1 1
10 sweet 1 1 sweet 'swiːt 1 1 1
11 . 0 0 0 1

195 rows × 6 columns

Stanzas

Accessing stanzas

Stanza separations are detected by two line breaks in the input text. You can access stanza objects through a text object:

assert len(sonnets.stanzas) == 154    # number of shakespeare sonnets

# can iterate over them simply by iterating over text object:
for stanza in sonnets:
  pass

# you can also reach stanzas by .stanza###
sonnets.stanza154.df
word_num_forms syll_is_stressed syll_is_heavy syll_is_strong syll_is_weak word_is_punc
stanza_num line_num line_txt word_lang wordtoken_num wordtoken_txt wordform_num syll_num syll_txt syll_ipa
154 1 The little Love-god lying once asleep en 1 The 1 1 The ðə 1 0 0
2 little 1 1 lit 'lɪ 1 1 0 1 0
2 tle təl 1 0 1 0 1
3 Love 1 1 Love 'lʌv 1 1 1
4 - 0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
14 Love's fire heats water, water cools not love. en 7 cools 1 1 cools 'kuːlz 1 1 1
8 not 1 1 not nɑt 2 0 1
2 1 not 'nɑt 2 1 1
9 love 1 1 love 'lʌv 1 1 1
10 . 0 0 0 1

183 rows × 6 columns

Displaying stanzas

By default, sonnets will display parsed:

sonnets.stanza154
  1. The little Love-god lying once asleep
  2. Laid by his side his heart-inflaming brand,
  3. Whilst many nymphs that vow'd chaste life to keep
  4. Came tripping by; but in her maiden hand
  5. The fairest votary took up that fire
  6. Which many legions of true hearts had warm'd;
  7. And so the general of hot desire
  8. Was sleeping by a virgin hand disarm'd.
  9. This brand she quenched in a cool well by,
  10. Which from Love's fire took heat perpetual,
  11. Growing a bath and healthful remedy
  12. For men diseased; but I, my mistress' thrall,
  13. Came there for cure, and this by that I prove,
  14. Love's fire heats water, water cools not love.

The red indicates violations and allows for nice comparison with other poems and parses. You can display the same thing on a text with text.render()

sonnetV.render()
  1. Those hours, that with gentle work did frame
  2. The lovely gaze where every eye doth dwell,
  3. Will play the tyrants to the very same
  4. And that unfair which fairly doth excel;
  5. For never-resting time leads summer on
  6. To hideous winter, and confounds him there;
  7. Sap checked with frost, and lusty leaves quite gone,
  8. Beauty o'er-snowed and bareness every where:
  9. Then were not summer's distillation left,
  10. A liquid prisoner pent in walls of glass,
  11. Beauty's effect with beauty were bereft,
  12. Nor it, nor no remembrance what it was:
  13. But flowers distill'd, though they with winter meet,
  14. Leese but their show; their substance still lives sweet.

Lines

Lines are important objects because (at present) they are the only objects actually considered as the unit of metrical parsing to the parser.

You can access them in a few ways:

# you can also reach them by line number
sonnetV.line14

# which are relative to the stanza
sonnets.stanza5.line14
word_num_forms syll_is_stressed syll_is_heavy word_is_punc syll_is_strong syll_is_weak
stanza_num line_num line_txt word_lang wordtoken_num wordtoken_txt wordform_num syll_num syll_txt syll_ipa
5 14 Leese but their show; their substance still lives sweet. en 1 Leese 1 1 Leese 'liːs 1 1 1
2 but 1 1 but bət 1 0 1
3 their 1 1 their ðɛr 2 0 1
2 1 their 'ðɛr 2 1 1
4 show 1 1 show 'ʃoʊ 1 1 1
... ... ... ... ... ... ... ... ... ... ... ...
7 substance 1 2 tance stəns 1 0 1 0 1
8 still 1 1 still 'stɪl 1 1 1
9 lives 1 1 lives 'lɪvz 1 1 1
10 sweet 1 1 sweet 'swiːt 1 1 1
11 . 0 0 0 1

14 rows × 6 columns

You can also create them directly:

line = prosodic.Line("A horse, a horse, my kingdom for a horse!")
line
word_num_forms syll_is_stressed syll_is_heavy word_is_punc syll_is_strong syll_is_weak
line_txt sent_num sentpart_num word_lang wordtoken_num wordtoken_txt wordform_num syll_num syll_txt syll_ipa
A horse, a horse, my kingdom for a horse! 1 1 en 1 A 1 1 A 1 0 1
2 horse 1 1 horse 'hɔːrs 1 1 1
3 , 0 0 0 1
4 a 1 1 a 1 0 1
5 horse 1 1 horse 'hɔːrs 1 1 1
... ... ... ... ... ... ... ... ... ... ... ...
8 kingdom 1 2 dom dəm 1 0 1 0 1
9 for 1 1 for fɔːr 1 0 1
10 a 1 1 a 1 0 1
11 horse 1 1 horse 'hɔːrs 1 1 1
12 ! 0 0 0 1

13 rows × 6 columns

Words

Tokens

Types

Forms

Syllables

Phonemes