Prosodic
Prosodic is a metrical-phonological parser written in Python. Currently, it can parse English and Finnish text, but adding additional languages is easy with a pronunciation dictionary or a custom python function. Prosodic was built by Ryan Heuser, Josh Falk, and Arto Anttila. Josh also maintains another repository, in which he has rewritten the part of this project that does phonetic transcription for English and Finnish. Sam Bowman has contributed to the codebase as well, adding several new metrical constraints.
This version, Prosodic 2.x, is a near-total rewrite of the original Prosodic.
Supports Python>=3.9.
Demo
You can view and use a web app demo of the current Prosodic app at prosodic.dev.
Install
1. Install python package
Install from pypi:
pip install prosodic
2. Install espeak
Install espeak, free text-to-speak (TTS) software, to ‘sound out’ unknown words.
Usage
Web app
Prosodic has a new GUI (graphical user interface) in a web app. After installing, run:
prosodic web
Then navigate to http://127.0.0.1:8181/. It should look like this:
Python
Read texts
# import prosodic
import prosodic
# load a text
= prosodic.Text("""
sonnet Those hours, that with gentle work did frame
The lovely gaze where every eye doth dwell,
Will play the tyrants to the very same
And that unfair which fairly doth excel;
For never-resting time leads summer on
To hideous winter, and confounds him there;
Sap checked with frost, and lusty leaves quite gone,
Beauty o’er-snowed and bareness every where:
Then were not summer’s distillation left,
A liquid prisoner pent in walls of glass,
Beauty’s effect with beauty were bereft,
Nor it, nor no remembrance what it was:
But flowers distill’d, though they with winter meet,
Leese but their show; their substance still lives sweet.
""")
# can also load by filename
= prosodic.Text(fn='corpora/corppoetry_en/en.shakespeare.txt') shaksonnets
Stanzas, lines, words, syllables, phonemes
Texts in prosodic are organized into a tree structure. The .children
of a Text
object is a list of Stanza
’s, whose .parent
objects point back to the Text
. In turn, in each stanza’s .children
is a list of Line
’s, whose .parent
’s point back to the stanza; so on down the tree.
# Take a peek at this tree structure
# and the features particular entities have
=30, incl_phons=True) sonnet.show(maxlines
Text()
| Stanza(num=1)
| Line(num=1, txt='Those hours, that with gentle work did frame')
| WordToken(num=1, txt='Those', sent_num=1, sentpart_num=1)
| WordType(num=1, txt='Those', lang='en', num_forms=1)
| WordForm(num=1, txt='Those')
| Syllable(ipa='ðoʊz', num=1, txt='Those', is_stressed=False, is_heavy=True)
| Phoneme(num=1, txt='ð', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
| Phoneme(num=3, txt='o', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=1, round=1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
| Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
| Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
| WordToken(num=2, txt=' hours', sent_num=1, sentpart_num=1)
| WordType(num=1, txt='hours', lang='en', num_forms=2)
| WordForm(num=1, txt='hours')
| Syllable(ipa="'aʊ", num=1, txt='ho', is_stressed=True, is_heavy=True, is_strong=True, is_weak=False)
| Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
| Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
| Syllable(ipa='ɛːz', num=2, txt='urs', is_stressed=False, is_heavy=True, is_strong=False, is_weak=True)
| Phoneme(num=2, txt='ɛː', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=-1, long=1, hitone=0, hireg=0)
| Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
| WordForm(num=2, txt='hours')
| Syllable(ipa="'aʊrz", num=1, txt='hours', is_stressed=True, is_heavy=True)
| Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
| Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
| Phoneme(num=4, txt='r', syl=-1, son=1, cons=1, cont=1, delrel=0, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=0, lo=0, back=0, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
| Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
| WordToken(num=3, txt=',', sent_num=1, sentpart_num=1)
| WordType(num=1, txt=',', lang='en', num_forms=0, is_punc=True)
| WordToken(num=4, txt=' that', sent_num=1, sentpart_num=1)
| WordType(num=1, txt='that', lang='en', num_forms=3)
# take a peek at it in dataframe form
# by-syllable dataframe representation
sonnet.df # ...which will also be shown when text object displayed (in a notebook) sonnet
word_num_forms | syll_is_stressed | syll_is_heavy | syll_is_strong | syll_is_weak | word_is_punc | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
stanza_num | line_num | line_txt | sent_num | sentpart_num | wordtoken_num | wordtoken_txt | word_lang | wordform_num | syll_num | syll_txt | syll_ipa | ||||||
1 | 1 | Those hours, that with gentle work did frame | 1 | 1 | 1 | Those | en | 1 | 1 | Those | ðoʊz | 1 | 0 | 1 | |||
2 | hours | en | 1 | 1 | ho | ’aʊ | 2 | 1 | 1 | 1 | 0 | ||||||
2 | urs | ɛːz | 2 | 0 | 1 | 0 | 1 | ||||||||||
2 | 1 | hours | ’aʊrz | 2 | 1 | 1 | |||||||||||
3 | , | en | 0 | 0 | 0 | 1 | |||||||||||
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | |
14 | Leese but their show; their substance still lives sweet. | 1 | 1 | 7 | substance | en | 1 | 2 | tance | stəns | 1 | 0 | 1 | 0 | 1 | ||
8 | still | en | 1 | 1 | still | ’stɪl | 1 | 1 | 1 | ||||||||
9 | lives | en | 1 | 1 | lives | ’lɪvz | 1 | 1 | 1 | ||||||||
10 | sweet | en | 1 | 1 | sweet | ’swiːt | 1 | 1 | 1 | ||||||||
11 | . | en | 0 | 0 | 0 | 1 |
195 rows × 6 columns