← Blog

Five Collections: Where the Proverbs Come From

2026-06-24

The verba corpus is a digital curation project. Rather than collecting proverbs from scratch, it acts as a unified paremiological registry that aggregates and standardizes the work of generations of folklorists.

Our database incorporates five major compilations published over the past 180 years. Each represents its historical era, regional dialects, and the typographic standards of its time.

Let's look at each source collection in detail:


1. Hryhoriy Ilkevych — "Galician Proverbs and Riddles" (1841)

An example from this collection: Proverb №48787«Ѣхала Хима з Єрусалима, во̂зок скрегоче, Хима ся регоче.» (translated: «Chyma traveled from Jerusalem, the cart squeaks, Chyma laughs.»).

Proverb Card


2. Matviy Nomys — "Ukrainian Sayings, Proverbs, and the Like" (1864)

An example from this collection: Proverb №1«По парі пізнати, чим серце кипить.» (translated: «By matching pair, one knows what boils in the heart.»).

Proverb Card


3. Ivan Franko — "Galician-Ruthenian Folk Proverbs" (1901–1910)

An example with Franko's notes: Proverb №126«"А ви з віхті?" - "А здуло би ті!"» (Franko notes this as a wordplay mocking the dialectal "z vikhti" meaning "from where", met with a playful curse). Another example: Proverb №5000«Верхове галузя вітри ломлят.» (modern standard spelling: «Верхове гілля вітри ломлять.», meaning «Winds break the topmost branches first.»).

Proverb Card


4. V. Bobkova (ed. M. Rylsky) — "Ukrainian Folk Proverbs and Sayings" (1961)


5. Valeriy Mlodzynskyi — "Practical Russian-Ukrainian Dictionary of Proverbs" (2009)


OCR and AI Alignment Limitations

Digitizing historical literature is a complex technical process. When researching the verba corpus, please keep these data limitations in mind: 1. OCR Accuracy: Scans of 19th-century publications were processed using OCR (Tesseract). For the oldest texts (such as Nomys 1864), the character recognition accuracy is estimated at 75–80%. Occasional typos or scanning artifacts may appear in the original text column. 2. AI Modernization & Tags: To facilitate search, modern spellings and category classifications were generated using Large Language Models (LLMs). This automated tagging has an estimated accuracy of 95% for modern spelling conversions and 85% for thematic tagging. The first (primary) theme in the list is always the most accurate.