Machine-readable Greek lexicon w/variant forms?

Here you can discuss all things Ancient Greek. Use this board to ask questions about grammar, discuss learning strategies, get help with a difficult passage of Greek, and more.
Post Reply
User avatar
Peitho
Textkit Neophyte
Posts: 16
Joined: Fri Jun 03, 2016 12:36 am

Machine-readable Greek lexicon w/variant forms?

Post by Peitho »

As part of my effort to memorize Greek words, including all the different moods/tenses/cases/etc. (see here: http://www.textkit.com/greek-latin-foru ... =2&t=65103 ), I’m trying to program some kind of flashcard thingy for my computer.

I’ve found this GitHub project extremely helpful: https://github.com/gcelano/LSJ_GreekUnicode It’s the entire Liddell-Scott-Jones Greek lexicon, in TEI XML format. It’s great to have the LSJ as something machine-readable, but I’d like to go a step further: where can I find machine-readable data that points to each variant form of a Greek word? (Or, at least, what the variant forms look like. It’s okay if standard variants are inherited via word stems or something, so long as nonstandard variants are specified.) The Perseus word tools seem to have this capability, so this kind of data must exist in the wild somewhere. Where can I find it? Thanks!

User avatar
jeidsath
Textkit Zealot
Posts: 5332
Joined: Mon Dec 30, 2013 2:42 pm
Location: Γαλεήπολις, Οὐισκόνσιν

Re: Machine-readable Greek lexicon w/variant forms?

Post by jeidsath »

You may want to take a look at the Perseus Morpheus tool: https://github.com/PerseusDL/morpheus

Morpheus is old C code, and the program has a number of limitations and errors, but may be the best bet for anything free and comprehensive.

There is also www.lexigram.gr

It's actually not easy to scrape the above. They have an offline version, distributed as a Windows binary. I was able to download it and run it (but only on the Greek version of Windows). But I haven't yet taken the time to figure out how to query the local databases programmatically, or how exactly their lemma database works.

The user here "opoudjis" has written some (all?) of the TLG's parsing code, which is much better than the Perseus tools, and may have some comments. I imagine that their lemma database would have exactly what you are asking for, but you'd need some sort of sharing permission from them to get it.

I wrote some fairly detailed code for generating the verb forms a few months ago, but it's not yet in a finished enough state for me to make public.
“One might get one’s Greek from the very lips of Homer and Plato." "In which case they would certainly plough you for the Little-go. The German scholars have improved Greek so much.”

Joel Eidsath -- jeidsath@gmail.com

User avatar
opoudjis
Textkit Member
Posts: 116
Joined: Tue Oct 03, 2017 2:54 am

Re: Machine-readable Greek lexicon w/variant forms?

Post by opoudjis »

The TLG parser is closed source, and has not been shared with anyone; now that I no longer work at the TLG, I don't have access to it either. But it is the Morpheus code (and oh, what horrid code it was), plus 15 years of debugging, enhancement, prioritisation of competing analyses, validation against texts, digitisation and manual entry of lemmata, manual conflation of lemmata, and manual additions to inflection. In raw wordform count, I took recognition up from 72% to 98% of the TLG corpus. (And the remaining 2% is long tail stuff: proper names, geometric lines, and relatively few unrecorded lemmata.)

As of 2003, Perseus' morpheus was Not. Good. At. All; there were some real breakages there. (And oh, what horrid code it was.) Lots of variants of Morpheus have been improved on and open sourced since; I haven't evaluated them.

There is work on a Classics NLTK going on: http://cltk.org . Greg Crane, whose C code morpheus is (and oh, what horrid code it was), is advisor to them. They would be the likely place to coordinate any open source work.

User avatar
Peitho
Textkit Neophyte
Posts: 16
Joined: Fri Jun 03, 2016 12:36 am

Re: Machine-readable Greek lexicon w/variant forms?

Post by Peitho »

opoudjis wrote:There is work on a Classics NLTK going on: http://cltk.org . Greg Crane, whose C code morpheus is (and oh, what horrid code it was), is advisor to them. They would be the likely place to coordinate any open source work.
Thanks! I checked it out and it looks amazing! http://docs.cltk.org/en/latest/greek.html Hopefully it will work on my system, I will give it a try.

I’m curious if Perseus itself has any well-documented API, so I would be able to make queries without having the whole thing installed on my machine. I’ve been trying to learn Emacs lately, hoping I could maybe add some functionality to do Perseus-like things (read ancient Greek texts, look up words, follow citations, etc.), so if I could query Perseus easily that would be excellent. I will try CLTK first, though!

Post Reply