bedwere wrote: ↑Tue Apr 23, 2019 4:42 pm
It seems to work, although you might want to give more detailed instruction if you want more people to test it.
Thank you for the test. Do you use my other dictionaries, such as
Dumesnil or
Shumway? Don't you think they will profit from being converted into this format?
bedwere wrote: ↑Tue Apr 23, 2019 4:42 pm
Given a dictionary pdf file downloaded from Google or archive.org, could you tell me step by step how to build a dictionary?
In theory it can be done manually, but generally it requires programming because of big amount of data. Two simple steps:
- make a list of keywords: you need pairs "keyword - page";
- convert them into XDXF or StarDict format.
Step 1 is the most laborious. Working on Popma's dictionary I used service
OCR.Space because it intentionally marks beginnings of the pages. Before I tried to recognize page numbers from text, but it is very unreliable. Then I used a regexp to catch headwords of the articles. Finally, manually checking produced index, better by several persons independently.
Step 2. I used XDXF because I worked with it before and know it well (and thought about transcribing full text in a distant future), however I am using here undocumented formatting which works in GoldenDict but could be incompatible with other dictionary shells. The most correct choice would be StarDict format. To embbed images, text should contain:
Code: Select all
<img src="/res/000093.png" height="100px"/>
Images have to be stored into the folder
"res" located next to the dicitonary file (res.zip is not supported). Attribute "height" makes kind of thumb-preview, then user should double-click on them to see full image.
My addition to this scheme is showing full images on hover, that is why I had to write
`article-style.css`. My format is following:
Code: Select all
<img src="/res/000093.png" height="100px" alt="preview"/>
<span class="full"><img src="/res/000093.png"/></span>
Here
<span> with full image is hidden by default and CSS rule makes it visible on hover (
display: none/unset)
Example of XDXF dictionary with 1 record:
https://www.dropbox.com/s/f4y0hux6y1pgc ... w.zip?dl=0 .
Step 3*. Keywords should be normalized. It is optional, but improves experience of work with your dictionary. For example, Popma uses ijv-spelling, whereas Latin hunspell dictionary used in GoldenDict comes with iuv-spelling. Verbs should be in a form of
present active indicative 1st sg, so "ACCVMBERE" need to be changed to "accumbo" and so on.
Two verses he could recollect // Of the Æneid, but incorrect.