How can I extract text from image respecting the distance between words?
Mostra commenti meno recenti
I am trying to extract text from an image of printed text. Along with the text itself, I am interested in detecting the spaces between the words as well. The spaces between words are not consistent (deliberate) and that is something I want detected.
To achieved this I first had to extract the text lines. I achieved that using the projection profile code attached (code copied from one of the answers by ImageAnalyst). The attached image shows the output of this code.

One way I thought of achieving this was by counting the number of white pixels between the words, if I know the number of pixels taken by a single space (say n), I could just determine the number of spaces by dividing the white pixels between the words by this 'n' to get the number of spaces.
I tried that but it did not go as planned, the results are very conflicting, even when compared against known ground truth values. Determining a baseline of every text line is proving to be difficult, for a single space between two words I am getting different pixel count. This is because as counting the white pixels from letter d to b is different from counting the white pixels from c to s (the white pixels within the curve of c is also sometimes counted.)
Any guidance or suggestions would be greatly appreciated.
Thank you
Risposta accettata
Più risposte (0)
Categorie
Scopri di più su Language Support in Centro assistenza e File Exchange
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!