I have a number of black-and-white scanned pages. To prepare them for
I have to split them in columns and rows. Additionally, somewhere in
are pictures, which also need to be separated.
So, in a page that might look like this:
Text1 Text4 Text6
Text2 Pict1 Text7
Text3 Text5 Pict2
I’d like to find the largest blocks of white which separate the texts
and pictures, both horizontally
Right now, I would use RMagick with export_pixels_to_str and then
regular expressions to find the
zeros, but I am not sure whether there’s a more effective way for this
Do you have any suggestions ?
Thank you very much,