Batch-converting JPEG files for OCR

Saturday, 8 March 2008

I was sent a whole bunch of .jpg files of scanned documents with text that I wanted to extract.

I have Microsoft Office Document Imaging (MODI) installed, so I was keen to use that to perform the OCR (instead of re-typing all the text!). The only problem is that MODI only understands TIFF and MDI formats.

I used ImageMagick to do the conversion. Convert might sound like the best candidate, but mogrify did the job for me.

You can convert a whole lot of files using the following command:

mogrify -format tiff *.jpg

This creates new tiff files for each JPEG file. The only problem is that MODI doesn't like the particular flavour of TIFF generated. Fortunately ImageMagick has 1001 options to configure exactly what you want to happen.

A bit of experimentation and I've found that the following extra options generate TIFF files that can be read without problems:

mogrify -format tiff -colorspace RGB -compress RLE *.jpg

All good, except that I then discovered that the scanning was at such a low DPI that the OCR wasn't able to find any text :-(

Something else that sounds interesting is that MODI can be programmed against. Maybe I could automate this even more!

10 Years!

Friday, 7 March 2008

David kissing Narelle on their wedding dayIt's hard to believe, but it really is 10 years ago today that this gorgeous woman said "I do"! Yes, Narelle and I are celebrating our 10th wedding anniversary.

In a lot of ways, it seems to have gone very quickly. It's like only yesterday we were going out and then engaged, flying interstate once a month to see each other (I maintain I kept Ansett afloat that year). Long distance relationships are very difficult, but the wait was worth it.

We were married in Sydney (which is where Narelle and her family were living at the time), so that meant there was a fair bunch of Adelaide family and friends who made the trek across. Not to mention Cathy from USA. Then as a complete surprise, Mr & Mrs Rush and Mr & Mrs Badger decided they'd drive to Sydney just to see the wedding, then drive all the way home again on the same weekend (1,400km each way).

After the honeymoon, it was back to Adelaide. As if getting married isn't enough, Narelle also moved interstate!

Now Narelle's Mum & Dad have moved here to retire and help share the babysitting with my parents. Very handy, especially with G3 due in May.

It's not all beer and skittles mind you. Well I don't drink beer for one thing (strawberry milkshakes are more my cup of tea) though I think the kids do have some skittles in the toy cupboard. Seriously, there have been some tough times, but we've got through them, and they've been more than made up for by the good ones.

Well all I can say is if the first 10 years are any indication, I'm looking forward to the next 10, and the 10 after that, and the 10 after that...

TyTN II frozen

Thursday, 6 March 2008

I'm not sure what happened. It was time to leave work, so I un-cradled my HTC TyTN II phone. I noticed it was part-way through synching but that shouldn't matter.

I walked out to the bus-stop and waited, and waited, and waited. I had a meeting in the City and when the bus arrived eventually I realised I'd be late, so I went to try and ring ahead.

The phone was stuck.

I did a soft-reset, and after it restarted it got as far as saying it couldn't do a network connection (not sure why it was trying to do that), and then it would refuse to do anything else.

Further soft-resets didn't help either. Narelle tried to ring me, but I couldn't even answer her call, as none of the buttons had any effect, and the only way I could stop it ringing was to soft-reset it again.

I finally got home, and was able to look up the manual online. For future reference, to do a hard-reset:

  1. Press and hold the left SOFT KEY and the right SOFT KEY, and at the
    same time, use the stylus to press the RESET button at the bottom of
    your device.
  2. Release the stylus, but continue pressing the two SOFT KEYs until you
    see a special message on the screen.
  3. Release the two SOFT KEYs, and then press the button on your
    device.

That seems to have done the trick, though it does mean I'll have to re-install everything else again.

Hot Cross Buns

Monday, 3 March 2008

Hot Cross BunsHere's something a bit different!

We're having a few people over for Hot Cross Buns on Good Friday. If you know me, then you are welcome to join us. Just drop me an email and I'll forward you the details.

Meeting with Paul Sherlock

Monday, 3 March 2008

I'd been thinking over the weekend that it would be polite to contact Paul Sherlock - the Director of UniSA's ISTS to let him know that I'm not planning to move over to their new Learning and Teaching Systems team. However, he beat me to it! He rang to ask if he could drop by and have a chat. Paul has always been friendly and cordial when I've had anything to do with him previously, and today was no different.

I told him that I was planning to leave UniSA, and not move out to the new position at Mawson Lakes. I've already mentioned here previously that the way I see ISTS function is not a place that I see myself fitting into.

He asked about my plans for the future, and also enquired about handover plans.

Handover won't be easy, as the new team doesn't exist yet, and I'd guess it would be at least another month by the time they advertise and fill the other positions. The best I can do is try and get RJ up to speed on as many things as I can think of (as he's indicated interest in the new team), and maybe make use of that nice Wiki software we now have.

It isn't easy picking up other people's work. I experienced this first hand when Roger left our team last year. Picking up after me will be harder, as I've got 13 years of "legacy" applications behind me :-)

The meeting ended positively, with Paul wishing me well for the future. I appreciate that. You never know when your paths will cross again, so I think it's preferable to leave on good terms.