November 20, 2006

Technorati vs. Unicode

Believe it or not, the over-hyped Technorati, still doesn't support unicode queries, despite its claims to be the single best source on the state of blogging in the world.

Six month ago I wrote in this blog:

Just go and search for these three widely used words in Persian,
Arabic and Hebrew. You'll get ZERO results:
امروز
عراق
הארץ

How can Mr. Sifry's Technorati be "the authority on what's going on in the world of weblogs," when they can't even show a single result for queries in at least these three languages whose blogospheres could easily be as big as one million blogs in total?

David Sifry then replied:

Actually, it is because our parsing of farsi, Arabic, and Hebrew aren't very good right now. We're actually working on building out our search support for more languages, and these 3 are important. But it'll still be a while until things are much improved. Sorry...

Six months on, there has been no improvement. So I wonder how Technorati allows itself to say anything about blogs that use Unicode scripts.

Posted by hoder at November 20, 2006 5:12 PM| TrackBack

Comments
Why don't you just make your own Persian-specific technorati and reap the profits? The parsing would probably be rather easy to figure out. Hell, you could even add support for searching based on romanized system (so امروز and emrooz would return the same results, for example)
- By: Thomas J. Webb on November 26, 2006
---------
Post a comment
bold (ctrl-shft-B)italics (ctrl-shft-T)link (ctrl-shft-A)
Name*:   
Email*:
URL:


Note:
* Required
The following HTML tags are allowed in your comments: <a> <b> <i>. To make line and paragraph breaks, press return (don't use <br> or <p>).
The bold, italics, and link buttons (and associated shortcut keys) only work in IE 5+ on the PC.