|
building a better local search tool [1102.1999] I built the powerreporting.com search tool by modifying some code from Dr. Dobbs. (Not sure where to find it at the moment, but it was more or less like the engine described at webmonkey. The problem is that it's an incredibly simplistic engine, and (at least my version) doesn't work very well. I wish I could have the power and speed of something like ht://Dig or swish++ for searching, but still index the content in my databases without going through HTTP or the web. In fact, it would be really nice if I could just gather the data I want to index (who cares where it came from?) and then pass the actual work off to a "real" engine. Well, perhaps I can do just that. ht://Dig has got the right idea: breaking the system into seperate programs. In that system, "htdig" fetches (spiders?) web pages, "htmerge" or "htfuzzy" creates the index, and then "htsearch" retrieves it from the index and displays it on the web. (Of course, it would be even better if "htsearch" were broken into two pieces: one for the search, and one for the display. In fact, that's exactly what I want: generic interfaces for indexing and searching, independent of the search engine. I ought to be able to write a script to index a particular set of data, have it call any one of several drivers, all with identical interfaces, and thus use whatever indexer is available on the host machine. Then, later, when I go to do the search, there ought to be another isomorphic set of drivers to look up what I want. Furthermore, it should return something like an ActiveX recordset or phplib's database object so that I can treat it just like any other query results (like, say, from against a database). That would give tools like Zebra the equivalent of reuable components like JavaBeans or ActiveX or whatever.. They just wouldn't crash your computer! I guess for right now I'll stick with a basic Perl indexer, but design it so that I can change the indexer and search engine later on with relative ease. |