Strange XML Issue
2009-07-24 14:35:54 UTC
I've got index server cataloguing a directory containing XML files. If
I query the catalogue using a keyword containing the word "of", for
example "Raiders of the lost Arc" it will not find an XML file
containing this phrase. However if I rename the XML file to .txt
or .html it has no problem finding the file. It appears to be the same
for any phrase containing "of".

I've tried removing "of" from the noise words list, but that has no

I checked in the registry to see what filter DLL was being used
for .txt and .xml files, and they both use the same one.

I've replicated this on two Servers running Windows Server 2003 and a
PC running XP.

I've no idea what the cause could be. It doesn't make much sense.

Anyone any ideas?
Marc J. Cawood
2009-08-06 05:38:55 UTC
Perhaps the filter can't parse the XML file. Have you tried a search
by the filename to see if it's in the catalog at all?
Next, I would validate the XML and then I would check that the words
you are searching aren't in a comment or an XML attribute.