Discussion:
One 'funky' catalog!
(too old to reply)
Bob
2006-09-26 15:21:47 UTC
Permalink
This problem will surely tax the brains of all of the experts on this list.
We have had an Index Server Catalog running for three years on Windows 2000
SP4. It has slowly built up ober time to 600,000+ documents, all in a dated
folder scheme, like 2006/09/30 ie by year then by month then by day . Each
daily folder will contain about 300 html and jpg files.

We index the html documents using metatags and the 'htmlprop.dll' tag
filter.

Up until a couple of months ago , it ran smoothly.. Each day a new folder
was dropped in , IS would come in , take about two minutes to index it and
that day's html files were then available to our web application.

Then, one day, it started to act very funny....when we added a day to the
catalog tree, IS would take hours ( its set as a 'dedicated server') to do
'master merge' after master merge , non-stop . If you stopped and restarted
IS it would go back to normal.


After a couple of weeks of this, I tore down the catalog completely and
re-created the web catalog, using just a few days worth of data, then
gradually added files back in to be indexed again. After only a month of
data was added, it started going 'funky' again.

You would add a day's worth of files then IS ( cisvc.exe) would start going
back over ALL the files in the catalog before it came back and answered
queries, sometimes a couple of hours later. Then we found that , for the
month of days that we had added, IS had NOT actually indexed all of the
content. Now, whenever you add a few days , IS indexes normally....but
sooner or later it goes into this mode where it seemingly 're-indexes' ALL
of the files, but doesn't really, it misses some anyway.

I have been using NT Filemon to watch CISVC open files , so I can see what
files it is hammering on at any point in time.


To demonstrate even funkier behaviour, if you throw the catalog into 'Query
Only' mode when it goes into one it's funks, then 'start' it again later,
it starts back at the beginning of the catalog, looking at ALL the files all
over again.......

Does anybody have any suggestions at all? I am basically at wit's end , run
out of things to try..... thanks in advance to anyone who answers.....
Hilary Cotter
2006-09-26 21:07:36 UTC
Permalink
Are there any messages in the event viewer from cisvc? Like catalog
corruption etc? Also are you setting the allowenumeration switch to false?
There is a condition where you don't set this to true that the catalog hang
in manner similar to what you describe.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
Post by Bob
This problem will surely tax the brains of all of the experts on this
list. We have had an Index Server Catalog running for three years on
Windows 2000 SP4. It has slowly built up ober time to 600,000+ documents,
all in a dated folder scheme, like 2006/09/30 ie by year then by month
then by day . Each daily folder will contain about 300 html and jpg files.
We index the html documents using metatags and the 'htmlprop.dll' tag
filter.
Up until a couple of months ago , it ran smoothly.. Each day a new folder
was dropped in , IS would come in , take about two minutes to index it and
that day's html files were then available to our web application.
Then, one day, it started to act very funny....when we added a day to the
catalog tree, IS would take hours ( its set as a 'dedicated server') to do
'master merge' after master merge , non-stop . If you stopped and
restarted IS it would go back to normal.
After a couple of weeks of this, I tore down the catalog completely and
re-created the web catalog, using just a few days worth of data, then
gradually added files back in to be indexed again. After only a month of
data was added, it started going 'funky' again.
You would add a day's worth of files then IS ( cisvc.exe) would start
going back over ALL the files in the catalog before it came back and
answered queries, sometimes a couple of hours later. Then we found that ,
for the month of days that we had added, IS had NOT actually indexed all
of the content. Now, whenever you add a few days , IS indexes
normally....but sooner or later it goes into this mode where it seemingly
're-indexes' ALL of the files, but doesn't really, it misses some anyway.
I have been using NT Filemon to watch CISVC open files , so I can see what
files it is hammering on at any point in time.
To demonstrate even funkier behaviour, if you throw the catalog into
'Query Only' mode when it goes into one it's funks, then 'start' it again
later, it starts back at the beginning of the catalog, looking at ALL the
files all over again.......
Does anybody have any suggestions at all? I am basically at wit's end ,
run out of things to try..... thanks in advance to anyone who answers.....
Bob
2006-09-27 16:56:31 UTC
Permalink
No, strangely enough there is no indication ...isallowennum is set to 1 ,
which I think is true , right now for example, the catalog is 'scanning'
again even though it's been through all the documents several times... It's
almost as if a virus has crept into the file system ...I did find a couple
of 'Backdoor agent' viruses, but am not sure whether they are serious enuf
to cause this?
Post by Hilary Cotter
Are there any messages in the event viewer from cisvc? Like catalog
corruption etc? Also are you setting the allowenumeration switch to false?
There is a condition where you don't set this to true that the catalog
hang in manner similar to what you describe.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
Post by Bob
This problem will surely tax the brains of all of the experts on this
list. We have had an Index Server Catalog running for three years on
Windows 2000 SP4. It has slowly built up ober time to 600,000+ documents,
all in a dated folder scheme, like 2006/09/30 ie by year then by month
then by day . Each daily folder will contain about 300 html and jpg files.
We index the html documents using metatags and the 'htmlprop.dll' tag
filter.
Up until a couple of months ago , it ran smoothly.. Each day a new folder
was dropped in , IS would come in , take about two minutes to index it
and that day's html files were then available to our web application.
Then, one day, it started to act very funny....when we added a day to the
catalog tree, IS would take hours ( its set as a 'dedicated server') to
do 'master merge' after master merge , non-stop . If you stopped and
restarted IS it would go back to normal.
After a couple of weeks of this, I tore down the catalog completely and
re-created the web catalog, using just a few days worth of data, then
gradually added files back in to be indexed again. After only a month of
data was added, it started going 'funky' again.
You would add a day's worth of files then IS ( cisvc.exe) would start
going back over ALL the files in the catalog before it came back and
answered queries, sometimes a couple of hours later. Then we found that ,
for the month of days that we had added, IS had NOT actually indexed all
of the content. Now, whenever you add a few days , IS indexes
normally....but sooner or later it goes into this mode where it seemingly
're-indexes' ALL of the files, but doesn't really, it misses some anyway.
I have been using NT Filemon to watch CISVC open files , so I can see
what files it is hammering on at any point in time.
To demonstrate even funkier behaviour, if you throw the catalog into
'Query Only' mode when it goes into one it's funks, then 'start' it
again later, it starts back at the beginning of the catalog, looking at
ALL the files all over again.......
Does anybody have any suggestions at all? I am basically at wit's end ,
run out of things to try..... thanks in advance to anyone who
answers.....
Loading...