Discussion:
Indexing service ignores most files in directory
(too old to reply)
cushlomokree
2006-10-29 21:07:54 UTC
Permalink
Hi,
I am running Windows Server 2003 SP1.
I'm developing a reference website with approximately 12000 .htm files in
the directory.
I created a new catalog in Computer Management under Indexing Service.
Added the directory on my local drive containing the development website and
excluded some subdirectories that contain unwanted material.
When I restart the service it only indexes 43 out of 12110 *.htm files.
Have "tuned the service", played with the folder's permissions until almost
anyone on earth has access. deleted the catalog and retried...
still only 43 files get indexed (~1mbyte).
Any help would be greatly appreciated.
Mike
WenJun Zhang[msft]
2006-10-30 07:49:54 UTC
Permalink
Hi Mike,

If there are a large number of files need to be indexed, when the catalog
was initially created, Indexing Service might take a bit long time to
finish the scanning on all the files. This is a expected behvaior.
Sometimes all file contents will show up a couple of hours later.

Please open Computer Management, expand Indexing Service snap-in. Verify
the 'Docs to Index' number. See if there is still a large number of docs
haven't been indexed there. Also check if the status is 'Indexing Paused
(User Active)'. Indexing Service generally performs full speed scanning
only when interactive user and system isn't in an active stage. Therefore
to let it finish the indexing as soon as possible, you can stop and restart
Indexing Service, then do not move mouse and click keyboad anymore. Wait
for the 'Docs to Index' number decrease to 0. Then all the web pages should
be properly returned from query.

Please feel free let me know if problem still persists.

Have a nice day.

Sincerely,

WenJun Zhang

Microsoft Online Community Support

==================================================

Get notification to my posts through email? Please refer to:
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at:

http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
cushlomokree
2006-10-30 15:45:43 UTC
Permalink
Hi WenJun,

Nothing has changed overnight.
Total docs=43; Docs to Index=0, not paused, Saved Indexes=1, Word Lists=0,
Size=1Mb, Status=Started
Please note that the folder in question is a Web Site on Frontpage 2003, as
well as Visual Studio 2005, and is listed as a web site (not default) in the
IIS snap in Computer Management.
On the IIS Properties dialog for this web site, 'Home Directory' tab the
checkboxes 'Read' and 'Index this resource' are checked.
I think there may be some conflict that I am not aware of.
As an experiment, I just copied all the content from the web site folder
into a temporary new folder and successfully created a new, complete
catalog(docs=12111) there with the Indexing service.
The indexing will ultimately be used as a "search function" for this site,
so I need to get the indexer working for the web site.
Thanks for your help,
Mike
Post by WenJun Zhang[msft]
Hi Mike,
If there are a large number of files need to be indexed, when the catalog
was initially created, Indexing Service might take a bit long time to
finish the scanning on all the files. This is a expected behvaior.
Sometimes all file contents will show up a couple of hours later.
Please open Computer Management, expand Indexing Service snap-in. Verify
the 'Docs to Index' number. See if there is still a large number of docs
haven't been indexed there. Also check if the status is 'Indexing Paused
(User Active)'. Indexing Service generally performs full speed scanning
only when interactive user and system isn't in an active stage. Therefore
to let it finish the indexing as soon as possible, you can stop and restart
Indexing Service, then do not move mouse and click keyboad anymore. Wait
for the 'Docs to Index' number decrease to 0. Then all the web pages should
be properly returned from query.
Please feel free let me know if problem still persists.
Have a nice day.
Sincerely,
WenJun Zhang
Microsoft Online Community Support
==================================================
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.
Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
AC
2006-10-31 01:59:09 UTC
Permalink
Post by cushlomokree
Hi WenJun,
Nothing has changed overnight.
Total docs=43; Docs to Index=0, not paused, Saved Indexes=1, Word Lists=0,
Size=1Mb, Status=Started
Please note that the folder in question is a Web Site on Frontpage 2003, as
<snip/>

Could changing the archive attribute of the files force it to re-index the
files?

Regards
WenJun Zhang[msft]
2006-10-31 11:41:12 UTC
Permalink
Hi Mike,

The problem is why the total doc number is only 43. Please compare the
problematic catalog's scope(directory) configuration and the correct new
test catalog. Any obvious difference between them?

If you cannot figure out where the problem is, please export the following
registry entry and send it to me at: ***@online.microsoft.com
(remove.). I will help on reviewing it for clues.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\Catalogs

I look forward to your message. Have a good day.

Sincerely,

WenJun Zhang

Microsoft Online Community Support

==================================================

Get notification to my posts through email? Please refer to:
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at:

http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
cushlomokree
2006-10-31 18:47:55 UTC
Permalink
Hi WenJun,
Here are the registry entries. There are 3 catalogs: "rh06", "system", and
"test"
"rh06" is the website that is not indexing properly, and "test" is the copy
in another directory that is indexing correctly.

Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\Catalogs
Class Name: <NO CLASS>
Last Write Time: 10/31/2006 - 11:21 AM

Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\Catalogs\rh06
Class Name: <NO CLASS>
Last Write Time: 10/31/2006 - 11:21 AM
Value 0
Name: Location
Type: REG_SZ
Data: D:\MV\rh06\html\htmlhelp\searchcat

Value 1
Name: IsIndexingW3Svc
Type: REG_DWORD
Data: 0

Value 2
Name: IsIndexingNNTPSvc
Type: REG_DWORD
Data: 0


Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\Catalogs\rh06\Scopes
Class Name: <NO CLASS>
Last Write Time: 10/31/2006 - 11:22 AM
Value 0
Name: D:\MV\rh06\html\htmlhelp\htmlROBO
Type: REG_SZ
Data: ,,5


Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\Catalogs\System
Class Name: <NO CLASS>
Last Write Time: 12/7/2004 - 2:26 PM
Value 0
Name: Location
Type: REG_SZ
Data: C:\System Volume Information

Value 1
Name: IsIndexingW3Svc
Type: REG_DWORD
Data: 0

Value 2
Name: IsIndexingNNTPSvc
Type: REG_DWORD
Data: 0


Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\Catalogs\System\Scopes
Class Name: <NO CLASS>
Last Write Time: 10/30/2006 - 9:20 AM
Value 0
Name: D:\Documents and Settings
Type: REG_SZ
Data: ,,4

Value 1
Name: C:\
Type: REG_SZ
Data: ,,4

Value 2
Name: D:\
Type: REG_SZ
Data: ,,4

Value 3
Name: D:\Documents and Settings\*\Application Data\*
Type: REG_SZ
Data: ,,4

Value 4
Name: D:\Documents and Settings\*\Local Settings\*
Type: REG_SZ
Data: ,,4

Value 5
Name: D:\MV\rh06\html\htmlhelp\htmlROBO
Type: REG_SZ
Data: ,,5


Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\Catalogs\test
Class Name: <NO CLASS>
Last Write Time: 10/31/2006 - 11:18 AM
Value 0
Name: Location
Type: REG_SZ
Data: D:\MV\rh06\html\backup\searchcat

Value 1
Name: IsIndexingW3Svc
Type: REG_DWORD
Data: 0

Value 2
Name: IsIndexingNNTPSvc
Type: REG_DWORD
Data: 0


Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\Catalogs\test\Scopes
Class Name: <NO CLASS>
Last Write Time: 10/31/2006 - 11:19 AM
Value 0
Name: D:\MV\rh06\html\backup\RH06
Type: REG_SZ
Data: ,,5

thanks,
Mike
Post by WenJun Zhang[msft]
Hi Mike,
The problem is why the total doc number is only 43. Please compare the
problematic catalog's scope(directory) configuration and the correct new
test catalog. Any obvious difference between them?
If you cannot figure out where the problem is, please export the following
(remove.). I will help on reviewing it for clues.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\Catalogs
I look forward to your message. Have a good day.
Sincerely,
WenJun Zhang
Microsoft Online Community Support
==================================================
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.
Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
WenJun Zhang[msft]
2006-11-01 09:01:29 UTC
Permalink
Hi Mike,

I saw the problematic catalog and your test catalog include difference
pathes as their scopes:

rh06 Catalog:

D:\MV\rh06\html\htmlhelp\htmlROBO

test Catalog:

D:\MV\rh06\html\backup\RH06

Looks like this is the reason of the problem. Please double-check their
directory setting. Also does the \htmlhelp\htmlROBO directory only contain
43 documents? If you also include \backup\RH06 directory to rh06 catalog
and restart IS service, will the doc number be reasonable?

Have a nice day.

Sincerely,

WenJun Zhang

Microsoft Online Community Support

==================================================

Get notification to my posts through email? Please refer to:
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at:

http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
cushlomokree
2006-11-01 14:43:09 UTC
Permalink
Hi WenJun,
D:\MV\rh06\html\htmlhelp\htmlROBO (rh06 Catalog) is the problem directory.
It contains 12113 files, but the indexer only "sees" 43 files.
D:\MV\rh06\html\backup\RH06 (test Catalog) is a backup copy of the above
directory. It also contains 12113 files, but the indexer "sees" all 12113
files.
Directory settings for both are the same.

--> If you also include \backup\RH06 directory to rh06 catalog
and restart IS service, will the doc number be reasonable?<--

Yes, if I add that directory to the rh06 catalog, the indexer "sees" all the
files.

However that just brings us back to the original problem:
Why does the indexer not see all the files in the original directory?

Thanks,
Mike
Post by WenJun Zhang[msft]
Hi Mike,
I saw the problematic catalog and your test catalog include difference
D:\MV\rh06\html\htmlhelp\htmlROBO
D:\MV\rh06\html\backup\RH06
Looks like this is the reason of the problem. Please double-check their
directory setting. Also does the \htmlhelp\htmlROBO directory only contain
43 documents? If you also include \backup\RH06 directory to rh06 catalog
and restart IS service, will the doc number be reasonable?
Have a nice day.
Sincerely,
WenJun Zhang
Microsoft Online Community Support
==================================================
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.
Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
WenYuan Wang
2006-11-03 12:12:11 UTC
Permalink
Hi Mike

I'm Wen-Jun's Backup.
Wen-Jun is on vacation, these days.
He will go back next week and reply here.
If you have any more concerns on it, please feel free to post here.

Sincerely,
WenYuan
WenJun Zhang[msft]
2006-11-07 12:44:31 UTC
Permalink
Hi Mike,

Since the current situation is a little bit strange, my suggestion is to
rebuild the old catalog and directory to test. Please follow the steps as
below:

1) Stop Indexing Service.

2) Rename the directory htmlROBO and create a new one under
D:\MV\rh06\html\htmlhelp\.

3) Copy all the files from the renamed directory to the new
D:\MV\rh06\html\htmlhelp\htmlROBO\

4) Select all files in the new directory and open
Properties->General->Advanced, make sure option 'File is ready for
archiving' isn't selected and 'For fast searching, allow Indexing Service
to index this file' is selected.

5) Delete the problematic catalog in Indexing Service snap-in. Also make
sure its corresponding hidden catalog.wci directory is deleted.

6) Recreate a new catalog with the same name and include the directory:
D:\MV\rh06\html\htmlhelp\htmlROBO\

7) Start Indexing Service to allow it crawl on the new data folder.

Let's see if it works this time. If still no sucess, I will ping our
indexing service product group for further suggestions on the
troubleshooting.

Have a great day.

Sincerely,

WenJun Zhang

Microsoft Online Community Support

==================================================

Get notification to my posts through email? Please refer to:
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at:

http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
Alec MacLean
2007-01-17 19:07:02 UTC
Permalink
I had been expereinceing the same problem as Mike, but now have results
coming through after following this change...
Post by WenJun Zhang[msft]
4) Select all files in the new directory and open
Properties->General->Advanced, make sure option 'File is ready for
archiving' isn't selected and 'For fast searching, allow Indexing Service
to index this file' is selected.
What really caught my eye was the "For fast searching, allow Indexing
Service to index this file".

I had already specified in IIS Manager to "Index this folder", as well as
setting up the catalog in Index Server. I had had some change to the word
list and saved indexes after forcing a rescan of the directory, but this
hadn't helped the search results.

Only after going to Windows Explorer and selecting the folder the site is
physically located in, e.g. D:\website (which is not the default of the
C:\InetPub due to use of RAID and separate drive volumes, etc.) and checking
that box did the catalog start to behave properly.

Thanks to WenJun Zhang for this snippet of absolutely crucial info! But why
was it required at all - IMHO the other more obvious settings should either
have controlled it or over-ridden it !!!!

Al
Post by WenJun Zhang[msft]
Hi,
I am running Windows Server 2003 SP1.
I'm developing a reference website with approximately 12000 .htm files in
the directory.
I created a new catalog in Computer Management under Indexing Service.
Added the directory on my local drive containing the development website
and excluded some subdirectories that contain unwanted material.
When I restart the service it only indexes 43 out of 12110 *.htm files.
Have "tuned the service", played with the folder's permissions until
almost anyone on earth has access. deleted the catalog and retried...
still only 43 files get indexed (~1mbyte).
Any help would be greatly appreciated.
Mike
Loading...