Discussion:
searching for "4.1" works, but "4.1.1" returns no results
(too old to reply)
Kevin Blount
2007-03-27 18:32:29 UTC
Permalink
For my companies site I have removed all numbers from the "noise.enu"
file so that we can search through our documents for version numbers
of
our software.

The problem is that I can search for the major and first minor version
numbers, such as 3.3 or 4.1, but if I try to search for second minor
numbers as well, the search returns no results, i.e. 4.1.1 or 3.2.3. I
know documents exist on our site with 4.1.0 and 4.1.1 in them, but
they
won't appear in the results.

I'm wondering if there is a 2 period/fullstop limit or some other
combination of rules that "4.1.1" breaks? Any ideas?

(btw, this is a repeat post, as the only reply I get last time (Nov
2005) was basically an advertisement for 'Coveo' and I'm really
looking for a solution or work around for Index Server at this stage
(though I have tasked one of my team to look at Coveo))
Gang_Warily
2007-03-29 11:38:01 UTC
Permalink
Hi Kevin

Try searching for "NN4d1" & "NN4d1d1" ?
Not sure how ...

I actually wanted to find 'NN1' because Northamptonshire UK postcodes are
like 'NN1 4JQ' and stumbled on the article below.

It seems it stores numbers as words as well.

Eric

http://msdn2.microsoft.com/en-gb/library/ms693168.aspx

When you create a word breaker, it is recommended that the word breaker
normalize numbers to a canonical representation by using the pattern
"NNddDcc," where "NN" is the literal sequence "NN," dd is the integer portion
of the number, "D" is the literal "D," and cc is the fractional portion of
the number. Word breakers do not restrict the number of digits for either the
integer or the fraction portion of the number. It is recommended that word
breakers recognize numerical patterns that are delimited by both periods (.)
and commas (,). For example, Indexing Service represents both "1,000.2" and
"1.000,2" as "NN1000D2."
Post by Kevin Blount
For my companies site I have removed all numbers from the "noise.enu"
file so that we can search through our documents for version numbers
of
our software.
The problem is that I can search for the major and first minor version
numbers, such as 3.3 or 4.1, but if I try to search for second minor
numbers as well, the search returns no results, i.e. 4.1.1 or 3.2.3. I
know documents exist on our site with 4.1.0 and 4.1.1 in them, but
they
won't appear in the results.
I'm wondering if there is a 2 period/fullstop limit or some other
combination of rules that "4.1.1" breaks? Any ideas?
(btw, this is a repeat post, as the only reply I get last time (Nov
2005) was basically an advertisement for 'Coveo' and I'm really
looking for a solution or work around for Index Server at this stage
(though I have tasked one of my team to look at Coveo))
Kevin Blount
2007-03-29 19:16:38 UTC
Permalink
Post by Gang_Warily
Hi Kevin
Try searching for "NN4d1" & "NN4d1d1" ?
Not sure how ...
I actually wanted to find 'NN1' because Northamptonshire UK postcodes are
like 'NN1 4JQ' and stumbled on the article below.
It seems it stores numbers as words as well.
Eric
http://msdn2.microsoft.com/en-gb/library/ms693168.aspx
When you create a word breaker, it is recommended that the word breaker
normalize numbers to a canonical representation by using the pattern
"NNddDcc," where "NN" is the literal sequence "NN," dd is the integer portion
of the number, "D" is the literal "D," and cc is the fractional portion of
the number. Word breakers do not restrict the number of digits for either the
integer or the fraction portion of the number. It is recommended that word
breakers recognize numerical patterns that are delimited by both periods (.)
and commas (,). For example, Indexing Service represents both "1,000.2" and
"1.000,2" as "NN1000D2."
Post by Kevin Blount
For my companies site I have removed all numbers from the "noise.enu"
file so that we can search through our documents for version numbers
of
our software.
The problem is that I can search for the major and first minor version
numbers, such as 3.3 or 4.1, but if I try to search for second minor
numbers as well, the search returns no results, i.e. 4.1.1 or 3.2.3. I
know documents exist on our site with 4.1.0 and 4.1.1 in them, but
they
won't appear in the results.
I'm wondering if there is a 2 period/fullstop limit or some other
combination of rules that "4.1.1" breaks? Any ideas?
(btw, this is a repeat post, as the only reply I get last time (Nov
2005) was basically an advertisement for 'Coveo' and I'm really
looking for a solution or work around for Index Server at this stage
(though I have tasked one of my team to look at Coveo))
Thanks for the reply. I did a test using "NN4d1" and it returned the
same results as a search of "4.1", which is a good start. However,
searching for "NN4d1d1" returned no results, which unfortunately, is the
same as searching for "4.1.1".

I guess this means that the word breakers are working, but that it still
means that a dbl period/fullstop isn't useable.

You've given me something else to work on.. and for that I'm greatful.

Anyone else have any ideas I can check out as well?
Gang_Warily
2007-03-30 08:52:00 UTC
Permalink
Post by Kevin Blount
Post by Gang_Warily
Hi Kevin
Try searching for "NN4d1" & "NN4d1d1" ?
Not sure how ...
I actually wanted to find 'NN1' because Northamptonshire UK postcodes are
like 'NN1 4JQ' and stumbled on the article below.
It seems it stores numbers as words as well.
Eric
http://msdn2.microsoft.com/en-gb/library/ms693168.aspx
When you create a word breaker, it is recommended that the word breaker
normalize numbers to a canonical representation by using the pattern
"NNddDcc," where "NN" is the literal sequence "NN," dd is the integer portion
of the number, "D" is the literal "D," and cc is the fractional portion of
the number. Word breakers do not restrict the number of digits for either the
integer or the fraction portion of the number. It is recommended that word
breakers recognize numerical patterns that are delimited by both periods (.)
and commas (,). For example, Indexing Service represents both "1,000.2" and
"1.000,2" as "NN1000D2."
Post by Kevin Blount
For my companies site I have removed all numbers from the "noise.enu"
file so that we can search through our documents for version numbers
of
our software.
The problem is that I can search for the major and first minor version
numbers, such as 3.3 or 4.1, but if I try to search for second minor
numbers as well, the search returns no results, i.e. 4.1.1 or 3.2.3. I
know documents exist on our site with 4.1.0 and 4.1.1 in them, but
they
won't appear in the results.
I'm wondering if there is a 2 period/fullstop limit or some other
combination of rules that "4.1.1" breaks? Any ideas?
(btw, this is a repeat post, as the only reply I get last time (Nov
2005) was basically an advertisement for 'Coveo' and I'm really
looking for a solution or work around for Index Server at this stage
(though I have tasked one of my team to look at Coveo))
Thanks for the reply. I did a test using "NN4d1" and it returned the
same results as a search of "4.1", which is a good start. However,
searching for "NN4d1d1" returned no results, which unfortunately, is the
same as searching for "4.1.1".
I guess this means that the word breakers are working, but that it still
means that a dbl period/fullstop isn't useable.
You've given me something else to work on.. and for that I'm greatful.
Anyone else have any ideas I can check out as well?
Post by Gang_Warily
Indexing Service represents both "1,000.2" and "1.000,2" as "NN1000D2."
One might guess that 4.1.1 would be represented as NN41d1 ?
There could be conflicts:
Do you ever get to 11.1.1 = NN111d1 = 1.11.1 ?

You might have write your own script to search the contents of the returned
files,
to 'disambiguate' !

I don't know how easy it is to write your own word-breaker,
or get one written for you ?
Some instructions exist around the article linked above.

I'm guessing you can't change the version code to 4.a.1 ?

There are diagnostic tools filttest.exe & filtdump.exe available
(Windows 2003 Platform SDK) -- search for 'test an ifilter' ...

Good luck !

Loading...