Discussion:
display links in search results ?
(too old to reply)
Gang_Warily
2006-11-01 17:36:12 UTC
Permalink
Hi

I would like to be able to retrieve the values of linked URLs into a page of
search results.
http://windowssdk.msdn.microsoft.com/en-us/library/ms692942.aspx
says 'A_HRef - corresponds to the Indexing Service property name HtmlHRef.
Can be queried but not retrieved.'

Is there a way to change it so it can be retrieved ?
I've tried adding it to the secondary cache as 'VT_LPWSTR 25' and
'VT_VARIANT 40', and just get null values back [VarType()=1].
(Works for keywords ...)

Would changing the Registry value
for Properties : Modifiability to 1 help ?
http://windowssdk.msdn.microsoft.com/en-us/library/ms692124.aspx
Modifiability of the cached property.
A value of 0 signifies that the property entry can't be modified once cached,
and a value of 1 signifies that the other parts of the cached property
(such as SizeInBytes)
can be modified once the property is cached.

Thanks for your help !
Gang_Warily
2006-11-02 11:19:02 UTC
Permalink
Hi again

There are other fields that I would like to retrieve.
70eb7a10-55d9-11cf-b75b-00aa0051fe20 img.alt
c82bf597-b831-11d0-b733-00aa00a1ebd2 img.src
c82bf597-b831-11d0-b733-00aa00a1ebd2 table.background
c82bf597-b831-11d0-b733-00aa00a1ebd2 tr.background
c82bf597-b831-11d0-b733-00aa00a1ebd2 td.background
All give null in the retrieved recordset, even though I can search and find
entries in the fields !
The use of background attributes on TABLE, TR, TD is non-standard HTML
so I presume the IFILTER is just scraping whatever properties it finds, in
an ad-hoc manner.

Strangely,
c82bf597-b831-11d0-b733-00aa00a1ebd2 meta.url
does retrieve values !

Even custom properties
<META content="test" name=xkeywords>
are retrievable as
d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 xkeywords

Can anyone suggest why some properties are not retrievable ?
'Content' is understandable - it's a compound property, constructed from
other text properties.

I should perhaps declare that I am trying to use the Indexing Service
catalog as a data source, to import content from static pages into a CMS
database ...

I don't really want to open every file and extract the tags using a Regular
Expression ...
Post by Gang_Warily
Hi
I would like to be able to retrieve the values of linked URLs into a page of
search results.
http://windowssdk.msdn.microsoft.com/en-us/library/ms692942.aspx
says 'A_HRef - corresponds to the Indexing Service property name HtmlHRef.
Can be queried but not retrieved.'
Is there a way to change it so it can be retrieved ?
I've tried adding it to the secondary cache as 'VT_LPWSTR 25' and
'VT_VARIANT 40', and just get null values back [VarType()=1].
(Works for keywords ...)
Would changing the Registry value
for Properties : Modifiability to 1 help ?
http://windowssdk.msdn.microsoft.com/en-us/library/ms692124.aspx
Modifiability of the cached property.
A value of 0 signifies that the property entry can't be modified once cached,
and a value of 1 signifies that the other parts of the cached property
(such as SizeInBytes)
can be modified once the property is cached.
Thanks for your help !
Hilary Cotter
2006-11-08 02:21:14 UTC
Permalink
Some properties are only searchable and not retrievable. For example you can
test to see if their value is one you are looking for, but you can't display
it.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
Post by Gang_Warily
Hi again
There are other fields that I would like to retrieve.
70eb7a10-55d9-11cf-b75b-00aa0051fe20 img.alt
c82bf597-b831-11d0-b733-00aa00a1ebd2 img.src
c82bf597-b831-11d0-b733-00aa00a1ebd2 table.background
c82bf597-b831-11d0-b733-00aa00a1ebd2 tr.background
c82bf597-b831-11d0-b733-00aa00a1ebd2 td.background
All give null in the retrieved recordset, even though I can search and find
entries in the fields !
The use of background attributes on TABLE, TR, TD is non-standard HTML
so I presume the IFILTER is just scraping whatever properties it finds, in
an ad-hoc manner.
Strangely,
c82bf597-b831-11d0-b733-00aa00a1ebd2 meta.url
does retrieve values !
Even custom properties
<META content="test" name=xkeywords>
are retrievable as
d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 xkeywords
Can anyone suggest why some properties are not retrievable ?
'Content' is understandable - it's a compound property, constructed from
other text properties.
I should perhaps declare that I am trying to use the Indexing Service
catalog as a data source, to import content from static pages into a CMS
database ...
I don't really want to open every file and extract the tags using a Regular
Expression ...
Post by Gang_Warily
Hi
I would like to be able to retrieve the values of linked URLs into a page of
search results.
http://windowssdk.msdn.microsoft.com/en-us/library/ms692942.aspx
says 'A_HRef - corresponds to the Indexing Service property name HtmlHRef.
Can be queried but not retrieved.'
Is there a way to change it so it can be retrieved ?
I've tried adding it to the secondary cache as 'VT_LPWSTR 25' and
'VT_VARIANT 40', and just get null values back [VarType()=1].
(Works for keywords ...)
Would changing the Registry value
for Properties : Modifiability to 1 help ?
http://windowssdk.msdn.microsoft.com/en-us/library/ms692124.aspx
Modifiability of the cached property.
A value of 0 signifies that the property entry can't be modified once cached,
and a value of 1 signifies that the other parts of the cached property
(such as SizeInBytes)
can be modified once the property is cached.
Thanks for your help !
Loading...