Search
Overview
The Netscape Web Publisher search function provides
you with the ability to search the file information and contents of documents
on a remote server. Server documents can be in a variety of formats, such
as HTML, Microsoft Word, Adobe PDF, and WordPerfect. The server converts
many types of non-HTML documents into HTML as it indexes them so that you
can use your web browser to view the documents that are found for your
search.
You can search through server documents for a
specific word or attribute value, obtaining a set of search results that
list all documents that match the query. You can then select a document
from the list to browse it in its entirety. This provides easy access to
server content.
There are four parts to text searching:
-
making a query--you enter your search criteria.
-
displaying search results--the server displays a list of the documents
that match your criteria.
-
viewing a document--you can view a specific highlighted document from the
search results list.
-
viewing the contents of a document information collection--you can look
at the information that is maintained for each of your collections.
Search home page
There is a search home page, at http://search-ui/examples,
that provides individual links to the search query interfaces, samples
of search input and output, and a brief tutorial on how your server administrator
can customize the interface.
Preparing data for searching
To enable searching capability on your server, the
server administrator begins by identifying the documents that you want
to be able to search. Before you can execute searches, you need a database
of searchable data against which you can target your searches. Your server
administrator has to create a document information database, called a collection,
that indexes and stores the content and file properties for each of the
documents you want to be able to search.
In the case of Web Publisher, there is a default
web publishing collection that contains all the documents that you have
published, uploaded, or otherwise manipulated through Web Publisher. Your
server administrator can also do bulk indexing of web publishing data for
you, for example, by indexing all the documents in the document directory
defined for Web Publisher.
Collections contain such information as the format
of the documents, the language they are in, their searchable attributes,
the number of documents in the collection, the collection's status, and
a brief description of the collection. For more details, see the section
"Displaying collection contents."
About collection attributes
Server documents can be in a variety of formats,
such as HTML, Microsoft Excel, Adobe PDF, and WordPerfect. If there is
a conversion filter available for a particular file format, the server
converts the documents into HTML as it indexes them so that you can use
your web browser to view the documents that are found for your search.
There are conversion filters for documents in
these formats:
-
MS Rich Text Format (RTF)
-
Interleaf 5.2-6.0
-
MS Word (DOS) 3.0-6.0
-
MS Word (Macintosh) 3-6
-
MS Word (Windows) 2.0, 6.0, 7.0
-
MS Excel 2-5
-
MS Excel (Macintosh) 3-4
-
MS PowerPoint 7.0
-
Adobe PDF (to ASCII)
-
Adobe FrameMaker (MIF) 3.0-5.0
-
Ami Pro 1.x-3.1
-
WordPerfect (Macintosh) 2-3.5
-
WordPerfect (Windows) 5.x-6.1
-
news and mail file formats (to ASCII)
Note
If a PDF file is password-protected or contains special graphical
navigation icons, the conversion filter cannot index the file.
Certain file formats have a default set of attributes
that are indexed for files of that type, as shown in Table
5.1. Note that ASCII files have no default attributes.
Table 5-1: The default attributes indexed
for each file format
File format |
Attribute |
Type |
Description |
ASCII |
- |
- |
- |
HTML |
Title |
text |
The user-defined title of the file. |
|
SourceType |
text |
The original format of the document. |
NEWS |
From |
text |
The source userID of the news item. |
|
Subject |
text |
The text from the subject field of the news item. |
|
Keywords |
text |
Any keywords defined for the news item |
|
Date |
date |
The date the news item was created. |
EMAIL |
From |
text |
The source userID of the email. |
|
To |
text |
The destination userID of the email. |
|
Subject |
text |
The text from the email's subject field. |
|
Date |
date |
The date the email was created. |
PDF |
InstanceID |
text |
An internal ID number. |
|
PermanentID |
text |
An internal ID number. |
|
NumPages |
integer |
The number of pages in the document. |
|
DirID |
text |
The directory where the PDF file exists. |
|
FTS_ModificationDate |
date |
The document's last modification date. |
|
FTS_CreationDate |
date |
The document's creation date. |
|
WXEVersion |
integer |
The version of Adobe Word Finder used to extract
the text from the PDF document. |
|
FileName |
text |
The Adobe filename specification. |
|
FTS_Title |
text |
The document's title. |
|
FTS_Subject |
text |
The document's subject. |
|
FTS_Author |
text |
The document's author. |
|
FTS_Creator |
text |
The document's creator. |
|
FTS_Producer |
text |
The document's producer. |
|
FTS_Keywords |
text |
The document's keywords. |
|
PageMap |
text |
The page map, describing the word instances for
the page. |
META-tagged attributes
By default, HTML collections only have Title
and SourceType attributes, but they can be set up to also permit
searching and sorting by up to 30 file attributes tagged with the HTML
<META> tag.
For example, a document could have these META-tagged
attributes:
<META NAME="Writer" CONTENT="J. S. Smith">
<META NAME="PubDate" CONTENT="07-24-97">
<META NAME="Product" CONTENT="Communicator">
If this document had been indexed with its META tags
extracted, you could search it for specific values in the writer, publication
date, or product fields. For example, you could enter this query: Writer
<contains> Smith or PubDate > 1/1/97.
Note
Any attribute values in META-tagged fields are text strings
only, which means that dates and numbers are sorted as text, not as dates
or numbers. Also, illegal HTML characters in a META-tagged attribute are
replaced with a hyphen.
Performing a search: the basics
Users are primarily concerned with querying the data
in the search collections and getting a list of documents in return. The
default installation of the Enterprise server includes a set of search
query and result pages to allow users a quick and easy way of doing searches.
Creating a search query
There are three default search query pages: standard
and advanced HTML forms and a Java-based guided applet.
On the standard search form, you select a collection
to search against and type in a word or phrase to search for using the
query language operators.
On the advanced HTML form, you have the additional
options of selecting multiple collections to search through, establishing
a sort sequence for the results, and defining how many documents are to
be displayed on a page at a time (clicking the Prev and Next arrows moves
you through the pages of results).
In the guided Java-based search applet, the applet
uses several drop-down lists to guide you through constructing a query.
You must have Java enabled for your browser to use this applet.
The standard search query form
To perform a standard search, follow these steps:
-
Type this URL in the location field in your web browser:
http://yourServer/search
Figure 5-1:The standard search query page
-
In the search query page that appears, choose the collection you want to
search through from the drop-down list in the Search In field.
-
Enter the word or phrase for your search query in the For field. You can
create complex queries by combining operators. See
"Query operators: a reference" for details about the search operators.
-
Click the Search button to execute your query.
The advanced HTML search query page
You can choose to use the advanced HTML search form,
which helps you construct the query. This form is especially useful if
you want to search through more than one collection or that produces results
sorted by a specific attribute value.
To access advanced HTML search through the standard
search query page, follow these steps:
-
Go to the standard search query page by typing this URL in the location
field in your web browser:
http://yourServer/search
-
Disable Java for your browser. To do this, use the Languages option preferences
menu command.
-
Click Guided Search on the standard search form to display the advanced
HTML query page.
Figure 5-2: The advanced HTML search query page
-
In the For field, type in the word or phrase you want to search for. You
can create complex queries by combining operators. See
"Query operators: a reference" for details about the search operators.
-
You can type in one or more attributes to sort the results by. The default
is an ascending sort order, but you can indicate a descending sort order
with a minus, as in -Pubdate. (See
"Sorting the results" for more information about sorting).
-
Depending on how many fields are listed for each document in the search
results page or how many you want to see at a time, you can expand or limit
the number of matching documents you want the search to return at a time.
The Prev and Next buttons allow you access to additional pages of documents
if there are too many to fit on a page at once.
-
Use the drop-down list in the Search In field to choose the collection
you want to search through. You can select more than one collection by
holding down the Control key as you click on another collection. All collections
in a query must be in the same language.
-
Click the Search button to execute your query.
The guided search applet
You can choose to use the Java-based guided search
interface, which helps you construct the query. This is especially useful
if you want to build a query that has several parts, say searching for
a word in the documents' content as well as a specific attribute value.
Note
Make sure Java is enabled for your browser. To do this, use
the Languages option preferences menu command.
To access guided search from the standard search
query page, follow these steps:
-
Obtain the standard search query page by using this URL:
http://yourServer/search
-
Click Guided Search on the standard search page to display the guided Java-based
query page.
Figure 5-3: The guided search query applet
-
Choose the collection you want to search through from the drop-down list
in the Search In field.
-
Use the For drop-down list to select the type of element you wish to search
for. In this example, choose Words.
-
In the blank text field, type in the word you want to search for. See
"Query operators: a reference" for details about the search operators.
-
Click Add Line to add the first part of the query. The word appears in
the large text display box at the bottom of the form.
-
To add to your query, choose another element from the drop-down list. In
this example, choose Attribute.
-
A new drop-down list appears on the right side of the form, listing all
attributes that are available for the chosen collection. Choose the attribute
you want to search against.
-
From the drop-down list above the text input field, choose a query operator
(Contains, Starts, Ends, Matches, Has a substring) or logical operator
(=, <, >, <=, >=) for your query.
-
In the blank text field, type in the attribute value you want to search
for.
-
Click Add Line to add another line for your query. You can click Undo Line
to remove the last line you added or Clear to remove the entire query.
-
Click the Search button to execute the search.
Getting search results
There are two standard types of search results: a
list of all documents that match the search criteria and the text of a
single document that you selected from the list of matching documents.
Access permission checking
Which documents you get for your search results depend
on the access control rules set for each of the documents and collections
involved. The server does an access check when you perform these actions:
-
search on the Web Publishing collection
-
search on any other collection that is defined to the server as checking
for permissions before displaying search results (set by your server administrator
when the collection is created)
-
click the URL for a document listed as part of the search results
-
click the icon displayed for a document in the search results, which displays
the version of the document that highlights the query word or phrase.
If the server encounters an access control rule that
restricts your access to a document that matches your query, the document
is not listed as part of the search results. If you do not have permission
to view a document listed in the search results, the server does not display
it.
Listing matched documents
In the default installation of the Netscape Enterprise
Server, when you execute a search from either the simple or advanced search
query pages, you obtain a list of the documents that match your search
criteria. The list gives some standard information about each file, depending
on the collection's format. For example, the default results page for email
collections give subject, to, from, and date for each entry and news collections
give subject, from, and date for each entry.
Figure 5-4: Sample search results
The kind of file format in the collection indicates
which default attributes are available for searching. See
"About collection attributes" and Table
for information about the attributes for each format.
For entries resulting from a search that checks
for comparative proximity of words to each other or for the exactness of
the match, the file's ranking can be provided by showing a score.
If there are more matching documents than can
fit on a page, click Next to see the next batch. You can always execute
a new search by entering new query data and clicking Search.
Sorting the results
By default, or if you don't enter anything in the
Sort By field on the advanced HTML query page, all documents matching the
search are output according to their relevance ranking (for queries that
consider this) or their position in the server file database (for other
queries).
If you enter an attribute name in the Sort By
field, the documents are displayed in an ascending sort sequence. You can
list the documents in a descending sort sequence by adding a minus sign
(-) prefix to the attribute, as in -keywords or -title.
You can do a multiple sort, by typing in more than one field, as in Author,-PubDate.
In a short query, sort order usually isn't critical,
but in queries that result in a great many matches, you may want to set
a sort value in order to obtain useful search results. Note, however, using
a special sort sequence may impact the search's performance.
Note
Attribute values in META-tagged fields are text strings, which
means that dates and numbers are sorted as text, not as dates or numbers.
To convert the value into a date or number, you can create a new property
in the Web Publishing|Add Custom Property form and check the box that marks
this property as a META-tagged attribute.
Displaying a document
In the default installation of Netscape Enterprise
Server, when you obtain a list of the documents that match your search
criteria, you can select a single document to display in your web browser.
The browser can display the original document or you can choose to display
the document with additional formatting so that your search query word
or phrase is highlighted with such text attributes as color, boldface,
or blinking.
To view the original document, click on the hypertext
link containing the document's URL. In the case of documents that have
been converted into HTML, the URL points you to the original document.
Clicking on this link spawns an external viewer to display the document
in its original format.
To view a highlighted document, click on the graphical
element next to the document's entry in the search results.
Displaying collection contents
You can display the contents of your collection database
to see which attributes are set for each collection. Your server administrator
may have defined some collections as non-displayable, in which case they
are not inclued in the output. The collection contents typically include
these items:
-
collection name, label, and description
-
collection format
-
number of attributes in the collection and a list of their names
-
number of documents in the collection
-
collection size and status
-
language and character set
-
input and output date formats
To display your collection database contents, type
this line in the web browser's URL location field (be sure not to include
any spaces):
http://yourServer/search?NS-search-page=c
Using the query operators
To perform an effective search, you need to know
how to use the query operators. You can only do Boolean searches, so all
the subsequent information is based on Boolean search rules.
Note
The query language is not case-sensitive. The examples use
uppercase for clarity only.
The search engine interprets the search query based
on a set of syntax rules. For example, by entering the word region,
the actual word region and all its stemmed variations (such as regions
and regional) are found. The search results are ranked for "importance,"
which means how close the matched word comes to the originally input search
criteria. In the example above, region would rank higher than any
of the stemmed variants.
Not all queries rank their results. Only those
queries that can have varying degrees of matching can be ranked. For example,
<CONTAINS> queries either do or do not contain the given string, but
<NEAR> queries can be ranked according to how close the words are to
each other: words found closer together are listed at the top of the search
results, while those that are far apart are put at the bottom of the results.
Default assumptions
The search query language has some implicit defaults
and assumptions that dictate how it interprets your input. In some cases,
you can circumvent the defaults, but here is how the search engine decides
what you want as the search results:
<STEM>--Search finds all documents
that contain any stemmed variant of the search word or phrase. The search
engine looks at the meaning of the word, not just its spelling. For example,
if you want to search on plan, the results would include documents
that contain planning and plans, but not those that contain
plane or planet.
<MANY>--Search considers how often
the search word or phrase appear in the found documents and ranks the results
for frequency (or relevancy).
<PHRASE>--Search considers words separated
by spaces to be part of a phrase. For example, Monterey otter is
interpreted as a phrase and both must be present and together to be found.
Such a search would not find documents containing sea otter or Monterey
Bay.
Note
In any case where it's not clear that two words are to be considered
as a phrase, you can use parentheses for clarity. For example,
<PHRASE> (rise "and" fall).
OR--Search considers each word or phrase
in the query separated by a comma to be optional, although at least one
must be present. In effect, this is an implicit OR operation. For example,
Monterey, otter is interpreted as searching for documents that contain
either Monterey or otter. Note that angle brackets are not
required for OR.
Search rules
To create complex searches, you can combine query
operators, manipulate the query syntax, and include wildcard characters.
Angle brackets
With the exception of the AND, OR,
NOT, and the date and numeric comparison operators, you need to
enclose query operators in angle brackets, as in <CONTAINS>
and <WILDCARD>.
Combining operators
You can combine several query operators into a single
query to obtain precise results. For example, you can input the following
query to limit your search to those documents that have Bay and
Monterey but to exclude those that mention Aquarium
Monterey AND Bay NOT <CONTAINS> Aquarium
You can achieve even greater precision by including
some implicit phrases, as in the following query that finds documents that
refer to the Monterey Bay Aquarium by its full name and also mention
otters but do not refer to shark:
Monterey Bay Aquarium AND otter AND NOT shark
Using query operators as search words
You can use any of the query operators as a search
word, but you must enclose the word in quotation marks. For example, you
could search for documents about the ebb and flow of the tides with
the following query:
<CONTAINS> ebb "and" flow
Canceling stemming
You can cancel the implicit stemming by using quotation
marks around a word. For example, you can be exact by using a query such
as this:
"plan"
This search only results in documents that contain
the exact word plan. It ignores documents with plans or planning.
Modifying operators
You can use AND, OR, and NOT
to modify other operators. For example, you may want to exclude documents
with titles that contain the phrase theme park. A query such as
this would solve this problem:
Title NOT <CONTAINS> theme park
Determining which operators to use
Use the following reference to help determine which
operators to use. Note that the query language is not case-sensitive, so
<starts> and <STARTS> are equivalent. This document uses uppercase
for clarity only.
Table 5-2: Deciding which operator to use
Type of Search |
Valid Operators |
Examples |
Finding documents by date or numeric value comparison. |
is equal to (=),
greater than (>),
greater than or equal to (>=),
less than (<),
less than or equal to (<=) |
DATE >= 06-30-96
Finds documents created on or after June 30, 1996. |
Finding words or phrases in specific document
fields or in specific locations in the field. |
<STARTS>,
<CONTAINS>,
<ENDS>,
is equal to (=) |
Title <STARTS> Help
Finds documents with titles that start with Help. |
Finding two or more words in a document. |
AND,
<NEAR/1> |
specifications AND review
Finds documents that contain both specifications
and review. |
Query operators: a reference
The following table describes some commonly used
operators and provides examples of how to use each one. All are relevance
ranked except where explicitly noted.
Note
You can only perform date and number comparison searches on web publishing
attributes that have been defined as date or number fields to the collection.
HTML attributes for dates and numbers that have been tagged as META attributes
are treated as text strings. Your server administrator needs to add custom
properties for these META attributes to convert them to actual dates and
numbers before the comparison operators (<, >, =, <=, and >=) can
work correctly.
Table 5-3: Query language operators
Operator |
Description |
Examples |
AND
|
Adds mandatory criteria to the search. Finds
documents that have all of the specified words. |
Antarctica AND mountain climb
Finds only documents containing both Antarctica
and mountain climb plus all the stemmed variants, such as mountain
climbing. |
<CONTAINS>
|
Finds documents containing the specified words
in a document field. The words must be in the exact same sequential and
contiguous order.
You can use wildcards. Only alphanumeric values.
Does not rank documents for relevance. |
Title <CONTAINS> higher profit
Finds documents containing the phrase higher
profit in the title. Ignores documents with profits higher in
the title. |
<ENDS>
|
Finds documents in which a document field ends
with a certain string of characters.
Does not rank documents for relevance. |
Title <ENDS> draft
Finds documents with titles ending in draft. |
equals (=) |
Finds documents in which a document field matches
a specific date or numeric value. |
Created = 6-30-96
Finds documents created on June 30, 1996. |
greater than (>) |
Finds documents in which a document field is
greater than a specific date or numeric value. |
Created > 6-30-96
Finds documents created after June 30, 1996. |
greater than or equal to (>=) |
Finds documents in which a document field is
greater than or equal to a specific date or numeric value. |
Created >= 6-30-96
Finds documents created on or after June 30, 1996. |
less than (<) |
Finds documents in which a document field is
less than a specific date or numeric value. |
Created < 6-30-96
Finds documents created before June 30, 1996. |
less than or equal to (<=) |
Finds documents in which a document field is
less than or equal to a specific date or numeric value. |
Created <= 6-30-96
Finds documents created on or before June 30,
1996. |
<MATCHES>
|
Finds documents in which a string in a document
field matches the character string you specify.
Ignores documents that contain partial matches.
Does not rank documents for relevance. |
<MATCHES> employee
Finds documents containing employee or
any of its stemmed variants such as employees. |
<NEAR>
|
Finds documents that contain the specified words.
The closer the terms are to each other in the document, the higher the
document's score. |
stock <NEAR> purchase
Finds any document containing both stock
and purchase, but gives a higher score to a document that has stock
purchase than to one that has purchase supplies and stock up. |
<NEAR/N>
|
Finds documents in which two or more specified
words are within N number of words from each other. N can be an integer
up to 1000. Also ranks the documents for relevance based on the words'
proximity to each other. |
stock <NEAR/1> purchase
Finds documents containing the phrases stock
purchase and purchase stock.
Ignores documents containing phrases like purchase
supplies and stock up because stock and purchase do not
appear next to each other.
When N is 2 or greater, finds documents that contain
the words within the range and gives a higher score for documents which
have the words closer together. |
NOT
|
Finds documents that do not contain a specific
word or phrase.
Note: You can use NOT to modify
the OR or the AND operator. |
surf AND NOT beach
Finds documents containing the word surf
but not the word beach. |
OR
|
Adds optional criteria to the search. Finds any
document that contains at least one of the search values. |
apples OR oranges
Finds documents containing either apples
or oranges. |
<PHRASE>
|
Finds documents that contain the specified phrase.
A phrase is a grouping of two or more words that occur in a specific order. |
<PHRASE> (rise "and" fall)
Finds documents that include the entire phrase
rise and fall. The and is in quotes to force the search to
interpret it as a literal, not as an operator. |
<STARTS>
|
Finds documents in which a document field starts
with a certain string of characters.
Does not rank documents for relevance. |
Title <STARTS> Corp
Finds documents with titles starting with Corp,
such as Corporate and Corporation.. |
<STEM>
(English only) |
Finds documents that contain the specified word
and its variants. |
<STEM> plan
Finds documents that contain plan, plans,
planned, planning, and other variants with the same meaning
stem. Ignores similarly spelled words such as planet and plane
that don't come from the same stem. |
<SUBSTRING>
|
Finds documents in which part or all of a string
in a document field matches the character string you specify.
Similar to <MATCHES>, but can match
on a partial string.
Does not work with wildcards.
Does not rank documents for relevance. |
<SUBSTRING> employ
Finds documents that can match on all or part
of employ, so it can succeed with ploy.
Note: This works with literals only. If
you input web*, the asterisk does not work as a wildcard, so the
search succeeds only with the exact "web*" string. |
<WILDCARD>
|
Finds documents that contain the wildcard characters
in the search string. You can use this to get words that have some similar
spellings but which would not be found by stemming the word.
Some characters, such as * and ?, automatically
indicate a wildcard-based search, so you don't have to include the word
<WILDCARD>. |
<WILDCARD> plan*
Finds documents that contain plan, plane,
and planet as well as any word that begins with plan, such
as planned, plans, and planetopolis.
See the next section for more details and examples. |
<WORD>
|
Finds documents that contain the specified word. |
<WORD> theme
Finds documents that contain theme, thematic,
themes, and other words that stem from theme. |
Using wildcards
You can use wildcards to obtain special results.
For example, you can find documents that contain words that have similar
spellings but are not stemmed variants. For example, plan stems
into plans and planning but not plane or planet.
With wildcards, you can find all of these words.
Some characters, such as * and ?, automatically
indicate a wildcard-based search and do not require you to use the <WILDCARD>operator
as part of the expression.
Table 5-4: Wildcard operators
Character |
Description |
* |
Specifies 0 or more alphanumeric characters.
For example, air* finds documents that contain air, airline,
and airhead.
Cannot use this wildcard as the first character
in an expression.
This wildcard is ignored in a set of ([ ]) or
in an alternative pattern ({ }).
With this wildcard, the <WILDCARD>
operator is implicit. |
? |
Specifies a single alphanumeric character, although
you can use more than one ? to indicate multiple characters. For example,
?at finds documents that contain cat and hat, while
??at finds documents that contain that and chat.
This wildcard is ignored in a set of ([ ]) or
in an alternative pattern ({ }).
With this wildcard, the <WILDCARD>
operator is implicit. |
{} |
An alternative pattern that specifies a series
of patterns, one for each pattern separated by commas. For example,
<WILDCARD> `Chat{s, ting, ty}`
finds documents that contain chats, chatting, and chatty.
You must enclose the entire string in back quotes
and you cannot have any embedded spaces. |
[ ] |
A set that specifies a series of characters that can be
used to find a match. For example,
<WILDCARD> `[chp]at`
finds documents that contain cat, hat, and pat.
You must enclose the entire string in back quotes
and you cannot have any embedded spaces. |
^ |
Specifies one or more characters to exclude from
a set. For example, <WILDCARD> `C[^io]t` finds documents that
contain cat and cut, but not cot.
The caret (^) must be the first character after
the left bracket. |
- |
Specifies a range of characters in a set. For
example, <WILDCARD> `Ch[a-j]t` finds documents that contain
any four-letter word from chat to chjt. |
Wildcards as literals
Sometimes you may want to search on characters that
are normally used as wildcards, such as the *or? expression. To
use a wildcard as a literal, you must precede it with a backslash. In the
case of asterisks, you must use two backslashes. For example, to search
on a magazine with a title of Zine***, you would type:
<WILDCARD>Zine***
Several characters have special meaning for the search
engine and require you to use back quotes to be interpreted as literals.
The special search characters are listed here:
-
comma ,
-
left and right parentheses ( )
-
double quotation mark "
-
backslash
-
at sign @
-
left curly brace {
-
left bracket [
-
back quote ` (Note: You can only search on back quotes as literals
if your server administrator has set this up.)
For example, to search for the string "a{b", you
would type
<WILDCARD>`a{b`
For another example, if you wanted to search on the
string "c`t", which contains a back quote, you would type
<WILDCARD>`c``t`