Date: 95-02-05 14:07:48 EST
Sender: Vanagon Mailing List <vanagon@vanagon.com>
From: netnews@DB.Stanford.EDU (Net News Filter)
Subject:
Stanford Netnews Filtering Service Info
Tak W. Yan (tyan@cs.stanford.edu)
Department of Computer Science
Stanford University
September 1994
( A ) W H O A R E W E
As part of the Electronic Library project at Stanford. we are providing a
filtering service for USENET News (Netnews) articles. A user sends his
profiles to the service, and will receive news articles relevant to his
interests periodically. Communication to and from the service is via email
messages. (A WWW interface for accessing the server is available at
http://sift.stanford.edu.) It is an experiment on large-scale information
filtering/dissemination. Please feel free to send suggestions, comments, or
bug reports to tyan@cs.stanford.edu.
( B ) W H A T I S N E T N E W S
Netnews, or USENET News, is a bulletin board system on the Internet. It is
organized into discussion groups (called newsgroups) covering a wide variety
of
topics, e.g., from robotics to video game tips, from food recipes to
politics.
Its total readership is in millions and daily traffic in tens of MBs. One
problem with Netnews is the volume and diversity of information. Our
filtering
service allows the user to express her interests in finer granularity (using
profiles) than newsgroups, and hopefully can provide a better match of
interests.
( C ) A S I M P L E E X A M P L E
First we describe a simple example to show how the service works. We then
talk
about how to pick a good profile.
( C . 1 ) E x a m p l e
Suppose a user subscribes to the service with these settings via the email
interface (a WWW interface is also available at http://sift.stanford.edu):
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mail netnews@db.stanford.edu
Subject: you may leave this blank
subscribe online information services
period 5
end
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
She will first receive an acknowledgement message from netnews, specifying
the email address the subscription is associated with. The user should keep
a record of this information, as she may need it in the future to remove
herself from the service.
After the subscription is successfully submitted, the user will receive email
messages like this every 5 days:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>From netnews@db.stanford.edu Mon Jan 24 10:40:35 1994
=========================================================================
Date:
Sender: Vanagon Mailing List <vanagon@vanagon.com>
From:
Subject: Netnews - online information services
Subscription 1: online information services
Article: misc.activism.progressive.11965
From: hn0003@handsnet.org
Subject: HandsNet WEEKLY DIGEST 1/15-21
Score: 100
First 20 lines:
HANDSNET WEEKLY DIGEST January 15 - 21, 1994
News from HandsNet's Information Forums
HandsNet is a national, nonprofit network connecting organizations working
on social and economic justice issues. Members use HandsNet to make new
contacts, work collaboratively and to find and publish information, news
....
Article: ca.politics.38420
From: rlm@helen.surfcty.com (Robert L. McMillin)
Subject: GOV-ACCESS #5:Cal.Emergency Svcs.online + Net-fax + MINN Pub Info
Net
Score: 100
First 20 lines:
Jan. 22, 1994
CALIFORNIA OFFICE OF EMERGENCY SERVICES INFO AVAILABLE ONLINE
<a recent exchange of messages>
The state Emergency Digitial Information Service is working fine
Telnet to telnet oes1.oes.ca.gov 5501
....
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
In case an article does not fit in its entirety, the user can request the
whole article be sent to her. She does this by sending an email message to
netnews@db.stanford.edu with the word get followed by one or more article
names in the message body. For example,
mail netnews@db.stanford.edu
Subject: you can leave this blank
get misc.activism.progressive.11965 ca.politics.38420
end
To remove individual profiles, the user may use the CANCEL command discussed
below. To remove all profiles, the user may use the UNSUBSCRIBE command, as
follows:
mail netnews@db.stanford.edu
Subject: you can leave this blank
user joe@cs.oceanview.edu
unsubscribe
end
Here the user needs to specify what email address the profiles are associated
with, using the USER command. This email address is the one from which the
user submits her subscribe requests.
( C . 2 ) H o w t o P i c k a G o o d P r o f i l e
As mentioned already, Netnews is a very diverse source of information. To
effectively filter the information, the user must be careful in choosing the
profile. Say a user is interested in traveling and she submits a profile
consisting of just one word "travel." This is too broad a topic and the
profile is likely to match many irrelevant articles that just happen to have
an occurrence of the word. If she describes her interests in more details,
such as travel hawaii, the results will be far better.
The service supports two kinds of profile: boolean and weighted. The default
mode is boolean. In this mode, you specify words that must appear in articles
received. For example, the profile "travel hawaii" in the boolean mode
matches
articles that contain both words "travel" and "hawaii." The number of
occurrences of the words is not considered. In the boolean mode, you can also
use the "not" operator to screen out articles. For example, say a user is
interested in underwater fishing but does not want to receive anything from
the alt.* newsgroups. She can send the profile "underwater fishing not alt."
The weighted mode provides some kind of "similarity" matching. You specify a
number of words; for each article, a score is computed based on the number of
occurrences of these words in the article. An article with a score higher
than
a certain threshold is returned.
In the weighted mode, it is important to set the threshold appropriately. A
good way to pick the appropriate threshold is to use the test run (for WWW)
or
search (for email) function. This will perform a search over the previous
day's news articles, and by looking at the scores of the articles returned,
you can decide what is the minimum score for an article to be relevant. A
basic rule of thumb is that if you set a high threshold (say 90), then only
articles that contain most of the words in the profile will be returned. And
if set a low threshold (say 10), then articles that match any of the words in
the profile will be returned. The default threshold is set to 60.
( D ) U S E R M E S S A G E F O R M A T
User messages should be sent to netnews@db.stanford.edu. The subject field of
the message is ignored. Each message is a request to the service. Each
request consists of a number of commands. Each command must start with a new
line with no leading spaces. Continuation lines begin with a space or a tab.
All commands are case-insensitive.
Requests are associated with the return address of the user message. Service
replies and deliveries will be sent to that address.
The usages of the commands are as follows.
( D . 1 ) S u b s c r i b i n g
To subscribe for articles, use these commands:
SUBSCRIBE word word ... Subscribe for articles relevant to the profile
specified by <word>'s. Two types of profiles
are supported: weighted and boolean. For
boolean profiles, you first specify words that
must all appear in a matching article, and
then
words that must not be in, separated by the
word
"not." For example,
subscribe food recipe not fish
You may also skip the "not" portion, e.g.,
subscribe world cup soccer
For weighted profiles, just type in some plain
text that describes your interest, such as:
subscribe nba golden state warriors
News articles will be given score based on the
number of occurrences of the profile words and
articles whose scores are higher than the
threshold (see below) will be returned to you.
The SUBSCRIBE command may be optionally
followed by TYPE, LINES, PERIOD, EXPIRE, and
THRESHOLD commands.
TYPE type (Optional - default boolean) Specify <type>
as the type of the profile. <type> must be
either the string "weighted" or "boolean."
LINES lines (Optional - default 20 lines) Specify the
number of lines from the beginning of an
article to be included in the news articles
abstract sent to you. <lines> must be an
integer from 1 to 60.
PERIOD period (Optional - default 1 day) Specify <period>
as the period between notifications (in days).
EXPIRE days (Optional - default 9999 days) Specify
<days> as the length (in days) for which
the subscription is valid.
THRESHOLD score (Optional - default 60) Only applicable for
weighted profiles. Specify <score> as the
minimum score for an article to be relevant.
The most relevant article is given a score of
100. <score> must an integer between 1 to 100.
For example, to request a subscription for articles related to "food recipe"
but not "fish," that is valid for 200 days and gives 10 lines in the news
article excerpts, send this:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mail netnews@db.stanford.edu
Subject: you may leave this blank
subscribe food recipe not fish
lines 10
expire 200
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
The service will acknowledge your subscription, returning a subscription
identifier (sid).
( D . 2 ) G e t t i n g A r t i c l e s
After receiving notifications of articles that may be relevant to your
interests, you may decide to see an article in its entirety. You can get the
whole article with the GET command:
GET article article ... Get the articles specified (by their article
ids).
For example,
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mail netnews@db.stanford.edu
Subject:
get news.announce.conferences.3670
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
( D . 3 ) R e l e v a n c e F e e d b a c k
After reading the articles, you may find some that you like. You can provide
feedback using these commands:
FEEDBACK sid Provide feedback to subscription <sid>.
LIKE article article ... Specify relevant article(s) by their ids.
For example, this message says that article news.announce.conferences.3670 is
relevant to subscription 1:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mail netnews@db.stanford.edu
Subject: you may leave this blank
feedback 1
like news.announce.conferences.3670
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
With feedback information, the service may be able to better match future
articles against your subscriptions.
( D . 4 ) M a n a g i n g S u b s c r i p t i o n s
You can manage your subscriptions with these commands:
UPDATE sid Update subscription with id <sid>. Must be
followed by one or more of PERIOD, EXPIRE,
THRESHOLD, LINES, TYPE (see D.1.), or PROFILE
commands to specify the parameter(s) to be
updated.
PROFILE word word ... Specify the new profile for the UPDATE
command.
CANCEL sid Cancel subscription <sid>.
LIST List all your subscriptions.
For example, to update the period and the threshold of a subscription:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mail netnews@db.stanford.edu
Subject: you may leave this blank
update 3
period 1
threshold 60
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
( D . 5 ) S e a r c h F o r P a s t A r t i c l e s
Besides providing the subscription service, the service also allows you to
search for recent articles that are already in the database:
SEARCH word word ... Do a search of the database with the given
query. Maybe optionally followed by a
THRESHOLD
command to specify the minimum score for an
article to be retrieved.
For example, to search for articles related to "information filtering":
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mail netnews@db.stanford.edu
Subject: you may leave this blank
search information filtering
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
( D . 6 ) U n s u b s c r i b i n g
To remove all profiles, the user may use the USER and UNSUBSCRIBE command:
USER address Specify the email address with which the
profiles are associated.
UNSUBSCRIBE Delete all profiles for the specified email
address.
For example,
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
mail netnews@db.stanford.edu
Subject: you can leave this blank
user joe@cs.oceanview.edu
unsubscribe
end
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
( D . 6 ) O t h e r C o m m a n d s
END End the request message. Useful for
preventing the processing of signatures.
HELP Get help information on server.
( E ) F r e q u e n t l y A s k e d Q u e s t i o n s
(1) Is the service free?
Yes, it's absolutely FREE!
(2) How do I access the service?
Email access: send message to netnews@db.stanford.edu, with word "help"
in message body
WWW access: URL http://sift.stanford.edu
(3) Is there any similar service you know of?
There is a companion service at elib@cs.stanford.edu for filtering
computer science technical reports. A search server is also available
at URL http://elib.stanford.edu.
(4) Can I tell my friends about the service?
Yes, please do. The server should still be able to handle more
subscriptions :-) The current number of subscriptions is 9800+.
(5) How to prevent the processing of "signatures" at the end of
email requests?
I have added a new command "end." You can include the word "end"
on a line by itself at the end of the request message (before
the signature). (Also see D.6 above.)
(9) Any papers written on the server?
A paper is available by anonymous ftp at
ftp://db.stanford.edu/sift/sift.ps
Some related papers on information filtering available at
ftp://db.stanford.edu/pub/yan
(10)What newsgroups are covered?
calstate gnu out.going tx
adass can ieee phl u3b
chi in.coming rec ucb
alt info sci uiuc
atl comp junk scruz uk
aus control misc soc vmsnet
ba ne za
bionet ee news talk
bit fj no test
biz fl nz trial
ca general ont triangle
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
For help information, send email NetNews Filtering
Server
with word 'help' in message body
netnews@db.stanford.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=