Vanagon EuroVan
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 1995)Back to main VANAGON pageJoin or leave VANAGON (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         95-02-05 14:07:48 EST
Sender:       Vanagon Mailing List <vanagon@vanagon.com>
From:         netnews@DB.Stanford.EDU (Net News Filter)
Subject:      

Stanford Netnews Filtering Service Info

Tak W. Yan (tyan@cs.stanford.edu) Department of Computer Science Stanford University

September 1994

( A ) W H O A R E W E

As part of the Electronic Library project at Stanford. we are providing a filtering service for USENET News (Netnews) articles. A user sends his profiles to the service, and will receive news articles relevant to his interests periodically. Communication to and from the service is via email messages. (A WWW interface for accessing the server is available at http://sift.stanford.edu.) It is an experiment on large-scale information filtering/dissemination. Please feel free to send suggestions, comments, or bug reports to tyan@cs.stanford.edu.

( B ) W H A T I S N E T N E W S

Netnews, or USENET News, is a bulletin board system on the Internet. It is organized into discussion groups (called newsgroups) covering a wide variety of topics, e.g., from robotics to video game tips, from food recipes to politics. Its total readership is in millions and daily traffic in tens of MBs. One problem with Netnews is the volume and diversity of information. Our filtering service allows the user to express her interests in finer granularity (using profiles) than newsgroups, and hopefully can provide a better match of interests.

( C ) A S I M P L E E X A M P L E

First we describe a simple example to show how the service works. We then talk about how to pick a good profile.

( C . 1 ) E x a m p l e

Suppose a user subscribes to the service with these settings via the email interface (a WWW interface is also available at http://sift.stanford.edu):

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= mail netnews@db.stanford.edu Subject: you may leave this blank subscribe online information services period 5 end =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

She will first receive an acknowledgement message from netnews, specifying the email address the subscription is associated with. The user should keep a record of this information, as she may need it in the future to remove herself from the service.

After the subscription is successfully submitted, the user will receive email

messages like this every 5 days:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >From netnews@db.stanford.edu Mon Jan 24 10:40:35 1994

========================================================================= Date: Sender: Vanagon Mailing List <vanagon@vanagon.com> From: Subject: Netnews - online information services

Subscription 1: online information services

Article: misc.activism.progressive.11965 From: hn0003@handsnet.org Subject: HandsNet WEEKLY DIGEST 1/15-21 Score: 100 First 20 lines: HANDSNET WEEKLY DIGEST January 15 - 21, 1994 News from HandsNet's Information Forums HandsNet is a national, nonprofit network connecting organizations working on social and economic justice issues. Members use HandsNet to make new contacts, work collaboratively and to find and publish information, news ....

Article: ca.politics.38420 From: rlm@helen.surfcty.com (Robert L. McMillin) Subject: GOV-ACCESS #5:Cal.Emergency Svcs.online + Net-fax + MINN Pub Info Net Score: 100 First 20 lines: Jan. 22, 1994 CALIFORNIA OFFICE OF EMERGENCY SERVICES INFO AVAILABLE ONLINE <a recent exchange of messages> The state Emergency Digitial Information Service is working fine Telnet to telnet oes1.oes.ca.gov 5501 .... =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

In case an article does not fit in its entirety, the user can request the whole article be sent to her. She does this by sending an email message to netnews@db.stanford.edu with the word get followed by one or more article names in the message body. For example,

mail netnews@db.stanford.edu Subject: you can leave this blank get misc.activism.progressive.11965 ca.politics.38420 end

To remove individual profiles, the user may use the CANCEL command discussed below. To remove all profiles, the user may use the UNSUBSCRIBE command, as follows:

mail netnews@db.stanford.edu Subject: you can leave this blank user joe@cs.oceanview.edu unsubscribe end

Here the user needs to specify what email address the profiles are associated with, using the USER command. This email address is the one from which the user submits her subscribe requests.

( C . 2 ) H o w t o P i c k a G o o d P r o f i l e

As mentioned already, Netnews is a very diverse source of information. To effectively filter the information, the user must be careful in choosing the profile. Say a user is interested in traveling and she submits a profile consisting of just one word "travel." This is too broad a topic and the profile is likely to match many irrelevant articles that just happen to have an occurrence of the word. If she describes her interests in more details, such as travel hawaii, the results will be far better.

The service supports two kinds of profile: boolean and weighted. The default mode is boolean. In this mode, you specify words that must appear in articles

received. For example, the profile "travel hawaii" in the boolean mode matches articles that contain both words "travel" and "hawaii." The number of occurrences of the words is not considered. In the boolean mode, you can also use the "not" operator to screen out articles. For example, say a user is interested in underwater fishing but does not want to receive anything from the alt.* newsgroups. She can send the profile "underwater fishing not alt."

The weighted mode provides some kind of "similarity" matching. You specify a number of words; for each article, a score is computed based on the number of

occurrences of these words in the article. An article with a score higher than a certain threshold is returned.

In the weighted mode, it is important to set the threshold appropriately. A good way to pick the appropriate threshold is to use the test run (for WWW) or search (for email) function. This will perform a search over the previous day's news articles, and by looking at the scores of the articles returned, you can decide what is the minimum score for an article to be relevant. A basic rule of thumb is that if you set a high threshold (say 90), then only articles that contain most of the words in the profile will be returned. And if set a low threshold (say 10), then articles that match any of the words in

the profile will be returned. The default threshold is set to 60.

( D ) U S E R M E S S A G E F O R M A T

User messages should be sent to netnews@db.stanford.edu. The subject field of the message is ignored. Each message is a request to the service. Each request consists of a number of commands. Each command must start with a new line with no leading spaces. Continuation lines begin with a space or a tab. All commands are case-insensitive.

Requests are associated with the return address of the user message. Service replies and deliveries will be sent to that address.

The usages of the commands are as follows.

( D . 1 ) S u b s c r i b i n g

To subscribe for articles, use these commands:

SUBSCRIBE word word ... Subscribe for articles relevant to the profile specified by <word>'s. Two types of profiles are supported: weighted and boolean. For boolean profiles, you first specify words that must all appear in a matching article, and then words that must not be in, separated by the word "not." For example,

subscribe food recipe not fish

You may also skip the "not" portion, e.g.,

subscribe world cup soccer

For weighted profiles, just type in some plain

text that describes your interest, such as:

subscribe nba golden state warriors

News articles will be given score based on the number of occurrences of the profile words and articles whose scores are higher than the threshold (see below) will be returned to you.

The SUBSCRIBE command may be optionally followed by TYPE, LINES, PERIOD, EXPIRE, and THRESHOLD commands.

TYPE type (Optional - default boolean) Specify <type> as the type of the profile. <type> must be either the string "weighted" or "boolean."

LINES lines (Optional - default 20 lines) Specify the number of lines from the beginning of an article to be included in the news articles abstract sent to you. <lines> must be an integer from 1 to 60.

PERIOD period (Optional - default 1 day) Specify <period> as the period between notifications (in days).

EXPIRE days (Optional - default 9999 days) Specify <days> as the length (in days) for which the subscription is valid.

THRESHOLD score (Optional - default 60) Only applicable for weighted profiles. Specify <score> as the minimum score for an article to be relevant. The most relevant article is given a score of 100. <score> must an integer between 1 to 100.

For example, to request a subscription for articles related to "food recipe" but not "fish," that is valid for 200 days and gives 10 lines in the news article excerpts, send this:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= mail netnews@db.stanford.edu Subject: you may leave this blank subscribe food recipe not fish lines 10 expire 200 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

The service will acknowledge your subscription, returning a subscription identifier (sid).

( D . 2 ) G e t t i n g A r t i c l e s

After receiving notifications of articles that may be relevant to your interests, you may decide to see an article in its entirety. You can get the whole article with the GET command:

GET article article ... Get the articles specified (by their article ids).

For example,

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= mail netnews@db.stanford.edu Subject: get news.announce.conferences.3670 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

( D . 3 ) R e l e v a n c e F e e d b a c k

After reading the articles, you may find some that you like. You can provide feedback using these commands:

FEEDBACK sid Provide feedback to subscription <sid>.

LIKE article article ... Specify relevant article(s) by their ids.

For example, this message says that article news.announce.conferences.3670 is relevant to subscription 1:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= mail netnews@db.stanford.edu Subject: you may leave this blank feedback 1 like news.announce.conferences.3670 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

With feedback information, the service may be able to better match future articles against your subscriptions.

( D . 4 ) M a n a g i n g S u b s c r i p t i o n s

You can manage your subscriptions with these commands:

UPDATE sid Update subscription with id <sid>. Must be followed by one or more of PERIOD, EXPIRE, THRESHOLD, LINES, TYPE (see D.1.), or PROFILE commands to specify the parameter(s) to be updated.

PROFILE word word ... Specify the new profile for the UPDATE command.

CANCEL sid Cancel subscription <sid>.

LIST List all your subscriptions.

For example, to update the period and the threshold of a subscription:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= mail netnews@db.stanford.edu Subject: you may leave this blank update 3 period 1 threshold 60 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

( D . 5 ) S e a r c h F o r P a s t A r t i c l e s

Besides providing the subscription service, the service also allows you to search for recent articles that are already in the database:

SEARCH word word ... Do a search of the database with the given query. Maybe optionally followed by a THRESHOLD command to specify the minimum score for an article to be retrieved.

For example, to search for articles related to "information filtering":

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= mail netnews@db.stanford.edu Subject: you may leave this blank search information filtering =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

( D . 6 ) U n s u b s c r i b i n g

To remove all profiles, the user may use the USER and UNSUBSCRIBE command:

USER address Specify the email address with which the profiles are associated.

UNSUBSCRIBE Delete all profiles for the specified email address.

For example,

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= mail netnews@db.stanford.edu Subject: you can leave this blank user joe@cs.oceanview.edu unsubscribe end =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

( D . 6 ) O t h e r C o m m a n d s

END End the request message. Useful for preventing the processing of signatures.

HELP Get help information on server.

( E ) F r e q u e n t l y A s k e d Q u e s t i o n s

(1) Is the service free? Yes, it's absolutely FREE!

(2) How do I access the service? Email access: send message to netnews@db.stanford.edu, with word "help" in message body WWW access: URL http://sift.stanford.edu

(3) Is there any similar service you know of? There is a companion service at elib@cs.stanford.edu for filtering computer science technical reports. A search server is also available at URL http://elib.stanford.edu.

(4) Can I tell my friends about the service? Yes, please do. The server should still be able to handle more subscriptions :-) The current number of subscriptions is 9800+.

(5) How to prevent the processing of "signatures" at the end of email requests? I have added a new command "end." You can include the word "end" on a line by itself at the end of the request message (before the signature). (Also see D.6 above.)

(9) Any papers written on the server? A paper is available by anonymous ftp at ftp://db.stanford.edu/sift/sift.ps Some related papers on information filtering available at ftp://db.stanford.edu/pub/yan

(10)What newsgroups are covered? calstate gnu out.going tx adass can ieee phl u3b chi in.coming rec ucb alt info sci uiuc atl comp junk scruz uk aus control misc soc vmsnet ba ne za bionet ee news talk bit fj no test biz fl nz trial ca general ont triangle

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= For help information, send email NetNews Filtering Server with word 'help' in message body netnews@db.stanford.edu

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Back to: Top of message | Previous page | Main VANAGON page

Please note - During the past 17 years of operation, several gigabytes of Vanagon mail messages have been archived. Searching the entire collection will take up to five minutes to complete. Please be patient!


Return to the archives @ gerry.vanagon.com


The vanagon mailing list archives are copyright (c) 1994-2011, and may not be reproduced without the express written permission of the list administrators. Posting messages to this mailing list grants a license to the mailing list administrators to reproduce the message in a compilation, either printed or electronic. All compilations will be not-for-profit, with any excess proceeds going to the Vanagon mailing list.

Any profits from list compilations go exclusively towards the management and operation of the Vanagon mailing list and vanagon mailing list web site.