Filtering in tin 0. Status This is an overview of the new filtering capabilities of tin. This document will be absorbed in the main documentation at some point. 1. Introduction Tin's filtering mechanism has changed significantly since version 1.3beta. Originally there were only two possibilities: 1) kill an article matching a rule. 2) hot-select an article matching a rule. This led to constant confusion, as it seemed important which rule came first in the filter file, but it wasn't. Then if an article was selected for whatever reason it couldn't be killed even if it was Craig Shergold telling you how to make money fast in a crosspost to alt.test. This binary concept isn't modern anyway, so a much more up-to-date fuzzy mechanism was necessary: scoring. When using tin's new scoring mechanism you assign a "score" to each filter rule. The scores of rules matching the current article are added and the final score of the article decides if it is regular, marked hot or killed. The standard "kill" and standard "select" already in your filter-file have the score SCORE_KILL and SCORE_SELECT respectively (See section 4). 2. Changes to the filter-file format Tin understands the additional "score" command in the filter-file now. Old style rule: scope=* type=0 case=0 subj=*$$$* New style rule: group=* type=0 case=0 score=-100 subj=*$$$* ##### So you can give the individual rule a weight, based on your opinion about the rule. e.g. if you want to be sure to never read a certain individual again, you may give the rule a score of (-)9000. The sign is not important by now, the direction (+ or -) is still determined by the "type" of a rule. So the following group=* type=0 case=0 score=100 subj=*$$$* ##### will be equivalent with the rule above for now. When tin writes it's filter-file the signs are rewritten to match the types. This is intended for backwards *and* future compatibility. A future version of tin will delete the "type"-line and write only the score. If you want only "classical" filtering and don't want to mess around with score values, you can use the magic words "kill" and "hot" as score values in your filter file. Example: group=* type=0 case=0 score=kill subj=*$$$* ##### These are handled as default values at program initialization time and may be somewhat easier to remember. You might have noticed by the examples above that tin inserts a line of hashes between two rules now. This is *not* required, it just improves readability. 3. Changes in the filter menu The on screen filter menu is now more compact and fits easily on small terminals such as a small xterm or a 640x200 CON: window now. It has been enhanced to allow you to enter a score for the rule you are adding. It should be in the range from 1 to SCORE_MAX (See section 4) otherwise it will default to SCORE_DEFAULT (See section 4). 4. Internal defaults There are some constants defined in tin.h, search for SCORE in your favorite editor. SCORE_MAX is the maximum score an article can reach. Any value above this is cut to SCORE_MAX, the same goes for negative scores. recommended: 10000 SCORE_DEFAULT is the default score, also used for quick kills and selects. recommended: 100 SCORE_KILL is the default score given for any kill rule, if no other is specified. recommended: -SCORE_DEFAULT SCORE_SELECT is the default score for any auto-selection rule, if no other is specified. recommended: SCORE_DEFAULT SCORE_LIM_KILL and SCORE_LIM_SEL are the limits that must be crossed to mark an article as killed or selected. WARNING: These will only be used as default in future releases, the limits will be configurable at runtime. recommended when used with values given above: -50/+50. 5. Overview of "filter"-commands Everything here is also described in the file ~/.tin/filter, albeit more concisely. All lines are of the form: command=value Valid "command"s are: scope selection: scope=pattern group=newsgroup was the "old" format. It is still understood, when tin reads the filter-file. The subdivision in local and global filter was removed, so when tin writes it, it is rewritten to: group=newsgroup_pattern_list newsgroup_pattern_list is a comma-separated list of newsgroup_patterns newsgroup_patterns can be a pattern (wildmat-style) or !pattern, negating the match of pattern. This is the same format used for the AUTO(UN)SUBSCRIBE environment variable. Tin doesn't rework your filter file, the new pattern matching is only used when you enter new entries by hand. additional info: type=num num: 0=kill, 1=autoselect case=num num: 0=casesensitive, 1=caseinsensitive score=num num: score value of rule, can now also be one of the magic words "kill" or "hot", which are equivalent to SCORE_KILL and SCORE_SELECT respectively. time=num num: time_t value; when rule expires. matches: matched to: subj=pattern Subject: from=pattern From: msgid=pattern Message-Id: *AND* full References: msgid_last=pattern Message-Id: and last last Reference:s entry only msgid_only=pattern Message-Id: refs_only=pattern References: line (e.g. <123@ether.net>) without Message-Id: lines=num Lines: ; num matches more than. xref=pattern Xref: ; filter crossposts to groups matching pattern xref_max=num } Xref: ; potentially obsolete, collects xref_score=num,pattern } scores for crossposting in other } newsgroups When you are using wildmat pattern-matching, patterns in ~/.tin/filter should be delimited with "*", verbatim wildcards in patterns must be escaped with "\". When using the built-in filter-file functions, tin tries to take care of it for itself, except when you are entering text in the builtin kill/hot-menu. Then you have to quote manually because tin doesn't know if e.g. "\[" is already quoted or not. 6. EXAMPLES 6.1 WILDMAT EXAMPLES none given, too simple, find out yourself ,-) 6.2 REGEXP EXAMPLES Be sure to change Wildcard setting from WILDMAT (default) to REGEX to make the following examples to work properly. This can be done using the internal configuration menu or in file ~/.tin/tinrc ### this kills all articles about CNews, DNEWS or diablo ### in news.software.* but not in news.software.readers group=news.software.*,!news.software.readers type=0 case=1 score=kill subj=([cd]news|diablo) ### this should mark all articles about tin, rtin, tind, ktin or cdtin as hot group=* type=1 case=1 score=hot subj=\b((cd)|[rk]?)?tin(d|pre)?[-\.0-9]*\b ### mark own articles and followups to own articles as hot in all groups ### except local ones ### match From: (a bit complex) and/or ### Message-ID: (I'm the only user who's posting on this server) group=*,!akk.*,!tin.* type=1 case=1 score=hot from=urs@(.*\.)?((akk\.uni-karlsruhe|dana)\.de|(karlsruhe|tin|akk)\.org|ka\.nu|argh\.net) msgid=@akk3\.akk\.uni-karlsruhe\.de> ### stupid ppl. sometimes read control.cancel to see if there are any ### forged cancels around... the next rule helps you a bit ### ignore know despammers and net.* cancels group=control.cancel type=0 case=1 score=kill from=(news@news\.msfc\.nasa\.gov|clewis@ferret\.ocunix\.on\.ca|jem@xpat\.com|(jeremy|lysander)@exit109\.com|howardk@iswest\.com|cosmo.roadkill.*rauug\.mil\.wi\.us|spamless@pacbell\.net|cwilkins@.*\.clark\.net) msgid_only= only f'ups to own articles keep marked hot group=* type=0 case=1 score=-200 msgid_only=doeblitz\.ts\.rz\.tu-bs\.de 7. TODO - make the time value in the filter file more human readable. - convert SCORE_LIM_* to global variables - rewrite filtering order to get optimal performance - filtering on arbitrary header lines - convert .tin/filter, make "type" obsolete, maybe completely new file format - throw out "xref*" when it is obsolete for sure - maybe autoconf'able values?