fr.gouv.culture.sdx.search.lucene.analysis
Class Analyzer_br

java.lang.Object
  extended byorg.apache.lucene.analysis.Analyzer
      extended byfr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
          extended byfr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer
              extended byfr.gouv.culture.sdx.search.lucene.analysis.Analyzer_br
All Implemented Interfaces:
Analyzer, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, java.io.Serializable, org.apache.excalibur.xml.sax.XMLizable

public class Analyzer_br
extends DefaultAnalyzer

Analyzer for brazilian language. Supports an external list of stopwords (words that will not be indexed at all) and an external list of exclusions (word that will not be stemmed, but indexed).

Version:
$Id: BrazilianAnalyzer.java,v 1.0 2001/02/13 21:29:04
Author:
Jo�o Kramer
See Also:
Serialized Form

Field Summary
protected static java.lang.String ANALYZER_TYPE
           
static java.lang.String[] BRAZILIAN_STOP_WORDS
          List of typical brazilian stopwords.
 
Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer
ATTRIBUTE_EXCLUDE_STEMS, ATTRIBUTE_USE_STOP_WORDS, DEFAULT_STOP_WORDS, EXCLUDE_STEM_ELEMENT, EXCLUDE_STEMS_ELEMENT, excludeTable, stopTable
 
Fields inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
logger
 
Constructor Summary
Analyzer_br(java.io.File stopwords)
          Builds an analyzer with the given stop words.
Analyzer_br(java.util.Hashtable stopwords)
          Builds an analyzer with the given stop words.
Analyzer_br(java.lang.String[] stopwords)
          Builds an analyzer with the given stop words.
 
Method Summary
protected  java.lang.String getAnalyzerType()
           
 void setStemExclusionTable(java.io.File exclusionlist)
          Builds an exclusionlist from the words contained in the given file.
 void setStemExclusionTable(java.util.Hashtable exclusionlist)
          Builds an exclusionlist from a Hashtable.
 void setStemExclusionTable(java.lang.String[] exclusionlist)
          Builds an exclusionlist from an array of Strings.
 org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
          Creates a TokenStream which tokenizes all the text in the provided Reader.
 
Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.DefaultAnalyzer
buildExcludeTable, buildStopTable, configure, getDefaultStopWords
 
Methods inherited from class fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
enableLogging, toSAX
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
tokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface fr.gouv.culture.sdx.search.lucene.analysis.Analyzer
tokenStream
 

Field Detail

ANALYZER_TYPE

protected static final java.lang.String ANALYZER_TYPE
See Also:
Constant Field Values

BRAZILIAN_STOP_WORDS

public static final java.lang.String[] BRAZILIAN_STOP_WORDS
List of typical brazilian stopwords.

Constructor Detail

Analyzer_br

public Analyzer_br(java.lang.String[] stopwords)
Builds an analyzer with the given stop words.


Analyzer_br

public Analyzer_br(java.util.Hashtable stopwords)
Builds an analyzer with the given stop words.


Analyzer_br

public Analyzer_br(java.io.File stopwords)
            throws java.io.IOException
Builds an analyzer with the given stop words.

Method Detail

setStemExclusionTable

public void setStemExclusionTable(java.lang.String[] exclusionlist)
Builds an exclusionlist from an array of Strings.


setStemExclusionTable

public void setStemExclusionTable(java.util.Hashtable exclusionlist)
Builds an exclusionlist from a Hashtable.


setStemExclusionTable

public void setStemExclusionTable(java.io.File exclusionlist)
                           throws java.io.IOException
Builds an exclusionlist from the words contained in the given file.

Throws:
java.io.IOException

tokenStream

public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                                java.io.Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.

Specified by:
tokenStream in interface Analyzer
Overrides:
tokenStream in class DefaultAnalyzer
Returns:
A TokenStream build from a StandardTokenizer filtered with StandardFilter, StopFilter, GermanStemFilter and LowerCaseFilter.

getAnalyzerType

protected java.lang.String getAnalyzerType()
Overrides:
getAnalyzerType in class DefaultAnalyzer


Copyright © 2000-2003 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.