fr.gouv.culture.sdx.search.lucene.analysis
Class Analyzer_ar
java.lang.Object
org.apache.lucene.analysis.Analyzer
fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
fr.gouv.culture.sdx.search.lucene.analysis.Analyzer_ar
- All Implemented Interfaces:
- Analyzer, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, java.io.Serializable, org.apache.excalibur.xml.sax.XMLizable
- public final class Analyzer_ar
- extends AbstractAnalyzer
Analyzer for the arabic language. This analyzer uses Tim Buckwalter's algorithm
(avalaible at LDC
Catalog) to identify the morphological category of arabic tokens.
The relevant categories are still to be determined but the current list gives
good results.
Final tokens are a romanized canonical version of the word.
- Author:
- Pierrick Brihaye, 2003
- See Also:
- Serialized Form
Method Summary |
void |
configure(org.apache.avalon.framework.configuration.Configuration configuration)
Configure the glosser. |
void |
enableLogging(org.apache.avalon.framework.logger.Logger logger)
Transmits a super.getLog() to the class. |
protected java.lang.String |
getAnalyzerType()
|
org.apache.lucene.analysis.TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a token stream of romanized arabic words whose morphological categories are found to be semantically relevant. |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
tokenStream |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface fr.gouv.culture.sdx.search.lucene.analysis.Analyzer |
tokenStream |
ANALYZER_TYPE
protected static final java.lang.String ANALYZER_TYPE
- See Also:
- Constant Field Values
Analyzer_ar
public Analyzer_ar()
configure
public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
throws org.apache.avalon.framework.configuration.ConfigurationException
- Configure the glosser.
- Specified by:
configure
in interface org.apache.avalon.framework.configuration.Configurable
- Overrides:
configure
in class AbstractAnalyzer
- Parameters:
configuration
- The configuration object
- Throws:
org.apache.avalon.framework.configuration.ConfigurationException
- If a problem occurs during configuration
enableLogging
public void enableLogging(org.apache.avalon.framework.logger.Logger logger)
- Transmits a super.getLog() to the class.
- Specified by:
enableLogging
in interface org.apache.avalon.framework.logger.LogEnabled
- Overrides:
enableLogging
in class AbstractAnalyzer
- Parameters:
logger
- The super.getLog()
tokenStream
public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
java.io.Reader reader)
- Returns a token stream of romanized arabic words whose morphological categories are found to be semantically relevant.
- Parameters:
reader
- The readerfieldName
- The field
- Returns:
- The token stream
getAnalyzerType
protected java.lang.String getAnalyzerType()
- Specified by:
getAnalyzerType
in class AbstractAnalyzer
Copyright © 2000-2003 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.