fr.gouv.culture.sdx.search.lucene.analysis
Class Glosser_ar_en
java.lang.Object
org.apache.lucene.analysis.Analyzer
fr.gouv.culture.sdx.search.lucene.analysis.AbstractAnalyzer
fr.gouv.culture.sdx.search.lucene.analysis.Glosser_ar_en
- All Implemented Interfaces:
- Analyzer, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, java.io.Serializable, org.apache.excalibur.xml.sax.XMLizable
- public final class Glosser_ar_en
- extends AbstractAnalyzer
An english glosser for the arabic language. This glosser uses Tim Buckwalter's algorithm
(available at LDC
Catalog) to identify the morphological category of arabic tokens and then return their glosses.
The meaningful morphological categories are still to be determined but the current list gives
good results.
- Author:
- Pierrick Brihaye, 2003
- See Also:
- Serialized Form
Field Summary |
protected static java.lang.String |
ANALYZER_TYPE
|
static java.lang.String[] |
STOP_WORDS
An array containing some common english words that are usually not
useful for searching. |
Method Summary |
void |
configure(org.apache.avalon.framework.configuration.Configuration configuration)
Configure the glosser. |
void |
enableLogging(org.apache.avalon.framework.logger.Logger logger)
Transmits a super.getLog() to the class. |
protected java.lang.String |
getAnalyzerType()
|
org.apache.lucene.analysis.TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a token stream of glosses of arabic words whose morphological categories are found to be semantically meaningful. |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
tokenStream |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface fr.gouv.culture.sdx.search.lucene.analysis.Analyzer |
tokenStream |
ANALYZER_TYPE
protected static final java.lang.String ANALYZER_TYPE
- See Also:
- Constant Field Values
STOP_WORDS
public static final java.lang.String[] STOP_WORDS
- An array containing some common english words that are usually not
useful for searching.
Glosser_ar_en
public Glosser_ar_en()
getAnalyzerType
protected java.lang.String getAnalyzerType()
- Specified by:
getAnalyzerType
in class AbstractAnalyzer
configure
public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
throws org.apache.avalon.framework.configuration.ConfigurationException
- Configure the glosser.
- Specified by:
configure
in interface org.apache.avalon.framework.configuration.Configurable
- Overrides:
configure
in class AbstractAnalyzer
- Parameters:
configuration
- The configuration object
- Throws:
org.apache.avalon.framework.configuration.ConfigurationException
- If a problem occurs during configuration
enableLogging
public void enableLogging(org.apache.avalon.framework.logger.Logger logger)
- Transmits a super.getLog() to the class.
- Specified by:
enableLogging
in interface org.apache.avalon.framework.logger.LogEnabled
- Overrides:
enableLogging
in class AbstractAnalyzer
- Parameters:
logger
- The super.getLog()
tokenStream
public org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
java.io.Reader reader)
- Returns a token stream of glosses of arabic words whose morphological categories are found to be semantically meaningful.
- Parameters:
reader
- The reader
- Returns:
- The token stream
Copyright © 2000-2003 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.