|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectfr.gouv.culture.sdx.utils.AbstractSdxObject
fr.gouv.culture.sdx.utils.database.DatabaseBacked
fr.gouv.culture.sdx.documentbase.AbstractDocumentBase
fr.gouv.culture.sdx.documentbase.SDXDocumentBase
fr.gouv.culture.sdx.documentbase.LuceneDocumentBase
A document base within an SDX application.
A document base is a very important document in SDX development. A document base is where documents are searched and retrieved, thus added (indexed), deleted or updated. A search cannot occur in a smaller unit than the document base. To exclude some parts of a document base, one should use query constructions, possibly filters.
A document base has a structure ; this structure is basically a list of fields. An application may have many document bases, and these document bases may have different structures. As always, indexable documents (XML, HTML or the like) with different structures can be indexed within a single document base.
Most applications will have only one document base, but in some cases it could be interesting to have more than one, like when different kinds of documents are never searched at the same time, in this case it would speed up the searching and indexing process if they are separated in different document bases.
A document base uses an indexer to index documents. It uses repositories to store the documents, either indexable ones or attached ones (i.e. non-indexable documents that are logically dependant of the indexable documents, images or the like). An application can get a searcher to perform searches within this document base, possibly with other document bases.
In order to work properly, a document base must be instantiated given the following sequence : 1) creation, 2) setting the super.getLog() (optional, but suggested for errors messages), 3) configuration, 4) initialization.
AbstractSdxObject.enableLogging(org.apache.avalon.framework.logger.Logger)
,
configure(org.apache.avalon.framework.configuration.Configuration)
,
init()
Nested Class Summary |
Nested classes inherited from class fr.gouv.culture.sdx.documentbase.SDXDocumentBaseTarget |
SDXDocumentBaseTarget.ConfigurationNode |
Nested classes inherited from class fr.gouv.culture.sdx.documentbase.DocumentBase |
DocumentBase.ConfigurationNode |
Field Summary | |
protected FieldList |
_fieldList
The (Lucene) fields that are to be handled by the index. |
protected java.util.HashMap |
_xmlFieldList
The list of fields with a XML type |
static java.lang.String |
DBELEM_ATTRIBUTE_REMOTE_ACCESS
The implied attribute stating whether this document base is to be exposed to remote access or not. |
static java.lang.String |
ELEMENT_NAME_LUCENE_SDX_INTERNAL_FIELDS
The element used to define system fields in sdx.xconf. |
protected java.lang.String |
INDEX_DIR_CURRENT
Directory names for indexes |
protected java.lang.String |
INDEX_DIR_MAIN
|
protected long |
lastDocCount
Number of indexed doc since last split |
protected LuceneIndex |
luceneActiveIndex
The active index for this document base |
protected LuceneIndex |
luceneCurrentIndex
The temporary index for this document base |
protected java.util.Vector |
luceneSearchIndexList
The sub-indexes for this document base (first entry is the activeIndex) |
protected java.lang.String |
SEARCH_INDEX_DIRECTORY_NAME
The directory name for the index that stores documents' indexation. |
protected int |
subIndexCount
Number of subindexes |
Fields inherited from class fr.gouv.culture.sdx.utils.database.DatabaseBacked |
_database, CLASS_NAME_SUFFIX, DATABASE_DIR_NAME, databaseConf, dbLocation, dbPath, DEFAULT_DATABASE_TYPE |
Fields inherited from class fr.gouv.culture.sdx.utils.AbstractSdxObject |
_context, _description, _encoding, _id, _locale, _logger, _manager, _xmlizable_objects, _xmlLang, isToSaxInitialized |
Fields inherited from interface fr.gouv.culture.sdx.documentbase.DocumentBase |
CLASS_NAME_SUFFIX, PACKAGE_QUALNAME |
Fields inherited from interface fr.gouv.culture.sdx.utils.Encodable |
DEFAULT_ENCODING |
Fields inherited from interface fr.gouv.culture.sdx.utils.save.Saveable |
ALL_SAVE_ATTRIB, PATH_ATTRIB, SAVE_DIRECTORY_PARAM |
Constructor Summary | |
LuceneDocumentBase()
Creates the document base. |
Method Summary | |
protected void |
addSubIndex()
Add a splitted sub-index and update configuration aftermath |
protected void |
addSubIndex(LuceneIndex index)
Add a splitted sub-index and update configuration aftermath |
protected void |
addToSearchIndex(java.lang.Object indexationDoc,
boolean batchIndex)
Writes a document to the search index |
void |
backup(SaveParameters save_config)
Save the DocumentBase data objects |
protected void |
backupIndexes(SaveParameters save_config)
Save the indexes files |
protected void |
backupTimeStamp(SaveParameters save_config)
Save the timestamp files |
void |
close()
Close document base |
protected void |
compactSearchIndex()
|
void |
configure(org.apache.avalon.framework.configuration.Configuration configuration)
Sets the configuration options for this document base. |
protected void |
configureDocumentBase(org.apache.avalon.framework.configuration.Configuration configuration)
|
protected void |
configureFieldList(org.apache.avalon.framework.configuration.Configuration configuration)
|
protected void |
configureOAIHarvester(org.apache.avalon.framework.configuration.Configuration configuration)
|
protected void |
configureOAIRepository(org.apache.avalon.framework.configuration.Configuration configuration)
Configure OAIRespository |
protected void |
configureSearchIndex()
|
OAIRepository |
createOAIRepository()
Creates the OAIRepository for the documentbase, using the older configuration |
OAIRepository |
createOAIRepository(org.apache.avalon.framework.configuration.Configuration configuration)
Creates the OAIRepository for the documentbase |
java.util.Date |
creationDate()
|
void |
delete(Document[] docs,
org.xml.sax.ContentHandler handler)
Overriding parent method only to add lucene index optimazation |
protected void |
deleteFromSearchIndex(java.lang.String docId)
|
int |
docCount()
TODO - This needs to be periodically written to a .properties file TODO - we a configurable generic mechanism to save such information to a .properties file like certain queries, terms, etc. which should be updated after indexation/deletion |
protected java.lang.String |
getFormatedSubIndexId(int subIndexNumber)
Get the formated sub-index number (for directories name) |
Index |
getIndex()
Gets the Index object for indexing and searching. |
protected java.lang.Object |
getIndexationDocument(IndexableDocument doc,
java.lang.String storeDocId,
java.lang.String repoId,
IndexParameters params)
|
org.apache.lucene.index.IndexReader |
getIndexReader()
Return the index reader for all this document base indexes |
protected long |
getIndexSize(LuceneIndex index)
Return the index size |
LuceneIndex |
getLuceneIndex()
|
org.apache.lucene.search.Searcher |
getSearcher()
Return the index searcher for all this document base indexes |
java.util.HashMap |
getXMLFieldList()
Returns the list of XML type fields |
void |
index(IndexableDocument[] docs,
Repository repository,
IndexParameters params,
org.xml.sax.ContentHandler handler)
Adds some documents. |
void |
indexModified()
Modifies the last modfication timestamp file |
void |
init()
Initializes the document base. |
protected void |
initializeVectorizedIndex()
Initialize the index vector by searching all sub index in it's directory NB : working as intended |
protected boolean |
initToSax()
Init the LinkedHashMap _xmlizable_objects with the objects in order to describ them in XML |
protected void |
initVolatileObjectsToSax()
Init the LinkedHashMap _xmlizable_volatile_objects with the objects in order to describ them in XML Some objects need to be refresh each time a toSAX is called |
java.util.Date |
lastModificationDate()
|
void |
mergeBatch()
Merges a batch of documents (in memory) into the physical index on the file system. |
void |
mergeCurrentBatch()
Merges a batch of documents (in memory) into the physical index on the file system and optimize this one if necessary (depends of the autoOptimize attribute for the current Document Base) |
void |
optimize()
Process an optimization of the indexes and repositories and system databases |
void |
reloadFieldList(java.lang.String appConfString)
Reload the fieldList of an application |
protected void |
removeSubIndex()
Remove a splitted sub-index and update configuration aftermath Currently of no use as there is no plan to do so, just here as a reminder for future functionnalities |
protected void |
renewKeyIndex()
refresh data for the main and current index |
void |
replaceFieldList(FieldList fieldList)
Replace the current fieldList by the new one |
void |
restore(SaveParameters save_config)
Restore the DocumentBase data objects |
protected void |
restoreIndexes(SaveParameters save_config)
Save the indexes files |
protected void |
restoreTimeStamp(SaveParameters save_config)
Restore the timestamp files |
protected IndexParameters |
setBaseParameters(IndexParameters params)
Set's the default pipeline parameters and ensures the params have a pipeline |
protected void |
setSearchIndexParameters(LuceneIndexParameters params)
Sets the search index parameters for indexation performance |
boolean |
splitCheck(boolean currentIndex)
Return true when splitting condition are reached if true, should be followed by a splitIndex() call |
void |
splitIndex(boolean currentIndex)
Split the current big index into 2 smaller one |
Methods inherited from class fr.gouv.culture.sdx.documentbase.AbstractDocumentBase |
addOaiDeletedRecord, configurePipeline, createEntityForDocMetaData, delete, deletePhysicalDocument, getDefaultHitsPerPage, getDefaultMaxSort, getDefaultRepository, getIdGenerator, getIndexationPipeline, getMimeType, getOAIHarvester, getOAIRepository, getPooledRepositoryConnection, getRepository, getSourceValidity, isDefault, isUseMetadata, optimizeDatabase, optimizeRepositories, releasePooledRepositoryConnections, removeOaiDeletedRecord |
Methods inherited from class fr.gouv.culture.sdx.utils.database.DatabaseBacked |
configure, getClassNameSuffix, getDatabase |
Methods inherited from class fr.gouv.culture.sdx.utils.AbstractSdxObject |
configureDescription, contextualize, enableLogging, getBaseAttributes, getContext, getDescription, getEncoding, getId, getLocale, getLog, getServiceManager, getXmlLang, service, setDescription, setEncoding, setId, setLocale, setUpSdxObject, setUpSdxObject, setXmlLang, toSAX, verifyConfigurationResources |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface fr.gouv.culture.sdx.utils.SdxObject |
getLog |
Methods inherited from interface org.apache.avalon.framework.logger.LogEnabled |
enableLogging |
Methods inherited from interface org.apache.avalon.framework.context.Contextualizable |
contextualize |
Methods inherited from interface org.apache.avalon.framework.service.Serviceable |
service |
Methods inherited from interface fr.gouv.culture.sdx.utils.Identifiable |
getId, setId |
Methods inherited from interface fr.gouv.culture.sdx.utils.Describable |
getDescription, setDescription |
Methods inherited from interface fr.gouv.culture.sdx.utils.Encodable |
getEncoding, setEncoding |
Methods inherited from interface fr.gouv.culture.sdx.utils.Localizable |
getLocale, getXmlLang, setLocale, setXmlLang |
Methods inherited from interface org.apache.excalibur.xml.sax.XMLizable |
toSAX |
Methods inherited from interface fr.gouv.culture.sdx.search.Searchable |
getId |
Field Detail |
protected java.util.Vector luceneSearchIndexList
protected LuceneIndex luceneActiveIndex
protected LuceneIndex luceneCurrentIndex
protected FieldList _fieldList
protected java.util.HashMap _xmlFieldList
protected int subIndexCount
protected long lastDocCount
protected final java.lang.String INDEX_DIR_CURRENT
protected final java.lang.String INDEX_DIR_MAIN
protected final java.lang.String SEARCH_INDEX_DIRECTORY_NAME
public static final java.lang.String DBELEM_ATTRIBUTE_REMOTE_ACCESS
public static final java.lang.String ELEMENT_NAME_LUCENE_SDX_INTERNAL_FIELDS
Constructor Detail |
public LuceneDocumentBase()
AbstractSdxObject.enableLogging(org.apache.avalon.framework.logger.Logger)
,
configure(org.apache.avalon.framework.configuration.Configuration)
,
init()
Method Detail |
public void configure(org.apache.avalon.framework.configuration.Configuration configuration) throws org.apache.avalon.framework.configuration.ConfigurationException
configure
in interface org.apache.avalon.framework.configuration.Configurable
configure
in class SDXDocumentBase
configuration
- The configuration object from which to build a document base.
Sample configuration entry:
<sdx:documentBase sdx:id = "myDocumentBaseName" sdx:type = "lucene"> <sdx:fieldList xml:lang = "fr-FR" sdx:variant = "" sdx:analyzerConf = "" sdx:analyzerClass = ""> <sdx:field code = "fieldName" type = "word" xml:lang = "fr-FR" sdx:analyzerClass = "" sdx:analyzerConf = ""/> <sdx:field code = "fieldName2" type = "field" xml:lang = "fr-FR" brief = "true"/> <sdx:field code = "fieldName3" type = "date" xml:lang = "fr-FR"/> <sdx:field code = "fieldName4" type = "unindexed" xml:lang = "fr-FR"/> </sdx:fieldList> <sdx:index> <sdx:pipeline sdx:id = "sdxIndexationPipeline"> <sdx:transformation src = "path to stylesheet, can be absolute or relative to the directory containing this file" sdx:id = "step2" sdx:type = "xslt"/> <sdx:transformation src = "path to stylesheet, can be absolute or relative to the directory containing this file" sdx:id = "step3" sdx:type = "xslt" keep = "true"/> </sdx:pipeline> </sdx:index> <sdx:repositories> <sdx:repository baseDirectory = "blah4" depth = "3" extent = "100" sdx:type = "FS" sdx:default = "true" sdx:id = "blah4"/> <sdx:repository ref = "blah2"/> </sdx:repositories> </sdx:documentBase>
org.apache.avalon.framework.configuration.ConfigurationException
we should link to this in the future when we have better documentation capabilities
protected void configureDocumentBase(org.apache.avalon.framework.configuration.Configuration configuration) throws org.apache.avalon.framework.configuration.ConfigurationException
configureDocumentBase
in class SDXDocumentBase
org.apache.avalon.framework.configuration.ConfigurationException
protected void configureFieldList(org.apache.avalon.framework.configuration.Configuration configuration) throws org.apache.avalon.framework.configuration.ConfigurationException
org.apache.avalon.framework.configuration.ConfigurationException
public void reloadFieldList(java.lang.String appConfString) throws SDXException
appConfString
- The path of the configuration file wich contain the new fieldList (eg, file:///myFiles/application.xconf, cocoon://myApplication/conf/application.xconf)
SDXException
public void replaceFieldList(FieldList fieldList) throws org.apache.avalon.framework.configuration.ConfigurationException
fieldList
- The new fieldList wich replace the old one
org.apache.avalon.framework.configuration.ConfigurationException
protected void configureSearchIndex() throws org.apache.avalon.framework.configuration.ConfigurationException
org.apache.avalon.framework.configuration.ConfigurationException
public OAIRepository createOAIRepository()
createOAIRepository
in interface DocumentBase
createOAIRepository
in class AbstractDocumentBase
public OAIRepository createOAIRepository(org.apache.avalon.framework.configuration.Configuration configuration)
configuration
- The configuration
protected void configureOAIRepository(org.apache.avalon.framework.configuration.Configuration configuration) throws org.apache.avalon.framework.configuration.ConfigurationException
configureOAIRepository
in class SDXDocumentBase
configuration
- The configuration
org.apache.avalon.framework.configuration.ConfigurationException
SDXDocumentBase.configureOAIRepository(org.apache.avalon.framework.configuration.Configuration)
protected void configureOAIHarvester(org.apache.avalon.framework.configuration.Configuration configuration) throws org.apache.avalon.framework.configuration.ConfigurationException
configureOAIHarvester
in class SDXDocumentBase
org.apache.avalon.framework.configuration.ConfigurationException
public void index(IndexableDocument[] docs, Repository repository, IndexParameters params, org.xml.sax.ContentHandler handler) throws SDXException, org.xml.sax.SAXException, org.apache.cocoon.ProcessingException
SDXDocumentBase
index
in interface DocumentBase
index
in class SDXDocumentBase
docs
- The documents to add.repository
- The repository where to store the documents. If null is passed, the default repository will be used.params
- The parameters for this adding action.handler
- A content handler where to send information about the process (may be null)
TODO : what kind of "informations" ? -pb
SDXException
org.xml.sax.SAXException
org.apache.cocoon.ProcessingException
public void delete(Document[] docs, org.xml.sax.ContentHandler handler) throws SDXException, org.xml.sax.SAXException, org.apache.cocoon.ProcessingException
delete
in interface DocumentBase
delete
in class SDXDocumentBase
docs
- The documents to delete.handler
- A content handler to feed with information.
SDXException
org.xml.sax.SAXException
org.apache.cocoon.ProcessingException
protected IndexParameters setBaseParameters(IndexParameters params)
setBaseParameters
in class SDXDocumentBase
params
- The params object provided by the user at indexation timepublic java.util.HashMap getXMLFieldList()
SDXDocumentBase
getXMLFieldList
in class SDXDocumentBase
public Index getIndex()
public LuceneIndex getLuceneIndex()
protected void setSearchIndexParameters(LuceneIndexParameters params)
params
- The lucene specific params to userprotected void addToSearchIndex(java.lang.Object indexationDoc, boolean batchIndex) throws SDXException
addToSearchIndex
in class SDXDocumentBase
indexationDoc
- The Document to addbatchIndex
-
SDXException
protected void deleteFromSearchIndex(java.lang.String docId) throws SDXException
deleteFromSearchIndex
in class SDXDocumentBase
SDXException
protected void compactSearchIndex() throws SDXException
compactSearchIndex
in class SDXDocumentBase
SDXException
protected java.lang.Object getIndexationDocument(IndexableDocument doc, java.lang.String storeDocId, java.lang.String repoId, IndexParameters params) throws SDXException
getIndexationDocument
in class SDXDocumentBase
SDXException
public java.util.Date lastModificationDate()
public java.util.Date creationDate()
public void init() throws SDXException
DocumentBase
This method must be called after the super.getLog() has been set and the configuration done.
init
in interface DocumentBase
init
in class SDXDocumentBase
SDXException
protected boolean initToSax()
AbstractSdxObject
initToSax
in class SDXDocumentBase
protected void initVolatileObjectsToSax()
initVolatileObjectsToSax
in class SDXDocumentBase
public void optimize()
optimize
in interface DocumentBase
optimize
in class SDXDocumentBase
public void mergeBatch() throws SDXException
SDXDocumentBase
mergeBatch
in class SDXDocumentBase
SDXException
public void mergeCurrentBatch()
mergeCurrentBatch
in class SDXDocumentBase
public void indexModified()
indexModified
in class SDXDocumentBase
public void splitIndex(boolean currentIndex) throws java.io.IOException, SDXException
splitIndex
in class SDXDocumentBase
java.io.IOException
SDXException
protected void initializeVectorizedIndex() throws org.apache.avalon.framework.configuration.ConfigurationException
org.apache.avalon.framework.configuration.ConfigurationException
protected void addSubIndex()
protected void removeSubIndex()
public boolean splitCheck(boolean currentIndex) throws SDXException
splitCheck
in class SDXDocumentBase
SDXException
protected long getIndexSize(LuceneIndex index)
public org.apache.lucene.search.Searcher getSearcher() throws SDXException
SDXException
public org.apache.lucene.index.IndexReader getIndexReader() throws SDXException
SDXException
protected java.lang.String getFormatedSubIndexId(int subIndexNumber)
protected void addSubIndex(LuceneIndex index)
protected void renewKeyIndex()
public void backup(SaveParameters save_config) throws SDXException
backup
in interface Saveable
backup
in class SDXDocumentBase
SDXException
Saveable.backup(fr.gouv.culture.sdx.utils.save.SaveParameters)
protected void backupIndexes(SaveParameters save_config) throws SDXException
backupIndexes
in class SDXDocumentBase
SDXException
protected void backupTimeStamp(SaveParameters save_config) throws SDXException
backupTimeStamp
in class SDXDocumentBase
SDXException
public void restore(SaveParameters save_config) throws SDXException
restore
in interface Saveable
restore
in class SDXDocumentBase
SDXException
Saveable.restore(fr.gouv.culture.sdx.utils.save.SaveParameters)
protected void restoreIndexes(SaveParameters save_config) throws SDXException
restoreIndexes
in class SDXDocumentBase
SDXException
protected void restoreTimeStamp(SaveParameters save_config) throws SDXException
restoreTimeStamp
in class SDXDocumentBase
SDXException
public int docCount()
public void close()
DocumentBase
close
in interface DocumentBase
close
in class SDXDocumentBase
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |