CLUCENE-0.9.10 BUG及修改方法列表

IndexWriter.cpp文件IndexWriter::close函数中,如果指定目录在退出时不关闭,则不能删除目录对象。

修改:
if ( closeDir ){
directory->close();
}
_CLDECDELETE(directory);if ( closeDir ){
directory->close();
_CLDECDELETE(directory);
}
IndexWriter.cpp文件IndexWriter ::optimize函数中,如果 segmentInfo->size() == 0,则会导致异常。
 
修改:
flushRamSegments();
为
flushRamSegments();
if(segmentInfos->size() == 0) return;
 
修改IndexWriter::addIndexes(Directory** dirs)函数,因为SegmentInfos对象在析构时会删除所有对象,因此在函数退出时会导致无效的指针。
 
修改
// start with zero or 1 seg so optimize the current
optimize();
 
//Iterate through the directories
int32_t i = 0;
while ( dirs[i] != NULL ) {
// DSR: Changed SegmentInfos constructor arg (see bug discussion below).
SegmentInfos sis(false);
sis.read( dirs[i]);
 
for (int32_t j = 0; j < sis.size(); j++) {
/* DSR:CL_BUG:
** In CLucene 0.8.11, the next call placed a pointer to a SegmentInfo
** object from stack variable $sis into the vector this->segmentInfos.
** Then, when the call to optimize() is made just before exiting this
** function, $sis had already been deallocated (and has deleted its
** member objects), leaving dangling pointers in this->segmentInfos.
** I added a SegmentInfos constructor that allowed me to order it not
** to delete its members, invoked the new constructor form above for
** $sis, and the problem was solved. */
segmentInfos->add(sis.info(j)); // add each info
}
i++;
}//Iterate through the directories
int32_t i = 0;
SegmentInfo *si;
while ( dirs[i] != NULL ) {
// DSR: Changed SegmentInfos constructor arg (see bug discussion below).
SegmentInfos sis(false);
sis.read( dirs[i]);
 
for (int32_t j = 0; j < sis.size(); j++) {
/* DSR:CL_BUG:
** In CLucene 0.8.11, the next call placed a pointer to a SegmentInfo
** object from stack variable $sis into the vector this->segmentInfos.
** Then, when the call to optimize() is made just before exiting this
** function, $sis had already been deallocated (and has deleted its
** member objects), leaving dangling pointers in this->segmentInfos.
** I added a SegmentInfos constructor that allowed me to order it not
** to delete its members, invoked the new constructor form above for
** $sis, and the problem was solved. */
si = sis.info(j);
segmentInfos->add( new SegmentInfo(si->name, si->docCount, si->getDir())); // add each info
}
i++;
}
修改IndexWriter.cpp文件IndexWriter::IndexWriter函数(两个),会导致内存泄漏
 
修改
segmentInfos (_CLNEW SegmentInfos),
为
segmentInfos (_CLNEW SegmentInfos(true)),
 
 
修改IndexWriter.cpp文件IndexWriter::addIndexes(IndexReader** readers),提高效率
修改后的函数如下:
void IndexWriter::addIndexes(IndexReader** readers){
SCOPED_LOCK_MUTEX(addIndexesReaders_LOCK);
 
char* mergedName = newSegmentName();
SegmentMerger* merger = _CLNEW SegmentMerger(directory, mergedName, false);
 
for(int i = 0; i < segmentInfos->size(); i ++) // add existing index, if any
merger->add(_CLNEW SegmentReader(segmentInfos->info(i)));
 
int32_t readersLength = 0;
while ( readers[readersLength] != NULL )
merger->add((SegmentReader*) readers[readersLength++]);
 
int32_t docCount = merger->merge(); // merge 'em
 
// pop old infos & add new
segmentInfos->clearto(0);
segmentInfos->add(_CLNEW SegmentInfo(mergedName, docCount, directory));
 
LuceneLock* lock = directory->makeLock("commit.lock");
IndexWriterLockWith with ( lock,LUCENE_COMMIT_LOCK_TIMEOUT,this,true);
 
LOCK_MUTEX(directory->DIR_OBJ); // in- & inter-process sync
with.run();
UNLOCK_MUTEX(directory->DIR_OBJ);
 
_CLDELETE(lock);
}
 
修改SegmentReader.cpp文件SegmentReader::files()函数,会导致删除未用的文件是不能正确删除.cfs文件。
 
修改
_ADD_SEGMENT(".cfs " );
为
_ADD_SEGMENT(".cfs" ); //原来的.cfs后面多了一个空格
 
修改util/StringIntern.h和util/StringIntern.cpp,多个线程操作类CLStringIntern时,会导致内存指针出错。方法一:对StringIntern操作时,加上互斥锁。
修改util/StringIntern.h文件
//internalise an ucs2 string and return an iterator for fast un-iteration
static __wcsintrntype::iterator internitr(const TCHAR* str CL_FILELINEPARAM);
在这行之前加上
STATIC_DEFINE_MUTEX(m_mutex);
 
修改util/StringIntern.cpp文件
__wcsintrntype CLStringIntern::stringPool(true);
__strintrntype CLStringIntern::stringaPool(true);
在这之后加上
DEFINE_MUTEX(CLStringIntern::m_mutex);
 
修改util/StringIntern.cpp文件,在所有成员函数的实现第一行之前加上:
SCOPED_LOCK_MUTEX(m_mutex);

方法二:在程序开始之前,先将所有需要用到的fieldname加入到StringIntern中,在程序结束之前将最开始加入的StringIntern删除即可。
这样在运行过程中就只会对存放StringIntern的列表进行读操作了。这种方法比第一种方法好,运行效率会提高些,不过每个程序都要写一次。

方法三:不使用stringpool,每次都直接分配内存,释放时直接把内存释放。
该方法会提高并发性能,但是会占用更多的内存,索引文档时如果放在内存的文档数比较大时,所多占用的内存数还是挺可观的。
同时,频繁的分配和释放内存也会造成内存碎片问题。
http://spaces.msn.com/chenjm/blog/

发表评论

电子邮件地址不会被公开。 必填项已用*标注