IndexWriter.cpp文件IndexWriter::close函数中,如果指定目录在退出时不关闭,则不能删除目录对象。

修改:
if ( closeDir ){
directory->close();
}
_CLDECDELETE(directory);
为
if ( closeDir ){
directory->close();
_CLDECDELETE(directory);
}
IndexWriter.cpp文件IndexWriter ::optimize函数中,如果 segmentInfo->size() == 0,则会导致异常。

修改:
flushRamSegments();
为
flushRamSegments();
if(segmentInfos->size() == 0) return;

修改IndexWriter::addIndexes(Directory** dirs)函数,因为SegmentInfos对象在析构时会删除所有对象,因此在函数退出时会导致无效的指针。

修改
// start with zero or 1 seg so optimize the current
optimize();

//Iterate through the directories
int32_t i = 0;
while ( dirs[i] != NULL ) {
// DSR: Changed SegmentInfos constructor arg (see bug discussion below).
SegmentInfos sis(false);
sis.read( dirs[i]);

for (int32_t j = 0; j < sis.size(); j++) {
/* DSR:CL_BUG:
** In CLucene 0.8.11, the next call placed a pointer to a SegmentInfo
** object from stack variable $sis into the vector this->segmentInfos.
** Then, when the call to optimize() is made just before exiting this
** function, $sis had already been deallocated (and has deleted its
** member objects), leaving dangling pointers in this->segmentInfos.
** I added a SegmentInfos constructor that allowed me to order it not
** to delete its members, invoked the new constructor form above for
** $sis, and the problem was solved. */
segmentInfos->add(sis.info(j)); // add each info
}
i++;
}
为
//Iterate through the directories
int32_t i = 0;
SegmentInfo *si;
while ( dirs[i] != NULL ) {
// DSR: Changed SegmentInfos constructor arg (see bug discussion below).
SegmentInfos sis(false);
sis.read( dirs[i]);

for (int32_t j = 0; j < sis.size(); j++) {
/* DSR:CL_BUG:
** In CLucene 0.8.11, the next call placed a pointer to a SegmentInfo
** object from stack variable $sis into the vector this->segmentInfos.
** Then, when the call to optimize() is made just before exiting this
** function, $sis had already been deallocated (and has deleted its
** member objects), leaving dangling pointers in this->segmentInfos.
** I added a SegmentInfos constructor that allowed me to order it not
** to delete its members, invoked the new constructor form above for
** $sis, and the problem was solved. */
si = sis.info(j);
segmentInfos->add( new SegmentInfo(si->name, si->docCount, si->getDir())); // add each info
}
i++;
}
修改IndexWriter.cpp文件IndexWriter::IndexWriter函数(两个),会导致内存泄漏

修改
segmentInfos (_CLNEW SegmentInfos),
为
segmentInfos (_CLNEW SegmentInfos(true)),


修改IndexWriter.cpp文件IndexWriter::addIndexes(IndexReader** readers),提高效率
修改后的函数如下:
void IndexWriter::addIndexes(IndexReader** readers){
SCOPED_LOCK_MUTEX(addIndexesReaders_LOCK);

char* mergedName = newSegmentName();
SegmentMerger* merger = _CLNEW SegmentMerger(directory, mergedName, false);

for(int i = 0; i < segmentInfos->size(); i ++) // add existing index, if any
merger->add(_CLNEW SegmentReader(segmentInfos->info(i)));

int32_t readersLength = 0;
while ( readers[readersLength] != NULL )
merger->add((SegmentReader*) readers[readersLength++]);

int32_t docCount = merger->merge(); // merge 'em

// pop old infos & add new
segmentInfos->clearto(0);
segmentInfos->add(_CLNEW SegmentInfo(mergedName, docCount, directory));

LuceneLock* lock = directory->makeLock("commit.lock");
IndexWriterLockWith with ( lock,LUCENE_COMMIT_LOCK_TIMEOUT,this,true);

LOCK_MUTEX(directory->DIR_OBJ); // in- & inter-process sync
with.run();
UNLOCK_MUTEX(directory->DIR_OBJ);

_CLDELETE(lock);
}

修改SegmentReader.cpp文件SegmentReader::files()函数,会导致删除未用的文件是不能正确删除.cfs文件。

修改
_ADD_SEGMENT(".cfs " );
为
_ADD_SEGMENT(".cfs" ); //原来的.cfs后面多了一个空格

修改util/StringIntern.h和util/StringIntern.cpp,多个线程操作类CLStringIntern时,会导致内存指针出错。方法一:对StringIntern操作时,加上互斥锁。
修改util/StringIntern.h文件
//internalise an ucs2 string and return an iterator for fast un-iteration
static __wcsintrntype::iterator internitr(const TCHAR* str CL_FILELINEPARAM);
在这行之前加上
STATIC_DEFINE_MUTEX(m_mutex);

修改util/StringIntern.cpp文件
__wcsintrntype CLStringIntern::stringPool(true);
__strintrntype CLStringIntern::stringaPool(true);
在这之后加上
DEFINE_MUTEX(CLStringIntern::m_mutex);

修改util/StringIntern.cpp文件,在所有成员函数的实现第一行之前加上:
SCOPED_LOCK_MUTEX(m_mutex);

方法二:在程序开始之前,先将所有需要用到的fieldname加入到StringIntern中,在程序结束之前将最开始加入的StringIntern删除即可。 这样在运行过程中就只会对存放StringIntern的列表进行读操作了。这种方法比第一种方法好,运行效率会提高些,不过每个程序都要写一次。

方法三:不使用stringpool,每次都直接分配内存,释放时直接把内存释放。 该方法会提高并发性能,但是会占用更多的内存,索引文档时如果放在内存的文档数比较大时,所多占用的内存数还是挺可观的。 同时,频繁的分配和释放内存也会造成内存碎片问题。 http://spaces.msn.com/chenjm/blog/