Re: edismax available in solr 3.1?

2011-05-07 Thread Ahmet Arslan
> is edixmax available in solr 3.1?  I don't see any
> documentation about it.
> 
> if it is, does it support the prefix and fuzzy query?

Yes and yes. See snippet taken from changes.txt

New Features
--

* SOLR-1553: New dismax parser implementation (accessible as "edismax")
  that supports full lucene syntax, improved reserved char escaping,
  fielded queries, improved proximity boosting, and improved stopword
  handling. Note: status is experimental for now. (yonik)


Field Cache

2011-05-07 Thread samarth s
Hi,

I have read lucene field cache is used in faceting and sorting. Is it also
populated/used when only selected fields are retrieved using the 'fl' OR
'included fields in collapse' parameters? Is it also used for collapsing?

-- 
Regards,
Samarth


Whole unfiltered content in response document field

2011-05-07 Thread solrfan
Hi, I have a question to the content of the document fields. My configuration
is ok so far, I index a database with DIH and have configured a index
analyser as folow:








... 

 
 
   
 

On the analysis view, my filters work poperly. On the end of the filter
chain I have only interest tokens. But when I search with Solr, I become as
a response the whole content of the indexed databse field. The field
contains stopwords, whitespaces, upercases and so on. I search for
stopwords, and I can find them. I would expect, I find in the response
document only the filtered content in the field and not the original raw
content that I would to index. 

Is this a normal behaviour? Do I understand Solr right? 

Many thanks! 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whole-unfiltered-content-in-response-document-field-tp2911588p2911588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: uima fieldMappings and solr dynamicField

2011-05-07 Thread Koji Sekiguchi
I've opened https://issues.apache.org/jira/browse/SOLR-2503 .

Koji
-- 
http://www.rondhuit.com/en/

(11/05/06 20:15), Koji Sekiguchi wrote:
> Hello,
> 
> I'd like to use dynamicField in feature-field mapping of uima update
> processor. It doesn't seem to be acceptable currently. Is it a bad idea
> in terms of use of uima? If it is not so bad, I'd like to try a patch.
> 
> Background:
> 
> Because my uima annotator can generate many types of named entity from
> a text, I don't want to implement so many types, but one type "NamedEntity":
> 
> 
>
>  
>com.rondhuit.uima.next.NamedEntity
>
>uima.tcas.Annotation
>
>  
>name
>
>uima.cas.String
>  
>  
>entity
>
>uima.cas.String
>  
>
>  
>
> 
> 
> sample extracted named entities:
> 
> name="PERSON", entity="Barack Obama"
> name="TITLE", entity="the President"
> 
> Now, I'd like to map these named entities to Solr fields like this:
> 
> PERSON_S:"Barack Obama"
> TITLE_S:"the President"
> 
> Because the type of name (PERSON, TITLE, etc.) can be so many,
> I'd like to use dynamicField *_s. And where * is replaced by the name
> feature of NamedEntity.
> 
> I think this is natural requirement from Solr view point, but I'm
> not sure my uima annotator implementation is correct or not. In other
> words, should I implement many types for each entity types?
> (e.g. PersonEntity, TitleEntity, ... instead of NamedEntity)
> 
> Thank you!
> 
> Koji




Re: Whole unfiltered content in response document field

2011-05-07 Thread Ahmet Arslan

> 
>          class="solr.WhitespaceTokenizerFactory"/>
>          class="solr.StopFilterFactory" 
>                
> ignoreCase="true" 
>                
> words="stopwords.txt" 
>                
> enablePositionIncrements="true" 
>                
> />
>          class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
> catenateWords="1"
> catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1"/>
>          class="solr.LowerCaseFilterFactory"/>
> 
> 
> ... 
> 
>  
>     indexed="true" stored="true" required="true"
> />  
>     indexed="true" stored="true"/>
>  
> 
> On the analysis view, my filters work poperly. On the end
> of the filter
> chain I have only interest tokens. But when I search with
> Solr, I become as
> a response the whole content of the indexed databse field.
> The field
> contains stopwords, whitespaces, upercases and so on. I
> search for
> stopwords, and I can find them. I would expect, I find in
> the response
> document only the filtered content in the field and not the
> original raw
> content that I would to index. 
> 
> Is this a normal behaviour? Do I understand Solr right? 

On the response, solr shows raw content. So you want to see analyzed/indexed 
content of a document in the response?

Searching and finding stop-words is not normal. May be you need to move 
StopFilter to under the WordDelimeter. Some punctuations may cause this.


Re: Replication Clarification Please

2011-05-07 Thread Bill Bell
I did not see answers... I am not an authority, but will tell you what I
think

Did you get some answers?


On 5/6/11 2:52 PM, "Ravi Solr"  wrote:

>Hello,
>Pardon me if this has been already answered somewhere and I
>apologize for a lengthy post. I was wondering if anybody could help me
>understand Replication internals a bit more. We have a single
>master-slave setup (solr 1.4.1) with the configurations as shown
>below. Our environment is quite commit heavy (almost 100s of docs
>every 5 minutes), and all indexing is done on Master and all searches
>go to the Slave. We are seeing that the slave replication performance
>gradually decreases and the speed decreases < 1kbps and ultimately
>gets backed up. Once we reload the core on slave it will be work fine
>for sometime and then it again gets backed up. We have mergeFactor set
>to 10 and ramBufferSizeMB is set to 32MB and solr itself is running
>with 2GB memory and locktype is simple on both master and slave.

How big is your index? How many rows and GB ?

Every time you replicate, there are several resets on caching. So if you
are constantly
Indexing, you need to be careful on how that performance impact will apply.

>
>I am hoping that the following questions might help me understand the
>replication performance issue better (Replication Configuration is
>given at the end of the email)
>
>1. Does the Slave get the whole index every time during replication or
>just the delta since the last replication happened ?


It depends. If you do an OPTIMIZE every time your index, then you will be
sending the whole index down.
If the amount of time if > 10 segments, I believe that might also trigger
a whole index, since you cycled all the segments.
In that case I think you might want to increase the mergeFactor.


>
>2. If there are huge number of queries being done on slave will it
>affect the replication ? How can I improve the performance ? (see the
>replications details at he bottom of the page)

It seems that might be one way the you get the index.* directories. At
least I see it more frequently when there is huge load and you are trying
to replicate.
You could replicate less frequently.

>
>3. Will the segment names be same be same on master and slave after
>replication ? I see that they are different. Is this correct ? If it
>is correct how does the slave know what to fetch the next time i.e.
>the delta.

Yes they better be. In the old days you could just rsync the data
directory from master and slave and reload the core, that worked fine.

>
>4. When and why does the index. folder get created ? I see
>this type of folder getting created only on slave and the slave
>instance is pointing to it.

I would love to know all the conditions... I believe it is supposed to
replicate to index.*, then reload to point to it. But sometimes it gets
stuck in index.* land and never goes back to straight index.

There are several bug fixes for this in 3.1.

>
>5. Does replication process copy both the index and index.
>folder ?

I believe it is supposed to copy the segment or whole index/ from master
to index.* on slave.

>
>6. what happens if the replication kicks off even before the previous
>invocation has not completed ? will the 2nd invocation block or will
>it go through causing more confusion ?

That is not supposed to happen, if a replication is in process, it should
not copy again until that one is complete.
Try it, just delete the data/*, restart SOLR, and force a replication,
while it is syncing, force it again. Does not seem to work for me.
>
>7. If I have to prep a new master-slave combination is it OK to copy
>the respective contents into the new master-slave and start solr ? or
>do I have have to wipe the new slave and let it replicate from its new
>master ?

If you shut down the slave, copy the data/* directory amd restart you
should be fine. That is how we fix the data/ dir when
there is corruption.
>
>8. Doing an 'ls | wc -l' on index folder of master and slave gave 194
>and 17968 respectively...I slave has lot of segments_xxx files. Is
>this normal ?

Several bugs fixed in 3.1 for this one. Not a good thing You are
getting leftover segments or index.* directories.
>
>MASTER
>
>
>
>startup
>commit
>optimize
>
>schema.xml,stopwords.txt
>00:00:10
>
>
>
>
>SLAVE
>
>
>
>master core url
>00:03:00
>internal
>5000
>1
> 
>
>
>
>REPLICATION DETAILS FROM PAGE
>
>Master master core url
>Poll Interval 00:03:00
>Local Index Index Version: 1296217104577, Generation: 20190
>Location: /data/solr/core/search-data/index.20110429042508
>Size: 2.1 GB
>Times Replicated Since Startup: 672
>Previous Replication Done At: Fri May 06 15:41:01 EDT 2011
>Config Files Replicated At: null
>Config Files Replicated: null
>Times Config Files Replicated Since Startup: null
>Next Replication Cycle At: Fri May 06 15:44:00 EDT 2011
>C