Reg: Multicore vs singlecore.

2011-10-03 Thread karan . jindal1988
hi,
I am new to solr, so just want to clarify on few points
I ran the test on the machine with following specification:-Ram : 5GBdual core 
: 2.66 GHzIndex Size : 10GB
1) Is the hard disk read time is a bottle neck on solr performance? I ran few 
tests using "jmeter" and analysis the system resources using "dstat". what I 
found is that cpu is 50% idle, 40% waiting and 10% actual user processing all 
the time. why cpu is in constant waiting state?
2) Atmost how many cores will give reasonable performance (under 2 second ) on 
the index of  10GB?
Hoping for a quick respone..ThanksKaran

Re: Reg: Multicore vs singlecore.

2011-10-03 Thread Gora Mohanty
On Mon, Oct 3, 2011 at 12:12 PM,   wrote:
> hi,
> I am new to solr, so just want to clarify on few points
> I ran the test on the machine with following specification:-Ram : 5GBdual 
> core : 2.66 GHzIndex Size : 10GB

* How much of the RAM is given to Solr, and how much is left
  for the OS disk cache.
* Where are your indexing data coming from? If it is a database,
   is that running on the same machine? Is network speed a factor?

> 1) Is the hard disk read time is a bottle neck on solr performance? I ran few 
> tests using "jmeter" and analysis the system resources using "dstat". what I 
> found is that cpu is 50% idle, 40% waiting and 10% actual user processing all 
> the time. why cpu is in constant waiting state?

Disk read speed can be a factor, especially if there is not enough
RAM left for the OS disk cache.

> 2) Atmost how many cores will give reasonable performance (under 2 second ) 
> on the index of  10GB?

This depends on a lot of factors, including RAM, your data sources,
etc., but with plenty of RAM, databases on separate machines, and
assuming that the index can be sharded (merging after indexing is
also possible), we were seeing the indexing speed increase linearly
with the number of cores, up to at least 10 cores.

Regards,
Gora


Re: multiple dateranges/timeslots per doc: modeling openinghours.

2011-10-03 Thread Geert-Jan Brits
Interesting! Reading your previous blogposts, I gather that the to be posted
'implementation approaches' includes a way of making the SpanQueries
available within SOLR?
Also, would with your approach would (numeric) RangeQueries be possible as
Hoss suggests?

Looking forward to that 'implementation post'
Cheers,
Geert-Jan

Op 1 oktober 2011 19:57 schreef Mikhail Khludnev  het volgende:

> I agree about SpanQueries. It's a viable measure against "false-positive
> matches on multivalue fields".
>  we've implemented this approach some time ago. Pls find details at
>
> http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
>
> and
>
> http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html
> we are going to publish the third post about an implementation approaches.
>
> --
> Mikhail Khludnev
>
>
> On Sat, Oct 1, 2011 at 6:25 AM, Chris Hostetter  >wrote:
>
> >
> > : Another, faulty, option would be to model opening/closing hours in 2
> > : multivalued date-fields, i.e: open, close. and insert open/close for
> each
> > : day, e.g:
> > :
> > : open: 2011-11-08:1800 - close: 2011-11-09:0300
> > : open: 2011-11-09:1700 - close: 2011-11-10:0500
> > : open: 2011-11-10:1700 - close: 2011-11-11:0300
> > :
> > : And queries would be of the form:
> > :
> > : 'open < now && close > now+3h'
> > :
> > : But since there is no way to indicate that 'open' and 'close' are
> > pairwise
> > : related I will get a lot of false positives, e.g the above document
> would
> > be
> > : returned for:
> >
> > This isn't possible out of the box, but the general idea of "position
> > linked" queries is possible using the same approach as the
> > FieldMaskingSpanQuery...
> >
> >
> >
> https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
> > https://issues.apache.org/jira/browse/LUCENE-1494
> >
> > ..implementing something like this that would work with
> > (Numeric)RangeQueries however would require some additional work, but it
> > should certianly be doable -- i've suggested this before but no one has
> > taken me up on it...
> > http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery
> >
> > If we take it as a given that you can do multiple ranges "at the same
> > position", then you can imagine supporting all of your "regular" hours
> > using just two fields ("open" and "close") by encoding the day+time of
> > each range of open hours into them -- even if a store is open for
> multiple
> > sets of ranges per day (ie: closed for siesta)...
> >
> >  open: mon_12_30, tue_12_30, wed_07_30, wed_3_30, ...
> >  close: mon_20_00, tue_20_30, wed_12_30, wed_22_30, ...
> >
> > then asking for "stores open now and for the next 3 hours" on "wed" at
> > "2:13PM" becomes a query for...
> >
> > sameposition(open:[* TO wed_14_13], close:[wed_17_13 TO *])
> >
> > For the special case part of your problem when there are certain dates
> > that a store will be open atypical hours, i *think* that could be solved
> > using some special docs and the new "join" QParser in a filter query...
> >
> >https://wiki.apache.org/solr/Join
> >
> > imagine you have your "regular" docs with all the normal data about a
> > store, and the open/close fields i describe above.  but in addition to
> > those, for any store that you know is "closed on dec 25" or "only open
> > 12:00-15:00 on Jan 01" you add an additional small doc encapsulating
> > the information about the stores closures on that special date - so that
> > each special case would be it's own doc, even if one store had 5 days
> > where there was a special case...
> >
> >  specialdoc1:
> >store_id: 42
> >special_date: Dec-25
> >status: closed
> >  specialdoc2:
> >store_id: 42
> >special_date: Jan-01
> >status: irregular
> >open: 09_30
> >close: 13_00
> >
> > then when you are executing your query, you use an "fq" to constrain to
> > stores that are (normally) open right now (like i mentioned above) and
> you
> > use another fq to find all docs *except* those resulting from a join
> > against these special case docs based on the current date.
> >
> > so if you r query is "open now and for the next 3 hours" and "now" ==
> > "sunday, 2011-12-25 @ 10:17AM your query would be something like...
> >
> > q=...user input...
> > time=sameposition(open:[* TO sun_10_17], close:[sun_13_17 TO *])
> > fq={!v=time}
> > fq={!join from=store_id to=unique_key v=$vv}
> > vv=-(+special_date:Dec-25 +(status:closed OR _query_:"{v=$time}"))
> >
> > That join based approach for dealing with the special dates should work
> > regardless of wether someone implements a way to do pair wise
> > "sameposition()" rangequeries ... so if you can live w/o the multiple
> > open/close pairs per day, you can just use the "one field per day of hte
> > week" type approach you mentioned combined with the "join" for special
> > case days of hte year and everything you need should already work w/o any
> > code (on trunk).
> >

Re: multiple dateranges/timeslots per doc: modeling openinghours.

2011-10-03 Thread Geert-Jan Brits
Thanks Hoss for that in-depth walkthrough.

I like your solution of using (something akin to)
FieldMaskingSpanQuery.
Conceptually
the Join-approach looks like it would work from paper, although I'm not a
big fan of introducing a lot of complexity to the frontend / querying part
of the solution.

As an alternative, what about using your fieldMaskingSpanQuery-approach
solely (without the JOIN-approach)  and encode open/close on a per day
basis?
I didn't mention it, but I 'only' need 100 days of data, which would lead to
100 open and 100 close values, not counting the pois with multiple
openinghours per day which are pretty rare.
The index is rebuild each night, refreshing the date-data.

I'm not sure what the performance implications would be like, but somehow
that feels doable. Perhaps it even offsets the extra time needed for doing
the Joins, only 1 way to find out I guess.
Disadvantage would be fewer cache-hits when using FQ.

Data then becomes:

open: 20111020_12_30, 20111021_12_30, 20111022_07_30, ...
close: 20111020_20_00, 20111021_26_30, 20111022_12_30, ...

Notice the: 20111021_26_30, which indicates close at 2AM the next day,
which would work (in contrast to encoding it like 20111022_02_30)

Alternatively, how would you compare your suggested approach with the
approach by David Smiley using either SOLR-2155 (Geohash prefix query
filter) or LSP:
https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13115244#comment-13115244.
That would work right now, and the LSP-approach seems pretty elegant to me.
FQ-style caching is probably not possible though.

Geert-Jan

Op 1 oktober 2011 04:25 schreef Chris Hostetter
het volgende:

>
> : Another, faulty, option would be to model opening/closing hours in 2
> : multivalued date-fields, i.e: open, close. and insert open/close for each
> : day, e.g:
> :
> : open: 2011-11-08:1800 - close: 2011-11-09:0300
> : open: 2011-11-09:1700 - close: 2011-11-10:0500
> : open: 2011-11-10:1700 - close: 2011-11-11:0300
> :
> : And queries would be of the form:
> :
> : 'open < now && close > now+3h'
> :
> : But since there is no way to indicate that 'open' and 'close' are
> pairwise
> : related I will get a lot of false positives, e.g the above document would
> be
> : returned for:
>
> This isn't possible out of the box, but the general idea of "position
> linked" queries is possible using the same approach as the
> FieldMaskingSpanQuery...
>
>
> https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
> https://issues.apache.org/jira/browse/LUCENE-1494
>
> ..implementing something like this that would work with
> (Numeric)RangeQueries however would require some additional work, but it
> should certianly be doable -- i've suggested this before but no one has
> taken me up on it...
> http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery
>
> If we take it as a given that you can do multiple ranges "at the same
> position", then you can imagine supporting all of your "regular" hours
> using just two fields ("open" and "close") by encoding the day+time of
> each range of open hours into them -- even if a store is open for multiple
> sets of ranges per day (ie: closed for siesta)...
>
>  open: mon_12_30, tue_12_30, wed_07_30, wed_3_30, ...
>  close: mon_20_00, tue_20_30, wed_12_30, wed_22_30, ...
>
> then asking for "stores open now and for the next 3 hours" on "wed" at
> "2:13PM" becomes a query for...
>
> sameposition(open:[* TO wed_14_13], close:[wed_17_13 TO *])
>
> For the special case part of your problem when there are certain dates
> that a store will be open atypical hours, i *think* that could be solved
> using some special docs and the new "join" QParser in a filter query...
>
>https://wiki.apache.org/solr/Join
>
> imagine you have your "regular" docs with all the normal data about a
> store, and the open/close fields i describe above.  but in addition to
> those, for any store that you know is "closed on dec 25" or "only open
> 12:00-15:00 on Jan 01" you add an additional small doc encapsulating
> the information about the stores closures on that special date - so that
> each special case would be it's own doc, even if one store had 5 days
> where there was a special case...
>
>  specialdoc1:
>store_id: 42
>special_date: Dec-25
>status: closed
>  specialdoc2:
>store_id: 42
>special_date: Jan-01
>status: irregular
>open: 09_30
>close: 13_00
>
> then when you are executing your query, you use an "fq" to constrain to
> stores that are (normally) open right now (like i mentioned above) and you
> use another fq to find all docs *except* those resulting from a join
> against these special case docs based on the current date.
>
> so if you r query is "open now and for the next 3 hours" and "now" ==
> "sunday, 2011-12-25 @ 10:17AM your query would be something like...
>
> q=...us

Re: Suggestions on how to perform infrastructure migration from 1.4 to 3.4?

2011-10-03 Thread Erick Erickson
Well, other than testing he heck out of the 3.4 before putting it into
production, the answer is "how long will it take you to install new
software and replicate?"

Note, I'm not saying that I know of any instabilities in 3.4, just that
thoroughly testing a major upgrade is *always* called for. 3.4 is better
tested than 1.4 ever was in terms of unit tests

Here's a suggested cut-over path, assuming you have a window
in which you can handle your full search load on one slave.

1> create a full index on 3.4. You'll be better off re-creating the entire
 index by 3.4. This can be on your separate server.
2> take your master out of service/disable replication and move
  the index over to it and verify it's operational.
3> take one slave out of service, install 3.4 and point it at your master.
4> Once replication is complete, put this slave back in service and
 take the other slave out of service.
5> install 3.4 on the slave, replicate, and put it back in service.

Really, you should be able to predict this pretty reliably by trying it on
a spare machine to work the kinks out.

Best
Erick

On Fri, Sep 30, 2011 at 5:54 AM, Pranav Prakash  wrote:
> Hi List,
>
> We have our production search infrastructure as - 1 indexing master, 2
> serving identical twin slaves. They are all Solr 1.4 beasts. Apart from this
> we have 1 beast on Solr 3.4, which we have benchmarked against our
> production setup (against performance and relevancy) and would like to
> upgrade our production setup. Something like this has not happened before in
> our organization. I'd like to know opinions from the community about what
> are ways in which this migration can be performed? Will there be any
> downtimes, if so for how many hours? What are some of the common issues that
> might come along?
>
> *Pranav Prakash*
>
> "temet nosce"
>
> Twitter  | Blog  |
> Google 
>


Re: heap size problem when indexinf files with solrj

2011-10-03 Thread Erick Erickson
You're saying that the file you're indexing is 500M? Pretty big...

First, I'd ask if you really want to index it as a single file or whether
you can break it up into sub-files. It depends upon what it is I guess.

Second, you can certainly index something this big, you just need
enough memory. Sounds like a 64-bit machine is in order.

Third, make sure you're committing before you try to index this
thing, so you're sure you have as many resources available
as possible.

Fourth, where is your error coming from? The SolrJ program which
I presume is running on your local machine or the Solr server
which is running where?

Best
Erick

On Fri, Sep 30, 2011 at 7:55 AM, hadi  wrote:
> I write a simple program with solrj that index files but after a minute
> passed it crashed and the
> *java.lang.OutOfmemoryError : java heap space* appear
>
> I used Eclipse and my memory storage is abou 2GB and i set the
> -Xms1024M-Xmx2048M for both my VM arg of tomcat and my application in Debug
> Configuration and uncomment the maxBufferedDocs in solrconfig and set it to
> 100 then run again my application but it crashed soon when it reach the
> files greater than 500MB
>
> is there any config to index large files with solrj?
> the detail my solrj is as below:
>
> String urlString = "http://localhost:8983/solr/file";;
> CommonsHttpSolrServer solr = new CommonsHttpSolrServer(urlString);
>
> ContentStreamUpdateRequest req = new
> ContentStreamUpdateRequest("/update/extract");
>
> eq.addFile(file);
> req.setParam("literal.id", file.getAbsolutePath());
> req.setParam("literal.name", file.getName());
> req.setAction(ACTION.COMMIT, true, true);
>
> solr.request(req);
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/heap-size-problem-when-indexinf-files-with-solrj-tp3382115p3382115.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr stopword problem in Query

2011-10-03 Thread Isan Fulia
Thanks Erick.

On 29 September 2011 18:31, Erick Erickson  wrote:

> I think your problem is that you've set
>
> omitTermFreqAndPositions="true"
>
> It's not real clear from the Wiki page, but
> the tricky little phrase
>
> "Queries that rely on position that are issued
> on a field with this option will silently fail to
> find documents."
>
> And phrase queries rely on position information
>
> Best
> Erick
>
> On Tue, Sep 27, 2011 at 11:00 AM, Rahul Warawdekar
>  wrote:
> > Hi Isan,
> >
> > The schema.xml seems OK to me.
> >
> > Is "textForQuery" the only field you are searching in ?
> > Are you also searching on any other non text based fields ? If yes,
> please
> > provide schema description for those fields also.
> > Also, provide your solrconfig.xml file.
> >
> >
> > On Tue, Sep 27, 2011 at 1:12 AM, Isan Fulia  >wrote:
> >
> >> Hi Rahul,
> >>
> >> I also tried searching "Coke Studio MTV" but no documents were returned.
> >>
> >> Here is the snippet of my schema file.
> >>
> >>   >> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >>
> >>  
> >>
> >>
> >> >>ignoreCase="true"
> >>
> >>words="stopwords_en.txt"
> >>enablePositionIncrements="true"
> >>
> >>/>
> >> >> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >>
> >>
> >>
> >> >> protected="protwords.txt"/>
> >>
> >>
> >>  
> >>
> >>  
> >>
> >>
> >> >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >>
> >> >>ignoreCase="true"
> >>
> >>words="stopwords_en.txt"
> >>enablePositionIncrements="true"
> >>
> >>/>
> >> >> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >>
> >>
> >>
> >> >> protected="protwords.txt"/>
> >>
> >>
> >>  
> >>
> >>
> >>
> >>
> >> * >> multiValued="false"/>
> >>  >> multiValued="false"/>
> >>
> >> ** >> multiValued="true" omitTermFreqAndPositions="true"/>**
> >>
> >> 
> >> *
> >>
> >>
> >> Thanks,
> >> Isan Fulia.
> >>
> >>
> >> On 26 September 2011 21:19, Rahul Warawdekar <
> rahul.warawde...@gmail.com
> >> >wrote:
> >>
> >> > Hi Isan,
> >> >
> >> > Does your search return any documents when you remove the 'at' keyword
> >> and
> >> > just search for "Coke studio MTV" ?
> >> > Also, can you please provide the snippet of schema.xml file where you
> >> have
> >> > mentioned this field name and its "type" description ?
> >> >
> >> > On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia  >> > >wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > I have a text field named* textForQuery* .
> >> > > Following content has been indexed into solr in field textForQuery
> >> > > *Coke Studio at MTV*
> >> > >
> >> > > when i fired the query as
> >> > > *textForQuery:("coke studio at mtv")* the results showed 0 documents
> >> > >
> >> > > After runing the same query in debugMode i got the following results
> >> > >
> >> > > 
> >> > > 
> >> > > textForQuery:("coke studio at mtv")
> >> > > textForQuery:("coke studio at mtv")
> >> > > PhraseQuery(textForQuery:"coke studio ?
> >> > mtv")
> >> > > textForQuery:"coke studio *?
> >> *mtv"
> >> > >
> >> > > Why the query did not matched any document even when there is a
> >> document
> >> > > with value of textForQuery as *Coke Studio at MTV*?
> >> > > Is this because of the stopword *at* present in stopwordList?
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Thanks & Regards,
> >> > > Isan Fulia.
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and Regards
> >> > Rahul A. Warawdekar
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> Isan Fulia.
> >>
> >
> >
> >
> > --
> > Thanks and Regards
> > Rahul A. Warawdekar
> >
>



-- 
Thanks & Regards,
Isan Fulia.


Re: multiple dateranges/timeslots per doc: modeling openinghours.

2011-10-03 Thread Mikhail Khludnev
On Mon, Oct 3, 2011 at 3:09 PM, Geert-Jan Brits  wrote:

> Interesting! Reading your previous blogposts, I gather that the to be
> posted
> 'implementation approaches' includes a way of making the SpanQueries
> available within SOLR?
>

It's going to be posted in two days. But please don't expect much from them,
it's just a proof of concept. It's not a code for production nor for
contribution. e.g. we've chosen 'quick hack' way of boolean query converting
instead of XmlQuery, SurroundParser or contrib's query parser, etc. i.e. we
can share only core ideas, some of these are possibly wrong.


> Also, would with your approach would (numeric) RangeQueries be possible as
> Hoss suggests?
>

Basically range queries are just conjunctions (sometimes it's not great at
all) for numbers. If you encode your terms in sortable manner eg A0715 for
Monday 7-15 am, you'll be able to build the span merging 'conjunction' - new
SpanOrQuery(new SpanTermQuery(..), ).

Regards

Mikhail


> Looking forward to that 'implementation post'
> Cheers,
> Geert-Jan
>
> Op 1 oktober 2011 19:57 schreef Mikhail Khludnev <
> mkhlud...@griddynamics.com
> > het volgende:
>
> > I agree about SpanQueries. It's a viable measure against "false-positive
> > matches on multivalue fields".
> >  we've implemented this approach some time ago. Pls find details at
> >
> >
> http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
> >
> > and
> >
> >
> http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html
> > we are going to publish the third post about an implementation
> approaches.
> >
> > --
> > Mikhail Khludnev
> >
> >
> > On Sat, Oct 1, 2011 at 6:25 AM, Chris Hostetter <
> hossman_luc...@fucit.org
> > >wrote:
> >
> > >
> > > : Another, faulty, option would be to model opening/closing hours in 2
> > > : multivalued date-fields, i.e: open, close. and insert open/close for
> > each
> > > : day, e.g:
> > > :
> > > : open: 2011-11-08:1800 - close: 2011-11-09:0300
> > > : open: 2011-11-09:1700 - close: 2011-11-10:0500
> > > : open: 2011-11-10:1700 - close: 2011-11-11:0300
> > > :
> > > : And queries would be of the form:
> > > :
> > > : 'open < now && close > now+3h'
> > > :
> > > : But since there is no way to indicate that 'open' and 'close' are
> > > pairwise
> > > : related I will get a lot of false positives, e.g the above document
> > would
> > > be
> > > : returned for:
> > >
> > > This isn't possible out of the box, but the general idea of "position
> > > linked" queries is possible using the same approach as the
> > > FieldMaskingSpanQuery...
> > >
> > >
> > >
> >
> https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
> > > https://issues.apache.org/jira/browse/LUCENE-1494
> > >
> > > ..implementing something like this that would work with
> > > (Numeric)RangeQueries however would require some additional work, but
> it
> > > should certianly be doable -- i've suggested this before but no one has
> > > taken me up on it...
> > > http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery
> > >
> > > If we take it as a given that you can do multiple ranges "at the same
> > > position", then you can imagine supporting all of your "regular" hours
> > > using just two fields ("open" and "close") by encoding the day+time of
> > > each range of open hours into them -- even if a store is open for
> > multiple
> > > sets of ranges per day (ie: closed for siesta)...
> > >
> > >  open: mon_12_30, tue_12_30, wed_07_30, wed_3_30, ...
> > >  close: mon_20_00, tue_20_30, wed_12_30, wed_22_30, ...
> > >
> > > then asking for "stores open now and for the next 3 hours" on "wed" at
> > > "2:13PM" becomes a query for...
> > >
> > > sameposition(open:[* TO wed_14_13], close:[wed_17_13 TO *])
> > >
> > > For the special case part of your problem when there are certain dates
> > > that a store will be open atypical hours, i *think* that could be
> solved
> > > using some special docs and the new "join" QParser in a filter query...
> > >
> > >https://wiki.apache.org/solr/Join
> > >
> > > imagine you have your "regular" docs with all the normal data about a
> > > store, and the open/close fields i describe above.  but in addition to
> > > those, for any store that you know is "closed on dec 25" or "only open
> > > 12:00-15:00 on Jan 01" you add an additional small doc encapsulating
> > > the information about the stores closures on that special date - so
> that
> > > each special case would be it's own doc, even if one store had 5 days
> > > where there was a special case...
> > >
> > >  specialdoc1:
> > >store_id: 42
> > >special_date: Dec-25
> > >status: closed
> > >  specialdoc2:
> > >store_id: 42
> > >special_date: Jan-01
> > >status: irregular
> > >open: 09_30
> > >close: 13_00
> > >
> > > then when you are executing your query, you use an "fq" to constrain to
> > > stores that are (normally) open right now (like i mentioned above

Faceted query performance problem when group.truncate set to true

2011-10-03 Thread dkundo
Hi,

in my (test) setup I have 200K distinct documents with each document having
5 historical version of it (so in total there is 1M documents). 
In order to to retrieve the latest (or historical) version of the documents
I'm using the grouping functionality:

   id:[0 TO N]&group=true&group.field=objid&group.limit=1&group.sort=id
desc

In addition I need to provide faceted search:

  facet=on&facet.field=feature&facet.field=tag&facet.field=folder

For faceted search to provide correct results I should add
*group.truncate=true* to my query. But when I do so the query time increases
significantly: from ~70ms without this option to ~1700ms with this option
set to true.

Am I doing something wrong? 
Is there another way of doing faceted search combined with result grouping?

Regards,
Dmitry


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceted-query-performance-problem-when-group-truncate-set-to-true-tp3389690p3389690.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: solr searching for special characters?

2011-10-03 Thread Steven A Rowe
Yes.

> -Original Message-
> From: vighnesh [mailto:svighnesh...@gmail.com]
> Sent: Monday, October 03, 2011 2:22 AM
> To: solr-user@lucene.apache.org
> Subject: solr searching for special characters?
> 
> Hi all,
> 
> I need to search special characters in solr . so
> Is it possible to search special characters in solr ?
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/solr-
> searching-for-special-characters-tp3388974p3388974.html
> Sent from the Solr - User mailing list archive at Nabble.com.


UniqueKey filed length exceeds

2011-10-03 Thread kiran.bodigam
I have defined time stamp as unique key but when i trying to search with the
same it throws error any other alternate for StrField coz i can't increase
the length of it or can't we apply analyzer for the same? 
My unique key:-MM-DD 13:54:11.414632 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/UniqueKey-filed-length-exceeds-tp3389759p3389759.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Automate startup/shutdown of SolrCloud Shards

2011-10-03 Thread Jamie Johnson
Something like this would be excellent.  Right now I am starting up
the main server and then doing a wget to see if the server has started
successfully before firing off any secondary servers.  If we could do
what you're saying then everything is much cleaner to do.

On Mon, Oct 3, 2011 at 1:20 AM, Mark Miller  wrote:
> I could use an easier way to do this myself:
> https://issues.apache.org/jira/browse/SOLR-2805
>
> I'm going to add a main method to ZkController that will make this simpler -
> I've got an early version that works something like: java -classpath .:*
> org.apache.solr.cloud.ZkController 127.0.0.1:9983 127.0.0.1 8983 solr
> /home/mark/workspace/SolrCloud/solr/example/solr/conf conf1
>
>
> On Fri, Sep 30, 2011 at 3:13 PM, Mark Miller  wrote:
>
>>
>> On Sep 29, 2011, at 1:59 PM, Jamie Johnson wrote:
>>
>> > I am trying to automate the startup/shutdown of SolrCloud shards and
>> > have noticed that there is a bit of a timing issue where if the server
>> > which is to bootstrap ZK with the configs does not complete it's
>> > process (i.e. there is no data at the Conf yet) the other servers will
>> > fail to start.  An obvious solution is to just start the solr instance
>> > responsible for bootstraping first, is there some way that folks are
>> > handling this now?
>>
>>
>> That's pretty much the deal - you have to get the configs in there for the
>> other shards to use as step one.
>>
>> Normally you would do this by starting one shard first, pointing to the
>> configs. Then start the other shards.
>>
>> Other options:
>>
>> use the zk cmd line program to manually create the config nodes and upload
>> the files
>> use the GUI program out there to do the same
>> use the zk library to write up something that does it
>>
>> write a simple java program (you'll need the solr libs on the classpath)
>> that does it with ZkController
>>
>> eg
>>
>> ZkController  zkController = new ZkController(zkAddress,
>>          zkClientTimeout, zkConnectTimeout, "localhost", "8983", "solr")
>> zkController.uploadConfigDir(directory, configName);
>> zkController.close();
>>
>> - Mark Miller
>> lucidimagination.com
>> 2011.lucene-eurocon.org | Oct 17-20 | Barcelona
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>


Re: Documents Indexed, SolrJ see nothing before long time

2011-10-03 Thread darul
Or any Idea to see cache updated more quickly, I do not understand well how
caches are working in Solr.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Documents-Indexed-SolrJ-see-nothing-before-long-time-tp3389721p3389864.html
Sent from the Solr - User mailing list archive at Nabble.com.


Broken Simple html Form

2011-10-03 Thread Peter Rochford
Hi Solr Users,

 

I’m new to using Solr and trying to implement it as a search engine for our
company web site using the simple html form provided by
donovan@gmail.com at
http://code.google.com/p/solr-php-client/wiki/ExampleUsage. The example
provided via this web page fails to work for me. From what I can determine
the php bails when trying to access any items in the object returned by
$solr->search. For example, the php file stops when executing the second
statement in the following two lines of code:

 

$response = $solr->search( $query, $offset, $limit);

$test = $response->response->numFound;

 

The problem appears to be due to the private protections placed on the
object returned from $solr->search. Enclosed between the horizontal lines
below is some code I’ve inserted within the first try statement to provide
some output.

 



  try

  {

print("query = $query "); //debug

$results = $solr->search($query, 0, $limit);

echo "Executing print_r(\$results) "; //debug

echo "result contents: ", print_r($results), ""; //debug

print "";

   echo "http status: ", $results->getHttpStatusMessage(), "";

   $raw_response = $results->getRawResponse();

   echo "Executing print_r(\$results->getRawResponse()) "; //debug

echo "Raw Response: ", print_r($raw_response), ""; //returns 800
char string

print "Executing: echo \$results->response->numFound; ";

   echo $results->response->numFound;

print " Passed new test load ";

  }

  catch (Exception $e)

  {

// in production you'd probably log or email this error to an admin

// and then show a special message to the user but for this example

// we're going to show the full exception

die("SEARCH EXCEPTION{$e
>__toString()}");

  }



 

The output obtained is shown between the horizontal lines below. From the
output it is clear that Solr is working fine and that the php file
encounters a problem when executing $results->response->numFound  because it
never gets to the “Passed new test load” statement.

 



query = rochford

Executing print_r($results)

result contents: Apache_Solr_Response Object ( [_response:protected] =>
Apache_Solr_HttpTransport_Response Object ( [_statusCode:private] => 200
[_statusMessage:private] => OK [_mimeType:private] => text/plain
[_encoding:private] => UTF-8 [_responseBody:private] =>
{"responseHeader":{"status":0,"QTime":1,"params":{"start":"0","q":"rochford"
,"json.nl":"map","wt":"json","rows":"10"}},"response":{"numFound":1,"start":
0,"docs":[{"id":"SOLR1000","name":"Solr, the Enterprise Search
Server","manu":"Apache Software
Foundation","price":0.0,"popularity":10,"inStock":true,"incubationdate_dt":"
2006-01-17T00:00:00Z","cat":["software","search"],"features":["Peter
Rochford added text here!","Advanced Full-Text Search Capabilities using
Lucene","Optimized for High Volume Web Traffic","Standards Based Open
Interfaces - XML and HTTP","Comprehensive HTML Administration
Interfaces","Scalability - Efficient Replication to other Solr Search
Servers","Flexible and Adaptable with XML configuration and Schema","Good
unicode support: héllo (hello with an accent over the e)"]}]}} )
[_isParsed:protected] => [_parsedData:protected] =>
[_createDocuments:protected] => 1 [_collapseSingleValueArrays:protected] =>
1 ) 1

 

http status: OK

Executing print_r($results->getRawResponse())

Raw Response:
{"responseHeader":{"status":0,"QTime":1,"params":{"start":"0","q":"rochford"
,"json.nl":"map","wt":"json","rows":"10"}},"response":{"numFound":1,"start":
0,"docs":[{"id":"SOLR1000","name":"Solr, the Enterprise Search
Server","manu":"Apache Software
Foundation","price":0.0,"popularity":10,"inStock":true,"incubationdate_dt":"
2006-01-17T00:00:00Z","cat":["software","search"],"features":["Peter
Rochford added text here!","Advanced Full-Text Search Capabilities using
Lucene","Optimized for High Volume Web Traffic","Standards Based Open
Interfaces - XML and HTTP","Comprehensive HTML Administration
Interfaces","Scalability - Efficient Replication to other Solr Search
Servers","Flexible and Adaptable with XML configuration and Schema","Good
unicode support: héllo (hello with an accent over the e)"]}]}}1

 

Executing: echo $results->response->numFound;



 

Clearly the search for "Rochford" is successful. The problem is I cannot
access the item in the $results object. 

 

Note that I am using the following versions of Solr, PHP, and Java:

 

Apache-Solr-3.3.0

 

PHP 5.3.8 (cli) (built: Aug 29 2011 21:03:55) 

Copyright (c) 1997-2011 The PHP Group

Zend Engine v2.3.0, Copyright (c) 1998-2011 Zend Technologies

 

java version "1.6.0_27"

Java(TM) SE Runtime Environment (build 1.6.0_27-b07)

Java HotSpot(TM) Server VM (build 20.2-b06, mixed mode) 

 

Any suggestions on the cause of the problem and what is the 

Debugging misbehaving spellchecker search....

2011-10-03 Thread Mark Swinson

Hi,

I'm trying to configure solr to perform a 'Did you mean this' style
search using the SpellCheckerComponent and the standard search handler.
Unfortunately I am having problems in getting results from my test
search ... basically when I search using a misspelling of a word I know
to be in the source index I get no results.

I have built a standard index from a mysql table using the 'dataimport'
plugin. This works successfully and I am able to make standard text
queries on this with the expected results. I have then performed a
spellchecker index rebuild using the following uri

/select?spellcheck=true&spellcheck.build=true&q=*

( if I try without the q parameter which I don't think is necessary in
this particular situation I get a null pointer exception ).

Does anyone know if there is a way of confirming if the spellchecker
index has been correctly written, as I want to isolate whether or not
it is my query that is at fault or my spellchecker configuration.



For reference, below is the key aspects of my solr configration relating
to this issue -


Thanks


Mark



schema.xml:

























solrconfig.xml:



textSpell


  test
  spell
  true
  ./spellchecker


  
  
  

 
   explicit
   
false

false

1

  spellcheck

  
  




http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



SolrJ Annotation for multiValued field

2011-10-03 Thread darul
Hello again,

Is it possible to persist simple list values in index using annotated bean ?

 

And in my Pojo :

@Field("one")
String myString
.


@Field("mymultivaluedfield")
List items;

Actually, nothing happends, content of this collection "ArrayList"
is not persisted into mymultivaluedfield field.

Is it the problem of nested field I have seen in many threads.

Any solutions to persist multivalue field with annotated bean ?

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Annotation-for-multiValued-field-tp3390255p3390255.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Weird issues when upgrading from 1.4 to 3.4

2011-10-03 Thread Jaeger, Jay - DOT
I have no idea what might be causing your memory to increase like that (we 
haven't run 3.4, and our index so far has been at most 28 million rows with 
maybe 40 fields), but just as an aside, depending upon what you meant by "we 
drop the whole index", I'd think it might work better to do an  right 
after deleting everything, and then a second one right after all of the 
updates.  And if you did it that way, I would not expect you to need an 
 at the end, either.

If you are trying to keep the old one available for query while the new ones 
are being added, might it be better to do that with two cores, and then swap 
them when complete?

JRJ


-Original Message-
From: Willem Basson [mailto:willem.bas...@gmail.com] 
Sent: Friday, September 30, 2011 7:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Weird issues when upgrading from 1.4 to 3.4

Just to clarify, I'm not worried about the virtual memory getting bigger,
the issue is that after doing a lot of adds without a commit the performance
dramatically decreases until we do the commit.
This didn't use to be a problem with 1.4

Willem

On Fri, Sep 30, 2011 at 10:20 AM, Willem Basson wrote

> Hi there
>
> We are currently upgrading from solr 1.4 to 3.4 and have seen some issues
> with our specific use-case.
> As background we drop the whole index and then add all our documents in one
> big build before we commit and then optimise.
> This way we can revert the build if there are any issues and this won't be
> replicated out to our slave instances.
>
> We use the following flags: -Xmx2g -Xms2g
> With 1.4 it works pretty well, and the process doesn't use much more than
> 2GB of memory as the index is being built. Garbage collector kicks in quite
> a bit but performance are pretty decent throughout the build.
> For 3.4 though the process builds up to use a lot more memory, about 20 to
> 30 GB and it starts swapping and grinding to a bit of a halt. This means it
> takes about 10 times longer to complete the build, but it does complete.
> We have about a 100k documents and they aren't massive, but they aren't
> small either. They do have a lot of fields though, some have 6000+ fields.
>
> Monitoring the heap size I can see that it stays under 2GB and garbage
> collector seems to kick in just like with 1.4
>
> While we could increase the memory and do a few other things to make this
> less of an issue I would really like to know if anyone has any idea what the
> problem could be and what we could do to try and change the behaviour in
> config. Our solrconfig.xml file is the same for 1.4 and 3.4, we haven't made
> any changes.
>
> Thanks
>
> Willem Basson
>


Documents Indexed, SolrJ see nothing before long time

2011-10-03 Thread darul
Hello,

While documents are indexed, I mean I can retrieve it with solr
administration console, it is taking too long, about 5 minutes before I can
see it using SolrJ API.

Do you have any Idea on how to resolve this please ?

Only specific configuration I have is :

 
 
 3 
   

Thanks,

Jul

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Documents-Indexed-SolrJ-see-nothing-before-long-time-tp3389721p3389721.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Documents Indexed, SolrJ see nothing before long time

2011-10-03 Thread Christopher Gross
See:
http://wiki.apache.org/solr/SolrConfigXml

The example in the wiki is:

  1 
  86000 


So since you have yours set to 3, that translates to 30,000 ms,
which is 5 minutes.  If you want the autocommit feature to trigger
more often, you could decrease the number.  Dropping it to 6000 would
make it auto commit every minute, but I don't know if that is too
often for you.

-- Chris



On Mon, Oct 3, 2011 at 10:00 AM, darul  wrote:
> Or any Idea to see cache updated more quickly, I do not understand well how
> caches are working in Solr.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Documents-Indexed-SolrJ-see-nothing-before-long-time-tp3389721p3389864.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Lucene Grid question

2011-10-03 Thread sol myr
Thank you very much (sorry for the delayed reply).




From: Chris Hostetter 
To: solr-users ; sol myr 
Sent: Wednesday, September 21, 2011 4:15 AM
Subject: Re: Lucene Grid question


: E.g. say I have a chain of book-stores, in different countries, and I'm 
aiming for the following:
: - Each country has its own index file, on its own machine (e.g. books from 
Japan are indexed on machine "japan1")
: - Most users search only within their own country (e.g. search only the 
"japan1" index)
: - But sometimes, they might ask to search the entire chain (all countries), 
meaning some sort of "map/reduce" (=collect data from all countries).

what you're describing is one possible usecase of "Distributed Search"

http://wiki.apache.org/solr/DistributedSearch

as long as each of the individual "country" indexes have schemas that 
overlap (ie: share some common fields) and have the same uniqueKey field, 
with an id space that does *not* overlap between countries (ie: document 
"1" can only be in one index, not in any others) then you can do a 
distributed query that is distributed out to all of hte individual 
indexes, and then merged together to generate aggregate results.


-Hoss

Re: Implementing a custom ResourceLoader

2011-10-03 Thread Chris Hostetter

: As part of writing a solr plugin I need to override the ResourceLoader. My
: plugin is intended stop word analyzer filter factory and I need to change
: the way stop words are being fetched. My assumption is overriding
: ResourceLoader->getLines() will help me to meet my target of fetching stop
: word data from an external webservice.
: Is thisi feasible? Or should I go about overriding
: Factory->inform(ResourceLoader) method. Kindly let me know how to achieve
: this.

I would not approach your problem by trying to customize hte 
ResourceLoader ... if your goal is to get the list of words from some 
location other then a file, owuld just make your TokenFilterFactory read 
the stopwords from wherever you wan to read them.

ResourceLoader.getLines() is just a convinience method for dealing with 
the local paths and classpaths and stripping out comment lines -- trying 
to subclasses ResourceLoader when you don't need 90% of that functionality 
is a bad idea.

-Hoss


Re: Bad Request accessing solr on linux

2011-10-03 Thread Chris Hostetter

: Below is the error:
: 
: Bad Request

we need a lot more to go on then that.

what errors are you seeing in the logs?  what do your configs look like? 
etc...

https://wiki.apache.org/solr/UsingMailingLists

-Hoss


Re: Documents Indexed, SolrJ see nothing before long time

2011-10-03 Thread darul
3ms = 5 minutes ? Are you sure you are not mistaking...
3ms = 30s

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Documents-Indexed-SolrJ-see-nothing-before-long-time-tp3389721p3390515.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Getting facet counts for 10,000 most relevant hits

2011-10-03 Thread Burton-West, Tom
Thanks so much for your reply Hoss,

I didn't realize how much more complicated this gets with distributed search. 
Do you think it's worth opening a JIRA issue for this?
Is there already some ongoing work on the faceting code that this might fit in 
with?

In the meantime, I think I'll go ahead and do some performance tests on my 
kludge.  That might work for us as an interim measure until I have time to dive 
into the Solr/Lucene distributed faceting code.

Tom

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, September 30, 2011 9:20 PM
To: solr-user@lucene.apache.org
Subject: RE: Getting facet counts for 10,000 most relevant hits


: I figured out how to do this in a kludgey way on the client side but it 
: seems this could be implemented much more efficiently at the Solr/Lucene 
: level.  I described my kludge and posted a question about this to the 

It can, and I have -- but only for the case of a single node...

In general the faceting code in solr just needs a DocSet.  the default 
imple uses the DocSet computed as aside effect when executing the main 
search, but a custom SearchComponent could pick any DocSet it wants.

A few years back I wrote a custom faceting plugin that computed a "score" 
for each constraint based on:
 * Editorially assigned weights from a config file
 * the number of matching documents (ie: normal constraint count)
 * the number of matching documents from hte first N results

...where the last number was determined by internally executing the search 
with "rows" of N, to generate a DocList object, nad then converting that 
DocList into a DocSet, and using that as the input to SimpleFacetCounts.

Ignoring the "Editorial weights" part of the above, the logic for 
"scoring" constraints based on the other two factors is general enough 
thta it could be implemented in solr, we just need a way to configure "N" 
and what kind of function should be applied to the two counts.

...But...

This approach really breaks down in a distributed model.  You can't do the 
same quick and easy DocList->DocSet transformation on each node, you have 
to do more complicated federating logic like the existing FacetComponent 
code does, and even there we don't have anything that would help with the 
"only the first N" type logic.  My best idea would be to do the same thing 
you describe in your "kludge" approach to solving this in the client...

: 
(http://lucene.472066.n3.nabble.com/Solr-should-provide-an-option-to-show-only-most-relevant-facet-values-tc3374285.html).
  

...the coordinator would have to query all of the shards for their top N, 
and then tell each one exactly which of those docs to include in the 
"weighted facets constraints" count ... which would make for some relaly 
big requests if N is large.

the only sane way to do this type of thing efficiently in a distributed 
setup would probably be to treat the "top N" part of the goal as a 
"guideline" for a sampling problem, telling each shard to consider only 
*their* top N results when computing the top facets in shardReq #1, and 
then do the same "give me an exact count" type logic in shardReq #2 
thta we already do.  So the constraints picked may not acutally be 
the top constraints for the first N docs across the whole collection (just 
like right now they aren't garunteed to be the top constraints for all 
docs in the collection in a long tail situation), but they would 
representative of the "first-ish" docs across the whole collection.

-Hoss


Re: Multithreaded JdbcDataStore and CachedSqlEntityProcessor

2011-10-03 Thread Maria Vazquez
When I'm debugging, if it is single threaded
CachedSqlEntityProcessor.getAllNonCachedRows is called only once, all the
rows cached and next time it requests a row it gets it from the cached data.
In the logs I see the SQL call only once.

If I use multiple threads, it calls
CachedSqlEntityProcessor.getAllNonCachedRows multiple times, so it creates a
new cache per thread.

The cache should be shared among threads and be safe, right?

Thanks!
Maria




On 10/1/11 8:00 PM, "pulkitsing...@gmail.com" 
wrote:

> What part of the source code in debug mode behaved in a fashion such as to
> make it seem like it is not thread-safe?
> 
> If it feels difficult to put into words then you can always make a small 5 min
> screencast to demo the issue and talk about it. I do that for really complex
> stuff with Jing by techsmith (free version).
> 
> Or you can put up a small and simplified test case together to demo the issue
> and then paste the link for that hosted attachment :)
> 
> Sent from my iPhone
> 
> On Sep 30, 2011, at 1:28 PM, Maria Vazquez  wrote:
> 
>> Hi,
>> I¹m using threads with JdbcDataStore and CachedSqlEntityProcessor.
>> I noticed that if I make it single threaded CachedSqlEntityProcessor behaves
>> as expected (it only queries the db once and caches all the rows). If I make
>> it multi threaded it seems to make multiple db queries and when I debug the
>> source code it looks like it is not thread safe.
>> Any ideas?
>> Thanks,
>> Maria
>> 



Re: Boost Exact matches on Specific Fields

2011-10-03 Thread Balaji S
Hi

   One More Question here ,  For ex: If so do an search for  "Agriculture
Foods"  with out Quotes , It is trying to find the ones which have both the
words not Splitting and checking for individual results . On removing the QF
params it seems to work . Is it a problem with the QF params

I am able to see it split and trigger by checking the SOLR Analysis page


Thanks
Balaji

On Thu, Sep 29, 2011 at 6:11 AM, Balaji S  wrote:

> Yeah I will change the weight for str_category and make it higher . I
> converted it to lowercase  because we cannot expect users to type them in
> the correct case
>
> Thanks
> Balaji
>
> On Thu, Sep 29, 2011 at 3:52 AM, Way Cool  wrote:
>
>> I will give str_category more weight than ts_category because we want
>> str_category to win if they have "exact" matches ( you converted to
>> lowercase).
>>
>> On Mon, Sep 26, 2011 at 10:23 PM, Balaji S  wrote:
>>
>> > Hi
>> >
>> >   You mean to say copy the String field to a Text field or the reverse .
>> > This is the approach I am currently following
>> >
>> > Step 1: Created a FieldType
>> >
>> >
>> > > > sortMissingLast="true" omitNorms="true">
>> >
>> >
>> >
>> >
>> >
>> > 
>> >
>> > Step 2 : > > stored="true"/>
>> >
>> > Step 3 : 
>> >
>> > And in the SOLR Query planning to q=hospitals&qf=body^4.0 title^5.0
>> > ts_category^10.0 str_category^8.0
>> >
>> >
>> > The One Question I have here is All the above mentioned fields will have
>> > "Hospital" present in them , will the above approach work to get the
>> exact
>> > match on the top and bring "Hospitalization" below in the results
>> >
>> >
>> > Thanks
>> > Balaji
>> >
>> >
>> > On Tue, Sep 27, 2011 at 9:38 AM, Way Cool 
>> wrote:
>> >
>> > > If I were you, probably I will try defining two fields:
>> > > 1. ts_category as a string type
>> > > 2. ts_category1 as a text_en type
>> > > Make sure copy ts_category to ts_category1.
>> > >
>> > > You can use the following as qf in your dismax:
>> > > qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0
>> > > or something like that.
>> > >
>> > > YH
>> > > http://thetechietutorials.blogspot.com/
>> > >
>> > >
>> > > On Mon, Sep 26, 2011 at 2:06 PM, balaji  wrote:
>> > >
>> > > > Hi all
>> > > >
>> > > >I am new to SOLR and have a doubt on Boosting the Exact Terms to
>> the
>> > > top
>> > > > on a Particular field
>> > > >
>> > > > For ex :
>> > > >
>> > > > I have a text field names ts_category and I want to give more
>> boost
>> > > to
>> > > > this field rather than other fields, SO in my Query I pass the
>> > following
>> > > in
>> > > > the QF params "qf=body^4.0 title^5.0 ts_category^21.0" and also sort
>> on
>> > > > SCORE desc
>> > > >
>> > > > When I do a search against "Hospitals" . I get "Hospitalization
>> > > > Management , Hospital Equipment & Supplies " on Top rather than the
>> > exact
>> > > > matches of "Hospitals"
>> > > >
>> > > >  So It would be great , If I could be helped over here
>> > > >
>> > > >
>> > > > Thanks
>> > > > Balaji
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > Thanks in Advance
>> > > > Balaji
>> > > >
>> > > > --
>> > > > View this message in context:
>> > > >
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Boost-Exact-matches-on-Specific-Fields-tp3370513p3370513.html
>> > > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > > >
>> > >
>> >
>>
>
>


sorting using function query results are notin order

2011-10-03 Thread abhayd
hi 
I am trying to sort results from solr using sum(count,score) function.
Basically its not adding things correctly.  
For example here is partial sample response
"Count":54,
"UserQuery":"how to",
"score":1.2550932,
"query({!dismax qf=UserQuery v='how'})":1.2550932,
"sum(Count,query({!dismax qf=UserQuery v='how'}))":1.2550932},

how come addition of 54+1.2550932 is equla to 1.2550932 ?as if

What i m doing wrong?
here is my complete query

http://localhost:10101/solr/autosuggest/select?q=how&start=0&indent=on&wt=json&rows=5&sort=sum%28Count,query%28{!dismax%20qf=UserQuery%20v=%27how%27}%29%29%20desc&fl=UserQuery,score,Count,query%28{!dismax%20qf=UserQuery%20v=%27how%27}%29,sum%28Count,query%28{!dismax%20qf=UserQuery%20v=%27how%27}%29%29&debug=true

{
  "responseHeader":{
"status":0,
"QTime":0,
"params":{
  "sort":"sum(Count,query({!dismax qf=UserQuery v='how'})) desc",
  "wt":"json",
  "rows":"5",
  "indent":"on",
  "fl":"UserQuery,score,Count,query({!dismax qf=UserQuery
v='how'}),sum(Count,query({!dismax qf=UserQuery v='how'}))",
  "debug":"true",
  "start":"0",
  "q":"how"}},
  "response":{"numFound":2628,"start":0,"maxScore":1.2550932,"docs":[
  {
"Count":54,
"UserQuery":"how to",
"score":1.2550932,
"query({!dismax qf=UserQuery v='how'})":1.2550932,
"sum(Count,query({!dismax qf=UserQuery v='how'}))":1.2550932},
  {
"Count":51,
"UserQuery":"how to text",
"score":0.8964951,
"query({!dismax qf=UserQuery v='how'})":0.8964951,
"sum(Count,query({!dismax qf=UserQuery v='how'}))":0.8964951},
  {
"Count":117,
"UserQuery":"how to block calls",
"score":0.7171961,
"query({!dismax qf=UserQuery v='how'})":0.7171961,
"sum(Count,query({!dismax qf=UserQuery v='how'}))":0.7171961},
  {
"Count":109,
"UserQuery":"how to call forward",
"score":0.7171961,
"query({!dismax qf=UserQuery v='how'})":0.7171961,
"sum(Count,query({!dismax qf=UserQuery v='how'}))":0.7171961},
  {
"Count":79,
"UserQuery":"how do I pay my bill?",
"score":0.7171961,
"query({!dismax qf=UserQuery v='how'})":0.7171961,
"sum(Count,query({!dismax qf=UserQuery v='how'}))":0.7171961}]
  },


--
View this message in context: 
http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3390926.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ Annotation for multiValued field

2011-10-03 Thread darul
I will check tomorrow, but for a test case, I have put this code to verify
Field List mapping :

@Field("mymultivaluedfield") 
List items; 

public List getItems() {
   if (items == null) {
 items = new ArrayList();
 items.add("value1");
 
   }
  return items;
}

Is it normal it does not persist with this syntax.

A better may be to use myBean.setItems() ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Annotation-for-multiValued-field-tp3390255p3390930.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Documents Indexed, SolrJ see nothing before long time

2011-10-03 Thread darul
Any SolrJ cache ? I am newbie ;)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Documents-Indexed-SolrJ-see-nothing-before-long-time-tp3389721p3390942.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Documents Indexed, SolrJ see nothing before long time

2011-10-03 Thread Christopher Gross
Sorry, lack of sleep made me see an extra "0" in there.

I haven't had this issue -- but after every batch of items that I post
into Solr with SolrJ I run the commit() routine on my instance of the
CommonsHttpSolrServer, so they show up immediately.  You could try
altering your code to do that, or changing the maxDocs setting to a
smaller number that would trigger more often.

I don't know of cache on SolrJ...but I know that after I insert
records I can immediately search for them in my JUnit tests.

-- Chris



On Mon, Oct 3, 2011 at 3:50 PM, darul  wrote:
> Any SolrJ cache ? I am newbie ;)
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Documents-Indexed-SolrJ-see-nothing-before-long-time-tp3389721p3390942.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


How to obtain the Explained output programmatically ?

2011-10-03 Thread David Ryan
Hi,

I need to use some detailed information of the explained result in Solr
search.

Here is one example:

*
http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&indent=on&debugQuery=true&fl=id,score
*


0.18314168 = (MATCH) sum of:
  0.18314168 = (MATCH) weight(text:gb in 1), product of:
0.35845062 = queryWeight(text:gb), product of:
  2.3121865 = idf(docFreq=6, numDocs=26)
  0.15502669 = queryNorm
0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
  1.4142135 = tf(termFreq(text:gb)=2)
  2.3121865 = idf(docFreq=6, numDocs=26)
  0.15625 = fieldNorm(field=text, doc=1)


I could see the explained result by clicking the "toggle explain" button in
the web browser.   Is there a way to access the explained output
programmatically?


Regards,
David


Re: How to obtain the Explained output programmatically ?

2011-10-03 Thread Chris Hostetter

: 
http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&indent=on&debugQuery=true&fl=id,score
...
: the web browser.   Is there a way to access the explained output
: programmatically?

https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured


-Hoss


schema changes changes 3.3 to 3.4?

2011-10-03 Thread jo
Hi, I have the following issue on my test environment
when I do a query with the full word the reply no longer contains the
attr_meta
ex: 
http://solr1:8983/solr/core_1/select/?q=stegosaurus


ISO-8859-1


en


but if I remove just one letter it shows the expected response
ex:
http://solr1:8983/solr/core_1/select/?q=stegosauru


ISO-8859-1


stream_source_info
document
stream_content_type
text/plain
stream_size
81
Content-Encoding
ISO-8859-1
stream_name
filex123.txt
Content-Type
text/plain
resourceName
dinosaurs5.txt



For troubleshooting I replaced the schema.xml from 3.3 into 3.4 and it work
just fine, I can't find what changes on the schema would case this, any
clues?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/schema-changes-changes-3-3-to-3-4-tp3391019p3391019.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to obtain the Explained output programmatically ?

2011-10-03 Thread David Ryan
Thanks Hoss!

debug.explain.structured
is
definitely helpful.  It adds some structure to the plain explained output.
Is there a way to access these structured outputs in Java code (e.g., via
Solr plugin class)?
We could write a HTML parse to examine the output in the browser, but it's
probably no the best way to do that.



On Mon, Oct 3, 2011 at 2:11 PM, Chris Hostetter wrote:

>
> :
> http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&indent=on&debugQuery=true&fl=id,score
>...
> : the web browser.   Is there a way to access the explained output
> : programmatically?
>
> https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured
>
>
> -Hoss
>


Re: Sort five random "Top Offers" to the top

2011-10-03 Thread Sujit Pal
Hi Mouli,

I was looking at the code here, not sure why you even need to do the
sort...

After you get the DocList, couldn't you do something like this?

List topofferDocIds = new ArrayList();
for (DocIterator it = ergebnis.iterator(); it.hasNext();) {
  topofferDocIds.add(it.next());
}
Collections.shuffle(topofferDocIds);
rb.req.getContext().set(TOPOFFERS, topofferDocIds);

so in first-component, you have identified the top 5 offers for the
query and client, and stuffed them into the context.

Then you define a last component which will take the topofferDocIds and
place them at the top of the search results, and remove them if they
exist from the main result.

Would that not work?

Alternatively (kind of a hybrid way) would be to define your own
(single) component that takes the query, sends back two queries to the
underlying solr, one with the topoffers and one without and merges the
results before sending back. This would replace the component that does
the search.

-sujit

On Wed, 2011-09-28 at 07:15 -0700, MOuli wrote:
> Hey Community.
> 
> I write my first component and now i got a problem hear is my code: 
> 
> @Override
> public void prepare(ResponseBuilder rb) throws IOException {
> try {
> rb.req.getParams().getBool("topoffers.show", true);
> String client = rb.req.getParams().get("client", "1");
> BooleanQuery[] queries = new BooleanQuery[2];
> queries[0] = (BooleanQuery) DisMaxQParser.getParser(
> rb.req.getParams().get("q"),
> DisMaxQParserPlugin.NAME,
> rb.req)
> .getQuery();
> queries[1] = new BooleanQuery();
> Occur occur = BooleanClause.Occur.MUST;
> queries[1].add(QueryParsing.parseQuery("ups_topoffer_" + client
> + ":true", rb.req.getSearcher().getSchema()), occur);
> 
> Query q = Query.mergeBooleanQueries(queries[0], queries[1]);
> 
> DocList ergebnis = rb.req.getSearcher().getDocList(q, null,
> null, 0, 5, 0);
> 
> String[] machineIds = new String[5];
> int position = 0;
> DocIterator iter = ergebnis.iterator();
> while (iter.hasNext()) {
> int docID = iter.nextDoc();
> Document doc =
> rb.req.getSearcher().getReader().document(docID);
> for (String value : doc.getValues("machine_id")) {
> machineIds[position++] = value;
> }
> }
> 
> Sort sort = rb.getSortSpec().getSort();
> if (sort == null) {
> rb.getSortSpec().setSort(new Sort());
> sort = rb.getSortSpec().getSort();
> }
> 
> SortField[] newSortings = new SortField[sort.getSort().length +
> 5];
> int count = 0;
> for (String machineId : machineIds) {
> SortField sortMachineId = new SortField("map(machine_id," +
> machineId + "," + machineId + ",1,0) desc", SortField.DOUBLE);
> newSortings[count++] = sortMachineId;
> }
> 
> SortField[] sortings = sort.getSort();
> for (SortField sorting : sortings) {
> newSortings[count++] = sorting;
> }
> 
> sort.setSort(newSortings);
> 
> rb.getSortSpec().setSort(sort);
> 
> } catch (ParseException e) {
> LoggerFactory.getLogger(Topoffers.class).error( "Fehler bei den
> Topoffers!", this);
> LoggerFactory.getLogger(Topoffers.class).error(e.toString(),
> this);
> }
> 
> }
> 
> Why can't i manipulate the sort? Is there something i miss understand?
> 
> This search component is added as a "first-component" in the solrconfig.xml.
> 
> Please can anyone help me??
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3376166.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Determining master/slave from ZK in SolrCloud

2011-10-03 Thread Jamie Johnson
Is it possible to determine if a solr instance is a master or a slave
in replication terms based on the information that is placed in ZK in
SolrCloud?


Selective Result Grouping

2011-10-03 Thread entdeveloper
I'd like to suggest the ability to collapse results in a more similar way to
the old SOLR-236 patch that the current grouping functionality doesn't
provide. I need the ability to collapse only certain results based on the
value of a field, leaving all other results in tact.

As an example, consider the following documents:
ID TYPE
1   doc
2   image
3   image
4   doc

My desired behavior is to collapse results where TYPE:image, producing a
result set like the following:
1
2 (collapsed, count=2)
4

Currently, when using the Result Grouping feature, I only have the ability
to produce the result set below
1 (grouped, count=2)
2 (grouped, count=2)

I'd like to propose repurposing the 'group.query' parameter to achieve this
behavior. Currently, the group.query parameter behaves exactly like an 'fq'
(at least in terms of the results that are produced). I have yet to come up
with a scenario where the group.query could not be accomplished by using the
other group params and fq.

I'm hoping to collect some thoughts on the subject before submitting a
ticket to jira. Thoughts?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Selective-Result-Grouping-tp3391538p3391538.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to skip current document to index data from DIP

2011-10-03 Thread Erick Erickson
You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

You've given us a config, but no idea what you've
tried. What you observe. What error you're seeing.

What do your logs show? Are you seeing any error
stack traces? Have you tried backing this up and
trying only one addition at a time? For instance,
take out the whole TikaEntityProcessor and see if
you can just connect to your DB and index something
from there.

It appears you have a custom Transformer. Take that
out. Or put logging messages in there to see if you
even get that far.

In other words, try stuff and tell us what the results
are. But just saying "it doesn't work" gives us very
little to go on.

Best
Erick

On Sun, Oct 2, 2011 at 11:00 PM, scorpking  wrote:
> Hi, thanks for your reply.
> But, when i set attribute onError="skip", There is no data which import.
> This is my config.
> *
>     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> url="jdbc:sqlserver://myip;databaseName=VTC_Edu" user="ac" password="ps"
> name="dsdb"/>
>
>        
>
>        
>
>                 pk="pk_document_id"
> query="select pk_document_id, s_path_origin from
> [VTC_Edu].[dbo].[tbl_Document]" onError="skip"
>
>                >
>                
>                                 name="s_path_origin" />
>
>
>                 format="text"
> url=
> "http://media.gox.vn/edu/document/original/${VTCEduDocument.s_path_origin}";
>                transformer="com.vtc.search.Converter" onError="skip"
>                >
>                                 meta="true"/>
>                
>                
>      
>          
>
>    
> *
>
> Thanks
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-skip-current-document-to-index-data-from-DIH-tp3381894p3388700.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Memory managment and JVM setting for solr

2011-10-03 Thread Erick Erickson
I think you're going at this from the wrong end. There's
no magical way to give your process more memory
than you have on your machine. But, there's no
correlation between the size of your data, especially
when it contains video files and the size of your Solr
index. You're not going to index the binary stuff,
just the meta-data. How do you intend to extract it
anyway? Tika? If so, what format? I think Tika only
supports flash, but that's just a vague memory.

I think you need to back up and explain
a bit more about what you intend to accomplish,
this feels like an XY problem, see:
http://people.apache.org/~hossman/#xyproblem

Best
Erick

On Mon, Oct 3, 2011 at 3:41 AM, hadi  wrote:
>
> I have a large Data storage(About 500GB pdf and video files) and my machine
> have a 4GB of RAM capacity, i want to index these file with solrj API's ,
> what is the necessary setting for solrconfig and JVM to ignore heap size
> problem and other memory crash down during indexing? is it any configuration
> to force garbage collector to deallocate the memory during indexing?
>
> thanks
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Memory-managment-and-JVM-setting-for-solr-tp3389093p3389093.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SOLR HttpCache Qtime

2011-10-03 Thread Erick Erickson
Why do you want to? QTime is the time Solr
spends searching. The cached value will,
indeed, be from the query that filled
in the HTTP cache. But what are you doing
with that information that you want to "correct"
it?

That said, I have no clue how you'd attempt to
do this.

Best
Erick

On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Han  wrote:
> Hi,
>
> Is there anyway to get correct Qtime when we use http caching ? I think Solr
> caching also the Qtime so giving the the same Qtime in response what ever
> takes it to finish ..  How I can set Qtime correcly from solr when I use
> http caching On.
>
> thanks
>


Re: Enabling the right logs for DIH

2011-10-03 Thread Erick Erickson
Hmm, you know, I don't even know what
a "row" means when importing XML. But
let's talk about importing XML. As far as
I know, unless you use XSLT to perform
a transformation, Solr doesn't import XML
except as well-formed Solr documents,
some form like:


   value



If you're importing anything else, I don't think
Solr understands it at all... So what does
your "funky XML document" look like?
What, if any, errors are reported in your Solr
logs?

Also, it's surprisingly easy to debug Solr
when it runs. In IntelliJ, all it involves
is creating an application and you tell
it to add a "remote" application and
it'll give you the parameters you need to
specify when you start your Solr. From there
you just invoke your Solr instance with those
parameters and connect remotely. I took the
entire source tree for the Solr I was using
and compiled it (ant example) and it was
easy. So you might get more mileage
out of debugging in Solr rather than logging, but
that's a guess.

Best
Erick

On Sat, Oct 1, 2011 at 6:17 PM, Pulkit Singhal  wrote:
> 
> The Problem:
> 
> When using DIH with trunk 4.x, I am seeing some very funny numbers
> with a particularly large XML file that I'm trying to import. Usually
> there are bound to be more rows than documents indexed in DIH because
> of the foreach property but my other xm lfiles have maybe 1.5 times
> the rows compared to the # of docs indexed.
>
> This particular funky file ends up with something like:
> 25614008
> 1048
> That's 25 million rows fetched before even a measly 1000 docs are indexed!
> Something has to be wrong here.
> I checked the xml for well-formed-ness in vim by running "!:xmllint
> --noout %" so I think there are no issues there.
>
> 
> The Question:
> 
> For those intimately familiar with DIH code/behaviour: What is the
> appropriate log-level that will let me see the rows & docs printed out
> to log as each one is fetched/created? I don't want to make the logs
> explode because then I won't be able to read through them. Is there
> some gentle balance here that I can leverage?
>
> Thanks!
> - Pulkit
>


Re: Reg: Multicore vs singlecore.

2011-10-03 Thread Erick Erickson
Describe your test more. If you're asking why
your CPU isn't pegged, I'd have to ask "Are
you firing queries at it rapidly enough?"

Best
Erick

On Mon, Oct 3, 2011 at 2:42 AM,   wrote:
> hi,
> I am new to solr, so just want to clarify on few points
> I ran the test on the machine with following specification:-Ram : 5GBdual 
> core : 2.66 GHzIndex Size : 10GB
> 1) Is the hard disk read time is a bottle neck on solr performance? I ran few 
> tests using "jmeter" and analysis the system resources using "dstat". what I 
> found is that cpu is 50% idle, 40% waiting and 10% actual user processing all 
> the time. why cpu is in constant waiting state?
> 2) Atmost how many cores will give reasonable performance (under 2 second ) 
> on the index of  10GB?
> Hoping for a quick respone..ThanksKaran


Re: UniqueKey filed length exceeds

2011-10-03 Thread Erick Erickson
I really don't understand what you're trying to convey.
Let's see your schema definition for your
unique key. I really wouldn't imagine that fields
of that form *could* be anything except string. It's
not in the correct format for a Solr date, and it's
certainly not a numeric type.

If that doesn't make sense, please show us the error
(stack trace) you're seeing. But the schema
for the field in question is the important part.

Best
Erick

On Mon, Oct 3, 2011 at 9:28 AM, kiran.bodigam  wrote:
> I have defined time stamp as unique key but when i trying to search with the
> same it throws error any other alternate for StrField coz i can't increase
> the length of it or can't we apply analyzer for the same?
> My unique key:-MM-DD 13:54:11.414632
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/UniqueKey-filed-length-exceeds-tp3389759p3389759.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Boost Exact matches on Specific Fields

2011-10-03 Thread Erick Erickson
I'm not sure what you're asking here. Could you
show the results of appending &debugQuery=on
to your query? Along with what you expect to
happen and what is in the fields.

But KeywordTokenizer (in your "string_lower")
type is suspicious when you start using
multiple words. Your "agriculture foods"
query string won't match without quotes because
it'll get parsed into field:agriculture field:foods,
i.e. two tokens. But there will only be a single
token in the index because of the KeywordAnalyzer.
You might want WhitespaceTokenizer here..

Best
Erick

On Mon, Oct 3, 2011 at 2:13 PM, Balaji S  wrote:
> Hi
>
>   One More Question here ,  For ex: If so do an search for  "Agriculture
> Foods"  with out Quotes , It is trying to find the ones which have both the
> words not Splitting and checking for individual results . On removing the QF
> params it seems to work . Is it a problem with the QF params
>
>    I am able to see it split and trigger by checking the SOLR Analysis page
>
>
> Thanks
> Balaji
>
> On Thu, Sep 29, 2011 at 6:11 AM, Balaji S  wrote:
>
>> Yeah I will change the weight for str_category and make it higher . I
>> converted it to lowercase  because we cannot expect users to type them in
>> the correct case
>>
>> Thanks
>> Balaji
>>
>> On Thu, Sep 29, 2011 at 3:52 AM, Way Cool  wrote:
>>
>>> I will give str_category more weight than ts_category because we want
>>> str_category to win if they have "exact" matches ( you converted to
>>> lowercase).
>>>
>>> On Mon, Sep 26, 2011 at 10:23 PM, Balaji S  wrote:
>>>
>>> > Hi
>>> >
>>> >   You mean to say copy the String field to a Text field or the reverse .
>>> > This is the approach I am currently following
>>> >
>>> > Step 1: Created a FieldType
>>> >
>>> >
>>> >     >> > sortMissingLast="true" omitNorms="true">
>>> >        
>>> >            
>>> >    
>>> >    
>>> >        
>>> >     
>>> >
>>> > Step 2 : >> > stored="true"/>
>>> >
>>> > Step 3 : 
>>> >
>>> > And in the SOLR Query planning to q=hospitals&qf=body^4.0 title^5.0
>>> > ts_category^10.0 str_category^8.0
>>> >
>>> >
>>> > The One Question I have here is All the above mentioned fields will have
>>> > "Hospital" present in them , will the above approach work to get the
>>> exact
>>> > match on the top and bring "Hospitalization" below in the results
>>> >
>>> >
>>> > Thanks
>>> > Balaji
>>> >
>>> >
>>> > On Tue, Sep 27, 2011 at 9:38 AM, Way Cool 
>>> wrote:
>>> >
>>> > > If I were you, probably I will try defining two fields:
>>> > > 1. ts_category as a string type
>>> > > 2. ts_category1 as a text_en type
>>> > > Make sure copy ts_category to ts_category1.
>>> > >
>>> > > You can use the following as qf in your dismax:
>>> > > qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0
>>> > > or something like that.
>>> > >
>>> > > YH
>>> > > http://thetechietutorials.blogspot.com/
>>> > >
>>> > >
>>> > > On Mon, Sep 26, 2011 at 2:06 PM, balaji  wrote:
>>> > >
>>> > > > Hi all
>>> > > >
>>> > > >    I am new to SOLR and have a doubt on Boosting the Exact Terms to
>>> the
>>> > > top
>>> > > > on a Particular field
>>> > > >
>>> > > > For ex :
>>> > > >
>>> > > >     I have a text field names ts_category and I want to give more
>>> boost
>>> > > to
>>> > > > this field rather than other fields, SO in my Query I pass the
>>> > following
>>> > > in
>>> > > > the QF params "qf=body^4.0 title^5.0 ts_category^21.0" and also sort
>>> on
>>> > > > SCORE desc
>>> > > >
>>> > > >     When I do a search against "Hospitals" . I get "Hospitalization
>>> > > > Management , Hospital Equipment & Supplies " on Top rather than the
>>> > exact
>>> > > > matches of "Hospitals"
>>> > > >
>>> > > >      So It would be great , If I could be helped over here
>>> > > >
>>> > > >
>>> > > > Thanks
>>> > > > Balaji
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > Thanks in Advance
>>> > > > Balaji
>>> > > >
>>> > > > --
>>> > > > View this message in context:
>>> > > >
>>> > >
>>> >
>>> http://lucene.472066.n3.nabble.com/Boost-Exact-matches-on-Specific-Fields-tp3370513p3370513.html
>>> > > > Sent from the Solr - User mailing list archive at Nabble.com.
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>


Re: How to skip current document to index data from DIP

2011-10-03 Thread scorpking
Hi Erick Erickson
Thank you for reply for me, In my config, i indexed successful data from
HTTP using Tika. I combined a field and url in Tika to get file by that
http. But during indexing, i have seen some URL which is not exist or
notice:

*Caused by: java.io.FileNotFoundException:
http://media.gox.vn/edu/document/original/1/2704201010071760_Bai25.ppt
*

it mean that, this file is not exist in server. i want to skip file
(documents) to index next files. I tried to use *onError="skip"* to continue
index from file rich documents but it doesn't work and stop at. Is there a
way to overcome this problem?

Best Regard
Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-skip-current-document-to-index-data-from-DIH-tp3381894p3392055.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: solr searching for special characters?

2011-10-03 Thread vighnesh
thanks for giving replay 

how it is possible also explain me and which tokenizer class can support for
finding the special characters .

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-searching-for-special-characters-tp3388974p3392157.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query failing because of omitTermFreqAndPositions

2011-10-03 Thread Isan Fulia
Hi Mike,

Thanks for the information.But why is it that once omiited positions in the
past , it will always omit positions
even if omitPositions is made false.

Thanks,
Isan Fulia.

On 29 September 2011 17:49, Michael McCandless wrote:

> Once a given field has omitted positions in the past, even for just
> one document, it "sticks" and that field will forever omit positions.
>
> Try creating a new index, never omitting positions from that field?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Sep 29, 2011 at 1:14 AM, Isan Fulia 
> wrote:
> > Hi All,
> >
> > My schema consisted of field textForQuery which was defined as
> >  > multiValued="true"/>
> >
> > After indexing 10 lakhs  of  documents  I changed the field to
> >  > multiValued="true" *omitTermFreqAndPositions="true"*/>
> >
> > So documents that were indexed after that omiited the position
> information
> > of the terms.
> > As a result I was not able to search the text which rely on position
> > information for eg. "coke studio at mtv" even though its present in some
> > documents.
> >
> > So I again changed the field textForQuery to
> >  > multiValued="true"/>
> >
> > But now even for new documents added  the query requiring positon
> > information is still failing.
> > For example i reindexed certain documents that consisted of "coke studio
> at
> > mtv" but still the query is not returning any documents when searched for
> > *textForQuery:"coke studio at mtv"*
> >
> > Can anyone please help me out why this is happening
> >
> >
> > --
> > Thanks & Regards,
> > Isan Fulia.
> >
>



-- 
Thanks & Regards,
Isan Fulia.


Deploy Solritas as a separate application?

2011-10-03 Thread jwang
Solritas is a nice search UI integrated with Solr with many features we could
use. However we do not want to build our UI into our Solr instance. We will
have a front-end web app interfacing with Solr. Is there an easy way to
deploy Solritas as a separate application (e.g., Solritas with SolrJ to
query a backend Solr instance)?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deploy-Solritas-as-a-separate-application-tp3392326p3392326.html
Sent from the Solr - User mailing list archive at Nabble.com.