Re: [Announce] Solr 3.5 with RankingAlgorithm 1.3, NRT support

2011-12-27 Thread Dmitry Kan
Hello Nagendra,

Congratulations on the new release!

In terms of downloading: does one need to be registered on the site do
download the bundle? The download links lead to
http://solr-ra.tgels.org/solr-ra.jsp.

Regards,

Dmitry Kan

On Tue, Dec 27, 2011 at 4:30 PM, Nagendra Nagarajayya <
nnagaraja...@transaxtions.com> wrote:

> Hi!
>
> I am very excited to announce the availability of Solr 3.5 with
> RankingAlgorithm 1.3 (NRT support). The performance to add 1 million docs
> in NRT to the MBArtists index with 1 concurrent request thread executing
> *:* is about 5000 docs in 498 ms. The query performance is about 168K query
> requests at 4.2 ms / request.
>
> RankingAlgorithm 1.3 supports the entire Lucene Query Syntax, +/- and/or
> boolean queries.
> RankingAlgorithm is very fast allows you to  query a 10m wikipedia index
> (complete index) in <50 ms.
>
> You can get more information about NRT performance from here:
> http://solr-ra.tgels.org/wiki/**en/Near_Real_Time_Search_ver3.**x
>
> You can download Solr 3.5 with RankingAlgorithm 1.3 from here:
> http://solr-ra.tgels.org
>
> Please download and give the new version a try.
>
>
> Regards,
>
> Nagendra Nagarajayya
> http://solr-ra.tgels.org
> http://rankingalgorithm.tgels.**org 
>
>


[Solr 3.5] Facets and stats become a lot slower during concurrent inserts

2011-12-27 Thread Lyuba Romanchuk
Hi,

I test facets and stats in Solr 3.5 and I see that queries are running a
lot slower during inserts into index with more than 15M documents .
If I stop to insert new documents facet/stats queries run 10-1000 times
faster than with concurrent inserts.
I don't see this degradation in Lucene.

Could you please explain what may cause this?
Is it Solr related issue only?

Thank you for help.

Best regards,
Lyuba


How to run the solr dedup for the document which match 80% or match almost.

2011-12-27 Thread vibhoreng04
Hi,

I am doing dedup for my solr instance which works on the content and the url
fields.My question is if I want to eliminate the records which are 80%
matching or 90% matching in the content field then how I should proceed for
that?
Already I have changed my solrconfig.xml and have changed the part of file
which is required for the dedup(update Request Processor chain) and that
part is working fine.


Regards,

Vibhor


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-run-the-solr-dedup-for-the-document-which-match-80-or-match-almost-tp3614239p3614239.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Mutivalue field search on different elements

2011-12-27 Thread Gora Mohanty
On Tue, Dec 27, 2011 at 6:11 PM, meghana  wrote:
> Hi iorixxx,
>
> I have changed my multiValued field to single value filed.. and now my field
> appears as below
> -
> 1s: This is very nice day. 3s: Christmas is about come and christmas
> 4s:preparation is just on
> -

Your question is not very clear. What is meant by the above: Is this
the value of the single-valued field in one document in your index?
What is "1s", "3s", "4s" above? Are they part of the field value?

> but by doing this, i don't get my search : "christmas preparation" to be
> matched on my search query , although i had set my positionIncrementGap to
> 0.  any ideas why it is not matching ??

positionIncrementGap has no effect on a single-valued field.

It might be easier if you explained what you are trying to achieve.

Regards,
Gora


[Announce] Solr 3.5 with RankingAlgorithm 1.3, NRT support

2011-12-27 Thread Nagendra Nagarajayya

Hi!

I am very excited to announce the availability of Solr 3.5 with 
RankingAlgorithm 1.3 (NRT support). The performance to add 1 million 
docs in NRT to the MBArtists index with 1 concurrent request thread 
executing *:* is about 5000 docs in 498 ms. The query performance is 
about 168K query requests at 4.2 ms / request.


RankingAlgorithm 1.3 supports the entire Lucene Query Syntax, +/- and/or 
boolean queries.
RankingAlgorithm is very fast allows you to  query a 10m wikipedia index 
(complete index) in <50 ms.


You can get more information about NRT performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver3.x

You can download Solr 3.5 with RankingAlgorithm 1.3 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.


Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org



Re: Solr - Mutivalue field search on different elements

2011-12-27 Thread meghana
Hi iorixxx,

I have changed my multiValued field to single value filed.. and now my field
appears as below
-
1s: This is very nice day. 3s: Christmas is about come and christmas
4s:preparation is just on
-
but by doing this, i don't get my search : "christmas preparation" to be
matched on my search query , although i had set my positionIncrementGap to
0.  any ideas why it is not matching ?? 

Please help me.
Meghana

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Mutivalue-field-search-on-different-elements-tp3604213p3614313.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuring Replication

2011-12-27 Thread Ahson Iqbal
Hi Ahmet

Thank you for your response 
both of the following urls 

http://localhost:8983
http://localhost:8983/solr

are working
and also it is not a multi core setup.

Regards
Ahsan




 From: Ahmet Arslan 
To: solr-user@lucene.apache.org; Ahson Iqbal  
Sent: Tuesday, December 27, 2011 12:44 PM
Subject: Re: Configuring Replication
 
> I just configured the master server as it is specified in
> solr replication wiki page, nothing is indexed yet on master
> neither on  slave, 
> And in solr replication wiki page they have mentioned that
> after configuring master server if you hit the following url
> in web browser 
> 
> http://localhost:8983/solr/replication
> [http://localhost:8983/solr is master server's url ]
> 
> you should get response OK
> 
> but unfortunately, i am getting 404 Error with following
> message 
> 
> HTTP Status 404 - /solr/replication

May be you have multi-core set-up? Then you should add the coreName to your URL 
e.g. http://localhost:8983/coreName/replication

What happens when you hit the following ULRs?

http://localhost:8983
http://localhost:8983/solr

Re: How can I check if a more complex query condition matched?

2011-12-27 Thread Ahmet Arslan
> I have a more complex query condition
> like this:
> 
> (city:15 AND country:60)^4 OR city:15^2 OR country:60^2
> 
> What I want to achive with this query is basically if a
> document has
> city = 15 AND country = 60 it is more important then
> another document
> which only has city = 15 OR country = 60
> 
> Furhtermore I want to show in my results view why a certain
> document
> matched, something like "matched city and country" or
> "matched city
> only" or "matched country only".
> 
> This is a bit of an simplified example, but the question
> remains: how
> can solr tell me which of the conditions in the query
> matched? If I
> match against a simple field only, I can get away with
> highlight
> fields, but conditions spanning multiple fields seem much
> more tricky.

Looks like you can extract these info from output of &debugQuery=on.
http://wiki.apache.org/solr/CommonQueryParameters#debugQuery



Re: Solr - Mutivalue field search on different elements

2011-12-27 Thread Ahmet Arslan
> I have changed my multiValued field to single value filed..
> and now my field
> appears as below
> -
> 1s: This is very nice day. 3s: Christmas is about come and
> christmas
> 4s:preparation is just on
> -
> but by doing this, i don't get my search : "christmas
> preparation" to be
> matched on my search query , although i had set my
> positionIncrementGap to
> 0.  any ideas why it is not matching ?? 

So you concatenated your sentences. If you could delete 1s, 2s, .. 4s too, both 
search with phrase and highlighting will work.

At the end, your field value would be : 
-
This is very nice day. Christmas is about come and christmas preparation is 
just on
-


Re: [Solr 3.5] Facets and stats become a lot slower during concurrent inserts

2011-12-27 Thread Yonik Seeley
On Tue, Dec 27, 2011 at 10:43 AM, Lyuba Romanchuk
 wrote:
> I test facets and stats in Solr 3.5 and I see that queries are running a
> lot slower during inserts into index with more than 15M documents .

Are you also doing commits (or have autocommit enabled)?
The first time a facet command is used for a field after a commit,
certain data structures need to be constructed.
To avoid slow first requests like this, you can add a request that
does the faceting as a static warming query that will be run before
any live queries use the new searcher.

-Yonik
http://www.lucidimagination.com


Re: Configuring Replication

2011-12-27 Thread Ahson Iqbal
Hi Ahmet

Same issue, one more thing i am using solr 1.4.1 with tomcat 7.0

Regards
Ahsan



 From: Ahmet Arslan 
To: solr-user@lucene.apache.org; Ahson Iqbal  
Sent: Tuesday, December 27, 2011 2:51 PM
Subject: Re: Configuring Replication
 

> Thank you for your response 
> both of the following urls 
> 
> http://localhost:8983
> http://localhost:8983/solr
> 
> are working
> and also it is not a multi core setup.

What happens when you use class="solr.ReplicationHandler" instead of 
class="org.apache.solr.ReplicationHandler" in your solrconfig.xml?

Re: How to run the solr dedup for the document which match 80% or match almost.

2011-12-27 Thread vibhoreng04
Hi iorixxx,

Thanks for the quick update.I hope I can take it from here !


Regards,

Vibhor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-run-the-solr-dedup-for-the-document-which-match-80-or-match-almost-tp3614239p3614253.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Mutivalue field search on different elements

2011-12-27 Thread meghana
Hi iorixxx,

Sorry for confusion in my question...

yes , "1s", "3s", "4s" are part of my field value.. i have my data in this
format. and the field is non-multivalued field (single valued). 

so as PositionIncrementGap is only work for multivalued field ,  in my
search i always have to apply slop in my search.  

Thanks for reply.
Meghana. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Mutivalue-field-search-on-different-elements-tp3604213p3614365.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to run the solr dedup for the document which match 80% or match almost.

2011-12-27 Thread Ahmet Arslan
> I am doing dedup for my solr instance which works on the
> content and the url
> fields.My question is if I want to eliminate the records
> which are 80%
> matching or 90% matching in the content field then how I
> should proceed for
> that?
> Already I have changed my solrconfig.xml and have changed
> the part of file
> which is required for the dedup(update Request Processor
> chain) and that
> part is working fine.

You can use TextProfileSignature, which is a Fuzzy hashing implementation, 
instead of Lookup3Signature. 


Re: [Announce] Solr 3.5 with RankingAlgorithm 1.3, NRT support

2011-12-27 Thread Nagendra Nagarajayya

Yes, you will need to register to download the bundle or the war file.

Regards,
 Nagendra Nagarajayya
http://solr-ra.tgels.org  

http://rankingalgorithm.tgels.org  


--

Hello Nagendra,

Congratulations on the new release!

In terms of downloading: does one need to be registered on the site do
download the bundle? The download links lead to
http://solr-ra.tgels.org/solr-ra.jsp  
.

Regards,

Dmitry Kan

On Tue, Dec 27, 2011 at 4:30 PM, Nagendra Nagarajayya<
nnagaraja...@transaxtions.com  
>
  wrote:


 Hi!

 I am very excited to announce the availability of Solr 3.5 with
 RankingAlgorithm 1.3 (NRT support). The performance to add 1 million docs
 in NRT to the MBArtists index with 1 concurrent request thread executing
 *:* is about 5000 docs in 498 ms. The query performance is about 168K query
 requests at 4.2 ms / request.

 RankingAlgorithm 1.3 supports the entire Lucene Query Syntax, +/- and/or
 boolean queries.
 RankingAlgorithm is very fast allows you to  query a 10m wikipedia index
 (complete index) in<50 ms.

 You can get more information about NRT performance from here:
 
http://solr-ra.tgels.org/wiki/**en/Near_Real_Time_Search_ver3.**x>

 You can download Solr 3.5 with RankingAlgorithm 1.3 from here:
 http://solr-ra.tgels.org  


 Please download and give the new version a try.


 Regards,

 Nagendra Nagarajayya
 http://solr-ra.tgels.org  

 http://rankingalgorithm.tgels.**org  
 
 >






Re: [Solr 3.5] Facets and stats become a lot slower during concurrent inserts

2011-12-27 Thread Lyuba Romanchuk
autoCommit is disabled in solrconfig.xml and I use
SolrServer::addBeans(beans, 100) for inserts.
I need to insert new documents continually in high rate with concurrent
running queries.

Best regards,
Lyuba

On Tue, Dec 27, 2011 at 6:15 PM, Yonik Seeley wrote:

> On Tue, Dec 27, 2011 at 10:43 AM, Lyuba Romanchuk
>  wrote:
> > I test facets and stats in Solr 3.5 and I see that queries are running a
> > lot slower during inserts into index with more than 15M documents .
>
> Are you also doing commits (or have autocommit enabled)?
> The first time a facet command is used for a field after a commit,
> certain data structures need to be constructed.
> To avoid slow first requests like this, you can add a request that
> does the faceting as a static warming query that will be run before
> any live queries use the new searcher.
>
> -Yonik
> http://www.lucidimagination.com
>


Re: solr keep old docs

2011-12-27 Thread Alexander Aristov
Hi

I am not using database. All needed data is in solr index that's why I want
to skip excessive checks.

I will check DIH but not sure if it helps.

I am fluent with Java and it's not a problem for me to write a class or so
but I want to check first  maybe there are any ways (workarounds) to make
it working without codding, just by playing around with configuration and
params. I don't want to go away from default solr implementation.

Best Regards
Alexander Aristov


On 27 December 2011 09:33, Mikhail Khludnev wrote:

> On Tue, Dec 27, 2011 at 12:26 AM, Alexander Aristov <
> alexander.aris...@gmail.com> wrote:
>
> > Hi people,
> >
> > I urgently need your help!
> >
> > I have solr 3.3 configured and running. I do uncremental indexing 4
> times a
> > day using bulk updates. Some documents are identical to some extent and I
> > wish to skip them, not to index.
> > But here is the problem as I could not find a way to tell solr ignore new
> > duplicate docs and keep old indexed docs. I don't care that it's new.
> Just
> > determine by ID that such document is in the index already and that's it.
> >
> > I use solrj for indexing. I have tried setting overwrite=false and dedupe
> > apprache but nothing helped me. I either have that a newer doc overwrites
> > old one or I get duplicate.
> >
> > I think it's a very simple and basic feature and it must exist. What did
> I
> > make wrong or didn't do?
> >
>
> I guess, because  the mainstream approach is delta-import , when you have
> "updated" timestamps in your DB and "last-import" timestamp stored
> somewhere. You can check how it works in DIH.
>
>
> >
> > Tried google but I couldn't find a solution there althoght many people
> > encounted such problem.
> >
> >
> it's definitely can be done by overriding
> o.a.s.update.DirectUpdateHandler2.addDoc(AddUpdateCommand), but I suggest
> to start from implementing your own
> http://wiki.apache.org/solr/UpdateRequestProcessor - search for PK, bypass
> chain call if it's found. Then if you meet performance issues on querying
> your PKs one by one, (but only after that) you can batch your searches,
> there are couple of optimization techniques for huge disjunction queries
> like PK:(2 OR 4 OR 5 OR 6).
>
>
> > I start considering that I must query index to check if a doc to be added
> > is in the index already and do not add it to array but I have so many
> docs
> > that I am affraid it's not a good solution.
> >
> > Best Regards
> > Alexander Aristov
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
>


Re: Configuring Replication

2011-12-27 Thread Erick Erickson
I suspect you haven't enabled the replication handler in solrconfig.xml.

Look in solrconfig.xml for a line like:


by default, I believe it's commented out. Have you uncommented it?

Best
Erick

On Tue, Dec 27, 2011 at 5:38 AM, Ahson Iqbal  wrote:
> Hi Ahmet
>
> Same issue, one more thing i am using solr 1.4.1 with tomcat 7.0
>
> Regards
> Ahsan
>
>
> 
>  From: Ahmet Arslan 
> To: solr-user@lucene.apache.org; Ahson Iqbal 
> Sent: Tuesday, December 27, 2011 2:51 PM
> Subject: Re: Configuring Replication
>
>
>> Thank you for your response
>> both of the following urls
>>
>> http://localhost:8983
>> http://localhost:8983/solr
>>
>> are working
>> and also it is not a multi core setup.
>
> What happens when you use class="solr.ReplicationHandler" instead of 
> class="org.apache.solr.ReplicationHandler" in your solrconfig.xml?


Re: Configuring Replication

2011-12-27 Thread Ahmet Arslan

> Thank you for your response 
> both of the following urls 
> 
> http://localhost:8983
> http://localhost:8983/solr
> 
> are working
> and also it is not a multi core setup.

What happens when you use class="solr.ReplicationHandler" instead of 
class="org.apache.solr.ReplicationHandler" in your solrconfig.xml?


How can I check if a more complex query condition matched?

2011-12-27 Thread Max
I have a more complex query condition like this:

(city:15 AND country:60)^4 OR city:15^2 OR country:60^2

What I want to achive with this query is basically if a document has
city = 15 AND country = 60 it is more important then another document
which only has city = 15 OR country = 60

Furhtermore I want to show in my results view why a certain document
matched, something like "matched city and country" or "matched city
only" or "matched country only".

This is a bit of an simplified example, but the question remains: how
can solr tell me which of the conditions in the query matched? If I
match against a simple field only, I can get away with highlight
fields, but conditions spanning multiple fields seem much more tricky.

Thanks for any ideas on this!


Re: solr keep old docs

2011-12-27 Thread Erick Erickson
Mikhail is right as far as I know, the assumption built into Solr is that
duplicate IDs (when  is defined) should trigger the old
document to be replaced.

what is your system-of-record? By that I mean what does your SolrJ
program do to send data to Solr? Is there any way you could just
*not* send documents that are already in the Solr index based on,
for instance, any timestamp associated with your system-of-record
and the last time you did an incremental index?

Best
Erick

On Tue, Dec 27, 2011 at 6:38 AM, Alexander Aristov
 wrote:
> Hi
>
> I am not using database. All needed data is in solr index that's why I want
> to skip excessive checks.
>
> I will check DIH but not sure if it helps.
>
> I am fluent with Java and it's not a problem for me to write a class or so
> but I want to check first  maybe there are any ways (workarounds) to make
> it working without codding, just by playing around with configuration and
> params. I don't want to go away from default solr implementation.
>
> Best Regards
> Alexander Aristov
>
>
> On 27 December 2011 09:33, Mikhail Khludnev wrote:
>
>> On Tue, Dec 27, 2011 at 12:26 AM, Alexander Aristov <
>> alexander.aris...@gmail.com> wrote:
>>
>> > Hi people,
>> >
>> > I urgently need your help!
>> >
>> > I have solr 3.3 configured and running. I do uncremental indexing 4
>> times a
>> > day using bulk updates. Some documents are identical to some extent and I
>> > wish to skip them, not to index.
>> > But here is the problem as I could not find a way to tell solr ignore new
>> > duplicate docs and keep old indexed docs. I don't care that it's new.
>> Just
>> > determine by ID that such document is in the index already and that's it.
>> >
>> > I use solrj for indexing. I have tried setting overwrite=false and dedupe
>> > apprache but nothing helped me. I either have that a newer doc overwrites
>> > old one or I get duplicate.
>> >
>> > I think it's a very simple and basic feature and it must exist. What did
>> I
>> > make wrong or didn't do?
>> >
>>
>> I guess, because  the mainstream approach is delta-import , when you have
>> "updated" timestamps in your DB and "last-import" timestamp stored
>> somewhere. You can check how it works in DIH.
>>
>>
>> >
>> > Tried google but I couldn't find a solution there althoght many people
>> > encounted such problem.
>> >
>> >
>> it's definitely can be done by overriding
>> o.a.s.update.DirectUpdateHandler2.addDoc(AddUpdateCommand), but I suggest
>> to start from implementing your own
>> http://wiki.apache.org/solr/UpdateRequestProcessor - search for PK, bypass
>> chain call if it's found. Then if you meet performance issues on querying
>> your PKs one by one, (but only after that) you can batch your searches,
>> there are couple of optimization techniques for huge disjunction queries
>> like PK:(2 OR 4 OR 5 OR 6).
>>
>>
>> > I start considering that I must query index to check if a doc to be added
>> > is in the index already and do not add it to array but I have so many
>> docs
>> > that I am affraid it's not a good solution.
>> >
>> > Best Regards
>> > Alexander Aristov
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Lucid Certified
>> Apache Lucene/Solr Developer
>> Grid Dynamics
>>


Re: How to run the solr dedup for the document which match 80% or match almost.

2011-12-27 Thread Shashi Kant
You can also look at cosine similarity (or related metrics) to measure
document similarity.

On Tue, Dec 27, 2011 at 6:51 AM, vibhoreng04  wrote:
> Hi iorixxx,
>
> Thanks for the quick update.I hope I can take it from here !
>
>
> Regards,
>
> Vibhor
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-run-the-solr-dedup-for-the-document-which-match-80-or-match-almost-tp3614239p3614253.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing only unique terms in index

2011-12-27 Thread Chris Hostetter

: I have catchall "text" field, and use it for searching.This field
: stores the non-unique terms. For example, this field stores the
: following terms:test test searchIs it possible to store non-unique
: terms in the following way: "term"|"number of terms", i.e. test|2
: search?
: I guess it should reduce the size of index
: 
: And if yes - is it possible to use this number of terms when
: calculating the relevance?

what you are describing is exactly how an inverted index like Lucene/Solr 
works -- the original raw text can optionally be "stored" for retrieval, 
but the index that is *searched* contains each term a single time, along 
with pointers refering to which documents and where in those documents the 
term exists.  the number of times a term exists in a document is the term 
frequency (or "tf") and is one of the two primary components used in 
the basic scoring formula (TF/IDF)

https://lucene.apache.org/java/3_5_0/fileformats.html
https://en.wikipedia.org/wiki/Tf%E2%80%93idf



-Hoss

Re: [Solr 3.5] Facets and stats become a lot slower during concurrent inserts

2011-12-27 Thread Chris Hostetter
: autoCommit is disabled in solrconfig.xml and I use
: SolrServer::addBeans(beans, 100) for inserts.

have you looked at the javadocs for that method?

https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/SolrServer.html#addBean%28java.lang.Object,%20int%29

  public UpdateResponse addBean(Object obj,
int commitWithinMs)
  Parameters:
obj - the input bean
commitWithinMs - max time (in ms) before a commit will happen

...so you are in fact asking solr to do a commit within 0.1 second of 
every document you add.

: > Are you also doing commits (or have autocommit enabled)?
: > The first time a facet command is used for a field after a commit,
: > certain data structures need to be constructed.
: > To avoid slow first requests like this, you can add a request that
: > does the faceting as a static warming query that will be run before
: > any live queries use the new searcher.


-Hoss


Re: VelocityResponseWriter's future

2011-12-27 Thread Jan Høydahl
Hi,

I think a "/browse" type of experience is crucial for newcomers to quickly get 
familiar with Solr.
Whether it's Velocity based, AJAX based or another technology is less important.
I personally like VRW and frequently use it as the first step in prototyping in 
a project. I've also contributed patches to fix bugs and make it more usable.
So unless a new and better alternative is in already in place (I love the idea 
of AJAX-ifying things), I vote for keeping VRW, but lazy loading it not to 
annoy people copying example/ around.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 9. des. 2011, at 22:30, Erik Hatcher wrote:

> So I thought that Solr having a decent HTML search UI out of the box was a 
> good idea.  I still do.  But it's been a bit of a pain to maintain 
> (originally it was a contrib module, then core, then folks didn't want it as 
> a core dependency, and now it is back as a contrib), and the UI has 
> accumulated a fair bit of cruft/ugliness as folks have tacked on "the kitchen 
> sink" into it compared to my idealistic generic (not specific to the example 
> data) lean and clean sensibilities.
> 
> What should be done?  Who actually cares about VRW or the /browse interface?  
> And if you do care, what do you like or dislike about it?  And if you really 
> really care, patches welcome! ;)
> 
> Perhaps, as I'm starting to feel in general about open source pet projects, 
> add-on's, "monkey patches" to open source software, it should be moved out of 
> Solr's repo altogether and maintained elsewhere (say my personal or Lucid's 
> github).
> 
> I appreciate your candid thoughts on this.
> 
>   Erik
> 



Using sort_values (fsv=true parameter) and Field Collapsing (group=true) at the same time

2011-12-27 Thread Jose Aguilar
Hi all,

I am using Solr 4.0 trunk with the Field Collapsing feature 
(http://wiki.apache.org/solr/FieldCollapsing) and I notice that when used at 
the same time as the fsv=true parameter, the sort_values in the response is 
gone. I haven't found much information about the fsv parameter, so I turned to 
the list to see if someone here can help us out, or shed some light if there is 
any incompatibility between the two features (which is what I think is 
happening, because of the field collapse implementation). Or maybe give us some 
pointers on how to achieve a similar effect.

We use fsv=true to help in debugging as to why one document was sorted on top 
of the other when using certain sort orders in our application, so this is a  
great way to visualize this and save us debugging time.

To clarify further, we send in this query to Solr expecting the , 
 and  tags to be on the response, with the  
arrays corresponding to the first element of each group:

http://localhost:8983/solr/select?wt=xml&fl=*&q=solr+memory&group=true&group.field=manu_exact&fsv=true&debugQuery=on…

But we don't get the  part back, we only get the following 
top-level tags in the response:

…
…
…


If we don't use Field Collapsing, and instead send in something like this:

http://localhost:8983/solr/select?wt=xml&fl=*&q=solr+memory&fsv=true&debugQuery=on…

Then we do get the  element in the response:







Is there some incompatibility between the two features? Any other way to 
retrieve this information in a way that would be compatible with field 
collapsing?

Thanks,

Jose Aguilar


Re: Looking for a good commit/merge strategy

2011-12-27 Thread Jan Høydahl
Have a look at http://wiki.apache.org/solr/NearRealtimeSearch which will help 
you (in TRUNK/4.0) with an efficient in-memory handling of NRT changes. Combine 
this with CommitWithin for persisting to disk: 
http://wiki.apache.org/solr/CommitWithin.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 13. des. 2011, at 14:34, peter_solr wrote:

> Hi all,
> 
> we are indexing real-time documents from various sources. Since we have
> multiple sources, we encounter quite a number of duplicates which we delete
> from the index. This mostly occurs within a short timeframe; deletes of
> older documents may happen, but they do not have a high priority. Search
> results do not need to be exactly reatime (they can be 1 minute or so
> behind), but facet counts should be correct as we use them to visualize
> frequencies in the data. We are now looking for a good commit/merge
> strategy. Any advice?
> 
> Thanks and best,
> Peter
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582294.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Custom content extractor for Solr Cell

2011-12-27 Thread Jan Høydahl
Hi John,

See discussion about the issue of indexing contents of ZIP files: 
https://issues.apache.org/jira/browse/SOLR-2416

Depending on your use case, you may be able to write a Tika parser which 
handles your specific case, such as uncompressing a GZIP file and using 
AutoDetect on its contents or similar. If you want to override the behaviour of 
Tika's parsing of certain MIME types, you can do this by specifying 
-Dtika.config= when starting Solr (3.5 or later), and 
it will obey your config. See Tika's web page for how to write your own parsers.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 5. des. 2011, at 22:02, John Bartak wrote:

> Is it possible to extract content for file types that Tika doesn’t support
> without changing and rebuilding Tika?  Do I need to specify a tika.config
> file in the solrconfig.xml file, and if so, what is the format of that file?
> 
> 
> 
> One example that I’m trying to solve is for a document management system
> where the files are compressed – so I’d like to have a content extractor
> that first decompresses the file and then delegates to the standard Solr
> content extraction mechanism.   Perhaps writing a custom extractor is more
> trouble than it is worth for this use case and I should just decompress the
> data before sending it to Solr?



Re: lower score for synonyms

2011-12-27 Thread Jan Høydahl
Hi,

Also see discussion in https://issues.apache.org/jira/browse/LUCENE-3130 for 
possible future way to do this with one field.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 6. des. 2011, at 13:47, Marc SCHNEIDER wrote:

> Hello,
> 
> You could create an other field and link to it the synonym analyzer. When
> querying set a lower boost for this field.
> 
> Marc.
> 
> On Tue, Dec 6, 2011 at 11:31 AM, Robert Brown  wrote:
> 
>> is it possible to lower the score for synonym matches?
>> 
>> we setup...
>> 
>> admin => administration
>> 
>> but if someone searches specifically for "admin", we want those specific
>> matches to rank higher than matches for "administration"
>> 
>> 
>> 
>> --
>> 
>> IntelCompute
>> Web Design & Local Online Marketing
>> 
>> http://www.intelcompute.com
>> 
>> 



Re: best practice to introducing singletons inside of Solr (IoC)

2011-12-27 Thread Mikhail Khludnev
Colleagues,

Don't hesitate to emit your opinion. Please!

Regards

On Wed, Dec 21, 2011 at 11:06 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello,
>
> I need to introduce several singletons inside of Solr and make them
> available for my own SearchHandlers, Components, and even QParsers, etc.
>
> Right now I use some kind of fake SolrRequestHandler which loads on init()
> and available everywhere through
> solrCore.getRequestHandler("wellknownName"). Then I downcast it everywhere
> and access the required methods. The same is possible with fake
> SearchComponent.
> Particularly my singletons are some additional fields schema (pretty
> sophisticated), and kind of request/response encoding facility.
> The typical Java hammer for such pins is Spring, but I've found puzzling
> to use
> http://static.springframework.org/spring/docs/3.0.x/javadoc-api/org/springframework/web/context/support/WebApplicationContextUtils.html
>
> What's the best way to do that?
>
> --
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
>
> 
>  
>
>


-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: How to run the solr dedup for the document which match 80% or match almost.

2011-12-27 Thread vibhoreng04
Hi Shashi,

That's correct  !But I need something for index time comparision.Can cosine
compare from the already indexed documents and compare the incrementally
indexed files ?



Regards,


Vibhor 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-run-the-solr-dedup-for-the-document-which-match-80-or-match-almost-tp3614239p3615787.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Mutivalue field search on different elements

2011-12-27 Thread meghana
i can't delete 1s ,2s ...etc from my field value , i have to keep text in
this format... so i'll apply slop in my search to do my needed search done.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Mutivalue-field-search-on-different-elements-tp3604213p3615816.html
Sent from the Solr - User mailing list archive at Nabble.com.


hl.boundaryScanner and hl.bs.chars

2011-12-27 Thread meghana
Hi all ,

i seen hl.boundaryScanner and hl.bs.chars parameters in solr highlighting
feature. but i didn't get its meaning exactly , what its use and how can i
use it in my search? 

My need is something like ,i want to set my fragment to start and end from
special character / string that i can specify and can set fragment length
dynamic (if possible) .

Can i do this by any way?? 
Meghana


--
View this message in context: 
http://lucene.472066.n3.nabble.com/hl-boundaryScanner-and-hl-bs-chars-tp3615838p3615838.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr keep old docs

2011-12-27 Thread Alexander Aristov
I get docs from external sources and the only place I keep them is solr
index. I have no a database or other means to track indexed docs (my
personal oppinion is that it might be a huge headache).

Some docs might change slightly in there original sources but I don't need
that changes. In fact I need original data only.

So I have no other ways but to either check if a document is already in
index before I put it to solrj array (read - query solr) or develop my own
update chain processor and implement ID check there and skip such docs.

Maybe it's wrong place to aguee and probably it's been discussed before but
I wonder why simple the overwrite parameter doesn't work here.

My oppinion it perfectly suits here. In combination with unique ID it can
cover all possible variants.

cases:

1. overwrite=true and uniquID exists then newer doc should overwrite the
old one.

2. overwrite=false and uniqueID exists then newer doc must be skipped since
old exists.

3. uniqueID doesn't exist then newer doc just gets added regardless if old
exists or not.


Best Regards
Alexander Aristov


On 27 December 2011 22:53, Erick Erickson  wrote:

> Mikhail is right as far as I know, the assumption built into Solr is that
> duplicate IDs (when  is defined) should trigger the old
> document to be replaced.
>
> what is your system-of-record? By that I mean what does your SolrJ
> program do to send data to Solr? Is there any way you could just
> *not* send documents that are already in the Solr index based on,
> for instance, any timestamp associated with your system-of-record
> and the last time you did an incremental index?
>
> Best
> Erick
>
> On Tue, Dec 27, 2011 at 6:38 AM, Alexander Aristov
>  wrote:
> > Hi
> >
> > I am not using database. All needed data is in solr index that's why I
> want
> > to skip excessive checks.
> >
> > I will check DIH but not sure if it helps.
> >
> > I am fluent with Java and it's not a problem for me to write a class or
> so
> > but I want to check first  maybe there are any ways (workarounds) to make
> > it working without codding, just by playing around with configuration and
> > params. I don't want to go away from default solr implementation.
> >
> > Best Regards
> > Alexander Aristov
> >
> >
> > On 27 December 2011 09:33, Mikhail Khludnev  >wrote:
> >
> >> On Tue, Dec 27, 2011 at 12:26 AM, Alexander Aristov <
> >> alexander.aris...@gmail.com> wrote:
> >>
> >> > Hi people,
> >> >
> >> > I urgently need your help!
> >> >
> >> > I have solr 3.3 configured and running. I do uncremental indexing 4
> >> times a
> >> > day using bulk updates. Some documents are identical to some extent
> and I
> >> > wish to skip them, not to index.
> >> > But here is the problem as I could not find a way to tell solr ignore
> new
> >> > duplicate docs and keep old indexed docs. I don't care that it's new.
> >> Just
> >> > determine by ID that such document is in the index already and that's
> it.
> >> >
> >> > I use solrj for indexing. I have tried setting overwrite=false and
> dedupe
> >> > apprache but nothing helped me. I either have that a newer doc
> overwrites
> >> > old one or I get duplicate.
> >> >
> >> > I think it's a very simple and basic feature and it must exist. What
> did
> >> I
> >> > make wrong or didn't do?
> >> >
> >>
> >> I guess, because  the mainstream approach is delta-import , when you
> have
> >> "updated" timestamps in your DB and "last-import" timestamp stored
> >> somewhere. You can check how it works in DIH.
> >>
> >>
> >> >
> >> > Tried google but I couldn't find a solution there althoght many people
> >> > encounted such problem.
> >> >
> >> >
> >> it's definitely can be done by overriding
> >> o.a.s.update.DirectUpdateHandler2.addDoc(AddUpdateCommand), but I
> suggest
> >> to start from implementing your own
> >> http://wiki.apache.org/solr/UpdateRequestProcessor - search for PK,
> bypass
> >> chain call if it's found. Then if you meet performance issues on
> querying
> >> your PKs one by one, (but only after that) you can batch your searches,
> >> there are couple of optimization techniques for huge disjunction queries
> >> like PK:(2 OR 4 OR 5 OR 6).
> >>
> >>
> >> > I start considering that I must query index to check if a doc to be
> added
> >> > is in the index already and do not add it to array but I have so many
> >> docs
> >> > that I am affraid it's not a good solution.
> >> >
> >> > Best Regards
> >> > Alexander Aristov
> >> >
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Lucid Certified
> >> Apache Lucene/Solr Developer
> >> Grid Dynamics
> >>
>


Re: hl.boundaryScanner and hl.bs.chars

2011-12-27 Thread Koji Sekiguchi

(11/12/28 15:29), meghana wrote:

Hi all ,

i seen hl.boundaryScanner and hl.bs.chars parameters in solr highlighting
feature. but i didn't get its meaning exactly , what its use and how can i
use it in my search?

My need is something like ,i want to set my fragment to start and end from
special character / string that i can specify and can set fragment length
dynamic (if possible) .


See solrconfig.xml in example:

  

  10
  .,!? 	


  

hl.bs.chars is effective only when SimpleBoundaryScanner is used.
SimpleBoundaryScanner scans stored data to backward and forward when creating
a snippet, until a character listed in hl.bs.chars.

Those features are effective for FastVectorHighlighter only.

koji
--
http://www.rondhuit.com/en/


Custom Solr FunctionQuery Error

2011-12-27 Thread Parvin Gasimzade
Hi all,

I have created custom Solr FunctionQuery in Solr 3.4.
I extended ValueSourceParser, ValueSource, Query and QParserPlugin classes.

I set the name parameter as "graph" inside GraphQParserPlugin class.

But when try to search i got an error. Search queries are

http://localhost:8080/solr/select/?q={!graph}test
http://localhost:8080/solr/select/?q=test&defType=graph

I also add the * *into
solrConfig.xml but i got the same error...

Error message is :

Dec 27, 2011 7:05:20 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Unknown query type 'graph'
 at org.apache.solr.core.SolrCore.getQueryPlugin(SolrCore.java:1517)
at org.apache.solr.search.QParser.getParser(QParser.java:316)
 at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:80)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88)
 at
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:185)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
 at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:151)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
 at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:269)
 at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:679)

Thank you for your help.

Best Regards,
Parvin