Re: Results after using Field Collapsing are not matching the results without using Field Collapsing

2009-12-21 Thread Varun Gupta
Hi Martijn,

Yes, it is working after making these changes.

--
Thanks
Varun Gupta

On Sun, Dec 20, 2009 at 5:54 PM, Martijn v Groningen <
martijn.is.h...@gmail.com> wrote:

> Hi Varun,
>
> Yes, after going over the code I think you are right. If you change
> the following if block in SolrIndexSearcher.getDocSet(Query query,
> DocSet filter, DocSetAwareCollector collector):
> if (first==null) {
>first = getDocSetNC(absQ, null);
>filterCache.put(absQ,first);
> }
> with:
> if (first==null) {
>first = getDocSetNC(absQ, null, collector);
>filterCache.put(absQ,first);
> }
> It should work then. Let me know if this solves your problem.
>
> Martijn
>
>
> 2009/12/18 Varun Gupta :
> > After a lot of debugging, I finally found why the order of collapse
> results
> > are not matching the uncollapsed results. I can't say if it is a bug in
> the
> > implementation of fieldcollapse or not.
> >
> > *Explaination:*
> > Actually, I am querying the fieldcollapse with some filters to restrict
> the
> > collapsing to some particular categories only by appending the parameter:
> > fq=ctype:(1+2+8+6+3).
> >
> > In: NonAdjacentDocumentCollapser.doQuery()
> > Line: DocSet filter = searcher.getDocSet(filterQueries);
> >
> > Here, filter docset is got without any scores (since I have filter in my
> > query, this line actually gets executed) and also stored in the filter
> > cache. In the next line in the code, the actual uncollapsed DocSet is got
> > passing the DocSetScoreCollector.
> >
> > Now, in: SolrIndexSearcher.getDocSet(Query query, DocSet filter,
> > DocSetAwareCollector collector)
> > Line: if (filterCache != null)
> > Because of the filter cache not being null, and no result for the query
> in
> > the cache, the line: first = getDocSetNC(absQ,null); gets executed.
> Notice,
> > over here the DocSetScoreCollector is not passed. Hence, results are
> > collected without any scores.
> >
> > This makes the uncollapsedDocSet to be without any scores and hence the
> > sorting is not done based on score.
> >
> > @Martijn: Is what I am right or I should use field collapsing in some
> other
> > way. Else, what is the ideal fix for this problem (I am not an active
> > developer, so can't say the fix that I do will not break anything).
> >
> > --
> > Thanks,
> > Varun Gupta
> >
> >
> > On Mon, Dec 14, 2009 at 10:35 AM, Varun Gupta  >wrote:
> >
> >> When I used collapse.threshold=1, out of the 5 categories 4 had the same
> >> top result, but 1 category had a different result (it was the 3rd result
> >> coming for that category when I used threshold as 3).
> >>
> >> --
> >> Thanks,
> >> Varun Gupta
> >>
> >>
> >>
> >> On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen <
> >> martijn.is.h...@gmail.com> wrote:
> >>
> >>> I would not expect that Solr 1.4 build is the cause of the problem.
> >>> Just out of curiosity does the same happen when collapse.threshold=1?
> >>>
> >>> 2009/12/11 Varun Gupta :
> >>> > Here is the field type configuration of ctype:
> >>> > >>> > omitNorms="true" />
> >>> >
> >>> > In solrconfig.xml, this is how I am enabling field collapsing:
> >>> > >>> > class="org.apache.solr.handler.component.CollapseComponent"/>
> >>> >
> >>> > Apart from this, I made no changes in solrconfig.xml for field
> collapse.
> >>> I
> >>> > am currently not using the field collapse cache.
> >>> >
> >>> > I have applied the patch on the Solr 1.4 build. I am not using the
> >>> latest
> >>> > solr nightly build. Can that cause any problem?
> >>> >
> >>> > --
> >>> > Thanks
> >>> > Varun Gupta
> >>> >
> >>> >
> >>> > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen <
> >>> > martijn.is.h...@gmail.com> wrote:
> >>> >
> >>> >> I tried to reproduce a similar situation here, but I got the
> expected
> >>> >> and correct results. Those three documents that you saw in your
> first
> >>> >> search result should be the first in your second search result
> (unless
> >>> >> the index changes or the sort changes ) when fq on that specific
> >>> >> category. I'm not sure what is causing this problem. Can you give me
> >>> >> some more information like the field type configuration for the
> ctype
> >>> >> field and how have configured field collapsing?
> >>> >>
> >>> >> I did find another problem to do with field collapse caching. The
> >>> >> collapse.threshold or collapse.maxdocs parameters are not taken into
> >>> >> account when caching, which is off course wrong because they do
> matter
> >>> >> when collapsing. Based on the information you have given me this
> >>> >> caching problem is not the cause of the situation you have. I will
> >>> >> update the patch that fixes this problem shortly.
> >>> >>
> >>> >> Martijn
> >>> >>
> >>> >> 2009/12/10 Varun Gupta :
> >>> >> > Hi Martijn,
> >>> >> >
> >>> >> > I am not sending the collapse parameters for the second query.
> Here
> >>> are
> >>> >> the
> >>> >> > queries I am using:
> >>> >> >
> >>> >> > *When using field collapsing (searching over all categ

Re: SOLR Performance Tuning: Disable INFO Logging.

2009-12-21 Thread Andrew McCombe
Hi

Can you quickly explain what you did to disable INFO-Level?

I am from a PHP background and am not so well versed in Tomcat or
Java.  Is this a section in solrconfig.xml or did you have to edit
Solr Java source and recompile?

Thanks In Advance
Andrew


2009/12/20 Fuad Efendi :
> After researching how to configure default SOLR & Tomcat logging, I finally
> disabled INFO-level for SOLR.
>
> And performance improved at least 7 times!!! ('at least 7' because I
> restarted server 5 minutes ago; caches are not prepopulated yet)
>
> Before that, I had 300-600 ms in HTTPD log files in average, and 4%-8% I/O
> wait whenever "top" commands shows SOLR on top.
>
> Now, I have 50ms-100ms in average (total response time logged by HTTPD).
>
>
> P.S.
> Of course, I am limited in RAM, and I use slow SATA... server is moderately
> loaded, 5-10 requests per second.
>
>
> P.P.S.
> And suddenly synchronous I/O by Java/Tomcat Logger slows down performance
> much higher than read-only I/O of Lucene.
>
>
>
> Fuad Efendi
> +1 416-993-2060
> http://www.linkedin.com/in/liferay
>
> Tokenizer Inc.
> http://www.tokenizer.ca/
> Data Mining, Vertical Search
>
>
>
>
>


tire fields and sortMissingLast

2009-12-21 Thread Marc Sturlese

Should sortMissingLast param be working on trie-fields?

-- 
View this message in context: 
http://old.nabble.com/tire-fields-and-sortMissingLast-tp26873134p26873134.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: get field values from solr and highlight text?

2009-12-21 Thread Shalin Shekhar Mangar
On Sun, Dec 20, 2009 at 1:50 AM, Faire Mii  wrote:

> Ive got the following code.
>
>$params = array('defType' => 'dismax', 'qf' => 'threads.title posts.body
> tags.name', 'hl' => 'true');
>
>$results = $solr->search($query, $offset, $limit, $params);
>
> So the keywords will be highlighted. What i dont know how to do is pulling
> the data out from $results. How do I get a documents field values and then
> show the body and hightlight it like google/SO search? Im using solr client
> php but i find it difficult to understand how to use it. There is so few
> example codes.
>

The highlighting response comes as a node separate from the main results but
items in both of them are presented in the same order. You'd need to match
the highlighting snippet with the current document either through the
uniqueKey or through position. So one way to do it would be to read the
snippets out of the response completely and put them in a map with the key
being the unique key and then for each document, lookup the unique key in
the map and print out the highlighted snippet. The other way would be to go
through the result set and highlighting response one item at a time.

-- 
Regards,
Shalin Shekhar Mangar.


Re: trie fields and sortMissingLast

2009-12-21 Thread Shalin Shekhar Mangar
On Mon, Dec 21, 2009 at 5:37 PM, Marc Sturlese wrote:

>
> Should sortMissingLast param be working on trie-fields?
>
>
Nope, trie fields do not support sortMissingFirst or sortMissingLast.

-- 
Regards,
Shalin Shekhar Mangar.


Re: query log

2009-12-21 Thread Chris Hostetter

: Subject: query log
: References: <83ec2c9c0912201238he4c9sf23b03e750de2...@mail.gmail.com>
: In-Reply-To: <83ec2c9c0912201238he4c9sf23b03e750de2...@mail.gmail.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss



Re: payload queries running slow

2009-12-21 Thread Grant Ingersoll

On Dec 20, 2009, at 3:41 AM, Raghuveer Kancherla wrote:

> Hi Grant,
> My queries are about 5 times slower when using payloads as compared to
> queries that dont use payloads on the same index. I have not done any
> profiling yet, I am trying out lucid gaze now.

How do they compare to just doing SpanQueries?  Would be interesting to see the 
three:
1. "Normal" queries
2. Span Queries
3. Payloads


> I do all the load testing after warming up.
> Since my index is small ~1 GB, was wondering if a ramDirectory will help
> instead of the default Directory implementation for the indexReader?
> 

I suppose, but probably not that big of a difference on a properly warmed index.


> Thanks,
> Raghu
> 
> 
> 
> On Thu, Dec 17, 2009 at 6:58 PM, Grant Ingersoll wrote:
> 
>> 
>> On Dec 17, 2009, at 4:52 AM, Raghuveer Kancherla wrote:
>> 
>>> Hi,
>>> With help from the group here, I have been able to set up a search
>>> application with payloads enabled. However, there is a noticeable
>> increase
>>> in query response times with payloads as compared to the same queries
>>> without payloads. I am also seeing a lot more disk IO (I have a 7200 rpm
>>> disk) and comparatively lesser cpu usage.
>>> 
>>> I am guessing this is because of the use of payloadTermQuery and
>>> payloadNearQuery  both of which extend SpanQuery formats. SpanQueries
>> read
>>> the positions index which will be much larger than the index accessed by
>> a
>>> simple TermQuery.
>>> 
>>> Is there any way of making this system faster without having to
>> distribute
>>> the index. My index size is hardly 1GB (~200k documents and only one
>> field
>>> to search in). I am experiencing query times as high as 2 seconds
>> (average).
>>> 
>>> Any indications on the direction in which I can experiment will also be
>> very
>>> helpful.
>>> 
>> 
>> Yeah, payloads are going to be slower, but how much slower are they for
>> you? Are you warming up those queries?
>> 
>> Also, have you done any profiling?
>> 
>> 
>>> I looked at HathiTrust digital library articles. The methods indicated
>> there
>>> talk about avoiding reading the positions index (converting PhraseQueries
>> to
>>> TermQueries). That will not work in my case because, I still have to read
>>> the positions index to get the payload information during scoring. Let me
>>> know if my understanding is incorrect.
>>> 
>>> 
>>> Thanks,
>>> -Raghu
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 



Re: Documents are indexed but not searchable

2009-12-21 Thread krosan

When searching for *:* I get this response:




0
9


*:*





I'm guessing this means the documents aren't really in the index?
However, I do get this reply when using the data-config debugger (with
commit on):
http://pastebin.com/m7a460711
And that obviously states "Indexing completed. Added/Updated: 2 documents.
Deleted 0 documents."

Do you have any ideas why the index doesn't really have those documents?

Thanks in advance!

Andreas Evers 



Noble Paul wrote:
> 
> just search for *:* and see if the docs are indeed there in the index.
> --Noble
> 
> -- 
> -
> Noble Paul | Systems Architect| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26875427.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: store content only of documents

2009-12-21 Thread Chris Hostetter

: 
: 
:   content
:   content
: 
:   
: 
: I want to store only "content" into this field but it store other meta data
: of a document e.g. "Author", "timestamp", "document type" etc. how can I ask
: solr to store only body of document into this field and not other meta data?

change your defaultField?



-Hoss



RE: Documents are indexed but not searchable

2009-12-21 Thread Ankit Bhatnagar


Try using luke - to view contents of index

Ankit


-Original Message-
From: krosan [mailto:kro...@gmail.com] 
Sent: Monday, December 21, 2009 10:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Documents are indexed but not searchable


When searching for *:* I get this response:




0
9


*:*





I'm guessing this means the documents aren't really in the index?
However, I do get this reply when using the data-config debugger (with
commit on):
http://pastebin.com/m7a460711
And that obviously states "Indexing completed. Added/Updated: 2 documents.
Deleted 0 documents."

Do you have any ideas why the index doesn't really have those documents?

Thanks in advance!

Andreas Evers 



Noble Paul wrote:
> 
> just search for *:* and see if the docs are indeed there in the index.
> --Noble
> 
> -- 
> -
> Noble Paul | Systems Architect| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26875427.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: SOLR Performance Tuning: Disable INFO Logging.

2009-12-21 Thread Fuad Efendi
> Can you quickly explain what you did to disable INFO-Level?
> 
> I am from a PHP background and am not so well versed in Tomcat or
> Java.  Is this a section in solrconfig.xml or did you have to edit
> Solr Java source and recompile?


1. Create a file called logging.properties with following content (I created it 
in /home/tomcat/solr folder):

.level=INFO
handlers= java.util.logging.ConsoleHandler, java.util.logging.FileHandler

java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.FileHandler.level = INFO

java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.ConsoleHandler.level = ALL

org.apache.solr.level=SEVERE


2. Modify file tomcat_installation/bin/catalina.sh to include following (as a 
first line in script):

JAVA_OPTS="... ... ... 
-Djava.util.logging.config.file=/home/tomcat/solr/logging.properties"

(this line may include more parameters such as -Xmx8196m for memory, 
-Dfile.encoding=UTF8 -Dsolr.solr.home=/home/tomcat/solr 
-Dsolr.data.dir=/home/tomcat/solr for SOLR, etc.)


With these settings, SOLR (and Tomcat) will use standard Java 5/6 logging 
capabilities. Log output will default to standard /logs folder of Tomcat.

You may find additional logging configuration settings by google for "Java 5 
Logging" etc.


> 
> 
> 2009/12/20 Fuad Efendi :
> > After researching how to configure default SOLR & Tomcat logging, I
> finally
> > disabled INFO-level for SOLR.
> >
> > And performance improved at least 7 times!!! ('at least 7' because I
> > restarted server 5 minutes ago; caches are not prepopulated yet)
> >
> > Before that, I had 300-600 ms in HTTPD log files in average, and 4%-8%
> I/O
> > wait whenever "top" commands shows SOLR on top.
> >
> > Now, I have 50ms-100ms in average (total response time logged by HTTPD).
> >
> >
> > P.S.
> > Of course, I am limited in RAM, and I use slow SATA... server is
> moderately
> > loaded, 5-10 requests per second.
> >
> >
> > P.P.S.
> > And suddenly synchronous I/O by Java/Tomcat Logger slows down
> performance
> > much higher than read-only I/O of Lucene.
> >
> >
> >
> > Fuad Efendi
> > +1 416-993-2060
> > http://www.linkedin.com/in/liferay
> >
> > Tokenizer Inc.
> > http://www.tokenizer.ca/
> > Data Mining, Vertical Search
> >
> >
> >
> >
> >


Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay

Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search





Re: solr perf

2009-12-21 Thread Licinio Fernández Maurelo
not bad advise ;-)

2009/12/20 Walter Underwood 

> Here is an idea. Don't make one core per user.  Use a field with a user id.
>
> wunder
>
> On Dec 20, 2009, at 12:38 PM, Matthieu Labour wrote:
>
> > Hi
> > I have a slr instance in which i created 700 core. 1 Core per user of my
> > application.
> > The total size of the data indexed on disk is 35GB with solr cores going
> > from 100KB and few documents to 1.2GB and 50 000 documents.
> > Searching seems very slow and indexing as well
> > This is running on a EC2 xtra large instance (6CPU, 15GB Memory, Raid0
> disk)
> > I would appreciate if anybody has some tips, articles etc... as what to
> do
> > to understand and improve performance
> > Thank you
>
>


-- 
Lici
~Java Developer~


Calculate term vector

2009-12-21 Thread Licinio Fernández Maurelo
Hi folks,

how can i get term vector from a custom solr query via http request? is this
possible?

-- 
Lici
~Java Developer~


RE: Calculate term vector

2009-12-21 Thread Ankit Bhatnagar
What version of Solr are you using?

Ankit

-Original Message-
From: Licinio Fernández Maurelo [mailto:licinio.fernan...@gmail.com] 
Sent: Monday, December 21, 2009 1:40 PM
To: solr-user@lucene.apache.org
Subject: Calculate term vector

Hi folks,

how can i get term vector from a custom solr query via http request? is this
possible?

-- 
Lici
~Java Developer~


Re: Calculate term vector

2009-12-21 Thread Grant Ingersoll
See http://wiki.apache.org/solr/TermVectorComponent

On Dec 21, 2009, at 1:39 PM, Licinio Fernández Maurelo wrote:

> Hi folks,
> 
> how can i get term vector from a custom solr query via http request? is this
> possible?
> 
> -- 
> Lici
> ~Java Developer~



RE: Documents are indexed but not searchable

2009-12-21 Thread krosan

Hey,

I just found out that my index is stored in the tomcat/solr dir, while my
-Dsolr.solr.home parameter is set to a different place (E drive). The
indexing is sent to the tomcat/solr dir, while the searching is done in my E
drive. How can I make sure the index is done in the E dir as well?

Thanks in advance!

Andreas Evers


ANKITBHATNAGAR wrote:
> 
> 
> 
> Try using luke - to view contents of index
> 
> Ankit
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26878531.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Documents are indexed but not searchable

2009-12-21 Thread Erik Hatcher
solrconfig.xml controls where the index is built.   Set it there to  
the absolute path of where you want the index.


Erik

On Dec 21, 2009, at 2:26 PM, krosan wrote:



Hey,

I just found out that my index is stored in the tomcat/solr dir,  
while my

-Dsolr.solr.home parameter is set to a different place (E drive). The
indexing is sent to the tomcat/solr dir, while the searching is done  
in my E

drive. How can I make sure the index is done in the E dir as well?

Thanks in advance!

Andreas Evers


ANKITBHATNAGAR wrote:




Try using luke - to view contents of index

Ankit




--
View this message in context: 
http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26878531.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: tire fields and sortMissingLast

2009-12-21 Thread Yonik Seeley
On Mon, Dec 21, 2009 at 7:06 AM, Marc Sturlese  wrote:
>
> Should sortMissingLast param be working on trie-fields?

Eventually.  It's currently not supported though.  Here's the comment
from the example schema.xml:




-Yonik
http://www.lucidimagination.com


Re: Adaptive search?

2009-12-21 Thread Ian Holsman

On 12/18/09 2:46 AM, Siddhant Goel wrote:

Let say we have a search engine (a simple front end - web app kind of a
thing - responsible for querying Solr and then displaying the results in a
human readable form) based on Solr. If a user searches for something, gets
quite a few search results, and then clicks on one such result - is there
any mechanism by which we can notify Solr to boost the score/relevance of
that particular result in future searches? If not, then any pointers on how
to go about doing that would be very helpful.
   


Hi Siddhant.
Solr can't do this out of the box.
you would need to use a external field and a custom scoring function to 
do something like this.


regards
Ian

Thanks,

On Thu, Dec 17, 2009 at 7:50 PM, Paul Libbrecht  wrote:

   

What can it mean to "adapt to user clicks" ? Quite many things in my head.
Do you have maybe a citation that inspires you here?

paul


Le 17-déc.-09 à 13:52, Siddhant Goel a écrit :


  Does Solr provide adaptive searching? Can it adapt to user clicks within
 

the
search results it provides? Or that has to be done externally?

   


 


   




Re: Document model suggestion

2009-12-21 Thread Lance Norskog
Yes, you would have 'role' as a multi-valued field. When you add
someone to a role, you don't have to re-index. That's all.

On Thu, Dec 17, 2009 at 12:55 PM, caman  wrote:
>
> Are you suggesting that roles should be maintained in the index? We do manage
> out authentication based on roles but at granular level, user rights play a
> big role as well.
> I know we need to compromise, just need to find a balance.
>
> Thanks
>
>
> Lance Norskog-2 wrote:
>>
>> Role-based authentication is one level of sophistication up from
>> user-based authentication. Users can have different roles, and
>> authentication goes against roles. Documents with multiple viewers
>> would be assigned special roles. All users would also have their own
>> matching role.
>>
>> On Tue, Dec 15, 2009 at 10:01 AM, caman 
>> wrote:
>>>
>>> Erick,
>>> I know what you mean.
>>> Wonder if it is actually cleaner to keep the authorization  model out of
>>> solr index and filter the data at client side based on the user access
>>> rights.
>>> Thanks all for help.
>>>
>>>
>>>
>>> Erick Erickson wrote:

 Yes, that should work. One hard part is what happens if your
 authorization model has groups, especially when membership
 in those groups changes. Then you have to go in and update
 all the affected docs.

 FWIW
 Erick

 On Tue, Dec 15, 2009 at 12:24 PM, caman
 wrote:

>
> Shalin,
>
> Thanks. much appreciated.
> Question about:
>  "That is usually what people do. The hard part is when some documents
> are
> shared across multiple users. "
>
> What do you recommend when documents has to be shared across multiple
> users?
> Can't I just multivalue a field with all the users who has access to
> the
> document?
>
>
> thanks
>
> Shalin Shekhar Mangar wrote:
> >
> > On Tue, Dec 15, 2009 at 7:26 AM, caman
> > wrote:
> >
> >>
> >> Appreciate any guidance here please. Have a master-child table
> between
> >> two
> >> tables 'TA' and 'TB' where form is the master table. Any row in TA
> can
> >> have
> >> multiple row in TB.
> >> e.g. row in TA
> >>
> >> id---name
> >> 1---tweets
> >>
> >> TB:
> >> id|ta_id|field0|field1|field2.|field20|created_by
> >> 1|1|value1|value2|value2.|value20|User1
> >>
> >> 
> >
> >>
> >> This works fine and index the data.But all the data for a row in TA
> gets
> >> combined in one document(not desirable).
> >> I am not clear on how to
> >>
> >> 1) separate a particular row from the search results.
> >> e.g. If I search for 'Android' and there are 5 rows for android in
> TB
> for
> >> a
> >> particular instance in TA, would like to show them separately to
> user
> and
> >> if
> >> the user click on any of the row,point them to an attached URL in
> the
> >> application. Should a separate index be maintained for each row in
> TB?TB
> >> can
> >> have millions of rows.
> >>
> >
> > The easy answer is that whatever you want to show as results should
> be
> the
> > thing that you index as documents. So if you want to show tweets as
> > results,
> > one document should represent one tweet.
> >
> > Solr is different from relational databases and you should not think
> about
> > both the same way. De-normalization is the way to go in Solr.
> >
> >
> >> 2) How to protect one user's data from another user. I guess I can
> keep
> a
> >> column for a user_id in the schema and append that filter
> automatically
> >> when
> >> I search through SOLR. Any better alternatives?
> >>
> >>
> > That is usually what people do. The hard part is when some documents
> are
> > shared across multiple users.
> >
> >
> >> Bear with me if these are newbie questions please, this is my first
> day
> >> with
> >> SOLR.
> >>
> >>
> > No problem. Welcome to Solr!
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Document-model-suggestion-tp26784346p26799016.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Document-model-suggestion-tp26784346p26834798.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Document model suggestion

2009-12-21 Thread caman

Lance,
Makes sense. We are playing around with keeping the security model
completely out of Index. We will filter out results before data display
based on access rights. But approach you suggested is not ruled out
completely.
thanks

Lance Norskog-2 wrote:
> 
> Yes, you would have 'role' as a multi-valued field. When you add
> someone to a role, you don't have to re-index. That's all.
> 
> On Thu, Dec 17, 2009 at 12:55 PM, caman 
> wrote:
>>
>> Are you suggesting that roles should be maintained in the index? We do
>> manage
>> out authentication based on roles but at granular level, user rights play
>> a
>> big role as well.
>> I know we need to compromise, just need to find a balance.
>>
>> Thanks
>>
>>
>> Lance Norskog-2 wrote:
>>>
>>> Role-based authentication is one level of sophistication up from
>>> user-based authentication. Users can have different roles, and
>>> authentication goes against roles. Documents with multiple viewers
>>> would be assigned special roles. All users would also have their own
>>> matching role.
>>>
>>> On Tue, Dec 15, 2009 at 10:01 AM, caman 
>>> wrote:

 Erick,
 I know what you mean.
 Wonder if it is actually cleaner to keep the authorization  model out
 of
 solr index and filter the data at client side based on the user access
 rights.
 Thanks all for help.



 Erick Erickson wrote:
>
> Yes, that should work. One hard part is what happens if your
> authorization model has groups, especially when membership
> in those groups changes. Then you have to go in and update
> all the affected docs.
>
> FWIW
> Erick
>
> On Tue, Dec 15, 2009 at 12:24 PM, caman
> wrote:
>
>>
>> Shalin,
>>
>> Thanks. much appreciated.
>> Question about:
>>  "That is usually what people do. The hard part is when some
>> documents
>> are
>> shared across multiple users. "
>>
>> What do you recommend when documents has to be shared across multiple
>> users?
>> Can't I just multivalue a field with all the users who has access to
>> the
>> document?
>>
>>
>> thanks
>>
>> Shalin Shekhar Mangar wrote:
>> >
>> > On Tue, Dec 15, 2009 at 7:26 AM, caman
>> > wrote:
>> >
>> >>
>> >> Appreciate any guidance here please. Have a master-child table
>> between
>> >> two
>> >> tables 'TA' and 'TB' where form is the master table. Any row in TA
>> can
>> >> have
>> >> multiple row in TB.
>> >> e.g. row in TA
>> >>
>> >> id---name
>> >> 1---tweets
>> >>
>> >> TB:
>> >> id|ta_id|field0|field1|field2.|field20|created_by
>> >> 1|1|value1|value2|value2.|value20|User1
>> >>
>> >> 
>> >
>> >>
>> >> This works fine and index the data.But all the data for a row in
>> TA
>> gets
>> >> combined in one document(not desirable).
>> >> I am not clear on how to
>> >>
>> >> 1) separate a particular row from the search results.
>> >> e.g. If I search for 'Android' and there are 5 rows for android in
>> TB
>> for
>> >> a
>> >> particular instance in TA, would like to show them separately to
>> user
>> and
>> >> if
>> >> the user click on any of the row,point them to an attached URL in
>> the
>> >> application. Should a separate index be maintained for each row in
>> TB?TB
>> >> can
>> >> have millions of rows.
>> >>
>> >
>> > The easy answer is that whatever you want to show as results should
>> be
>> the
>> > thing that you index as documents. So if you want to show tweets as
>> > results,
>> > one document should represent one tweet.
>> >
>> > Solr is different from relational databases and you should not
>> think
>> about
>> > both the same way. De-normalization is the way to go in Solr.
>> >
>> >
>> >> 2) How to protect one user's data from another user. I guess I can
>> keep
>> a
>> >> column for a user_id in the schema and append that filter
>> automatically
>> >> when
>> >> I search through SOLR. Any better alternatives?
>> >>
>> >>
>> > That is usually what people do. The hard part is when some
>> documents
>> are
>> > shared across multiple users.
>> >
>> >
>> >> Bear with me if these are newbie questions please, this is my
>> first
>> day
>> >> with
>> >> SOLR.
>> >>
>> >>
>> > No problem. Welcome to Solr!
>> >
>> > --
>> > Regards,
>> > Shalin Shekhar Mangar.
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>

 --
 View this message in context:
 http://old.nabble.com/Document-model-suggestion-

Re: Adaptive search?

2009-12-21 Thread Lance Norskog
Solr does have the ExternalFileField available. You could track
existing clicks from the container search log and generate a file to
be used with ExternalFileField.

http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

In the solr source, trunk/src/test/test-files/solr/conf/schema11.xml
and schema-trie.xml show how to use it.

On Mon, Dec 21, 2009 at 12:39 PM, Ian Holsman  wrote:
> On 12/18/09 2:46 AM, Siddhant Goel wrote:
>>
>> Let say we have a search engine (a simple front end - web app kind of a
>> thing - responsible for querying Solr and then displaying the results in a
>> human readable form) based on Solr. If a user searches for something, gets
>> quite a few search results, and then clicks on one such result - is there
>> any mechanism by which we can notify Solr to boost the score/relevance of
>> that particular result in future searches? If not, then any pointers on
>> how
>> to go about doing that would be very helpful.
>>
>
> Hi Siddhant.
> Solr can't do this out of the box.
> you would need to use a external field and a custom scoring function to do
> something like this.
>
> regards
> Ian
>>
>> Thanks,
>>
>> On Thu, Dec 17, 2009 at 7:50 PM, Paul Libbrecht
>>  wrote:
>>
>>
>>>
>>> What can it mean to "adapt to user clicks" ? Quite many things in my
>>> head.
>>> Do you have maybe a citation that inspires you here?
>>>
>>> paul
>>>
>>>
>>> Le 17-déc.-09 à 13:52, Siddhant Goel a écrit :
>>>
>>>
>>>  Does Solr provide adaptive searching? Can it adapt to user clicks within
>>>

 the
 search results it provides? Or that has to be done externally?


>>>
>>>
>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: solr perf

2009-12-21 Thread didier deshommes
Have you tried loading solr instances as you need them and unloading
those that are not being used? I wish I could help more, I don't know
many people running that many use cores.

didier

On Sun, Dec 20, 2009 at 2:38 PM, Matthieu Labour  wrote:
> Hi
> I have a slr instance in which i created 700 core. 1 Core per user of my
> application.
> The total size of the data indexed on disk is 35GB with solr cores going
> from 100KB and few documents to 1.2GB and 50 000 documents.
> Searching seems very slow and indexing as well
> This is running on a EC2 xtra large instance (6CPU, 15GB Memory, Raid0 disk)
> I would appreciate if anybody has some tips, articles etc... as what to do
> to understand and improve performance
> Thank you
>


Solr replication 1.3 issue

2009-12-21 Thread Maduranga Kannangara
Hi All,

We're trying to replicate indexes on Solr 1.3 across from 
Dev->QA->Staging->Prod etc.
So at each stage other than Dev and Prod, each would live as a master and a 
slave at a given time.

We hit a bottle neck (may be?) when we try to start rsyncd-start on the master 
from the slave machine.

Commands used:

ssh -o StrictHostKeyChecking=no ad...@192.168.22.1 
/solr/SolrHome/bin/rsyncd-enable
ssh -o StrictHostKeyChecking=no ad...@192.168.22.1  /solr / SolrHome 
/bin/rsyncd-start -p 18003

On slave following error is displayed:

@RSYNCD: 29
@ERROR: protocol startup error

On master logs following were found:

2009/12/21 22:46:05 enabled by admin
2009/12/21 22:46:05 command: / solr/SolrHome /bin/rsyncd-enable
2009/12/21 22:46:05 ended (elapsed time: 0 sec)
2009/12/21 22:46:09 started by admin
2009/12/21 22:46:09 command: /solr/SolrHome/bin/rsyncd-start -p 18993
2009/12/21 22:46:09 [16964] forward name lookup for devserver002 failed: 
ai_family not supported
2009/12/21 22:46:09 [16964] connect from UNKNOWN (localhost)
2009/12/21 22:46:29 [16964] rsync: connection unexpectedly closed (0 bytes 
received so far) [receiver]
2009/12/21 22:46:29 [16964] rsync error: error in rsync protocol data stream 
(code 12) at io.c(463) [receiver=2.6.8]
2009/12/21 22:46:44 rsyncd not accepting connections, exiting
2009/12/21 22:46:57 enabled by admin
2009/12/21 22:46:57 command: /solr/SolrHome/bin/rsyncd-enable
2009/12/21 22:46:57 rsyncd already currently enabled
2009/12/21 22:46:57 exited (elapsed time: 0 sec)
2009/12/21 22:47:00 started by admin
2009/12/21 22:47:00 command: /solr/SolrHome/bin/rsyncd-start -p 18993
2009/12/21 22:47:00 [17115] forward name lookup for devserver002 failed: 
ai_family not supported
2009/12/21 22:47:00 [17115] connect from UNKNOWN (localhost)
2009/12/21 22:49:18 rsyncd not accepting connections, exiting


Is it not possible to start the rsync daemon on master from the slave?
The user that we use is on the sudoers list as well.

Thanks
Madu





Multi Solr

2009-12-21 Thread Olala

Hi all!

I have developed Solr on Tomcat, but now I want to building many Solr on
only one Tomcat server.Is that can be done or not??? 
-- 
View this message in context: 
http://old.nabble.com/Multi-Solr-tp26884086p26884086.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multi Solr

2009-12-21 Thread Raghuveer Kancherla
Based on your need you can choose one of the options listed at

http://wiki.apache.org/solr/MultipleIndexes


- Raghu


On Tue, Dec 22, 2009 at 10:46 AM, Olala  wrote:

>
> Hi all!
>
> I have developed Solr on Tomcat, but now I want to building many Solr on
> only one Tomcat server.Is that can be done or not???
> --
> View this message in context:
> http://old.nabble.com/Multi-Solr-tp26884086p26884086.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Adaptive search?

2009-12-21 Thread Ryan Kennedy
On Mon, Dec 21, 2009 at 3:36 PM, Lance Norskog  wrote:
> Solr does have the ExternalFileField available. You could track
> existing clicks from the container search log and generate a file to
> be used with ExternalFileField.
>
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>
> In the solr source, trunk/src/test/test-files/solr/conf/schema11.xml
> and schema-trie.xml show how to use it.

This approach will be limited to applying a "global" rank to all the
documents, which may have some unintended consequences. The most
popular document in your index will be the most popular, even for
queries for which it was never clicked on. We've currently been
working on this problem in our own implementation and implemented it
using a FunctionQuery (http://wiki.apache.org/solr/FunctionQuery). We
create a ValueSourceParser and hook it into our Solr config:


/path/to/popularity_file.xml


Then we use the new function in our request handler(s):


...

qpop(id)



The QueryPopularity class takes the current (normalized) query and
indexes into popularity_file.xml to find out what document IDs (it
uses the "id" field because that's what we specified in the arguments
to "qpop", you could use any field you want) are popular for the
current query. Documents which are popular, get a score greater than
zero proportional to their popularity. We do offline processing every
night to build the mappings of query -> popular ID and push that file
to our machines. QueryPopularity has a background thread, which
periodically refreshes the in-memory copy of the XML file's contents.

The main difference is that this is a two-level hash (query -> id ->
score), whereas the ExternalFileField appears to be a one-level hash
(id -> score).

Ryan