Re: Results after using Field Collapsing are not matching the results without using Field Collapsing
Hi Martijn, Yes, it is working after making these changes. -- Thanks Varun Gupta On Sun, Dec 20, 2009 at 5:54 PM, Martijn v Groningen < martijn.is.h...@gmail.com> wrote: > Hi Varun, > > Yes, after going over the code I think you are right. If you change > the following if block in SolrIndexSearcher.getDocSet(Query query, > DocSet filter, DocSetAwareCollector collector): > if (first==null) { >first = getDocSetNC(absQ, null); >filterCache.put(absQ,first); > } > with: > if (first==null) { >first = getDocSetNC(absQ, null, collector); >filterCache.put(absQ,first); > } > It should work then. Let me know if this solves your problem. > > Martijn > > > 2009/12/18 Varun Gupta : > > After a lot of debugging, I finally found why the order of collapse > results > > are not matching the uncollapsed results. I can't say if it is a bug in > the > > implementation of fieldcollapse or not. > > > > *Explaination:* > > Actually, I am querying the fieldcollapse with some filters to restrict > the > > collapsing to some particular categories only by appending the parameter: > > fq=ctype:(1+2+8+6+3). > > > > In: NonAdjacentDocumentCollapser.doQuery() > > Line: DocSet filter = searcher.getDocSet(filterQueries); > > > > Here, filter docset is got without any scores (since I have filter in my > > query, this line actually gets executed) and also stored in the filter > > cache. In the next line in the code, the actual uncollapsed DocSet is got > > passing the DocSetScoreCollector. > > > > Now, in: SolrIndexSearcher.getDocSet(Query query, DocSet filter, > > DocSetAwareCollector collector) > > Line: if (filterCache != null) > > Because of the filter cache not being null, and no result for the query > in > > the cache, the line: first = getDocSetNC(absQ,null); gets executed. > Notice, > > over here the DocSetScoreCollector is not passed. Hence, results are > > collected without any scores. > > > > This makes the uncollapsedDocSet to be without any scores and hence the > > sorting is not done based on score. > > > > @Martijn: Is what I am right or I should use field collapsing in some > other > > way. Else, what is the ideal fix for this problem (I am not an active > > developer, so can't say the fix that I do will not break anything). > > > > -- > > Thanks, > > Varun Gupta > > > > > > On Mon, Dec 14, 2009 at 10:35 AM, Varun Gupta >wrote: > > > >> When I used collapse.threshold=1, out of the 5 categories 4 had the same > >> top result, but 1 category had a different result (it was the 3rd result > >> coming for that category when I used threshold as 3). > >> > >> -- > >> Thanks, > >> Varun Gupta > >> > >> > >> > >> On Mon, Dec 14, 2009 at 2:56 AM, Martijn v Groningen < > >> martijn.is.h...@gmail.com> wrote: > >> > >>> I would not expect that Solr 1.4 build is the cause of the problem. > >>> Just out of curiosity does the same happen when collapse.threshold=1? > >>> > >>> 2009/12/11 Varun Gupta : > >>> > Here is the field type configuration of ctype: > >>> > >>> > omitNorms="true" /> > >>> > > >>> > In solrconfig.xml, this is how I am enabling field collapsing: > >>> > >>> > class="org.apache.solr.handler.component.CollapseComponent"/> > >>> > > >>> > Apart from this, I made no changes in solrconfig.xml for field > collapse. > >>> I > >>> > am currently not using the field collapse cache. > >>> > > >>> > I have applied the patch on the Solr 1.4 build. I am not using the > >>> latest > >>> > solr nightly build. Can that cause any problem? > >>> > > >>> > -- > >>> > Thanks > >>> > Varun Gupta > >>> > > >>> > > >>> > On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen < > >>> > martijn.is.h...@gmail.com> wrote: > >>> > > >>> >> I tried to reproduce a similar situation here, but I got the > expected > >>> >> and correct results. Those three documents that you saw in your > first > >>> >> search result should be the first in your second search result > (unless > >>> >> the index changes or the sort changes ) when fq on that specific > >>> >> category. I'm not sure what is causing this problem. Can you give me > >>> >> some more information like the field type configuration for the > ctype > >>> >> field and how have configured field collapsing? > >>> >> > >>> >> I did find another problem to do with field collapse caching. The > >>> >> collapse.threshold or collapse.maxdocs parameters are not taken into > >>> >> account when caching, which is off course wrong because they do > matter > >>> >> when collapsing. Based on the information you have given me this > >>> >> caching problem is not the cause of the situation you have. I will > >>> >> update the patch that fixes this problem shortly. > >>> >> > >>> >> Martijn > >>> >> > >>> >> 2009/12/10 Varun Gupta : > >>> >> > Hi Martijn, > >>> >> > > >>> >> > I am not sending the collapse parameters for the second query. > Here > >>> are > >>> >> the > >>> >> > queries I am using: > >>> >> > > >>> >> > *When using field collapsing (searching over all categ
Re: SOLR Performance Tuning: Disable INFO Logging.
Hi Can you quickly explain what you did to disable INFO-Level? I am from a PHP background and am not so well versed in Tomcat or Java. Is this a section in solrconfig.xml or did you have to edit Solr Java source and recompile? Thanks In Advance Andrew 2009/12/20 Fuad Efendi : > After researching how to configure default SOLR & Tomcat logging, I finally > disabled INFO-level for SOLR. > > And performance improved at least 7 times!!! ('at least 7' because I > restarted server 5 minutes ago; caches are not prepopulated yet) > > Before that, I had 300-600 ms in HTTPD log files in average, and 4%-8% I/O > wait whenever "top" commands shows SOLR on top. > > Now, I have 50ms-100ms in average (total response time logged by HTTPD). > > > P.S. > Of course, I am limited in RAM, and I use slow SATA... server is moderately > loaded, 5-10 requests per second. > > > P.P.S. > And suddenly synchronous I/O by Java/Tomcat Logger slows down performance > much higher than read-only I/O of Lucene. > > > > Fuad Efendi > +1 416-993-2060 > http://www.linkedin.com/in/liferay > > Tokenizer Inc. > http://www.tokenizer.ca/ > Data Mining, Vertical Search > > > > >
tire fields and sortMissingLast
Should sortMissingLast param be working on trie-fields? -- View this message in context: http://old.nabble.com/tire-fields-and-sortMissingLast-tp26873134p26873134.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: get field values from solr and highlight text?
On Sun, Dec 20, 2009 at 1:50 AM, Faire Mii wrote: > Ive got the following code. > >$params = array('defType' => 'dismax', 'qf' => 'threads.title posts.body > tags.name', 'hl' => 'true'); > >$results = $solr->search($query, $offset, $limit, $params); > > So the keywords will be highlighted. What i dont know how to do is pulling > the data out from $results. How do I get a documents field values and then > show the body and hightlight it like google/SO search? Im using solr client > php but i find it difficult to understand how to use it. There is so few > example codes. > The highlighting response comes as a node separate from the main results but items in both of them are presented in the same order. You'd need to match the highlighting snippet with the current document either through the uniqueKey or through position. So one way to do it would be to read the snippets out of the response completely and put them in a map with the key being the unique key and then for each document, lookup the unique key in the map and print out the highlighted snippet. The other way would be to go through the result set and highlighting response one item at a time. -- Regards, Shalin Shekhar Mangar.
Re: trie fields and sortMissingLast
On Mon, Dec 21, 2009 at 5:37 PM, Marc Sturlese wrote: > > Should sortMissingLast param be working on trie-fields? > > Nope, trie fields do not support sortMissingFirst or sortMissingLast. -- Regards, Shalin Shekhar Mangar.
Re: query log
: Subject: query log : References: <83ec2c9c0912201238he4c9sf23b03e750de2...@mail.gmail.com> : In-Reply-To: <83ec2c9c0912201238he4c9sf23b03e750de2...@mail.gmail.com> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: payload queries running slow
On Dec 20, 2009, at 3:41 AM, Raghuveer Kancherla wrote: > Hi Grant, > My queries are about 5 times slower when using payloads as compared to > queries that dont use payloads on the same index. I have not done any > profiling yet, I am trying out lucid gaze now. How do they compare to just doing SpanQueries? Would be interesting to see the three: 1. "Normal" queries 2. Span Queries 3. Payloads > I do all the load testing after warming up. > Since my index is small ~1 GB, was wondering if a ramDirectory will help > instead of the default Directory implementation for the indexReader? > I suppose, but probably not that big of a difference on a properly warmed index. > Thanks, > Raghu > > > > On Thu, Dec 17, 2009 at 6:58 PM, Grant Ingersoll wrote: > >> >> On Dec 17, 2009, at 4:52 AM, Raghuveer Kancherla wrote: >> >>> Hi, >>> With help from the group here, I have been able to set up a search >>> application with payloads enabled. However, there is a noticeable >> increase >>> in query response times with payloads as compared to the same queries >>> without payloads. I am also seeing a lot more disk IO (I have a 7200 rpm >>> disk) and comparatively lesser cpu usage. >>> >>> I am guessing this is because of the use of payloadTermQuery and >>> payloadNearQuery both of which extend SpanQuery formats. SpanQueries >> read >>> the positions index which will be much larger than the index accessed by >> a >>> simple TermQuery. >>> >>> Is there any way of making this system faster without having to >> distribute >>> the index. My index size is hardly 1GB (~200k documents and only one >> field >>> to search in). I am experiencing query times as high as 2 seconds >> (average). >>> >>> Any indications on the direction in which I can experiment will also be >> very >>> helpful. >>> >> >> Yeah, payloads are going to be slower, but how much slower are they for >> you? Are you warming up those queries? >> >> Also, have you done any profiling? >> >> >>> I looked at HathiTrust digital library articles. The methods indicated >> there >>> talk about avoiding reading the positions index (converting PhraseQueries >> to >>> TermQueries). That will not work in my case because, I still have to read >>> the positions index to get the payload information during scoring. Let me >>> know if my understanding is incorrect. >>> >>> >>> Thanks, >>> -Raghu >> >> -- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >> Solr/Lucene: >> http://www.lucidimagination.com/search >> >>
Re: Documents are indexed but not searchable
When searching for *:* I get this response: 0 9 *:* I'm guessing this means the documents aren't really in the index? However, I do get this reply when using the data-config debugger (with commit on): http://pastebin.com/m7a460711 And that obviously states "Indexing completed. Added/Updated: 2 documents. Deleted 0 documents." Do you have any ideas why the index doesn't really have those documents? Thanks in advance! Andreas Evers Noble Paul wrote: > > just search for *:* and see if the docs are indeed there in the index. > --Noble > > -- > - > Noble Paul | Systems Architect| AOL | http://aol.com > > -- View this message in context: http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26875427.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: store content only of documents
: : : content : content : : : : I want to store only "content" into this field but it store other meta data : of a document e.g. "Author", "timestamp", "document type" etc. how can I ask : solr to store only body of document into this field and not other meta data? change your defaultField? -Hoss
RE: Documents are indexed but not searchable
Try using luke - to view contents of index Ankit -Original Message- From: krosan [mailto:kro...@gmail.com] Sent: Monday, December 21, 2009 10:22 AM To: solr-user@lucene.apache.org Subject: Re: Documents are indexed but not searchable When searching for *:* I get this response: 0 9 *:* I'm guessing this means the documents aren't really in the index? However, I do get this reply when using the data-config debugger (with commit on): http://pastebin.com/m7a460711 And that obviously states "Indexing completed. Added/Updated: 2 documents. Deleted 0 documents." Do you have any ideas why the index doesn't really have those documents? Thanks in advance! Andreas Evers Noble Paul wrote: > > just search for *:* and see if the docs are indeed there in the index. > --Noble > > -- > - > Noble Paul | Systems Architect| AOL | http://aol.com > > -- View this message in context: http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26875427.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SOLR Performance Tuning: Disable INFO Logging.
> Can you quickly explain what you did to disable INFO-Level? > > I am from a PHP background and am not so well versed in Tomcat or > Java. Is this a section in solrconfig.xml or did you have to edit > Solr Java source and recompile? 1. Create a file called logging.properties with following content (I created it in /home/tomcat/solr folder): .level=INFO handlers= java.util.logging.ConsoleHandler, java.util.logging.FileHandler java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter java.util.logging.FileHandler.level = INFO java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter java.util.logging.ConsoleHandler.level = ALL org.apache.solr.level=SEVERE 2. Modify file tomcat_installation/bin/catalina.sh to include following (as a first line in script): JAVA_OPTS="... ... ... -Djava.util.logging.config.file=/home/tomcat/solr/logging.properties" (this line may include more parameters such as -Xmx8196m for memory, -Dfile.encoding=UTF8 -Dsolr.solr.home=/home/tomcat/solr -Dsolr.data.dir=/home/tomcat/solr for SOLR, etc.) With these settings, SOLR (and Tomcat) will use standard Java 5/6 logging capabilities. Log output will default to standard /logs folder of Tomcat. You may find additional logging configuration settings by google for "Java 5 Logging" etc. > > > 2009/12/20 Fuad Efendi : > > After researching how to configure default SOLR & Tomcat logging, I > finally > > disabled INFO-level for SOLR. > > > > And performance improved at least 7 times!!! ('at least 7' because I > > restarted server 5 minutes ago; caches are not prepopulated yet) > > > > Before that, I had 300-600 ms in HTTPD log files in average, and 4%-8% > I/O > > wait whenever "top" commands shows SOLR on top. > > > > Now, I have 50ms-100ms in average (total response time logged by HTTPD). > > > > > > P.S. > > Of course, I am limited in RAM, and I use slow SATA... server is > moderately > > loaded, 5-10 requests per second. > > > > > > P.P.S. > > And suddenly synchronous I/O by Java/Tomcat Logger slows down > performance > > much higher than read-only I/O of Lucene. > > > > > > > > Fuad Efendi > > +1 416-993-2060 > > http://www.linkedin.com/in/liferay > > > > Tokenizer Inc. > > http://www.tokenizer.ca/ > > Data Mining, Vertical Search > > > > > > > > > > Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search
Re: solr perf
not bad advise ;-) 2009/12/20 Walter Underwood > Here is an idea. Don't make one core per user. Use a field with a user id. > > wunder > > On Dec 20, 2009, at 12:38 PM, Matthieu Labour wrote: > > > Hi > > I have a slr instance in which i created 700 core. 1 Core per user of my > > application. > > The total size of the data indexed on disk is 35GB with solr cores going > > from 100KB and few documents to 1.2GB and 50 000 documents. > > Searching seems very slow and indexing as well > > This is running on a EC2 xtra large instance (6CPU, 15GB Memory, Raid0 > disk) > > I would appreciate if anybody has some tips, articles etc... as what to > do > > to understand and improve performance > > Thank you > > -- Lici ~Java Developer~
Calculate term vector
Hi folks, how can i get term vector from a custom solr query via http request? is this possible? -- Lici ~Java Developer~
RE: Calculate term vector
What version of Solr are you using? Ankit -Original Message- From: Licinio Fernández Maurelo [mailto:licinio.fernan...@gmail.com] Sent: Monday, December 21, 2009 1:40 PM To: solr-user@lucene.apache.org Subject: Calculate term vector Hi folks, how can i get term vector from a custom solr query via http request? is this possible? -- Lici ~Java Developer~
Re: Calculate term vector
See http://wiki.apache.org/solr/TermVectorComponent On Dec 21, 2009, at 1:39 PM, Licinio Fernández Maurelo wrote: > Hi folks, > > how can i get term vector from a custom solr query via http request? is this > possible? > > -- > Lici > ~Java Developer~
RE: Documents are indexed but not searchable
Hey, I just found out that my index is stored in the tomcat/solr dir, while my -Dsolr.solr.home parameter is set to a different place (E drive). The indexing is sent to the tomcat/solr dir, while the searching is done in my E drive. How can I make sure the index is done in the E dir as well? Thanks in advance! Andreas Evers ANKITBHATNAGAR wrote: > > > > Try using luke - to view contents of index > > Ankit > > -- View this message in context: http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26878531.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documents are indexed but not searchable
solrconfig.xml controls where the index is built. Set it there to the absolute path of where you want the index. Erik On Dec 21, 2009, at 2:26 PM, krosan wrote: Hey, I just found out that my index is stored in the tomcat/solr dir, while my -Dsolr.solr.home parameter is set to a different place (E drive). The indexing is sent to the tomcat/solr dir, while the searching is done in my E drive. How can I make sure the index is done in the E dir as well? Thanks in advance! Andreas Evers ANKITBHATNAGAR wrote: Try using luke - to view contents of index Ankit -- View this message in context: http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26878531.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: tire fields and sortMissingLast
On Mon, Dec 21, 2009 at 7:06 AM, Marc Sturlese wrote: > > Should sortMissingLast param be working on trie-fields? Eventually. It's currently not supported though. Here's the comment from the example schema.xml: -Yonik http://www.lucidimagination.com
Re: Adaptive search?
On 12/18/09 2:46 AM, Siddhant Goel wrote: Let say we have a search engine (a simple front end - web app kind of a thing - responsible for querying Solr and then displaying the results in a human readable form) based on Solr. If a user searches for something, gets quite a few search results, and then clicks on one such result - is there any mechanism by which we can notify Solr to boost the score/relevance of that particular result in future searches? If not, then any pointers on how to go about doing that would be very helpful. Hi Siddhant. Solr can't do this out of the box. you would need to use a external field and a custom scoring function to do something like this. regards Ian Thanks, On Thu, Dec 17, 2009 at 7:50 PM, Paul Libbrecht wrote: What can it mean to "adapt to user clicks" ? Quite many things in my head. Do you have maybe a citation that inspires you here? paul Le 17-déc.-09 à 13:52, Siddhant Goel a écrit : Does Solr provide adaptive searching? Can it adapt to user clicks within the search results it provides? Or that has to be done externally?
Re: Document model suggestion
Yes, you would have 'role' as a multi-valued field. When you add someone to a role, you don't have to re-index. That's all. On Thu, Dec 17, 2009 at 12:55 PM, caman wrote: > > Are you suggesting that roles should be maintained in the index? We do manage > out authentication based on roles but at granular level, user rights play a > big role as well. > I know we need to compromise, just need to find a balance. > > Thanks > > > Lance Norskog-2 wrote: >> >> Role-based authentication is one level of sophistication up from >> user-based authentication. Users can have different roles, and >> authentication goes against roles. Documents with multiple viewers >> would be assigned special roles. All users would also have their own >> matching role. >> >> On Tue, Dec 15, 2009 at 10:01 AM, caman >> wrote: >>> >>> Erick, >>> I know what you mean. >>> Wonder if it is actually cleaner to keep the authorization model out of >>> solr index and filter the data at client side based on the user access >>> rights. >>> Thanks all for help. >>> >>> >>> >>> Erick Erickson wrote: Yes, that should work. One hard part is what happens if your authorization model has groups, especially when membership in those groups changes. Then you have to go in and update all the affected docs. FWIW Erick On Tue, Dec 15, 2009 at 12:24 PM, caman wrote: > > Shalin, > > Thanks. much appreciated. > Question about: > "That is usually what people do. The hard part is when some documents > are > shared across multiple users. " > > What do you recommend when documents has to be shared across multiple > users? > Can't I just multivalue a field with all the users who has access to > the > document? > > > thanks > > Shalin Shekhar Mangar wrote: > > > > On Tue, Dec 15, 2009 at 7:26 AM, caman > > wrote: > > > >> > >> Appreciate any guidance here please. Have a master-child table > between > >> two > >> tables 'TA' and 'TB' where form is the master table. Any row in TA > can > >> have > >> multiple row in TB. > >> e.g. row in TA > >> > >> id---name > >> 1---tweets > >> > >> TB: > >> id|ta_id|field0|field1|field2.|field20|created_by > >> 1|1|value1|value2|value2.|value20|User1 > >> > >> > > > >> > >> This works fine and index the data.But all the data for a row in TA > gets > >> combined in one document(not desirable). > >> I am not clear on how to > >> > >> 1) separate a particular row from the search results. > >> e.g. If I search for 'Android' and there are 5 rows for android in > TB > for > >> a > >> particular instance in TA, would like to show them separately to > user > and > >> if > >> the user click on any of the row,point them to an attached URL in > the > >> application. Should a separate index be maintained for each row in > TB?TB > >> can > >> have millions of rows. > >> > > > > The easy answer is that whatever you want to show as results should > be > the > > thing that you index as documents. So if you want to show tweets as > > results, > > one document should represent one tweet. > > > > Solr is different from relational databases and you should not think > about > > both the same way. De-normalization is the way to go in Solr. > > > > > >> 2) How to protect one user's data from another user. I guess I can > keep > a > >> column for a user_id in the schema and append that filter > automatically > >> when > >> I search through SOLR. Any better alternatives? > >> > >> > > That is usually what people do. The hard part is when some documents > are > > shared across multiple users. > > > > > >> Bear with me if these are newbie questions please, this is my first > day > >> with > >> SOLR. > >> > >> > > No problem. Welcome to Solr! > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > > > > > -- > View this message in context: > http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html > Sent from the Solr - User mailing list archive at Nabble.com. > > >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Document-model-suggestion-tp26784346p26799016.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> >> > > -- > View this message in context: > http://old.nabble.com/Document-model-suggestion-tp26784346p26834798.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Re: Document model suggestion
Lance, Makes sense. We are playing around with keeping the security model completely out of Index. We will filter out results before data display based on access rights. But approach you suggested is not ruled out completely. thanks Lance Norskog-2 wrote: > > Yes, you would have 'role' as a multi-valued field. When you add > someone to a role, you don't have to re-index. That's all. > > On Thu, Dec 17, 2009 at 12:55 PM, caman > wrote: >> >> Are you suggesting that roles should be maintained in the index? We do >> manage >> out authentication based on roles but at granular level, user rights play >> a >> big role as well. >> I know we need to compromise, just need to find a balance. >> >> Thanks >> >> >> Lance Norskog-2 wrote: >>> >>> Role-based authentication is one level of sophistication up from >>> user-based authentication. Users can have different roles, and >>> authentication goes against roles. Documents with multiple viewers >>> would be assigned special roles. All users would also have their own >>> matching role. >>> >>> On Tue, Dec 15, 2009 at 10:01 AM, caman >>> wrote: Erick, I know what you mean. Wonder if it is actually cleaner to keep the authorization model out of solr index and filter the data at client side based on the user access rights. Thanks all for help. Erick Erickson wrote: > > Yes, that should work. One hard part is what happens if your > authorization model has groups, especially when membership > in those groups changes. Then you have to go in and update > all the affected docs. > > FWIW > Erick > > On Tue, Dec 15, 2009 at 12:24 PM, caman > wrote: > >> >> Shalin, >> >> Thanks. much appreciated. >> Question about: >> "That is usually what people do. The hard part is when some >> documents >> are >> shared across multiple users. " >> >> What do you recommend when documents has to be shared across multiple >> users? >> Can't I just multivalue a field with all the users who has access to >> the >> document? >> >> >> thanks >> >> Shalin Shekhar Mangar wrote: >> > >> > On Tue, Dec 15, 2009 at 7:26 AM, caman >> > wrote: >> > >> >> >> >> Appreciate any guidance here please. Have a master-child table >> between >> >> two >> >> tables 'TA' and 'TB' where form is the master table. Any row in TA >> can >> >> have >> >> multiple row in TB. >> >> e.g. row in TA >> >> >> >> id---name >> >> 1---tweets >> >> >> >> TB: >> >> id|ta_id|field0|field1|field2.|field20|created_by >> >> 1|1|value1|value2|value2.|value20|User1 >> >> >> >> >> > >> >> >> >> This works fine and index the data.But all the data for a row in >> TA >> gets >> >> combined in one document(not desirable). >> >> I am not clear on how to >> >> >> >> 1) separate a particular row from the search results. >> >> e.g. If I search for 'Android' and there are 5 rows for android in >> TB >> for >> >> a >> >> particular instance in TA, would like to show them separately to >> user >> and >> >> if >> >> the user click on any of the row,point them to an attached URL in >> the >> >> application. Should a separate index be maintained for each row in >> TB?TB >> >> can >> >> have millions of rows. >> >> >> > >> > The easy answer is that whatever you want to show as results should >> be >> the >> > thing that you index as documents. So if you want to show tweets as >> > results, >> > one document should represent one tweet. >> > >> > Solr is different from relational databases and you should not >> think >> about >> > both the same way. De-normalization is the way to go in Solr. >> > >> > >> >> 2) How to protect one user's data from another user. I guess I can >> keep >> a >> >> column for a user_id in the schema and append that filter >> automatically >> >> when >> >> I search through SOLR. Any better alternatives? >> >> >> >> >> > That is usually what people do. The hard part is when some >> documents >> are >> > shared across multiple users. >> > >> > >> >> Bear with me if these are newbie questions please, this is my >> first >> day >> >> with >> >> SOLR. >> >> >> >> >> > No problem. Welcome to Solr! >> > >> > -- >> > Regards, >> > Shalin Shekhar Mangar. >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/Document-model-suggestion-tp26784346p26798445.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/Document-model-suggestion-
Re: Adaptive search?
Solr does have the ExternalFileField available. You could track existing clicks from the container search log and generate a file to be used with ExternalFileField. http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html In the solr source, trunk/src/test/test-files/solr/conf/schema11.xml and schema-trie.xml show how to use it. On Mon, Dec 21, 2009 at 12:39 PM, Ian Holsman wrote: > On 12/18/09 2:46 AM, Siddhant Goel wrote: >> >> Let say we have a search engine (a simple front end - web app kind of a >> thing - responsible for querying Solr and then displaying the results in a >> human readable form) based on Solr. If a user searches for something, gets >> quite a few search results, and then clicks on one such result - is there >> any mechanism by which we can notify Solr to boost the score/relevance of >> that particular result in future searches? If not, then any pointers on >> how >> to go about doing that would be very helpful. >> > > Hi Siddhant. > Solr can't do this out of the box. > you would need to use a external field and a custom scoring function to do > something like this. > > regards > Ian >> >> Thanks, >> >> On Thu, Dec 17, 2009 at 7:50 PM, Paul Libbrecht >> wrote: >> >> >>> >>> What can it mean to "adapt to user clicks" ? Quite many things in my >>> head. >>> Do you have maybe a citation that inspires you here? >>> >>> paul >>> >>> >>> Le 17-déc.-09 à 13:52, Siddhant Goel a écrit : >>> >>> >>> Does Solr provide adaptive searching? Can it adapt to user clicks within >>> the search results it provides? Or that has to be done externally? >>> >>> >> >> > > -- Lance Norskog goks...@gmail.com
Re: solr perf
Have you tried loading solr instances as you need them and unloading those that are not being used? I wish I could help more, I don't know many people running that many use cores. didier On Sun, Dec 20, 2009 at 2:38 PM, Matthieu Labour wrote: > Hi > I have a slr instance in which i created 700 core. 1 Core per user of my > application. > The total size of the data indexed on disk is 35GB with solr cores going > from 100KB and few documents to 1.2GB and 50 000 documents. > Searching seems very slow and indexing as well > This is running on a EC2 xtra large instance (6CPU, 15GB Memory, Raid0 disk) > I would appreciate if anybody has some tips, articles etc... as what to do > to understand and improve performance > Thank you >
Solr replication 1.3 issue
Hi All, We're trying to replicate indexes on Solr 1.3 across from Dev->QA->Staging->Prod etc. So at each stage other than Dev and Prod, each would live as a master and a slave at a given time. We hit a bottle neck (may be?) when we try to start rsyncd-start on the master from the slave machine. Commands used: ssh -o StrictHostKeyChecking=no ad...@192.168.22.1 /solr/SolrHome/bin/rsyncd-enable ssh -o StrictHostKeyChecking=no ad...@192.168.22.1 /solr / SolrHome /bin/rsyncd-start -p 18003 On slave following error is displayed: @RSYNCD: 29 @ERROR: protocol startup error On master logs following were found: 2009/12/21 22:46:05 enabled by admin 2009/12/21 22:46:05 command: / solr/SolrHome /bin/rsyncd-enable 2009/12/21 22:46:05 ended (elapsed time: 0 sec) 2009/12/21 22:46:09 started by admin 2009/12/21 22:46:09 command: /solr/SolrHome/bin/rsyncd-start -p 18993 2009/12/21 22:46:09 [16964] forward name lookup for devserver002 failed: ai_family not supported 2009/12/21 22:46:09 [16964] connect from UNKNOWN (localhost) 2009/12/21 22:46:29 [16964] rsync: connection unexpectedly closed (0 bytes received so far) [receiver] 2009/12/21 22:46:29 [16964] rsync error: error in rsync protocol data stream (code 12) at io.c(463) [receiver=2.6.8] 2009/12/21 22:46:44 rsyncd not accepting connections, exiting 2009/12/21 22:46:57 enabled by admin 2009/12/21 22:46:57 command: /solr/SolrHome/bin/rsyncd-enable 2009/12/21 22:46:57 rsyncd already currently enabled 2009/12/21 22:46:57 exited (elapsed time: 0 sec) 2009/12/21 22:47:00 started by admin 2009/12/21 22:47:00 command: /solr/SolrHome/bin/rsyncd-start -p 18993 2009/12/21 22:47:00 [17115] forward name lookup for devserver002 failed: ai_family not supported 2009/12/21 22:47:00 [17115] connect from UNKNOWN (localhost) 2009/12/21 22:49:18 rsyncd not accepting connections, exiting Is it not possible to start the rsync daemon on master from the slave? The user that we use is on the sudoers list as well. Thanks Madu
Multi Solr
Hi all! I have developed Solr on Tomcat, but now I want to building many Solr on only one Tomcat server.Is that can be done or not??? -- View this message in context: http://old.nabble.com/Multi-Solr-tp26884086p26884086.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi Solr
Based on your need you can choose one of the options listed at http://wiki.apache.org/solr/MultipleIndexes - Raghu On Tue, Dec 22, 2009 at 10:46 AM, Olala wrote: > > Hi all! > > I have developed Solr on Tomcat, but now I want to building many Solr on > only one Tomcat server.Is that can be done or not??? > -- > View this message in context: > http://old.nabble.com/Multi-Solr-tp26884086p26884086.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Adaptive search?
On Mon, Dec 21, 2009 at 3:36 PM, Lance Norskog wrote: > Solr does have the ExternalFileField available. You could track > existing clicks from the container search log and generate a file to > be used with ExternalFileField. > > http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html > > In the solr source, trunk/src/test/test-files/solr/conf/schema11.xml > and schema-trie.xml show how to use it. This approach will be limited to applying a "global" rank to all the documents, which may have some unintended consequences. The most popular document in your index will be the most popular, even for queries for which it was never clicked on. We've currently been working on this problem in our own implementation and implemented it using a FunctionQuery (http://wiki.apache.org/solr/FunctionQuery). We create a ValueSourceParser and hook it into our Solr config: /path/to/popularity_file.xml Then we use the new function in our request handler(s): ... qpop(id) The QueryPopularity class takes the current (normalized) query and indexes into popularity_file.xml to find out what document IDs (it uses the "id" field because that's what we specified in the arguments to "qpop", you could use any field you want) are popular for the current query. Documents which are popular, get a score greater than zero proportional to their popularity. We do offline processing every night to build the mappings of query -> popular ID and push that file to our machines. QueryPopularity has a background thread, which periodically refreshes the in-memory copy of the XML file's contents. The main difference is that this is a two-level hash (query -> id -> score), whereas the ExternalFileField appears to be a one-level hash (id -> score). Ryan