Re: The best way to exclude "seen" results from search queries

2015-06-11 Thread Upayavira
On Thu, Jun 11, 2015, at 07:20 PM, amid wrote: > Thanks Charles, > > We though of using multi-valued field but got the feeling it will not be > small as our data will grow. > Another issue with multi-valued field is that you can't create complex > join > query, while using a different collection

Re: The best way to exclude "seen" results from search queries

2015-06-11 Thread Upayavira
It is the number of recommendations for a single user that matter. The more there are, the worse the performance. Try it and see is the best way though. I personally would have one doc per recommendation. It will reduce the amount of churn in your index as updating a multivalued field will involve

KeepwordFilter issue

2015-06-11 Thread vineet yadav
Hi, I am using keepword filter to identify key phrases. I have made following schema changes in schema.xml When I am using facet query on keyphrase field( http://localhost:8983/solr/core1/select?q=*%3A

Increase the suggester len size

2015-06-11 Thread Zheng Lin Edwin Yeo
Hi, I'm facing some issues with my suggester for the content field. As my content is indexed from rich text documents which is quite large, I got the following error when I tried to build the suggester using /suggesthandler?suggest.build=true len must be <= 32767; got 35578 Is there anyway to

Re: Show all fields in Solr highlighting output

2015-06-11 Thread Zheng Lin Edwin Yeo
Thank you for the info, Will try to implement it. Regards, Edwin On 12 June 2015 at 01:32, Reitzel, Charles wrote: > Moving the highlighted snippets to the main response is a bad thing for > some applications. E.g. if you do any sorting or searching on the returned > fields, you need to use th

RE: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Reitzel, Charles
Yes. Typically, the content file is used to populate a single field in each document, e.g. "content". Typically, this field is the primary target for searches.Sometimes, additional metadata (title, author, etc.) can be extracted from the source files. But the idea remains the same: the t

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
The filepath is the key in both the filesystem and the database -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211253.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
Both sources, the filesystem and the database, contain the file path for each individual file -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211251.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
So you're saying I could merge both the metadata in the database and their files in the file system into one query-able item in solr by just customizing the DIH correctly and getting the right schema? (I'm sorry this sounds like a redundant question but I've been trying to find an answer for the

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Jack Krupansky
One question is which source defines the key - do you crawl the files and then look up the file name in the database, or scan the database and there is a field to specify the file name? IOW, given a database key, is there a fixed method to determine the file name path? And vice versa. -- Jack Kru

Lucene/Solr Revolution 2015 Voting

2015-06-11 Thread Yonik Seeley
Hey Folks, If you're interested in going to Lucene/Solr Revolution this year in Austin, please vote for the sessions you would like to see! https://lucenerevolution.uservoice.com/ -Yonik

RE: The best way to exclude "seen" results from search queries

2015-06-11 Thread amid
Thanks Charles, We though of using multi-valued field but got the feeling it will not be small as our data will grow. Another issue with multi-valued field is that you can't create complex join query, while using a different collection with document with more than one field (e.g. recommendation_da

How to index/search without whitespace but hightlight with whitespace?

2015-06-11 Thread Travis
Hey everyone! I'm trying to setup a Solr instance on some free text clinical data. This data has a lot of white space formatting, for example, I might have a document that contains unstructured bulleted lists or section titles. For example, blah blah blah... MEDICATIONS: * Xanax * Phenobritrol

RE: Show all fields in Solr highlighting output

2015-06-11 Thread Reitzel, Charles
Moving the highlighted snippets to the main response is a bad thing for some applications. E.g. if you do any sorting or searching on the returned fields, you need to use the original values. The same is true if any of the values are used as a key into some other system or table lookup. Spe

RE: The best way to exclude "seen" results from search queries

2015-06-11 Thread Reitzel, Charles
So long as the fields are indexed, I think performance should be ok. Personally, I would also look at using a single document per user with a multi-valued field for recommendation ID. Assuming only a small fraction of all recommendation IDs are ever presented to any single user, this schema wo

RE: The best way to exclude "seen" results from search queries

2015-06-11 Thread amid
Thanks allot Charles, This seems to be what I'm looking for. Do you know if join for this amount of documents & user will still have good query performance? also, is there any limitations for the solr architecture once using the "join" method (i.e. sharding)? Many thanks, Ami -- View this mess

Exact phrase search not working

2015-06-11 Thread Mike Thomsen
This is my field definition: Then I query for this exact phrase (which I can see in various documents) and get no results... my_field: "baltimore po

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Alessandro Benedetti
I agree with all the ideas so far explained, but actually I would have suggested the DIH ( Data Import Handler) as a first plan. It does already allow out of the box indexing from different datasources. It supports Jdbc datasources with extensive processors and it does support also a file system d

Re: DocValues memory consumption thoughts

2015-06-11 Thread Alessandro Benedetti
m DocValues actually is an un-inverted index that is built as part of the segment. This means that it has the same behaviour of the other segments files. Assuming you are indexing not a compound segment file but a classic multi filed segment in a NRTCachingDirectory, The segment is built in mem

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Erick Erickson
Here's a skeleton that uses Tika from a SolrJ client. It mixes in a database too, but the parts are pretty separate. https://lucidworks.com/blog/indexing-with-solrj/ Best, Erick On Thu, Jun 11, 2015 at 7:14 AM, Paden wrote: > You were very VERY helpful. Thank you very much. If I could bug you f

DocValues memory consumption thoughts

2015-06-11 Thread adfel70
I am using DocValues and I am wondering how to configure Solr's processes java's heap size: does DocValues uses system cache (off heap memory) or heap memory? should I take DocValues into consideration when I calculate heap parameters (xmx, xmn, xms...)? -- View this message in context: http:/

Re: Adding applicative cache to SolrSearcher

2015-06-11 Thread adfel70
Works great, thanks guys! Missed the leafReader because I looked at IndexSearcher instead of SolrIndexSearcher... -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-applicative-cache-to-SolrSearcher-tp4211012p4211183.html Sent from the Solr - User mailing list archive at

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
You were very VERY helpful. Thank you very much. If I could bug you for one last question. Do you know where the documentation is that would help me write my own indexer? -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp42111

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Charlie Hull
On 11/06/2015 14:57, Paden wrote: So you're saying that Tika can parse the text OUTSIDE of Solr. So I would still be able to process my PDF's with Tika just outside of Solr specifically correct? Yes. Charlie -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
So you're saying that Tika can parse the text OUTSIDE of Solr. So I would still be able to process my PDF's with Tika just outside of Solr specifically correct? -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p421117

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Charlie Hull
On 11/06/2015 14:38, Paden wrote: I do have a link between both sets of data and that would be the filepath that could be indexed by both. Great. I do, however, have large PDF's that do need to be indexed. So just for clarification, I could write an indexer that used both the DIH and SolrCell

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
I do have a link between both sets of data and that would be the filepath that could be indexed by both. I do, however, have large PDF's that do need to be indexed. So just for clarification, I could write an indexer that used both the DIH and SolrCell to submit a combined record to Solr or would

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Charlie Hull
On 11/06/2015 14:19, Paden wrote: I'm trying to figure out if Solr is a good fit for my project. I have two sets of data. On the one hand there is a bunch of files sitting in a local file system in a Linux file system. On the other is a set of metadata FOR the files that is located in a MySQL da

Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
I'm trying to figure out if Solr is a good fit for my project. I have two sets of data. On the one hand there is a bunch of files sitting in a local file system in a Linux file system. On the other is a set of metadata FOR the files that is located in a MySQL database. I need a program that can

Re: Separate network interfaces for inter-node communication and update/search requests?

2015-06-11 Thread Anirudha Jadhav
Modern network interfaces are pretty capable. I would doubt this optimization would yield any performance improvements. I would love to see some test results which prove me wrong. is performance the primary reason for this? or do you have any other reasons. -Ani On Thu, Jun 11, 2015 at 9:04 AM,

Re: Problem with german hyphenated words not being found

2015-06-11 Thread Thomas Michael Engelke
Thank you for your input. Here's how the query looks with debugQuery=true: "rawquerystring": "name:industrie-anhänger", "querystring": "name:industrie-anhänger", "parsedquery": "MultiPhraseQuery(name:"(industrie-anhang industri) (anhang industrieanhang)")", "parsedquery_toString": "name:"(indu

Re: Separate network interfaces for inter-node communication and update/search requests?

2015-06-11 Thread Shawn Heisey
On 6/11/2015 6:47 AM, MOIS Martin (MORPHO) wrote: > is it possible to separate the network interface for inter-node communication > from the network interface for update/search requests? If so I could put two > network cards in each machine and route the index and search traffic over the > first

Separate network interfaces for inter-node communication and update/search requests?

2015-06-11 Thread MOIS Martin (MORPHO)
Hello, is it possible to separate the network interface for inter-node communication from the network interface for update/search requests? If so I could put two network cards in each machine and route the index and search traffic over the first interface and the traffic for the inter-node comm

Re: Phrase Highlighter + Surround Query Parser

2015-06-11 Thread Salman Akram
Picking up this thread again... When you said 'stock one' you meant in built surround Query parser of customized? We already use usePhrasehighlighter=true. On Mon, Aug 4, 2014 at 10:38 AM, Ahmet Arslan wrote: > Hi, > > You are using a customized surround query parser, right? > > Did you check/

Re: DocTransformers for restructuring output, e.g. Highlighting

2015-06-11 Thread Upayavira
Yes! It only needs to be done! On Thu, Jun 11, 2015, at 11:38 AM, Ahmet Arslan wrote: > Hi Upayavira, > > I was going to suggest SOLR-3479 to Edwin, I saw your old post. > > Regarding your suggestion, there is an existing ticket : > https://issues.apache.org/jira/browse/SOLR-3479 > > I think S

Re: Problem with german hyphenated words not being found

2015-06-11 Thread Upayavira
The next thing to do is add debugQuery=true to your URL (or enable it in the query pane of the admin UI). Then look for the parsed query info. On the standard text_en field which includes an English stop word filter, I ran a query on "Jack and Jill's House" which showed this output: "rawquery

Re: Show all fields in Solr highlighting output

2015-06-11 Thread Ahmet Arslan
Hi Edwin, I think Highlighting Behaviour of those types shifts over time. May be we should do the reverse. Move snippets to main response: https://issues.apache.org/jira/browse/SOLR-3479 Ahmet On Thursday, June 11, 2015 11:23 AM, Zheng Lin Edwin Yeo wrote: Hi Ahmet, I've tried that, but i

Re: DocTransformers for restructuring output, e.g. Highlighting

2015-06-11 Thread Ahmet Arslan
Hi Upayavira, I was going to suggest SOLR-3479 to Edwin, I saw your old post. Regarding your suggestion, there is an existing ticket : https://issues.apache.org/jira/browse/SOLR-3479 I think SOLR-7665 is also relevant to your question. Ahmet On Sunday, June 23, 2013 9:54 PM, Upayavira wr

Re: Problem with german hyphenated words not being found

2015-06-11 Thread Upayavira
Have you used the analysis tab in the admin UI? You can type in sentences for both index and query time and see how they would be analysed by various fields/field types. Once you have got index time and query time to result in the same tokens at the end of the analysis chain, you should start seei

Problem with german hyphenated words not being found

2015-06-11 Thread Thomas Michael Engelke
Hey, in german, you can string most nouns together by using hyphens, like this: Industrie = industry Anhänger = trailer Industrie-Anhänger = trailer for industrial use Here [1], you can see me querying "Industrieanhänger" from the "name" field (name:Industrieanhänger), to make sure the index a

Re: Indexing issue - index get deleted

2015-06-11 Thread Alessandro Benedetti
Hi Chris, Amazing Analysis ! I did actually not investigated the log, because I was first trying to get more information from the user. "We are running full import and delta import crons . Fulll index once a day delta index : every 10 mins last night my index automatically deleted(numdocs=0).

Re: Show all fields in Solr highlighting output

2015-06-11 Thread Zheng Lin Edwin Yeo
Hi Ahmet, I've tried that, but it's still not able to show. Those fields are actually of type=float, type=date and type=int. By default those field type are not able to be highlighted? Regards, Edwin On 11 June 2015 at 15:03, Ahmet Arslan wrote: > Hi Edwin, > > hl.alternateField is probabl

Re: Indexing issue - index get deleted

2015-06-11 Thread Midas A
Thanks . for replying .. please find the data-config On Thu, Jun 11, 2015 at 6:06 AM, Chris Hostetter wrote: > > : The guys was using delta import anyway, so maybe the problem is > : different and not related to the clean. > > that's not what the logs say. > > Here's what i see... > > Log beg

Re: Show all fields in Solr highlighting output

2015-06-11 Thread Ahmet Arslan
Hi Edwin, hl.alternateField is probably what you are looking for. ahmet On Thursday, June 11, 2015 5:38 AM, Zheng Lin Edwin Yeo wrote: Hi, Is it possible to list all the fields in the highlighting portion in the output? Currently,even when I *, it only shows fields where highlighting is po