On Thu, Jun 11, 2015, at 07:20 PM, amid wrote:
> Thanks Charles,
>
> We though of using multi-valued field but got the feeling it will not be
> small as our data will grow.
> Another issue with multi-valued field is that you can't create complex
> join
> query, while using a different collection
It is the number of recommendations for a single user that matter. The
more there are, the worse the performance. Try it and see is the best
way though.
I personally would have one doc per recommendation. It will reduce the
amount of churn in your index as updating a multivalued field will
involve
Hi,
I am using keepword filter to identify key phrases. I have made following
schema changes in schema.xml
When I am using facet query on keyphrase field(
http://localhost:8983/solr/core1/select?q=*%3A
Hi,
I'm facing some issues with my suggester for the content field.
As my content is indexed from rich text documents which is quite large, I
got the following error when I tried to build the suggester using
/suggesthandler?suggest.build=true
len must be <= 32767; got 35578
Is there anyway to
Thank you for the info, Will try to implement it.
Regards,
Edwin
On 12 June 2015 at 01:32, Reitzel, Charles
wrote:
> Moving the highlighted snippets to the main response is a bad thing for
> some applications. E.g. if you do any sorting or searching on the returned
> fields, you need to use th
Yes. Typically, the content file is used to populate a single field in each
document, e.g. "content". Typically, this field is the primary target for
searches.Sometimes, additional metadata (title, author, etc.) can be
extracted from the source files. But the idea remains the same: the t
The filepath is the key in both the filesystem and the database
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211253.html
Sent from the Solr - User mailing list archive at Nabble.com.
Both sources, the filesystem and the database, contain the file path for each
individual file
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211251.html
Sent from the Solr - User mailing list archive at Nabble.com.
So you're saying I could merge both the metadata in the database and their
files in the file system into one query-able item in solr by just
customizing the DIH correctly and getting the right schema?
(I'm sorry this sounds like a redundant question but I've been trying to
find an answer for the
One question is which source defines the key - do you crawl the files and
then look up the file name in the database, or scan the database and there
is a field to specify the file name? IOW, given a database key, is there a
fixed method to determine the file name path? And vice versa.
-- Jack Kru
Hey Folks,
If you're interested in going to Lucene/Solr Revolution this year in Austin,
please vote for the sessions you would like to see!
https://lucenerevolution.uservoice.com/
-Yonik
Thanks Charles,
We though of using multi-valued field but got the feeling it will not be
small as our data will grow.
Another issue with multi-valued field is that you can't create complex join
query, while using a different collection with document with more than one
field (e.g. recommendation_da
Hey everyone!
I'm trying to setup a Solr instance on some free text clinical data.
This data has a lot of white space formatting, for example, I might have a
document that contains unstructured bulleted lists or section titles.
For example,
blah blah blah...
MEDICATIONS:
* Xanax
* Phenobritrol
Moving the highlighted snippets to the main response is a bad thing for some
applications. E.g. if you do any sorting or searching on the returned fields,
you need to use the original values. The same is true if any of the values
are used as a key into some other system or table lookup. Spe
So long as the fields are indexed, I think performance should be ok.
Personally, I would also look at using a single document per user with a
multi-valued field for recommendation ID. Assuming only a small fraction of
all recommendation IDs are ever presented to any single user, this schema wo
Thanks allot Charles,
This seems to be what I'm looking for.
Do you know if join for this amount of documents & user will still have good
query performance? also, is there any limitations for the solr architecture
once using the "join" method (i.e. sharding)?
Many thanks,
Ami
--
View this mess
This is my field definition:
Then I query for this exact phrase (which I can see in various documents)
and get no results...
my_field: "baltimore po
I agree with all the ideas so far explained, but actually I would have
suggested the DIH ( Data Import Handler) as a first plan.
It does already allow out of the box indexing from different datasources.
It supports Jdbc datasources with extensive processors and it does support
also a file system d
m DocValues actually is an un-inverted index that is built as part of
the segment.
This means that it has the same behaviour of the other segments files.
Assuming you are indexing not a compound segment file but a classic multi
filed segment in a NRTCachingDirectory,
The segment is built in mem
Here's a skeleton that uses Tika from a SolrJ client. It mixes in
a database too, but the parts are pretty separate.
https://lucidworks.com/blog/indexing-with-solrj/
Best,
Erick
On Thu, Jun 11, 2015 at 7:14 AM, Paden wrote:
> You were very VERY helpful. Thank you very much. If I could bug you f
I am using DocValues and I am wondering how to configure Solr's processes
java's heap size: does DocValues uses system cache (off heap memory) or heap
memory? should I take DocValues into consideration when I calculate heap
parameters (xmx, xmn, xms...)?
--
View this message in context:
http:/
Works great, thanks guys!
Missed the leafReader because I looked at IndexSearcher instead of
SolrIndexSearcher...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Adding-applicative-cache-to-SolrSearcher-tp4211012p4211183.html
Sent from the Solr - User mailing list archive at
You were very VERY helpful. Thank you very much. If I could bug you for one
last question. Do you know where the documentation is that would help me
write my own indexer?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp42111
On 11/06/2015 14:57, Paden wrote:
So you're saying that Tika can parse the text OUTSIDE of Solr. So I would
still be able to process my PDF's with Tika just outside of Solr
specifically correct?
Yes.
Charlie
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-Sets
So you're saying that Tika can parse the text OUTSIDE of Solr. So I would
still be able to process my PDF's with Tika just outside of Solr
specifically correct?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p421117
On 11/06/2015 14:38, Paden wrote:
I do have a link between both sets of data and that would be the filepath
that could be indexed by both.
Great.
I do, however, have large PDF's that do need to be indexed. So just for
clarification, I could write an indexer that used both the DIH and SolrCell
I do have a link between both sets of data and that would be the filepath
that could be indexed by both.
I do, however, have large PDF's that do need to be indexed. So just for
clarification, I could write an indexer that used both the DIH and SolrCell
to submit a combined record to Solr or would
On 11/06/2015 14:19, Paden wrote:
I'm trying to figure out if Solr is a good fit for my project.
I have two sets of data. On the one hand there is a bunch of files sitting
in a local file system in a Linux file system. On the other is a set of
metadata FOR the files that is located in a MySQL da
I'm trying to figure out if Solr is a good fit for my project.
I have two sets of data. On the one hand there is a bunch of files sitting
in a local file system in a Linux file system. On the other is a set of
metadata FOR the files that is located in a MySQL database.
I need a program that can
Modern network interfaces are pretty capable. I would doubt this
optimization would yield any performance improvements.
I would love to see some test results which prove me wrong.
is performance the primary reason for this? or do you have any other
reasons.
-Ani
On Thu, Jun 11, 2015 at 9:04 AM,
Thank you for your input. Here's how the query looks with
debugQuery=true:
"rawquerystring": "name:industrie-anhänger",
"querystring": "name:industrie-anhänger",
"parsedquery": "MultiPhraseQuery(name:"(industrie-anhang industri)
(anhang industrieanhang)")",
"parsedquery_toString": "name:"(indu
On 6/11/2015 6:47 AM, MOIS Martin (MORPHO) wrote:
> is it possible to separate the network interface for inter-node communication
> from the network interface for update/search requests? If so I could put two
> network cards in each machine and route the index and search traffic over the
> first
Hello,
is it possible to separate the network interface for inter-node communication
from the network interface for update/search requests? If so I could put two
network cards in each machine and route the index and search traffic over the
first interface and the traffic for the inter-node comm
Picking up this thread again...
When you said 'stock one' you meant in built surround Query parser of
customized? We already use usePhrasehighlighter=true.
On Mon, Aug 4, 2014 at 10:38 AM, Ahmet Arslan
wrote:
> Hi,
>
> You are using a customized surround query parser, right?
>
> Did you check/
Yes! It only needs to be done!
On Thu, Jun 11, 2015, at 11:38 AM, Ahmet Arslan wrote:
> Hi Upayavira,
>
> I was going to suggest SOLR-3479 to Edwin, I saw your old post.
>
> Regarding your suggestion, there is an existing ticket :
> https://issues.apache.org/jira/browse/SOLR-3479
>
> I think S
The next thing to do is add debugQuery=true to your URL (or enable it in
the query pane of the admin UI). Then look for the parsed query info.
On the standard text_en field which includes an English stop word
filter, I ran a query on "Jack and Jill's House" which showed
this output:
"rawquery
Hi Edwin,
I think Highlighting Behaviour of those types shifts over time. May be we
should do the reverse.
Move snippets to main response: https://issues.apache.org/jira/browse/SOLR-3479
Ahmet
On Thursday, June 11, 2015 11:23 AM, Zheng Lin Edwin Yeo
wrote:
Hi Ahmet,
I've tried that, but i
Hi Upayavira,
I was going to suggest SOLR-3479 to Edwin, I saw your old post.
Regarding your suggestion, there is an existing ticket :
https://issues.apache.org/jira/browse/SOLR-3479
I think SOLR-7665 is also relevant to your question.
Ahmet
On Sunday, June 23, 2013 9:54 PM, Upayavira wr
Have you used the analysis tab in the admin UI? You can type in
sentences for both index and query time and see how they would be
analysed by various fields/field types.
Once you have got index time and query time to result in the same tokens
at the end of the analysis chain, you should start seei
Hey,
in german, you can string most nouns together by using hyphens, like
this:
Industrie = industry
Anhänger = trailer
Industrie-Anhänger = trailer for industrial use
Here [1], you can see me querying "Industrieanhänger" from the "name"
field (name:Industrieanhänger), to make sure the index a
Hi Chris,
Amazing Analysis !
I did actually not investigated the log, because I was first trying to get
more information from the user.
"We are running full import and delta import crons .
Fulll index once a day
delta index : every 10 mins
last night my index automatically deleted(numdocs=0).
Hi Ahmet,
I've tried that, but it's still not able to show.
Those fields are actually of type=float, type=date and type=int.
By default those field type are not able to be highlighted?
Regards,
Edwin
On 11 June 2015 at 15:03, Ahmet Arslan wrote:
> Hi Edwin,
>
> hl.alternateField is probabl
Thanks . for replying ..
please find the data-config
On Thu, Jun 11, 2015 at 6:06 AM, Chris Hostetter
wrote:
>
> : The guys was using delta import anyway, so maybe the problem is
> : different and not related to the clean.
>
> that's not what the logs say.
>
> Here's what i see...
>
> Log beg
Hi Edwin,
hl.alternateField is probably what you are looking for.
ahmet
On Thursday, June 11, 2015 5:38 AM, Zheng Lin Edwin Yeo
wrote:
Hi,
Is it possible to list all the fields in the highlighting portion in the
output?
Currently,even when I *, it only shows fields where
highlighting is po
44 matches
Mail list logo