I am trying to find out duplicate records based on distance and phonetic
algorithms. Can I utilize solr for that? I have following fields and
conditions to identify exact or possible duplicates.
1. Fields
prefix
suffix
firstname
lastname
email(primary_email1, email2, email3)
phone(primary_phone1,
bq: This is problematic because some portion of user activity will fail,
queries that are in transit will not complete
This is always interesting to think about, but is it a serious enough
problem to spend resources trying to anticipate? I can imagine situations
where even losing the queries in tr
bq. But tons of people on this mailing list do not recommend AggressiveOpts
It's up to you to decide - that is why it's an option. It will enable more
aggressive options that will tend to perform better. On the other hand,
these more aggressive options and optimizations have a history of being
mor
All,
At my current customer we have developed a custom federator that will
federate queries between Endeca and Solr to ease the transition from an
extremely large (TBs of data) Endeca index to Solr. (Endeca is similar to
Solr in terms of search/faceted navigation/etc).
During this transition pl
It's a single Solr Instance, and in my files, I used 'doc_key' everywhere,
but I changed it to "id" in the email I sent out wanting to make it easier
to read, sorry don't mean to confuse you :)
On Fri, Jan 2, 2015 at 4:06 PM, Alexandre Rafalovitch
wrote:
> On 2 January 2015 at 15:43, wrote:
>
On 2 January 2015 at 15:43, wrote:
> id
Your uniqueKey does not seem to be the 'doc_key' that the URP is asked
to generate. I wonder if that is causing the issue. Are you
deliberately generating a field different from one defined as unique
id?
Regards,
Alex.
Sign up for my Solr resourc
Is this SolrCloud or single Solr Instance?
On Jan 2, 2015 3:44 PM, wrote:
> Happy New Year Everyone :)
>
> I am trying to automatically generate document Id when indexing a csv
> file that contains multiple lines of documents. The desired case: if the
> csv file contains 2 lines (each line is a d
Happy New Year Everyone :)
I am trying to automatically generate document Id when indexing a csv
file that contains multiple lines of documents. The desired case: if the
csv file contains 2 lines (each line is a document), then the index
should contain 2 documents.
What I observed: If the csv fi
Really impossible to say, assuming you're generating correctly-formed
documents I don't see how this would fail. So, here's how I'd approach it:
You're assuming that
1> you're getting all the docs back from server A that you have in there
and
2> you're correctly sending them all to server B
So my
Shawn Heisey [apa...@elyograg.org] wrote:
> All indications are that you should probably turn off the "transparent huge
> pages" feature in the OS if you use them, though.
> https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge
Very interesting. We had severe performance p
On 1/1/2015 10:12 PM, William Bell wrote:
> Do you think setting aside 2GB for UseLargePages would generally help
> indexing or not?
>
> I can imaging it might help
Allocating part of your operating system memory as huge pages and then
turning on UseLargePages probably will help with general
I have a solr cloud setup with two collections A & B with different schemas (
although majority of fields are identical ).
Collection A has ~ 3.6 million documents
Using *solrj 4.7.0 *
As per a requirement, my application
- reads documents from collection A in batches of 10k
- Creates docs of ty
On 1/1/2015 1:09 PM, Meraj A. Khan wrote:
> When running SolrCloud do you even have to include the shards parameter
> ,shouldnt only shards.qt parameter suffice?
If you are using SolrCloud, no shards parameter is required ... all
queries sent to either the collection or any shard replica will
auto
On 1/1/2015 6:35 PM, William Bell wrote:
> But tons of people on this mailing list do not recommend AggressiveOpts
>
> Why do you recommend it?
I haven't done any comparisons with and without it. To call it a
"recommendation" is a little bit strong. I use it, and I am seeing good
results.
My r
Hi folks,
There will be an Open source search devroom[1] at this year's FOSDEM in
Brussels, 31st of January & 1st of February.
I don't know if there will be a Lucene/Solr presence (there's no
schedule for the dev room yet), but this seems like a good place meet up
and talk shop.
I'll be th
15 matches
Mail list logo