Hi
Does SolR provide a way to describe synonyms relationships such
"equivalent to" ,"narrower thant", "broader than" ?
It turns out both postgres and oracle do, but I can't find any related
information in the documentation.
This is useful to allow generalizing the terms of the research or not.
Hi
SolR pagination is incredible: you can provide the end user a small set
of results together with the total number of documents found (numFound).
I am wondering if both "parallel SQL" and "Graph Traversal" feature
also provide a pagination mechanism.
As an example, the above SQL :
SELECT id
FR
Hi
I have read here [1] and here [2] that it is possible to highlight only
parent documents in block join queries. But I didn't succeed yet:
So here is my nested document example:
[
{
"id": "2",
"type_s": "parent",
"content_txt": ["apache"],
"_childDocuments_":
hi
This question is highly related to a previous one found on the
mailing-list archive [1].
I have this document:
"content_txt":["001 first","002 second"]
I d'like the below query return nothing:
> q=content_txt:(first AND second)
The method proposed ([1]) by Erick works ok to look for a single
On Sun, Dec 16, 2018 at 09:30:33AM -0800, Erick Erickson wrote:
> Have you looked at ComplexPhraseQueryParser here?
> https://lucene.apache.org/solr/guide/6_6/other-parsers.html
Sure. However, I am using multi-word synonyms and so far the
complexphrase does not handle them. (maybe soon ?)
> Depen
On Sun, Dec 16, 2018 at 05:44:30PM -0800, Erick Erickson wrote:
> No, the idea is that you have N single valued fields, one for each of
> the MV entries you have. The copyField dest would be MV, and only used
> in those cases you wanted to match across values. Not saying that's a
> great solution,
Hi
It turns out that MoreLikeThis handler does not use queryTime synonyms
expansion.
It is only compatible with indexTime synonyms.
However multiword synonyms are only compatible with queryTime synonyms
expansion.
For this reason this does not allow the use of multiword synonyms within
together
On Wed, Dec 26, 2018 at 09:09:02PM -0800, Erick Erickson wrote:
> bq. However multiword synonyms are only compatible with queryTime synonyms
> expansion.
>
> Why do you say this? What version of Solr? Query-time mult-word
> synonyms were _added_, but AFAIK the capability of multi-word synonyms
> w
Hi
I have a numeric field (say "weight") and I d'like to be able to get
results sorted.
q=kind:animal weight:50
pf=kind^2 weight^3
would return:
name=dog, kind=animal, weight=51
name=tiger, kind=animal,weight=150
name=elephant, kind=animal,weight=2000
In other terms how to deal with numeric fie
tanding your question here. if your query is
> > q=kind:animal weight:50 you will get no results, as nothing matches
> > (assuming a q.op of AND)
> >
> >
> > On Thu, Feb 14, 2019 at 4:06 PM Nicolas Paris
> > wrote:
> >
> > > Hi
> > >
> &g
which is a whole different animal and something I don’t
> > think many have experience with, including myself
> >
> >> On Feb 16, 2019, at 10:10 AM, Nicolas Paris
> >> wrote:
> >>
> >> Hi
> >>
> >> Thanks.
> >> To clarify, I
Hello
I wonder if there is a plain text query syntax to say:
give me all document that match:
wonderful pizza NOT peperoni
all those in a 5 distance word bag
then
pizza are wonderful -> would match
I made a wonderful pasta and pizza -> would match
Peperoni pizza are so wonderful -> would not ma
1. Query terms containing other than just letters or digits may be placed
>> within double quotes so that those other characters do not separate a term
>> into many terms. A dot (period) and white space are neither letter nor
>> digit. Examples: "Now is the time for all good men" (spaces, quote
Hello Markus
Thanks !
The ComplexPhraseQueryParser syntax:
q={!complexphrase inOrder=false}collector:"wonderful pizza -peperoni"~5
answers my needs.
BTW,
Apparently it accepts both leading/ending wildcards, that's look powerful
feature.
Any chance it would support the "sow=false" in order to co
Hi
Not realy a direct answer - Never used it, however this feature have
been attractive to me while first looking at uima.
Right now, I would say UIMA connectors in general are by design
a pain to maintain. Source and target often do have optimised
way to bulk export/import data. For example, usi
sorry thought I was on UIMA mailing list.
That being said, my position is the same :
let UIMA folks load data into SolR by using the most optimized way.
(what would be the best way ? Loading jsons ?)
2018-06-19 22:48 GMT+02:00 Nicolas Paris :
> Hi
>
> Not realy a direct answer - Neve
and any files transmitted with it are confidential and
> may be legally privileged, and intended solely for the use of the individual
> or entity to whom they are addressed. If you have received this email in
> error please notify the sender. This email message has been swept for the
> presence of computer viruses.
--
nicolas paris
faster performances for the brute task, I guess I
could artificially limit the FQ under 2M for all queries by getting a
sample (I don't really care having more than 2M documents to build the
word cloud).
I am wondering how I could filter the documents to get approximate facets ?
Thanks !
--
nicolas paris
maybe better than
subsetting with extra random fields
--
nicolas paris
https://lucene.apache.org/solr/guide/8_4/the-stats-component.html#local-parameters-with-the-stats-component
is about hll and facets, but I am not sure that really meet the use
case. I also have to admit that part is quite cryptic to me.
--
nicolas paris
Hi
solr doc [1] says it's only compatible with hdfs 2.x
is that true ?
[1]: http://lucene.apache.org/solr/guide/7_7/running-solr-on-hdfs.html
--
nicolas
.
>
> Kevin Risden
>
>
> On Thu, May 2, 2019 at 9:32 AM Nicolas Paris
> wrote:
>
> > Hi
> >
> > solr doc [1] says it's only compatible with hdfs 2.x
> > is that true ?
> >
> >
> > [1]: http://lucene.apache.org/solr/guide/7_7/running-solr-on-hdfs.html
> >
> > --
> > nicolas
> >
--
nicolas
Hi
I am looking for a way to faster the update of documents.
In my context, the update replaces one of the many existing indexed
fields, and keep the others as is.
Right now, I am building the whole document, and replacing the existing
one by id.
I am wondering if **atomic update feature** woul
Hi
I have several large collections that cannot fit in a standalone solr
instance. They are split over multiple shards in solr-cloud mode.
Those collections are supposed to be joined to an other collection to
retrieve subset. Because I am using distributed collections, I am not
able to use the so
a of that shard must be co-located with every
> replica of the “to” collection.
>
> Have you looked at streaming and “streaming expressions"? It does not have
> the same problem, although it does have its own limitations.
>
> Best,
> Erick
>
> > On Oct 15, 2019,
is 12M
or 1 document in size. So the performance of join looks correlated to
size of joined collection and not the kind of filter applied to it.
I will explore the streaming expressions
On Wed, Oct 16, 2019 at 08:00:43AM +0200, Nicolas Paris wrote:
> > You can certainly replicate the
ing score=none as a local param. Turns another algorithm dragging
> by from side join.
>
> On Wed, Oct 16, 2019 at 11:37 AM Nicolas Paris
> wrote:
>
> > Sadly, the join performances are poor.
> > The joined collection is 12M documents, and the performances are 6k ms
&
Hi community,
Any advice to speed-up updates ?
Is there any advice on commit, memory, docvalues, stored or any tips to
faster things ?
Thanks
On Wed, Oct 16, 2019 at 12:47:47AM +0200, Nicolas Paris wrote:
> Hi
>
> I am looking for a way to faster the update of documents.
>
>
instances, sharding, replication,
> commit timing etc.
>
> > Am 19.10.2019 um 21:52 schrieb Nicolas Paris :
> >
> > Hi community,
> >
> > Any advice to speed-up updates ?
> > Is there any advice on commit, memory, docvalues, stored or any tips to
>
s, Merge Policies)? We, at
> Auto-Suggest, also do atomic updates daily and specifically changing merge
> factor gave us a boost of ~4x during indexing. At current configuration,
> our core atomically updates ~423 documents per second.
>
> On Sun, 20 Oct 2019 at 02:07, Nicolas Paris
>
for indexing that converges to 60 million unique
> documents after atomic updates (full indexing).
>
>
>
> > Would you say atomical update is faster than regular replacement of
> > documents?
>
>
> No, I don't say that. Either of the two configs (autoCommit, Merg
dates might be faster.
The documents are stored within parquet files without any processing
needed. In this case, the atomic update is not likely to faster things.
Thanks
On Wed, Oct 23, 2019 at 07:49:44AM -0600, Shawn Heisey wrote:
> On 10/22/2019 1:12 PM, Nicolas Paris wrote:
> > &
what is your current performance?
>
> Once this is clear further architecture aspects can be derived, such as
> number of spark executors, number of Solr instances, sharding, replication,
> commit timing etc.
>
> > Am 19.10.2019 um 21:52 schrieb Nicolas Paris :
> >
Also we are using stanford POS tagger for french. The processing time is
mitigated by the spark-corenlp package which distribute the process over
multiple node.
Also I am interesting in the way you use POS information within solr
queries, or solr fields.
Thanks,
On Fri, Oct 25, 2019 at 10:42:43A
, and also a query-side POS tagger (must be fast).
>
> --
> Audrey Lorberfeld
> Data Scientist, w3 Search
> IBM
> audrey.lorberf...@ibm.com
>
>
> On 10/25/19, 11:57 AM, "Nicolas Paris" wrote:
>
> Also we are using stanford POS tagger for french. The p
solr/guide/7_3/language-analysis.html#opennlp-part-of-speech-filter
On Fri, Oct 25, 2019 at 06:25:36PM +0200, Nicolas Paris wrote:
> > Do you use the POS tagger at query time, or just at index time?
>
> I have the POS tagger pipeline ready but nothing done yet on the solr
> part. Rig
> If you are someone who wishes the PDF would continue, please share your
> feedback.
I have not particularly explored the documentation format but the
content. However here my thought on this:
Pdf version of solr documentation has two advantages:
1. readable offline
2. make searching easier than
Hello,
I am having trouble with basic auth on a solrcloud instance. When the
collection is only one shard, there is no problem. When the collection
is multiple shard, there is no problem until I ask multiple query
concurrently: I get 401 error and asking for credentials for concurrent
queries.
I
2 - both of those
> bugs are fixed in that version.
>
> Hope that helps,
>
> Jason
>
>
> On Mon, Nov 18, 2019 at 8:26 AM Nicolas Paris
> wrote:
> >
> > Hello,
> >
> > I am having trouble with basic auth on a solrcloud instance. When the
>
Hi Mark,
Have you shared with the community all the weaknesses of solrcloud you
have in mind and the advice to overcome that?
Apparently you wrote most of that code and your feedback would be
helpful for community.
Regards
On Sat, Nov 30, 2019 at 09:31:34PM -0600, Mark Miller wrote:
> I’d also
Hi
>From my understanding, copy fields creates an new indexes from the
copied fields.
>From my tests, I copied 1k textual fields into _text_ with copyFields.
As a result there is no increase in the size of the collection. All the
source fields are indexed and stored. The _text_ field is indexed bu
On Tue, Dec 24, 2019 at 10:59:03AM -0700, Shawn Heisey wrote:
> On 12/24/2019 10:45 AM, Nicolas Paris wrote:
> > From my understanding, copy fields creates an new indexes from the
> > copied fields.
> > From my tests, I copied 1k textual fields into _text_ with copyFields.
&g
e same ! (while the _text_ field is
working correctly)
On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote:
> On 12/24/2019 5:11 PM, Nicolas Paris wrote:
> > Do you mean "copy fields" is only an action of changing the schema ?
> > I was thinking it was adding a
ith/without the _text_ field
>
> > On Dec 25, 2019, at 3:07 AM, Nicolas Paris wrote:
> >
> >
> >>
> >> If you are redoing the indexing after changing the schema and
> >> reloading/restarting, then you can ignore me.
> >
> > I am s
Anyway, that´s good news copy field does not increase indexe size in
some circumstance:
- the copied fields and the target field share the same datatype
- the target field is not stored
this is tested on text fields
On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote:
>
> On We
parate part of the relevant files (.tim, .pos,
> etc). Term frequencies are kept on a _per field_ basis for instance.
>
> So this pretty much has to be small sample size or other measurement error.
>
> Best,
> Erick
>
> > On Dec 26, 2019, at 9:27 AM, Nicolas Paris wrote:
behavior is perfect for
my needs.
On Fri, Dec 27, 2019 at 05:28:25PM -0700, Shawn Heisey wrote:
> On 12/26/2019 1:21 PM, Nicolas Paris wrote:
> > Below a part of the managed-schema. There is 1k section* fields. The
> > second experience, I removed the copyField, droped the collect
> We have implemented the content ingestion and processing pipelines already
> in python and SPARK, so most of the data will be pushed in using APIs.
I use the spark-solr library in production and have looked at the ES
equivalent and the solr connector looks much more advanced for both
loading and
48 matches
Mail list logo