Hello,
I'm using solr 4.4. I have a solr core with a schema defining a bunch of
different fields, and among them, a date field:
- date: indexed and stored // the date used at search time
In practice it's a TrieDateField but I think that's not relevant for the
concern.
It also has a multi
Hello Andrea,
I think you face a rather common issue involving keyword tokenization and query
parsing in Lucene:
The query parser splits the input query on white spaces, and then each token is
analysed according to your configuration.
So those queries with a whitespace won't behave as expected be
Hello Adrien,
Looking quickly at your schema, I suspect that the suggestions field isn't
populated, so the suggester dictionary is empty.
How is input sent to that field ? Providing a few sample documents you are
indexing could help understand what is going on.
If you intended to copy content
Hello,
Did you try q=-geodata:[* TO *] ? (Note the '-' (minus))
This reads as "documents without any value for field named geodata".
Also if you plan to use this intensively, you'd better declare a boolean
field telling if geodata are set or not and set a value to each doc,
because the -field_nam
Hello,
I think you don't have that much tuning possiblities using only the
schema.xml file.
You will have to write some custom Java code (subclasses of
UpdateRequestProcessor and UpdateRequestProcessorFactory), build a Java jar
containing your custom code, put that jar in one of the path declared
to do that...
--
Tanguy
2012/9/26 韦震宇
> Hi, Tanguy
> I would do as your suggestion.
> Best Regards!
> Monton
> - Original Message -
> From: "Tanguy Moal"
> To: ;
> Sent: Tuesday, September 25, 2012 11:05 PM
> Subject: Re: How can I create about
That is an interesting issue...
I was wondering if relying on dynamic fields could be an option...
Something like :
* :
* customer : string
* *_field_a1 : type_a
* *_field_a2 : type_a
* *_field_b1 : type_b
* ...
And the prefix each field by the customer name, so for customer1, indexed
documents
Hi,
Did you try issuing a query like : "+Yoga Teacher" (without the
double-quotes) ?
See http://lucene.apache.org/core/3_6_1/queryparsersyntax.html#Boolean
operators for more details one lucene's query parser's syntax.
Hope this helps,
--
Tanguy
2012/9/13 veena rani
> Hi ,
>
> In solr, If i m
Hi Peter,
Yes if you want to do complex things in suggest mode, you'd better rely on
the SearchComponent...
For example, this blog post is a good read
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ ,
if you have complex requirements on the searched fields.
(Although y
If your interest is focusing on the real textual content of a web page, you
could try this : JReadability (https://github.com/ifesdjeen/jReadability ,
Apache 2.0 license), which wraps JSoup (as Lance suggested) and applies a
set of predefined rules to scrap crap (nav, headers, footers, ...) off of
I think it's not possible to combine pivots with facet queries, nor with
facet ranges (or facet dates), please someone correct me if I'm wrong...
I think only "standard" fields are "pivotable" :)
That said, if you always use the same ranges for your DateTime field, you
*could* have a "string" ver
You are correct, it doesn't work :
Queries like :
http://localhost:8983/solr/collection1/select?q=*:*&facet=on&facet.pivot={!ex=a_tag}field1,field2&facet.limit=5&rows=0&fq={!tag=a_tag}field3:"filter";
result in the following response :
400
1
on
*:*
5
{!ex=a_tag}field1,field
Hello Kiran,
I think you can try turning grouping on and group "on", and ask solr to
group on the "Category" field.
Nevertheless, this will *not* ensure you that groups are returned in facet
counts order. This will *not* ensure you the mincount per group neither.
Hope this helps,
--
Tanguy
201
code into a jar and make that jar
accessible to solr, see http://wiki.apache.org/solr/SolrPlugins for how to
plug your custom code into Solr.
The main drawback of that approach is that it will be activated for all
queries and all fields...
--
Tanguy
2012/8/7 Tanguy Moal
> May be it wasn
May be it wasn't clear in my response, sorry!
You can use a different field for searching (qf parameter for dismax) than
the one for highlighting (hl.fl) :
q="a phrase
query"&qf="text_without_termFreqs"&hl=on&hl.fl="text_with_termFreqs".
Scoring will be based on fq's fields only (i.e. those withou
Dear Alexander,
A few questions on stemming support in Solr 3.6.1:
> - Can you do non-English stemming?
>
With solr, many languages are supported, see
http://wiki.apache.org/solr/LanguageAnalysis
- We're using solr.PorterStemFilterFactory on the "text_en" field type. We
> will index a ton of PD
If think you could use a field without the term frequencies for searching,
that will solve your relevancy issues.
You can then have the exact same content in an other field (using a
copyField directive in your schema), having terms frequencies and positions
turned on, and use this particuliar for h
Hi,
I've not tested it by myself but I think that can take advantage of Solr
4's pseudo fields, by adding something like :
&fl=*,geodist(),score
I think you could even pass several geodist() calls with different
parameters if you want to have the distance wrt several POIs ^-^
SOLR 4 only.
--
Ta
gt; problem.
>
> On Mon, Jun 11, 2012 at 10:55 AM, Tanguy Moal
> wrote:
> > There is definitely something interesting to do around geohashes.
> >
> > I'm wondering how one could map the N by N tiles requested tiles to a
> range
> > of geohashes. (Where the ga
There is definitely something interesting to do around geohashes.
I'm wondering how one could map the N by N tiles requested tiles to a range
of geohashes. (Where the gap would be a function of N).
What I try to mean is that I don't know if a bijective function exist
between tiles and geohash rang
Hello,
I think you have to issue a phrase query in such a case because otherwise
each "token" is searched independently in the merchant field : the query
parser splits the query on spaces!
Check the difference between debug outputs when you search for "Jones New
York", you'd get what you expected.
Hello Elisabeth,
Wouldn't it be more simple to have a custom component inside of the
front-end to your search server that would transform a query like <> into <<"hotel de ville" paris>> (I.e. turning each
occurence of the sequence "hotel de ville" into a phrase query ) ?
Concerning protections in
It all depends on the frequency at which you refresh your data, on your
deployment (master/slave setup), ...
Many things need to be taken into account!
Did you face any performance issue while building your index?
If you didn't, rebuilding it shouldn't be more problematic.
--
Tanguy
2012/5/22 So
Hello,
Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data)
be a combination of the master product id and the child product id ?
Therefor whenever you update your master product db entry, you simply need
to reindex documents depending on the master product entry.
You can ev
d
> FrenchLightStemmer is the only one of them that does this arbitrary
> duplicate sequence compression. (FinnishLightStemmer does repetition
> compression too, but restricts the operation to chars 'k', 'p', and 't'.)
>
> Thanks,
> Steve
>
>
Any idea someone ?
I think this is important since this could produce weird results on
collections with numbers mixed in text.
>From my understanding, there are a few options to address the issue :
1) Make *LightStemmer token type aware and don't try to stem on things that
are not text (alpha/alp
Hello,
>From the response you pasted here, it looks like the field
"itemNoExactMatchStr"
never matched.
Can you try matching in that field only and ensure you have matches ? Given
the ^30 boost, you should have high scores on this field...
Hope this helps,
--
Tanguy
2012/5/15 geeky2
> Hello a
Dear list,
I recently figured out that the FrenchLightStemFilterFactory performs
some interestingly undocumented normalization on tokens...
There's a norm() helper called for each produced token that performs,
amongst other things, deletions on repeated characters... Only for
tokens with mor
Hello Franck,
I've had the same issue in the past.
I addressed that by adding a random value to each document.
I use this value in the "bf" parameter, so that the random value alters
more or less the documents' score.
This results in a natural shuffling of documents which had the same
score
I think you're using PHP to request solr.
You can ask solr to respond in several different formats (xml, json,
php, ...), see http://wiki.apache.org/solr/QueryResponseWriter .
Depending on how you connect to solr from php, you may want to use
html_entity_decode before using mb_substr.
--
Ta
That's because of the space.
If you want to include the space in the search query (performing exact
match), then use double quotes around your search terms :
q=multiplex_name:"Agent Vinod"
Online documentation :
* http://wiki.apache.org/solr/SolrQuerySyntax
*
http://lucene.apache.org/core/ol
Hello Roberto,
Exact match needs extra " (double-quotes) surrounding the exact
thing you want to query in the id field.
Give a try to a query like this :
id:"http://127.0.0.1:/my/personal/testuser/Personal
Documents/cal9.pdf"
See this wiki page :
Hi all,
I think that depending on the language detector implemention, things may
vary...
For Tika, it performs better with longer inputs than shorter ones (as it
seems to depend on the probabilistic distribution of ngrams -- of
different sizes -- to perform distance computations with precomput
Hi Gian Marco,
I don't know if it's possible to exploit documents' boost values from
function queries (see http://wiki.apache.org/solr/FunctionQuery), but if
you store your boost in a search-able numeric field, you could either :
do
q=*:* AND _val_:"your_boost_field"
if you're using default
How are you sending documents to solr ?
If you push solr input documents via HTTP (which is what SolrJ does),
you could increase CPU consumption (and therefor reduce indexing time)
by sending your update requests asynchronously, using multiple updating
threads, to your single solr core.
Some
Dear ML,
I'm performing some developments relying on spatial capabilities of solr.
I'm using Solr 3.5, have been reading
http://wiki.apache.org/solr/SpatialSearch#Spatial_Query_Parameters and
have the basic behaviours I wanted working.
I use geofilt on a latlong field, with geodist() in the b
get the results-set you were expected.
You might then want to sort on that field, and this time my previous
answer could help ;-).
Sorry for confusing you!
Le 04/01/2012 09:32, Tanguy Moal a écrit :
Hello,
If the number stored is not in a string field, you will need solr >=
3.5 to perf
Hello,
If the number stored is not in a string field, you will need solr >= 3.5
to perform what you want.
Since solr 3.5 it's possible to set the attribute sortMissingLast or
sortMissingFirst to true, within the field definition (an example is
available in the schema.xml provided with solr 3
Hello Alexander,
I don't know much about your requirements in terms of size and
performances, but I've had a similar use case and found a pretty simple
workaround.
If your duplicate rate is not too high, you can have the
SignatureProcessor to generate fingerprint of documents (you already did
Dear list,
I'd like to bounce on that issue...
IMHO, configuration parsing could be a little bit stricter... At least,
what stands for a "severe" configuration error could be user-defined.
Let me give some examples that are common errors and that don't trigger
the "abortOnConfigurationError"
Le 21/12/2011 23:49, Koji Sekiguchi a écrit :
(11/12/21 22:28), Tanguy Moal wrote:
Dear all,
[...]
I tried using both legacy highlighter and FVH but the same issue occurs.
The issue only triggers when relying on hl.q.
Thank you very much for any help,
--
Tanguy
Tanguy,
Thank you for
Hello,
I think that the positionIncrementGap attribute of your field has to
changed to 0 (instead of 100 by default).
(See
http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html
)
Hope this helps,
--
Tanguy
Le 21/12/2011 15:39, meghana a écrit :
Hi all,
i
Dear all,
I'm try to get highlighting working, and I'm almost done, but that's not
perfect yet...
Basically my documents have a title and a description.
I have two kind of text fields :
text :
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange
Hello,
Usually, when such an error occur, there are some good hints of what's
wrong with your new configuration in solr logs.
Depending on how you setup your solr instance and configured logging for
solr (http://wiki.apache.org/solr/SolrLogging), log files may be located
at different places.
Hello,
Quoting http://wiki.apache.org/solr/SolrCaching#filterCache :
The filter cache stores the results of any filter queries ("fq"
parameters) that Solr is explicitly asked to execute. (Each filter is
executed and cached separately. When it's time to use them to limit
the number of results re
Dear list,
I've experienced a weird (unexpected?) behaviour concerning core reload
on a master instance.
My setup :
master/slave on separate hosts.
On the master, I update the schema.xml file, adding a dynamic field of
type random sort field.
I reload the master using core admin.
The new f
Hi again,
Since you have a custom high availability solution over your solr
instances, I can't help much I guess... :-)
I usually rely on master/slave replication to separate index build and
index search processes.
The fact is that resources consumption at build time and search time are
not
Hi,
If you only need to sum over "displayed" results, go with the
post-processing of hits solution, that's fast and easy.
If you sum over the whole data set (i.e your sum is not query
dependant), have it computed at indexing time, depending on your
indexing workflow.
Otherwise, (sum over the
overwritedupes to
false and set the signiture key to be the id. That way solr
will manage updates?
from the wiki
http://wiki.apache.org/solr/Deduplication
HTH
Lee
On 30 May 2011 08:32, Tanguy Moal wrote:
Hello,
Sorry for re-posting this but it seems my message got lost in the mailin
yone a few hints on how to optimize the handling of index time
deduplication ?
More details on my setup and the state of my understanding are in my
previous message here-after.
Thank you very much in advance.
Regards,
Tanguy
On 05/25/11 15:35, Tanguy Moal wrote:
Dear list,
I'm posting
Hi Romi,
A simple way to do so is to define in your schema.xml the union of all
the columns you need plus a "type" field to distinguish your entities.
eg, In your DB
table1 :
- col1 : varchar
- col2 : int
- col3 : float
table2 :
- col1 : int
- col2 : varchar
- col3 : int
- col4 : varchar
in
Dear list,
I'm posting here after some unsuccessful investigations.
In my setup I push documents to Solr using the StreamingUpdateSolrServer.
I'm sending a comfortable initial amount of documents (~250M) and wished
to perform overwriting of duplicated documents at index time, during the
update
Hello,
Have you tried reading :
http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
From that page I would try something like :
http://host:port/solr/select?q=sony&sort=min(min(priceCash,priceCreditCard),priceCoupon)+asc&rows=10&indent=on&debugQuery=on
Is that of any help ?
--
Tanguy
Hello Ravish, Erick,
I'm facing the same issue with solr-trunk (as of r1071282)
- Field configuration :
positionIncrementGap="100">
- Schema configuration :
In my test index, I have documents with sparse values : Some documents
may or may not have a value for f1, f2 and/or f3
The
Hello,
You could try taking advantage of Solr's facetization feature : provided
that you have the amount stored in the amount field and the currency stored
in the currency field, try the following request :
http://host:port
/solr/select?q=YOUR_QUERY&stats=on&stats.field=amount&f.amount.stats.facet
Hi Dennis,
Not particular to the client you use (solr-php-client) for sending
documents, think of update as an overwrite.
This means that if you update a particular document, the previous
version indexed is lost.
Therefore, when updating a document, make sure that all the fields to
be indexed and
To do so, you have several possibilities, I don't know if there is a best one.
It depends pretty much on the format of the input file(s), your
affinities with a given programing language,some libraries you might
need and the time you're ready to spend on this task.
Consider having a look at SolrJ
Satya,
In fact the highlighter will select the relevant part of the whole
text and return it with the matched terms highlighted.
If you do so for a whole book, you will face the issue spotted by Dave
(too long text).
To address that issue, you have the possibility to split your book in
chapters,
Hi Satya,
I think what you'e looking for is called "highlighting" in the sense
of "highlighting" the query terms in their matching context.
You could start by googling "solr highlight", surely the first results
will make sense.
Solr's wiki results are usually a good entry point :
http://wiki.apa
Kind of : their suggestions are based on users queries with some filtering.
You can have a little read there :
http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=106230
They perform "little" filtering to remove offending content such as
"hate speech, violence and pornography" (quot
this would be to see if you can index
> your content in a way to avoid these expensive queries. But this is
> just a suggestion, what you are doing should still work fine.
>
> On Fri, Dec 3, 2010 at 6:56 AM, Robert Muir wrote:
>> On Fri, Dec 3, 2010 at 6:28 AM, Tanguy Mo
rskog :
> Please add a JIRA issue requesting this. A bunch of things are not
> supported for functions: returning as a field value, for example.
>
> On Thu, Oct 14, 2010 at 8:31 AM, Tanguy Moal wrote:
>> Dear solr-user folks,
>>
>> I would like to use the stats modu
Dear solr-user folks,
I would like to use the stats module to perform very basic statistics
(mean, min and max) which is actually working just fine.
Nethertheless I found a little limitation that bothers me a tiny bit :
how to perform the exact same statistics, but on the result of a
function que
63 matches
Mail list logo