It wouldn’t be easy if in the site you’ll ensure that only terms are submitted
to the actual search? In app I worked some time ago the default behavior of the
Javascript component used for autocompletion was to first autocomplete the term
in the input and then submit the query against the backen
The whole idea behind Solr is to solve the problem that you just explain, in
particular what you need is to define the title field as a solr.TextField and
then define a tokenizer. The tokenizer essentially will transform the initial
text into tokens. Solr has several tokenizers, each which its s
How would you measure which snippet is the best?
On Nov 9, 2014, at 1:59 PM, SolrUser1543 wrote:
> Lets say that for some query there are several results , with several hits
> for each one , which shown in hightligth section of the response.
>
> Is it possible to select only one best hit for e
When you fire a query against Solr with the wt=csv the response coming from
Solr is *already* in CSV, the CSVResponseWriter is responsible for translating
SolrDocument instances into a CSV on the server side, son I don’t see any
reason on using it by your self, Solr already do the heavy lifting
If you’re talking about a generic web crawl you could use something like Nutch
[1] keep in mind that his a full web crawler and it does a pretty good job.
I’ve been using it for over more than 2 years now and I’m very happy, although
I don’t crawl just a couple of sites but a more wide spectrum
I see you’re defining a default value for “rows” this could be overridden on
the request, and requesting a lot of documents from solr can stress out your
server/cluster, of course if the client in question has that many documents. if
this is a fixed value and the clients can’t request more docum
Don’t worry, the way Hoss explained its indeed the way I’ve know that works,
but the example provided in the book pick my curiosity and hence the question
in this thread.
Regards,
On Sep 30, 2014, at 5:59 PM, Timothy Potter wrote:
> Indeed - Hoss is correct ... it's a problem with the example
Perhaps instead of the suggester component you could use the EdgeNGramFilter
and provide partial matches so you will me able to configure a custom request
handler that will “suggest” terms of phrases for you. I’m using this approach
to provide queries suggestions, of course I’m indexing the quer
Krupansky wrote:
> I am not aware of any such feature! That doesn't mean it doesn't exist, but I
> don't recall seeing it in the Solr source code.
>
> -- Jack Krupansky
>
> -Original Message- From: Jorge Luis Betancourt Gonzalez
> Sent: Wednesday, Septem
I’ve done something similar to this using the the EdgeNGram not the
spellchecker component, I don’t know if this is along with your requirements:
The relevant portion of my fieldType config:
class="solr.SpellCheckComponent">
>
>
. See
> solrconfig.xml:
>
>
>
>
>explicit
>10
>text
>
> ...
>
> -- Jack Krupansky
>
> -Original Message- From: Jorge Luis Betancourt Gonzalez
> Sent: Tuesday, September 23, 2014 11:02 AM
> To: solr-user@lucene.apache.org
> Subject: Ch
Hi:
I’m trying to change the default configuration for the query component of a
SearchHandler, basically I want to set a default value to the rows parameters
and that this value be shared by all my SearchHandlers, as stated on the
solrconfig.xml comments, this could be accomplished redeclaring
Basically you could create a bunch of dynamic fields (according to your needs)
so basically creating a dynamic field for each type of data (and several
combinations) and then you can create a small wrapper around Solrj that will
wrap the patterns defined on your schema.xml in a more understandab
Which crawler are you using?
On Sep 18, 2014, at 10:14 AM, keeblerh wrote:
> eShard wrote
>> Good afternoon,
>> I'm using solr 4.0 Final
>> I need movies "hidden" in zip files that need to be excluded from the
>> index.
>> I can't filter movies on the crawler because then I would have to exclude
What are you developing a custom search component? update processor? a
different class for one of the zillion moving parts of Solr?
If you have access to a SolrCore instance you could use it to get access of,
essentially using the SolrCore instance specific to the current core will cause
the l
In one of the talks by Trey Grainger (author of Solr in Action) it touches how
on CareerBuilder are dealing with multilingual with payloads, its a little more
of work but I think it would payoff.
On Sep 8, 2014, at 7:58 AM, Jack Krupansky wrote:
> You also need to take a stance as to whether
Perhaps what you’re trying to do could be addressed by using the
EdgeNGramFilterFactory filter? For query suggestions I’m using a very similar
approach, this is an extract of the configuration I’m using:
Basically this allows you to get partial matches from any part of the string,
let’s s
Hi all:
We have a small installation of Solr 3.6 in our hands, right now we have 3
physical servers (1 master and 2 slaves) the ingestion process it’s done in the
master which replicates by solr internal mechanism into the slaves, which
handles all the queries. We are trying to update to Solr 4
query string. So i better suggest you that the if the website has
> the appropriate and good data it should come on first page, so its better
> to come on first page rather than finding the position.
>
> With Regards
> Aman Tandon
>
>
> On Tue, Jun 24, 2014 at 10:35 AM,
; way to do it. But you only need to fetch the URL field. You can ignore
> everything else.
>
> wunder
>
> On Jun 23, 2014, at 9:32 PM, Jorge Luis Betancourt Gonzalez
> wrote:
>
>> Basically given a few search terms (query) the idea is to know given one or
>>
Aman Tandon
>
>
> On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez <
> jlbetanco...@uci.cu> wrote:
>
>> I’m using Solr for an analytic use case, one of the requirements is
>> basically given a search query get the position of the first hit. I’m
>
I’m using Solr for an analytic use case, one of the requirements is basically
given a search query get the position of the first hit. I’m indexing web pages,
so given a search criteria the client want’s to know the position (first
occurrence) of his webpage in the result set (if it appears at al
I’ve certainly go for the 2nd option. Depending of what you need you won’t need
to modify Solr itself but extend it using different plugins for what you need.
You’ll need to write different components depending on your specific
requirements. I definitely recommend the talks from Trey Grainger, f
Is there some work around in Solr ecosystem to get something similar to the
percolator feature offered by elastic search?
Greetings!VII Escuela Internacional de Verano en la UCI del 30 de junio al 11
de julio de 2014. Ver www.uci.cu
In the book Apache Solr Beginner’s Guide there is a section dedicated to write
new Solr plugins, perhaps it would be a good place to start, also in the wiki
there is a page about this, but the it’s a light introduction. I’ve found that
a very good starting point it’s just browse throw the code o
Previously in the list a spreadsheet has been mentioned, taking into account
that you already have documents in an index you could extract the needed
information from your index and feed it into the spreadsheet and it probably
will give you a rough approximated of the hardware you’ll bee needing
I’ve some experience using Solarium and have been great so far. In particular
we use the NelmioSolariumBundle to integrate with Symfony2.
Greetings!
On Jan 28, 2014, at 1:54 PM, Felipe Dantas de Souza Paiva
wrote:
> Hi Folks,
>
> I would like to know what is the best way to integrate PHP an
Q1: Nutch doesn’t only handle the parse of HTML files, it also use hadoop to
achieve large-scale crawling using multiple nodes, it fetch the content of the
HTML file, and yes it also parse its content.
Q2: In our case we use sold to crawl some website, store the content in one
“main” solr core.
I believe that you are looking for something similar to the percolator feature
present in elasticsearch. I remember something about a solar implementation
being discussed here some time ago. Anyone knows if there have been any
progress in this area?
On Jan 27, 2014, at 8:18 AM, Furkan KAMACI w
If I’m not remembering incorrectly Trey Grainger in one of his talks explained
a few techniques that could be of use. If the equivalency is not dynamically
you could just use synonyms. Otherwise some kind of offline processing should
be used to compute the similarity between your queries (given
Happy new year!
I’ve developed some custom update request processors to accomplish some custom
logic needed in some user cases. I’m trying to write test for this processor,
but I’d like to test in a very similar way of how the built in processors are
tested in the solr source code. Is there any
Is it possible to export the doc into markdown?
- Mensaje original -
De: "Chris Hostetter"
Para: solr-user@lucene.apache.org
Enviados: Lunes, 9 de Diciembre 2013 14:00:34
Asunto: Re: ANNOUNCE: Apache Solr Reference Guide 4.6
: Can we please give some thought to producing these manuals
Hi:
I'm using solr 3.6 with dismax query parser, I've found that docs that doesn't
has all the query terms get ranked above other that contains all the terms in
the search query. Using debugQuery I could see that the most part of the score
in this cases come from the coord(q,d) factor. Is there
+1 on this.
- Mensaje original -
De: "Otis Gospodnetic"
Para: solr-user@lucene.apache.org
Enviados: Viernes, 6 de Diciembre 2013 9:35:25
Asunto: Re: Introducing Luwak for high-performance stored Lucene queries
Hi Charlie,
Very nice - thanks!
I'd love to see a side-by-side comparison wi
I think that one experience in this area could by provided by Tray Grainger,
author of Solr in Action, I believe that some of his work on careerbuilder
involve the creation of something (somehow) similar to what you're trying to
accomplish. I must say that I'm also interested in this topic, but
Perhaps what you want is a transparent proxy? You could use nginx, squid,
varnish, etc. W've been evaluating varnish as a posibility to run in front of
our solr server and take advantage of the HTTP caching that varnish does so
well.
Greetings!
- Mensaje original -
De: "Markus Jelsma"
Hi everybody:
Is there any way of forcing an UTF-8 conversion on the queries that are logged
into the log? I've deployed solr in tomcat7. The file appears to be an UTF-8
file but I'm seeing this in the logs:
INFO: [] webapp=/solr path=/select
params={fl=*,score&start=0&q=disñemos+el+mundo&hl.
I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is
my configuration for the gap fragmenter:
150
This is the basic configuration, just tweaked the fragsize parameter to get
shorter fragments. The thing is that for 1 particula
Sorry, I forgot the link:
[1] - http://wiki.apache.org/solr/SolrRelevancyFAQ
- Mensaje original -
De: "Ing. Jorge Luis Betancourt Gonzalez"
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 13:34:03
Asunto: Re: Auto Suggest - Time decay
For that core
For that core just use a boost factor as explained on [1]:
You could use a query like this to see (before make any change) how your
suggestions will be retrieved, in this case a query for "goog" has been made,
and recent documents will be boosted (an extra bonus will be given for the
newer docu
Are you using the suggester component? or a separated core? I've used a
separated core to store suggestions and order this suggestions (queries
performed on the frontend) using a time decay function, and it works great for
me.
Regards,
- Mensaje original -
De: "SolrLover"
Para: solr-u
users become a query, this
query should already be in the cache. This are just thoughts but I hope could
be useful to you.
Regards,
- Mensaje original -
De: "Ing. Jorge Luis Betancourt Gonzalez"
Para: solr-user@lucene.apache.org
Enviados: Viernes, 27 de Septiembre 2013 19:44
ith some arbitrary number?
On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:
> Great!! I haven't see your message yet, perhaps you could create a PR to
that Github repository, son it will be in sync with current versions of
Solr.
&
I think you could use boosting queries: for group A you boost one category and
for group B some other category.
- Mensaje original -
De: "Snubbel"
Para: solr-user@lucene.apache.org
Enviados: Jueves, 26 de Septiembre 2013 8:01:36
Asunto: Sorting dependent on user preferences with Function
ame for
>> ""#{url_for_solr}" > src="#{url_for_solr}/js/lib/jquery-1.7.2.min.js">
>>
>>
>> On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez <
>> jlbetanco...@uci.cu> wrote:
>>
>>> Try quering the cor
nviados: Miércoles, 25 de Septiembre 2013 15:40:00
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)
Not yet but I do see the "$" not found in console.
On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:
> As
plementing Solr Suggester for Autocomplete (multiple columns)
That seems to work. I get back an xml containing a bunch of suggestions.
Can we agree that it's jquery that's the problem?
On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wro
Try quering the core where the data has been imported, something like:
http://localhost:8983/solr/suggestions/select?q=uc
In the previous URL suggestions is the name I give to the core, so this should
change, if you get results, then the problem could be the jquery dependency. I
don't remember
g Solr Suggester for Autocomplete (multiple columns)
I simple query through admin (*:*) confirms the data is exists. The version
I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I
wonder of this is the problem?
On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betanc
the expected response?
On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:
> I've used a separated core for storing suggestions, based on what I see
> in: https://github.com/cominvent/autocomplete. You can check the blog
>
I've used a separated core for storing suggestions, based on what I see in:
https://github.com/cominvent/autocomplete. You can check the blog post on
www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/. This is
really flexible, on the downside it does not use the suggester compo
If is query suggestion what you are looking for, what we've done is storing the
user queries into a separated core and pull the suggestions from there.
- Mensaje original -
De: "Brendan Grainger"
Para: solr-user@lucene.apache.org
Enviados: Jueves, 13 de Junio 2013 19:43:03
Asunto: Sugge
will turn your
single + into "+" , it will be considered as a token (rather than being
part of the query syntax) by the parser.
Providing you're using the edismax parser, it should be just fine for any
other queries, like '+ foo' , 'foo +', '++' ...
J.
d as it will turn your
single + into "+" , it will be considered as a token (rather than being
part of the query syntax) by the parser.
Providing you're using the edismax parser, it should be just fine for any
other queries, like '+ foo' , 'foo +', '++'
ed to escape that char in search terms.
Special chars are + - ! ( ) { } [ ] ^ " ~ * ? : \ / at the moment.
The %2B is just the url encoding, but it will still be a + for Solr, so just
put a \ in front of the chars I mentioned.
Cheers,
Kai
Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt G
Hi!
Currently I'm working on a basica search engine for, the main problem is that
during some tests a problem was detected, in the application if a user search
for the "+" or "-" term only or the "+" string it causes an exception in my
application, the problem is caused for an
org.apache.l
two or three characters or so.
-- Jack Krupansky
-----Original Message-----
From: Jorge Luis Betancourt Gonzalez
Sent: Friday, March 29, 2013 10:34 PM
To: solr-user@lucene.apache.org
Subject: Getting better snippets in highlighting component
Hi all:
I'm building a document search plattform, basica
Hi all:
I'm building a document search plattform, basically indexing a lot of PDF
files. Some of this files has an index, which means that when I query for
"normativos" in my application (built using Symfony2+PHP+Solarium) I get a few
results like this:
would use leading wildcard query.
&q=*@gmail.com
There was a similar question recently:
http://search-lucene.com/m/XF2ejnM6Vi2
--- On Thu, 3/14/13, Jorge Luis Betancourt Gonzalez wrote:
> From: Jorge Luis Betancourt Gonzalez
> Subject: Question about email search
> To: solr-user@l
I'm using solr 3.6.2 to crawl some data using nutch, in my schema I've one
field with all the content extracted from the page, which could possibly
include email addresses, this is the configuration of my schema:
Currently I'm using a separated core to query suggestions, for this I've
started from: https://github.com/cominvent/autocomplete. Basically the
suggester component I'm only using it for term suggestions based on the a
tokenized field in my schema (all of this in solr 3.6), perhaps instead of
us
Agreed, PHP and Solr are an excellent combination. I'm using Solr 3.6 + PHP
(Symfony2 + NelmioSolariumBundle + Solarium) and getting excellent results.
Even solarium as a PHP library is great, right now it lack's of solr4 support,
but for solr 3.6 it's great.
- Mensaje original -
De: "D
Hi:
I'm trying to build a custom update handler to accomplish one specific task. In
our app we do query suggestions based on previous queries passed into our
frontend app, the thing is that instead of getting this queries from the solr
logs, we stored in a separated core. So far so good, but on
fields and will break quickly.
The best way to do it is to index pages as documents. You can use field
collapsing to group pages from the same document together.
Upayavira
On Tue, Feb 5, 2013, at 02:00 PM, Jorge Luis Betancourt Gonzalez wrote:
> Hi:
>
> I'm working on a search eng
Hi:
I'm working on a search engine for several PDF documents, right now one of the
requirements is that we can provide not only the documents matching the search
criteria but the page that match the criteria. Normally tika only extracts the
text content and does not do this distinction, but usi
iculty via the
XML/HTTP interface.
Your mileage may vary, but for that particular app, that is what it
took.
Note, 4.0 can work in a 3.x way (old style replication, etc). You don't
need to use SolrCloud etc when using 4.0.
Upayavira
On Sat, Jan 5, 2013, at 08:20 AM, Jorge Luis Betancourt Gonzale
Hi:
I'm currently working with solr 3.6.1, but solr 4 has great features like the
ones bundled with SolrCloud, the content in the index is really not the problem
to the transition, the thing is that I've a large app written in PHP + Solarium
that interacts with the index in solr 3. As far as I
thing. In order for it to work, you need to store
every field, as what it does behind the scenes is retrieve the stored
fields, rebuilds the document, and then posts the whole document back.
Upayavira
On Sat, Dec 15, 2012, at 04:52 PM, Jorge Luis Betancourt Gonzalez wrote:
> Is this updatable fie
"cat" : { "add" : "fantasy"},
"ISBN_s": { "set" : "0-380-97365-0"}
"remove_s" : { "set" : null } }
]'
/* example stolen from Yonik's ApacheCon talk */
Upayavira
On Sat, Dec 15, 2012, at
Hi all:
I'm trying to build a query suggestion system using solr (also used to index
all the data in the app). I've a separated core dedicated only for this purpose
(along with some other for images, etc.). In the main app, written in Symfoy2 +
Solarium Bundle, we store the queries in this core
Hi Guillaume:
I beg to differ, it's true that the native solr support has been a big aid to
developers use of solr from many programming languages. But making all the
queries "by hand" is not wice and in any case is hard to maintain, it's easier
using some OO library to interact with solr. For
Any news on Solarium Project? Is the one I'm using with Solr 3.6!
- Mensaje original -
De: "Bill Au"
Para: solr-user@lucene.apache.org, "Arkadi Colson"
Enviados: Viernes, 7 de Diciembre 2012 13:40:20
Asunto: Re: PHP client
I have not used the pecl Solr client. I have been using SolrPhp
Hi:
Is there any way that I can prevent a document from being indexed? I've a
separated core only for query suggestions, this queries are stored right from
the frontend app, so I'm trying to prevent some kind of bad intended queries to
be stored in my query, but keeping the logic of what I cons
I'm trying to using to search though news websites, but I was interested in
classification on index time, is there any available solution for this?
Greetings!
On Dec 3, 2012, at 12:37 PM, Stanislaw Osinski wrote:
>> I mean measuring the similarity between the document in each cluster.
>> Also,
to change your tokenisation anyhow, as a search for
> 'universidad' will not match your term 'universidad,'
>
> But you are on the right track - to improve suggestions, improve what is
> in your index.
>
> Upayavira
>
> On Mon, Nov 26, 2012, at 07:54 PM
Hi:
I've configured my solr setup to use the suggester component and to get terms
suggestions from a PHP application, the thing is that I'm getting results like
universidad, note the punctuation sign, is there any way I can get rid of this?
Or do I need to create a separate field and strip all
I'm currently using solarium with solr 3.6, perhaps you can tweak solarium as
needed? I suppose that pull requests are welcome into solarium for solr 4.
Greetings!
On Nov 12, 2012, at 2:56 PM, Bill Au wrote:
> Anyone know of a PHP client that is compatible with Solr 4.0.0? I am using
> an old
I think that solr by him self doesn't store the queries (correct me if I'm
wrong, about this) but you can accomplish what you want by processing the solr
log (its the only way I think). From the solr log you can get the queries and
then process the queries according to your needs, and change the
SearchComponent that logs queries in another format,
> should the existing log format not be sufficient for you.
>
> Upayavira
>
> On Mon, Oct 8, 2012, at 01:24 AM, Jorge Luis Betancourt Gonzalez wrote:
>> Hi!
>>
>> I was wondering if there are any built-in mecha
Hi!
I was wondering if there are any built-in mechanism that allow me to store the
queries made to a solr server inside the index itself. I know that the
suggester module exist, but as far as I know it only works for terms existing
in the index, and not with queries. I remember reading about us
Thanks a lot for all the replies, Chris it worked out with this mm value:
If this version of solr is affected with the bug you pointed out, shouldn't
fail with this value as well?
Greetings!
On Oct 4, 2012, at 8:48 PM, Jorge Luis Betancourt Gonzalez wrote:
> Hi Chris:
>
>
Hi Chris:
I'm using solr 3.6.1, is the bug present in this version?
Greetings!
On Oct 4, 2012, at 6:11 PM, Chris Hostetter wrote:
>
> : GRAVE: java.lang.NumberFormatException: For input string: "
> : 100
> : "
> : at
> java.lang.NumberFormatException.forInputString(NumberFormatExc
Thanks for the quick response, I got the same response, what I'm trying to
accomplish is to get straight OR between all the clauses or terms in my query,
the value I should use is 0 right?
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO,
;s the error Jorge?
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Thu, Oct 4, 2012 at 1:36 PM, Jorge Luis Betancourt Gonzalez
> wrote:
>> Hi:
>>
>
2, at 11:06 AM, Jorge Luis Betancourt Gonzalez wrote:
> Hi:
>
> I'm having an issue with solr 3.6.1 and I'm sensing that is a lack of
> understanding. I'm building a search engine, using of course solr to store
> the inverted index, so far so good. When I search for a
Hi:
I'm having an issue with solr 3.6.1 and I'm sensing that is a lack of
understanding. I'm building a search engine, using of course solr to store the
inverted index, so far so good. When I search for a term, let's say "java" I
get 761 results, then querying the index with a "php" term give
86 matches
Mail list logo