Nevermind, I found a solution. I created an excerpt field in the schema.xml,
then I used the copyField method with the maxChars parameter declared to
copy the content into it with a limitation of the amount of characters that
I wanted.
Thanks anyways.
--
View this message in context:
http://
Hi, I am new with Solr and I am extracting metadata from binary files through
URLs stored in my database. I would like to know what fields are available
for indexing from PDFs (the ones that would be initiated as in column=””).
For example how would I extract something like file size, format or f
Hi Jack, thanks a lot for your reply. I did that . However, when I run Solr it gives me a
bunch of errors. It actually displays the content of my files on my command
line and shows some logs like this:
org.apache.solr.common.SolrException: Document is missing mandatory
uniqueKey field: id
Hi Gora, thank you for your reply. I am not using any commands, I just go on
the Solr dashboard, db > Dataimport and execute a full-import.
*My schema.xml looks like this:*
Sorry, Gora. It is ${fileSourcePaths.urlpath} actually.
*My complete schema.xml is this:*
Hi Gora,
Yes, my urlpath points to an url like that. I do not get why uncommenting
the catch all dynamic field ("*") does not work for me.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-indexing-binary-files-tp4047470p4048542.html
Sent from the Solr - User mailing li
Hi,
I am using solr to index data from binary files using BinURLDataSource. I
was wondering if anyone knows how to extract an excerpt of the indexed data
during search. For example if someone made a search it would return 200
characters as a preview of the whole text content. I read online that
Hello,
Solr is trying to process non-existing child/nested entities. By
non-existing I mean that they exist in DB but should not be at Solr side
because they don't match the conditions in the query I use to fetch them.
I have the below solr data configuration. The relationship between tables
is c
uot; where "h" does not match the table names in the FROM
> statement. Perhaps, that's the problem?
>
> Regards,
>Alex.
>
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 15 August 2016 a
ething else to
> trigger it.
>
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 15 August 2016 at 23:54, Luis Sepúlveda wrote:
> > Thanks for the promp reply.
> >
> > h.enabled=true is a typo. It
Hi Salman,
I was interested in something similar, take a look at the following thread:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201401.mbox/%3CCADSoL-i04aYrsOo2%3DGcaFqsQ3mViF%2Bhn24ArDtT%3D7kpALtVHzA%40mail.gmail.com%3E#archives
I never followed through, however.
-Luis
On
- A possible hack that I never followed through -
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201401.mbox/%3CCANGii8eaSouePGxa7JfvOBhrnJUL++Ct4rQha2pxMefvaWhH=g...@mail.gmail.com%3E
Maybe one of those will help you? If they do, make sure to report back!
-Luis
On Tue, Apr 1
ntly slows down first queries launched because of
caches warm up. There must be a solution for this scenario - I think that it
should be very common. Do you think that disabling caches will improve this?
Thanks a lot!
- Luis Cappa
> El 05/11/2013, a las 23:29, Shawn Heisey escribió:
>
ote:
> That's what faceting does. The facets are only tabulated
> for documents that satisfy they query, including all of
> the filter queries and anh other criteria.
>
> Otherwise, facet counts would be the same no matter
> what the query was.
>
> Or I'm complete
ield constraining query" to include
only terms from documents that pass this query.
Thanks,
Luis
processing
the old request, correct?
Is there any way to cancel that first request?
Thanks,
Luis
trings in StrField that I
can query against?
Thanks,
Luis
Update: It seems I get the bad behavior (no documents returned) when the
length of a value in the StrField is greater than or equal to 32,767
(2^15). Is this some type of bit overflow somewhere?
On Wed, Feb 5, 2014 at 12:32 PM, Luis Lebolo wrote:
> Hi All,
>
> It seems that I can
e? Do you have any thoughts on how we might better accomplish this
functionality?
Thanks!
On Wed, Feb 5, 2014 at 1:42 PM, Yonik Seeley wrote:
> On Wed, Feb 5, 2014 at 1:04 PM, Luis Lebolo wrote:
> > Update: It seems I get the bad behavior (no documents returned) when the
> > length
This page never came up on any of my Google searches, so thanks for the
heads up! Looks good.
-Luis
On Tue, Jun 25, 2013 at 12:32 PM, Learner wrote:
> I just came across a wonderful online reference wiki for SOLR and thought
> of
> sharing it with the community..
&g
now if you need more info. Any ideas appreciated!
Thanks,
Luis
ul 30, 2013 at 11:04 AM, Luis Lebolo wrote:
> Hi All,
>
> I'm trying to use CachedSqlEntityProcessor in one of my sub-entities, but
> the field never gets populated. I'm using Solr 4.4. The field is a
> multi-valued field:
>
> The relevant p
t entities. If I fill in row with 1627,
indexing occurs at about 10 docs per second. If I leave it blank, it occurs
at about 1 doc per second.
Thanks,
Luis
ields fieldName_someToken (e.g.
fieldName_1, fieldName_2, fieldName_3), can I construct a query like
fieldName_*:someValue?
The query itself doesn't work, but is there a way to query numerous dynamic
fields without explicitly listing them?
Thanks,
Luis
...
In short, I am querying for an ID throughout multiple dynamically created
fields (mutation_prot_mt_#_#).
Any thoughts on how to further debug?
Thanks in advance,
Luis
--
SEVERE: Servlet.service() for servlet [X] in context with path [/x] thre
What if you try
city:(*:* -H*) OR zip:30*
Sometimes Solr requires a list of documents to subtract from (think of "*:*
-someQuery" converts to "all documents without someQuery").
You can also try looking at your query with debugQuery = true.
-Luis
On Mon, Apr 15, 2013 at
Sorry, spoke to soon. Turns out I was not sending the query via POST.
Changing the method to POST solved the issue. Apologies for the spam!
-Luis
On Mon, Apr 15, 2013 at 11:47 AM, Luis Lebolo wrote:
> Hi All,
>
> I'm using Solr 4.1 and am receiving an
> org.apache.solr.com
Turns out I spoke too soon. I was *not* sending the query via POST.
Changing the method to POST solved the issue for me (maybe I was hitting a
GET limit somewhere?).
-Luis
On Tue, Apr 16, 2013 at 7:38 AM, Marc des Garets wrote:
> Did you find anything? I have the same problem but it
Hi All,
Does SolrJ have an option for a custom RowMapper or BeanPropertyRowMapper
(I'm using Spring/JDBC terms).
I know the QueryResponse has a getBeans method, but I would like to create
my own mapping and plug it in.
Any pointers?
Thanks,
Luis
en SolrDocument?
Thanks,
Luis
Apologies, I wasn't storing these dynamic fields.
On Fri, Apr 26, 2013 at 11:01 AM, Luis Lebolo wrote:
> Hi All,
>
> I'm using SolrJ's QueryResponse to retrieve all SolrDocuments from a
> query. When I use SolrDocument's getFieldNames(), I get back a list o
Hello.
You can also develop an application by yourself that uses Solrj to retrieve all
the documents from your índex, process and add all the new information (fields)
desired and the index them into another Solr index. Its easy.
Goodbye!
El 16/09/2011, a las 17:39, "Olson, Ron" escribió:
Using "fl=*" returns all the data from the
fields that start with "$" but it severely increases the size of the
response.
--
Luis Neves
t just use them?
Yes, that's what I ended up doing, but it involved a reindex. I was
trying to avoid that.
Thanks!
--
Luis Neves
e changed code I see this:
if (facetFieldType.isTokenized() || facetFieldType.isMultiValued()) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
"Stats can only facet on single-valued fields, not: " + facetField
+ "[" + facetFieldType + "]");
}
this seem to also "fix" SOLR-1782.
--
Luis Neves
quot;false" TrieLongField with
precisionsStep="0")
I will try to set up a reproducible test case.
Thanks!
--
Luis Neves.
warm all possible permutations of ~2000 Oranges =)).
Hope the generalization isn't too stupid, and thanks in advance!
Cheers,
Luis
Oranges
that fit this query and the user says "select them all". We then store all
600 IDs in our database.
For the data availability filter, we get the list of Orange IDs from the
database first then use SolrJ to create the facet query.
-Luis
On Tue, Feb 5, 2013 at 12:03 PM, Mikhai
ult?
Am I missing something? Is there any way to include a computed value in
the search results and sort by it?
Thanks in advance.
--
/**
* Luis Neves
* @e-mail: luis.ne...@co.sapo.pt
* @xmpp: lfs_ne...@sapo.pt
* @web: <http://technotes.blogs.sapo.pt/>
* @tlm: +351 962 057 656
*/
the first search after a commit return in a reasonable time?
Thanks!
--
Luis Neves
StackTrace:
Error during auto-warming of
key:[EMAIL PROTECTED]:java.lang.OutOfMemoryError:
GC overhead limit exceeded
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:104)
at org.
Yonik Seeley wrote:
On 7/25/07, Luis Neves <[EMAIL PROTECTED]> wrote:
We are having some issues with one of our Solr instances when
autowarming is
enabled. The index has about 2.2M documents and 2GB of size, so it's not
particularly big. Solr runs with "-Xmx1024M -Xms10
Luis Neves wrote:
The objective is to boost the documents by "freshness" ... this is
probably the cause of the memory abuse since all the "EntryDate" values
are unique.
I will try to use something like:
EntryDate:[* TO NOW/DAY-3MONTH]^1.5
This turn out to be a bad idea
Yonik Seeley wrote:
On 7/25/07, Luis Neves <[EMAIL PROTECTED]> wrote:
This turn out to be a bad idea ... for some reason using the
BoostQuery instead
of the BoostFunction slows the search to a crawl.
Dismax throws bq in with the main query, so it can't really be cached
sep
max query handler, can
anyone point me in the right direction.
Thanks!
--
Luis Neves
Nevermind, I got it ... Somehow I missed the javadoc.
--
Luis Neves
Luis Neves wrote:
Hello all,
Using the standard query handler I can search for a term excluding a category
and sort descending by price, e.g.:
http://localhost/solr/select/?q=book+-Category:Adults;Price+desc&sta
ticular case.
Can anyone point me in the right direction?
Thanks!
Luis Neves
Mental note: think before post ... this is a simple job for a Servlet filter.
sorry for the noise.
--
Luis Neves
Luis Neves wrote:
Hello all.
We have a product catalog that is searchable via Solr, by default we
want to exclude results from the "Adult" category unless the sea
search platform and
they call this feature "Field collapsing":
<http://www.fastsearch.com/glossary.aspx?m=48&amid=299>
I like the syntax they use:
"&collapseon=&collapsenum=N" -> Collapse, but keep N number of
collapsed documents
For some reason they can only collapse on numeric fields (int32).
Regards,
Luis Neves
Rafeek Raja wrote:
I am beginner to solr and lucene. Is search possible while indexing?
Yes... that is just one of the cool features of Solr/Lucene.
<http://incubator.apache.org/solr/features.html>
--
Luis Neves
#x27;m beginning to think that this a little to complex for a first project with
Lucene. In my particular case all I want is to group results by category (from a
predetermined - and small - category list), I think I will just make a request
by category and accept the latency.
--
Luis Neves
like:
fq=xmlField:/book/content/text()
This way only the "/book/content/" element was searched.
Did I make sense? Is this possible?
--
Luis Neves
Hi!
Thorsten Scherler wrote:
On Mon, 2007-01-15 at 12:23 +, Luis Neves wrote:
Hello.
What I do now to index XML documents it's to use a Filter to strip the markup,
this works but it's impossible to know where in the document is the match located.
What would it take to make p
Hi,
Thorsten Scherler wrote:
On Mon, 2007-01-15 at 13:42 +, Luis Neves wrote:
I think you should explain your use case a wee bit more.
What I do now to index XML documents it's to use a Filter to strip
the markup,
this works but it's impossible to know where in the docum
right?), but I can't figure out the syntax.
--
Luis Neves
rieve the number of comments
increment it and update the index because the "actual" value might be
uncommitted... is there any other alternative to this problem?
Thanks in advance for any help.
--
Luis Neves
open with the vendor, but they are not what we could call
agile.
--
Luis Neves
Luis Neves wrote:
Hello all,
We have a Solr/Lucene index for newspaper articles, those articles have
associated comments. When searching for articles we want to present the
number of comments per article.
What we
Hi,
I have a Solr instance using the clustering component (with the Lingo
algorithm) working perfectly. However when I get back the cluster results
only the ID's of these come back with it. What is the easiest way to
retrieve full documents instead? Should I parse these IDs into a new query
to Sol
bug (because maybe is the expected
behavior, but after some years using Solr I think it is not) I can create
the JIRA issue and debug it more deeply to apply a patch with the aim to
help.
Regards,
--
- Luis Cappa
Ehem, *_target ---> *_facet.
2015-05-14 16:47 GMT+02:00 Luis Cappa Banda :
> Hi Yonik,
>
> Yes, they are the target from copyFields in the schema.xml. This *_target
> fields are suposed to be used in some specific searchable (thus, tokenized)
> fields that in the future ar
get are dynamic,
indexed and stored values. The only difference is that *_target one is
multivalued. Does it have some sense?
Regards
- Luis Cappa
2015-05-14 16:42 GMT+02:00 Yonik Seeley :
> Are the _facet fields the target of a copyField in the schema?
> Realtime get either gets the values
nces between them are:
- Regular expression: i18n* VS *_facet
- Multivalued: *_facet are multivalued.
Regards,
- Luis Cappa
2015-05-14 18:32 GMT+02:00 Yonik Seeley :
> On Thu, May 14, 2015 at 10:47 AM, Luis Cappa Banda
> wrote:
> > Hi Yonik,
> >
> > Yes, they are the tar
, 2015 at 12:49 PM, Luis Cappa Banda
> wrote:
> > If you don' t mark as stored a field indexed and 'facetable', I was
> > expecting to not be able to return their values, so faceting has no
> sense.
>
> Faceting does not use or retrieve stored field values.
scale horizontally and startup new Tomcat + Solrs from 4 to N nodes.
Best,
- Luis Cappa
2015-05-19 15:57 GMT+02:00 Michael Della Bitta :
> Are you sure the requests are getting queued because the LB is detecting
> that Solr won't handle them?
>
> The reason why I'm asking
t. In other words, SolrA shows newer segments and SolrB/SolrC appears
to see just the old ones.
Is that normal? Any idea or suggestion to solve this?
Thank you in advance, :-)
Best regards,
--
- Luis Cappa
ectoryReader or
FSDirectoyReader) that always read the current segments when a commit
happens?
2014-03-12 11:35 GMT+01:00 Luis Cappa Banda :
> Hey guys,
>
> I've doing some tests sharing the same index between three Solr servers:
>
> *SolrA*: is allowed to both read and index. The
le?
Thanks in advance!
Best,
2014-03-12 12:10 GMT+01:00 Luis Cappa Banda :
> I've seen that StandardDirectoryReader appears in the commit logs. Maybe
> this DirectoryReader type is caching somehow the old segments in SolrB and
> SolrC even if they have been commited previosly. If that
ts/data/select?q=*:*&spellcheck.q=m&qt=suggestion&shards.qt=/suggestion&wt=json&shards=solr1:8080/events/data,solr2:8080/events/data<http://solrclusterd.buguroo.dev:8080/events/data/select?q=*:*&spellcheck.q=m&qt=suggestion&shards.qt=/suggestion&wt=json&shards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data>
Any idea of what I'm doing wrong?
Thank you very much in advance!
Best regards,
--
- Luis Cappa
t=/sugges*t&wt=json&q=*:*<http://solrclusterd.buguroo.dev:8080/events/data/select?qt=/suggest&wt=json&q=*:*>
2013/10/23 Luis Cappa Banda
> Hello!
>
> I'be been trying to enable Spellchecking using sharding following the
> steps from
Any idea?
2013/10/23 Luis Cappa Banda
> More info:
>
> When executing the Query to a single Solr server it works:
> http://solr1:8080/events/data/suggest?q=m&wt=json<http://solrclusterd.buguroo.dev:8080/events/data/suggest?q=m&wt=json>
>
> {
>
>- resp
; James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Luis Cappa Banda [mailto:luisca...@gmail.com]
> Sent: Thursday, October 24, 2013 6:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck with Distributed Search (sharding).
cument dates), and I don't use "fq" to execute that queries.
In this scenario, do you recommend to disable caches?
Thank you very much in advance!
Best,
--
- Luis Cappa
Against --> again, :-)
2013/11/5 Luis Cappa Banda
> Hi guys!
>
> I have a master-slave replication (Solr 4.1 version) with a 30 seconds
> polling interval and continuously new documents are indexed, so after 30
> seconds always new data must be replicated. My test index is n
Hello!
Checkout also your application server logs. Maybe you're trying to index
Documents with any syntax error and they are skipped.
Regards,
- Luis Cappa
2013/11/26 Alejandro Marqués Rodríguez
> Hi,
>
> In lucene you are supossed to be able to index up to 274 billion docu
",
- q: "tagsValues:"sucks"",
- facet.limit: "-1",
- facet.field: "tagsValues",
- wt: "json"
}
Any idea of what's happening here? I'm confused, :-/
Regards,
--
- Luis Cappa
Will slave servers "loose" index identifiers that allow
them to replicate delta documents from master after optimizing them? Will
the next replication update slaves indexes overriding the optimized index?
Thank you very much in advance.
Regards,
--
- Luis Cappa
ool to prevent weird production
situations.
Best,
- Luis Cappa
2014-02-05 Chris Hostetter :
>
> : I've got an scenario where I index very frequently on master servers and
> : replicate to slave servers with one minute polling. Master indexes are
> : growing fast and I would like
cipal worry was about optimizing at much as possible search speed
thanks to optimizing, mergeFactor tunning, caches setup, etc.
Thanks a lot!
2014-02-06 Toke Eskildsen :
> On Thu, 2014-02-06 at 10:22 +0100, Luis Cappa Banda wrote:
> > I knew some performance tips to improve search and I c
easons. Was there some issue reported
related to elevated memory consumption by the field cache?
any help would be greatly appreciated.
regards,
--
Luis Carlos Guerrero
about.me/luis.guerrero
p the GC work better for
> you (which is not to say there isn't a leak somewhere):
>
> -XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40
>
> This should lead to a nice up-and-down GC profile over time.
>
> On Thu, Sep 11, 2014 at 10:52 AM, Luis Carlos Guerre
n the same field
not that common?
On Tue, Sep 16, 2014 at 11:06 AM, Luis Carlos Guerrero <
lcguerreroc...@gmail.com> wrote:
> Thanks for the response, I've been working on solving some of the most
> evident issues and I also added your garbage collector parameters. First of
> al
Hi,
Does Lucene support syllabification of words out of the box? If so is there
support for brazilian portuguese? I'm trying to setup a readability score
for short text descriptions and this would be really helpful.
thanks,
--
Luis Carlos Guerrero
about.me/luis.guerrero
Given the following Solr data:
1008rs1cz0icl2pk
2014-10-07T14:18:29.784Z
h60fmtybz0i7sx87
1481314421768716288
u42xyz1cz0i7sx87
h60fmtybz0i7sx87
1481314421768716288
u42xyz1cz0i7sx87
h60fmtybz0i7sx87
1481314421448900608
I would like to know how to *DELETE docum
OLR-6357
>
> I can't think of any other queries at the moment. You might consider using
> the above query (which should work as a normal select query) to get the
> IDs, then delete them in a separate query.
>
>
> On 10 October 2014 07:31, Luis Festas Matos wrote:
>
y different from the classic RegExp
syntax, so that may be the reason why they didn't work for me, and maybe
someone more expert can help me.
The syntax is the following:
*E-mail: *
text:/[a-z0-9_\|-]+(\.[a-z0-9_\|-]|)*@[a-z0-9-]|(\.[a-z0-9-]|)*\.([a-z]{2,4})/
Thank you very much in advance!
at you're trying to reinvent the wheel?
>
> -- Jack Krupansky
>
> -Original Message- From: Luis Cappa Banda
> Sent: Tuesday, July 30, 2013 10:53 AM
> To: solr-user@lucene.apache.org
> Subject: Email regular expression.
>
>
> Hello everyone!
>
> Unfort
/select?q=emails:[* TO
*]&start=0&rows=10&sort=mydate desc
And I don´t like it, to be honest,
Regards,
2013/7/30 Luis Cappa Banda
> Hello, Jack, Steve,
>
> Thank you for your answers. I´ve never used UAX29URLEmailTokenizerFactory,
> but I´ve read about it before trying Reg
I´ve tried this kind of queries in the past but I detected that they have a
poor performance and that they are incredibly slow. But it´s just my
experience, maybe someone can share with us any other opinion.
2013/7/30 Raymond Wiker
> On Jul 30, 2013, at 22:05 , Luis Cappa Banda wr
owever, it may be possible to create a field called 'flagEmails' that will
be true if the field 'emails' is filled via UAX29URLEmailTokenizerFactory.
Does anyone implemented during index-time this kind of behavior? Is it
possible?
Regards,
2013/7/30 Luis Cappa Banda
> I
e the same number of numFound documents,
but I would like to know the internal behavior of Solr.
Best regards,
- Luis Cappa
2013/7/30 Smiley, David W.
> Steve,
> The FieldCache and DocValues are irrelevant to this problem. Solr's
> FilterCache is, and Lucene has no counterpart. Perhap
Thank you very much, David. That was a great explanation!
Regards,
- Luis Cappa
2013/7/30 Smiley, David W.
> Luis,
>
> field:* and field:[* TO *] are semantically equivalent -- they have the
> same effect. But they internally work differently depending on the field
> type.
xception:
*2013-07-31 09:50:49,409 5189 [main] ERROR
com.buguroo.solr.index.WriteIndex - No such core: core*
Or I am sleppy, something that's possible, or there is some kind of bug
here.
Best regards,
--
- Luis Cappa
gards,
2013/7/31 Alan Woodward
> Hi Luis,
>
> You need to call coreContainer.load() after construction for it to load
> the cores. Previously the CoreContainer(solrHome, configFile) constructor
> also called load(), but this was the only constructor to do that.
>
> I prob
uot;:56475.0,
> "query":{
> "time":935.0},
> "facet":{
> "time":0.0},
> "mlt":{
> "time":55442.0},
> "highlight":{
> "time":0.0},
> "stats":{
> "time":0.0},
> "spellcheck":{
> "time":0.0},
> "debug":{
> "time":98.0}
>
> Is there anything for me to do other than file an issue?
>
> Thanks,
> Shawn
>
--
- Luis Cappa
Hi,
The uuid, that was been used like the id of a document, it's generated by
solr using an updatechain.
I just use the recommend method to generate uuid's.
I think an atomic update is not suitable for me, because I want that solr
indexes the feeds and not me. I don't want to send information to
n a
> separate UUID field. That doesn't change by definition. What advantage do
> you think you get from the UUID field over just using your
> field?
>
> Best,
> Erick
>
>
> On Sat, Aug 24, 2013 at 6:26 AM, Luis Portela Afonso <
> meligalet...@gmail.co
Hi,
I'm having a problem when solr indexes.
It is updating documents already indexed. Is this a normal behavior?
If a document with the same key already exists is it supposed to be updated?
I has thinking that is supposed to just update if the information on the
rss has changed.
Appreciate your h
So I'm indexing RSS feeds.
I'm running the data import full-import command with a cron job. It runs
every 15 minutes and indexes a lot of RSS feeds from many sources.
With cron job, I do a http request using curl, to the address
http://localhost:port/solr/core/dataimport/?command=full-import&clean
But with atomic updates i need to send the information, right?
I want that solr automatic indexes it. And he is doing that. Can you look
at the solr example in the source?
There is an example on example-DIH folder.
Imagine that you run the URL to import the data every 15 minutes. If the
same info
POJO again and again.
--
- Luis Cappa
is not the way to do that:
you have to change the Java class, compile it again and relaunch whatever
the process that uses that Java class.
Regards,
- Luis Cappa
2013/5/13 Jack Krupansky
> Do your POJOs follow a simple flat data model that is 100% compatible with
> Solr?
>
> If so,
1 - 100 of 323 matches
Mail list logo