Hello,
I created a data-config.xml file where I define a datasource and an entity
with 12 fields.
In my use case I have 2 databases with the same schema, so I want to
combine in one index the 2 databases.
I defined a second dataSource tag and duplicateed the entity with its
field(changed the name
Hi,
It's funny that if you try "fóruns" it matches:
http://bhakta.casadomato.org:8982/solr/select/?q=f%C3%B3runs&version=2.2&start=0&rows=10&indent=on
But not when you try "foruns", it does not.
Check this out...
http://bhakta.casadomato.org:8982/solr/admin/analysis.jsp?nt=type&name=text&verbose
Hi all,
I have a problem to configure a pdf indexing from a directory in my solr
wit DIH:
with this data-config
I obtain this result:
full-import
idle
-
0:0:2.44
0
43
0
2012-02-12 19:06:00
Indexing failed. Rolled
Hello.
Which RH do you use to find typing errors like "goolge" => do you mean
"google" ?!
I want to use my Autosuggestion "EdgeNGram" with a clever AutoCorrection!
What do you use ?
-
--- System
One Server, 12 GB RAM, 2
I kept old schema files and solrconfig file but there were some errors due to
which solr was not loading. I dono what are those things. We have few our
own custom plugins developed with 1.4.1
--
View this message in context:
http://lucene.472066.n3.nabble.com/Do-we-need-reindexing-from-solr-1-4-1
we have both stored = true and false fields in the schema. So we cant reindex
wat u said. we have tried that earlier.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Do-we-need-reindexing-from-solr-1-4-1-to-3-5-0-tp3739353p3749631.html
Sent from the Solr - User mailing list ar
Pls find inlined.
On Thu, Feb 16, 2012 at 10:30 AM, Alexey Verkhovsky <
alexey.verkhov...@gmail.com> wrote:
> Hi, all,
>
> I'm new here. Used Solr on a couple of projects before, but didn't need to
> dive deep into anything until now. These days, I'm doing a spike for a
> "yellow pages" type sear
1. Do you see any errors / exceptions in the logs?
2. Could you have duplicates?
On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev wrote:
> Hello,
>
> I created a data-config.xml file where I define a datasource and an entity
> with 12 fields.
> In my use case I have 2 databases with the same schema,
Hi William,
Thanks for the feedback.
I will try the group query and see how the performance with 2 queries is.
Best Regards
Ericz
On Thu, Feb 16, 2012 at 4:06 AM, William Bell wrote:
> One way to do it is to group by city and then sort=geodist() asc
>
> select?group=true&group.field=city&sort
I have a heldesk application developed in PHP/MySQL. I want to implement real
time Full text search and I have shortlisted Solr. MySQL database will store
all the tickets and their updates and that data will be imported for
building Solr index. All Search requests will be handled by Solr.
What I w
1. Nothing in the logs
2. No.
On Thu, Feb 16, 2012 at 12:44 PM, Dmitry Kan wrote:
> 1. Do you see any errors / exceptions in the logs?
> 2. Could you have duplicates?
>
> On Thu, Feb 16, 2012 at 10:15 AM, Radu Toev wrote:
>
> > Hello,
> >
> > I created a data-config.xml file where I define a da
It sounds a bit, as if SOLR stopped processing data once it queried all
from the smaller dataset. That's why you have 2000. If you just have a
handler pointed to the bigger data set (6k), do you manage to get all 6k db
entries into solr?
On Thu, Feb 16, 2012 at 1:46 PM, Radu Toev wrote:
> 1. Not
I tried running with just one datasource(the one that has 6k entries) and
it indexes them ok.
The same, if I do sepparately the 1k database. It indexes ok.
On Thu, Feb 16, 2012 at 2:11 PM, Dmitry Kan wrote:
> It sounds a bit, as if SOLR stopped processing data once it queried all
> from the smal
OK, maybe you can show the db-data-config.xml just in case?
Also in schema.xml, does you correspond to the unique field in
the db?
On Thu, Feb 16, 2012 at 2:13 PM, Radu Toev wrote:
> I tried running with just one datasource(the one that has 6k entries) and
> it indexes them ok.
> The same, if I
I've removed the connection params
The unique key is id.
On Thu, Feb 16, 2012 at 2:27 PM, Dmitry Kan wrote:
> OK, maybe you can show the db-data-config.xml just in case?
> Also in schema.xml, does you correspond to the unique fi
Hi Solr community,
I'm new to Solr and DataImportHandler., I've a requirement to fetch the data
from a database table and pass it to solr.
Part of existing data-config.xml and solr schema.xml are given below,
data-config.xml
On Feb 14, 2012, at 10:57 PM, Jamie Johnson wrote:
> Not sure if this is
> expected or not.
Nope - should be already resolved or will be today though.
- Mark Miller
lucidimagination.com
Ok, great. Just wanted to make sure someone was aware. Thanks for
looking into this.
On Thu, Feb 16, 2012 at 8:26 AM, Mark Miller wrote:
>
> On Feb 14, 2012, at 10:57 PM, Jamie Johnson wrote:
>
>> Not sure if this is
>> expected or not.
>
> Nope - should be already resolved or will be today th
PatternReplaceFilterFactory has no option to select the group to replace.
Is there a reason for this, or could this be a nice feature?
--
View this message in context:
http://lucene.472066.n3.nabble.com/PatternReplaceFilterFactory-group-tp3750201p3750201.html
Sent from the Solr - User mailing l
Hello all:
We'd like to score the matching documents using a combination of SOLR's IR
score with another application-specific score that we store within the
documents themselves (i.e. a float field containing the app-specific
score). In particular, we'd like to calculate the final score doing some
On 16 February 2012 14:33, alessio crisantemi
wrote:
> Hi all,
> I have a problem to configure a pdf indexing from a directory in my solr
> wit DIH:
>
> with this data-config
>
>
>
>
>
> name="tika-test"
> processor="FileListEntityProcessor"
> baseDir="D:\gioconews_archivio\marzo20
Hi Baranee,
Some time ago I played with
http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer - it was a
pretty good stuff.
Regards
On Thu, Feb 16, 2012 at 3:53 PM, K, Baraneetharan wrote:
> To avoid that we don't want to mention the column names in the field tag ,
> but want to writ
The slaves will be able to replicate from the master as before but not
in NRT depending on your commit interval. Commit interval can be set
higher for NRT as it is not needed for searches except for consolidating
the index changes on the master and can be an hr or even more. It maybe
easier to
I will test it with my big production indexes first, if it works I
will port to Java and add to contrib I think.
On Wed, Feb 15, 2012 at 10:03 PM, Li Li wrote:
> great. I think you could make it a public tool. maybe others also need such
> functionality.
>
> On Thu, Feb 16, 2012 at 5:31 AM, Rober
Hello,
I already posted this question but for some reason it was attached to a
thread with different topic.
Is there the possibility of perform 'exact search' in a payload field?
I'have to index text with auxiliary info for each word. In particular at
each word is associated the bounding box co
I think the problem here is that initially you trying to create separate
documents for two different tables, while your config is aiming to create
only one document. Here there is one solution (not tried by me):
--
You can have multiple documents generated by the same data-config:
I'm not sure I follow.
The idea is to have only one document. Do the multiple documents have the
same structure then(different datasources), and if so how are they actually
indexed?
Thanks.
On Thu, Feb 16, 2012 at 4:40 PM, Dmitry Kan wrote:
> I think the problem here is that initially you tryin
yes, but if I use TikaEntityProcessor the result of my full-import is
0
1
0
Indexing failed. Rolled back all changes.
2012/2/16 alessio crisantemi
> Hi all,
> I have a problem to configure a pdf indexing from a directory in my solr
> wit DIH:
>
> with this data-config
>
>
>
>
>
>
Each document in SOLR will correspond to one db record and since both
databases have the same schema, you can't index two records from two
databases into the same SOLR document.
So after indexing, you should have 7k different documents, each of which
holds data from a db record.
Also one problem
Hello carlos,
could you show us how your Solr-call looks like?
Regards,
Em
Am 16.02.2012 14:34, schrieb Carlos Gonzalez-Cadenas:
> Hello all:
>
> We'd like to score the matching documents using a combination of SOLR's IR
> score with another application-specific score that we store within the
>
Really good point on the ids, I completely overlooked that matter.
I will give it a try.
Thanks again.
On Thu, Feb 16, 2012 at 5:00 PM, Dmitry Kan wrote:
> Each document in SOLR will correspond to one db record and since both
> databases have the same schema, you can't index two records from two
no problem, hope it helps, you're welcome.
On Thu, Feb 16, 2012 at 5:03 PM, Radu Toev wrote:
> Really good point on the ids, I completely overlooked that matter.
> I will give it a try.
> Thanks again.
>
> On Thu, Feb 16, 2012 at 5:00 PM, Dmitry Kan wrote:
>
> > Each document in SOLR will corre
Hey everyone,
we're running into some operational problems with our SOLR production
setup here and were wondering if anyone else is affected or has even
solved these problems before. We're running a vanilla SOLR 3.4.0 in
several Tomcat 6 instances, so nothing out of the ordinary, but after
a day o
Hi O.,
PatternReplaceFilter(Factory) uses Matcher.replaceAll() or replaceFirst(), both
of which take in a string that can include any or all groups using the syntax
"$n", where n is the group number. See the Matcher.appendReplacement()
javadocs for an explanation of the functionality and synta
Hello Em:
The URL is quite large (w/ shards, ...), maybe it's best if I paste the
relevant parts.
Our "q" parameter is:
"q":"_val_:\"product(query_score,max(query($q8),max(query($q7),max(query($q4),query($q3)\"",
The subqueries q8, q7, q4 and q3 are regular queries, for example:
"q7
If your script turns out too complex to maintain, and you are developing
in Java, anyway, you could extend EntityProcessor and handle the data in
a custom way. I've done that to transform a datamart like data structure
back into a row based one.
Basically you override the method that gets the data
Make sure your Tomcat instances are started each with a max heap size
that adds up to something a lot lower than the complete RAM of your
system.
Frequent Garbage collection means that your applications request more
RAM but your Java VM has no more resources, so it requires the Garbage
Collector t
steve_rowe wrote
>
> Hi O.,
>
> PatternReplaceFilter(Factory) uses Matcher.replaceAll() or replaceFirst(),
> both of which take in a string that can include any or all groups using
> the syntax "$n", where n is the group number. See the
> Matcher.appendReplacement() javadocs for an explanation
On 2/15/2012 11:26 PM, nagarjuna wrote:
hi all..
i am new to solr .can any body explain me about the delta-import and
delta query and also i have the below questions
1.is it possible to run deltaimport without delataquery?
2. is it possible to write a delta query without having last_modifi
here the log:
org.apache.solr.handler.dataimport.DataImporter doFullImport
Grave: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' is
a required attribute Processing Document # 1
at
org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileLis
On 16 February 2012 21:37, alessio crisantemi
wrote:
> here the log:
>
>
> org.apache.solr.handler.dataimport.DataImporter doFullImport
> Grave: Full Import failed
> org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' is
> a required attribute Processing Document # 1
[...]
Th
On 2/3/2012 1:12 PM, Shawn Heisey wrote:
Is the following a reasonable approach to setting a connection timeout
with SolrJ?
queryCore.getHttpClient().getHttpConnectionManager().getParams()
.setConnectionTimeout(15000);
Right now I have all my solr server objects sharing
Yes, I read it. But I don't know the cause.
and more: I work on windows and so, I configured manually tika and solr
because I don't have maven...
2012/2/16 Gora Mohanty
> On 16 February 2012 21:37, alessio crisantemi
> wrote:
> > here the log:
> >
> >
> > org.apache.solr.handler.dataimport.Data
There may be issues with your solrconfig. Kindly post the exception that you
are recieving.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Do-we-need-reindexing-from-solr-1-4-1-to-3-5-0-tp3739353p3750937.html
Sent from the Solr - User mailing list archive at Nabble.com.
There is a good example on how to do a delta update using
"command=full-update&clean=false" on the wiki, here:
http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta
This can be advantageous if you are updating a ton of data at once and do not
want it executing as many queries to the
You can enable the spellcheck component and add it to your default request
handler.
This might be of use:
http://wiki.apache.org/solr/SpellCheckComponent
http://wiki.apache.org/solr/SpellCheckComponent
It could be used both during autosuggest as well as did you mean.
--
View this message in con
Hi All,
I am using edismax SearchHandler in my search and I have some issues in the
search results. As I understand if the "defaultOperator" is set to OR the
search query will be passed as -> The OR quick OR brown OR fox implicitly.
However if I search for The quick brown fox, I get lesser result
Hello,
I want to copy all data from a multivalued field joined together in a single
valued field.
Is there any opportunity to do this by using solr-standards?
kind regards
--
View this message in context:
http://lucene.472066.n3.nabble.com/copyField-multivalued-field-to-joined-singlevalue-fiel
On Thu, Feb 16, 2012 at 11:35 AM, flyingeagle-de
wrote:
> Hello,
>
> I want to copy all data from a multivalued field joined together in a single
> valued field.
>
> Is there any opportunity to do this by using solr-standards?
There is not currently, but it certainly makes sense.
Anyone know of
I am attempting to execute a query with the following parameters
q=*:*
distrib=true
facet=true
facet.limit=10
facet.field=manu
f.manu.facet.mincount=1
f.manu.facet.limit=10
f.manu.facet.sort=index
rows=10
When doing this I get the following exception
null java.lang.ArrayIndexOutOfBoundsExceptio
Hello Carlos,
well, you must take into account that you are executing up to 8 queries
per request instead of one query per request.
I am not totally sure about the details of the implementation of the
max-function-query, but I guess it first iterates over the results of
the first max-query, after
please ignore this, it has nothing to do with the faceting component.
I was able to disable a custom component that I had and it worked
perfectly fine.
On Thu, Feb 16, 2012 at 12:42 PM, Jamie Johnson wrote:
> I am attempting to execute a query with the following parameters
>
> q=*:*
> distrib=tr
Hi Jamie,
what version of Solr/SolrJ are you using?
Regards,
Em
Am 16.02.2012 18:42, schrieb Jamie Johnson:
> I am attempting to execute a query with the following parameters
>
> q=*:*
> distrib=true
> facet=true
> facet.limit=10
> facet.field=manu
> f.manu.facet.mincount=1
> f.manu.facet.limit
Hi Jamie,
nice to hear.
Maybe you can share in what kind of bug you ran, so that other
developers with similar bugish components can benefit from your
experience. :)
Regards,
Em
Am 16.02.2012 19:23, schrieb Jamie Johnson:
> please ignore this, it has nothing to do with the faceting component.
>
Chantal,
if you prefer java here is http://wiki.apache.org/solr/DIHCustomTransform
On Thu, Feb 16, 2012 at 7:24 PM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:
> If your script turns out too complex to maintain, and you are developing
> in Java, anyway, you could extend EntityP
Hello Em:
Thanks for your answer.
Yes, we initially also thought that the excessive increase in response time
was caused by the several queries being executed, and we did another test.
We executed one of the subqueries that I've shown to you directly in the
"q" parameter and then we tested this s
Hello Carlos,
> We have some more tests on that matter: now we're moving from issuing this
> large query through the SOLR interface to creating our own
QueryParser. The
> initial tests we've done in our QParser (that internally creates multiple
> queries and inserts them inside a DisjunctionMaxQue
Hello Em:
1) Here's a printout of an example DisMax query (as you can see mostly MUST
terms except for some SHOULD terms used for boosting scores for stopwords)
*
*
*((+stopword_shortened_phrase:hoteles +stopword_shortened_phrase:barcelona
stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
: > I want to copy all data from a multivalued field joined together in a single
: > valued field.
: >
: > Is there any opportunity to do this by using solr-standards?
:
: There is not currently, but it certainly makes sense.
Part of it has just recently been commited to trunk actually...
https
https://issues.apache.org/jira/browse/SOLR-3138
On Feb 9, 2012, at 4:16 PM, Jamie Johnson wrote:
> per SOLR-2765 we can add roles to specific cores such that it's
> possible to give custom roles to solr instances, is it possible to
> specify this when adding a core through curl
> 'http://host:por
On Thu, Feb 16, 2012 at 3:37 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:
> Everybody start from daily bounce, but end up with UPDATED_AT column and
> delta updates , just consider urgent content fix usecase. Don't think it's
> worth to rely on daily bounce as a cornerstone of archite
still digging ;) Once I figure it out I'll be happy to share.
On Thu, Feb 16, 2012 at 1:32 PM, Em wrote:
> Hi Jamie,
>
> nice to hear.
> Maybe you can share in what kind of bug you ran, so that other
> developers with similar bugish components can benefit from your
> experience. :)
>
> Regards,
i have the same problem, it seems that there is a bug in SolrZkServer class
(parseProperties method), that doesn't work well when you have an external
zookeeper ensemble.
Thanks,
arin
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-issues-running-with-embedded-zook
On Thu, Feb 16, 2012 at 3:03 PM, Alexey Verkhovsky
wrote:
>> 5. All Solr caching is switched off.
>
>> But why?
>>
>
> Because (a) I shouldn't need to cache documents, if they are all in memory
> anyway;
Your're making many assumptions about how Solr works internally.
One example of many:
Solr
: We'd like to score the matching documents using a combination of SOLR's IR
: score with another application-specific score that we store within the
: documents themselves (i.e. a float field containing the app-specific
: score). In particular, we'd like to calculate the final score doing some
:
The issue appears to be that I put an empty array into the doc scores
instead of null in DocSlice. DocSlice then just checks if scores is
null when hasScore is called which caused a further issue down the
line. I'll follow up with anything else that I find along the way.
On Thu, Feb 16, 2012 at
On Thu, Feb 16, 2012 at 8:34 AM, Carlos Gonzalez-Cadenas
wrote:
> Hello all:
>
> We'd like to score the matching documents using a combination of SOLR's IR
> score with another application-specific score that we store within the
> documents themselves (i.e. a float field containing the app-specifi
On Thu, Feb 16, 2012 at 1:32 PM, Yonik Seeley wrote:
> Your're making many assumptions about how Solr works internally.
>
True that. If this spike turns into a project, digging through the source
code will come. Meantime, we have to start somewhere, and the default
configuration may not be the gr
On Thu, Feb 16, 2012 at 4:06 PM, Alexey Verkhovsky
wrote:
> ly need ids, scores and total number of results out of Solr. Presentation of
> selected entities will have to include some write-heavy data (from RDBMS
> and/or memcached), therefore won't be Solr's business anyway.
It depends on if you'
On Feb 16, 2012, at 2:53 PM, arin g wrote:
> i have the same problem, it seems that there is a bug in SolrZkServer class
> (parseProperties method), that doesn't work well when you have an external
> zookeeper ensemble.
>
This issue was around using an embedded ensemble - an external ensemble m
Hello Carlos,
I think we missunderstood eachother.
As an example:
BooleanQuery (
clauses: (
MustMatch(
DisjunctionMaxQuery(
TermQuery("stopword_field", "barcelona"),
TermQuery("stopword_field", "hoteles")
)
),
Sh
Hi all,
I was loading a big (60 million docs) csv in solr 4 when something odd
happened.
I got a solr error in the log saying that it could not write the file.
du -s indicated I had used 30Gb of a 50Gb available but df -k indicated
that the disk was I00% used.
ds and df giving different results c
I just modified some TestCases a little bit to see how the FunctionQuery
behaves.
Given that you got an index containing 14 docs, where 13 of them
containing the term "batman" and two contain the term "superman", a
search for
q=+text:superman _val_:"query($qq)"&qq=text:superman
Leads to two hits
On Thu, Feb 16, 2012 at 5:56 PM, Paulo Magalhaes
wrote:
> I was loading a big (60 million docs) csv in solr 4 when something odd
> happened.
> I got a solr error in the log saying that it could not write the file.
> du -s indicated I had used 30Gb of a 50Gb available but df -k indicated
> that th
Im not sure that timeout will help you here - I believe it's the timeout on
'creating' the connection.
Try setting the socket timeout (setSoTimeout) - that should let you try
sooner.
It looks like perhaps the server is timing out and closing the connection.
I guess all you can do is timeout reas
The delta instructions from
https://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command
works for me in solr 1.4 but crashes in 3.5.0 (error: "deltaQuery has no
column to resolve to declared primary key pk='ITEM_ID, CATEGORY_ID'" issue:
https://issues.apache.org/jira/browse/SOLR-2907
On 2/16/2012 6:28 PM, Mark Miller wrote:
Im not sure that timeout will help you here - I believe it's the timeout on
'creating' the connection.
Try setting the socket timeout (setSoTimeout) - that should let you try
sooner.
It looks like perhaps the server is timing out and closing the connecti
Hi,
I'm looking for a way to sort results by the number of matching terms.
Being able to sort by the coord() value or by the overlap value that gets
passed into the coord() function would do the trick. Is there a way I can
expose those values to the sort function?
I'd appreciate any help that poi
On 2/16/2012 6:31 PM, AdamLane wrote:
The delta instructions from
https://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command
works for me in solr 1.4 but crashes in 3.5.0 (error: "deltaQuery has no
column to resolve to declared primary key pk='ITEM_ID, CATEGORY_ID'" issue:
https:/
you can fool the lucene scoring fuction. override each function such as idf
queryNorm lengthNorm and let them simply return 1.0f.
I don't lucene 4 will expose more details. but for 2.x/3.x, lucene can only
score by vector space model and the formula can't be replaced by users.
On Fri, Feb 17, 2012
Here’s my use case. I expect to set up a Solr index that is approximately
1.4GB (this is a real number from the proof-of-concept using the real data,
which consists of about 10 million documents, many of significant size, and
making use of the FastVectorHighlighter to do highlighting on the body te
A couple of thoughts:
We wound up doing a bunch of tuning on the Java garbage collection.
However, the pattern we were seeing was periodic very extreme slowdowns,
because we were then using the default garbage collector, which blocks
when it has to do a major collection. This doesn't sound like yo
Yup - deletes are fine.
On Thu, Feb 16, 2012 at 8:56 PM, Jamie Johnson wrote:
> With solr-2358 being committed to trunk do deletes and updates get
> distributed/routed like adds do? Also when a down shard comes back up are
> the deletes/updates forwarded as well? Reading the jira I believe the
I want to leave the score intact so I can sort by matching term frequency
and then by score. I don't think I can do that if I modify all the
similarity functions, but I think your solution would have worked otherwise.
It would be great if there was a way I could expose this information
through a f
So suppose I have a multivalued field for categories. Let's say we have 3
items with these categories:
Item 1: category ids [1,2,5,7,9]
Item 2: category ids [4,8,9]
Item 3: category ids [1,4,9]
I now run a filter query for any of the following category ids [1,4,9]. I
should get all of them back a
If I want to write a complex UpdateRequestHandler should I do it on
trunk or the 3.x branch? The criteria are a stable, debugged,
full-featured environment.
--
Lance Norskog
goks...@gmail.com
Thanks Em!
What if we use a threshold value in the suggest configuration, like
0.005
I assume the dictionary size will then be smaller than the total number of
distinct terms, is there anyway to determine what that size is?
Thanks,
Mike
On Wednesday, February 15, 2012 at 4:39 PM, Em
> One thing that could fit the pattern you describe would be Solr caches
> filling up and getting you too close to your JVM or memory limit
This [uncommitted] issue would solve that problem by allowing the GC
to collect caches that become too large, though in practice, the cache
setting would need
88 matches
Mail list logo