On Thu, Aug 12, 2010 at 5:42 AM, harrysmith wrote:
>
> To follow up on my own question, it appears this is only an issue when
> using
> the DataImport console debugging tools. It looks like when submitting the
> debugging request, the data-config.xml is sent via a GET request, which
> would fail.
Try to define image solr fields <-> db columns mapping explicitly in
"image" entity, i.e.
See
http://www.lucidimagination.com/search/document/c8f2ed065ee75651/dih_and_multivariable_fields_problems
On Thu, Aug 12, 2010 at 2:30 AM, Manali Joshi wrote:
> I tried making the schema
Hi,
When indexing large amounts of data I hit a problem whereby Solr
becomes unresponsive
and doesn't recover (even when left overnight!). I think i've hit some
GC problems/tuning
is required of GC and I wanted to know if anyone has ever hit this problem.
I can replicate this error (albeit taking
Thanks - splunk looks overkill.
We're extremely small scale - were hoping for something open source :-)
- Original Message
From: Jan Høydahl / Cominvent
To: solr-user@lucene.apache.org
Sent: Wed, August 11, 2010 11:14:37 PM
Subject: Re: Analysing SOLR logfiles
Have a look at www.splunk
Hi Robert!
> Since the example given was "http" being slow, its worth mentioning that if
> queries are "one word" urls [for example http://lucene.apache.org] these
> will actually form slow phrase queries by default.
>
do you mean that http://lucene.apache.org will be split up into "http
luce
Hi Tom,
I tried again with:
and even now the hitratio is still 0. What could be wrong with my setup?
('free -m' shows that the cache has over 2 GB free.)
Regards,
Peter.
> Hi Peter,
>
> Can you give a few more examples of slow queries?
> Are they phrase queries? Boolean queries? prefix or
Hi Tom!
> Hi Peter,
>
> Can you give a few more examples of slow queries?
> Are they phrase queries? Boolean queries? prefix or wildcard queries?
>
I am experimenting with one word queries only at the moment.
> If one word queries are your slow queries, than CommonGrams won't help.
> Comm
we've just started using awstats - as suggested by the solr 1.4 book.
its open source!:
http://awstats.sourceforge.net/
On 12 August 2010 18:18, Jay Flattery wrote:
> Thanks - splunk looks overkill.
> We're extremely small scale - were hoping for something open source :-)
>
>
> - Original Me
exactly!
On Thu, Aug 12, 2010 at 5:26 AM, Peter Karich wrote:
> Hi Robert!
>
> > Since the example given was "http" being slow, its worth mentioning that
> if
> > queries are "one word" urls [for example http://lucene.apache.org] these
> > will actually form slow phrase queries by default.
> >
On 05/08/2010 09:59, Raphaël Droz wrote:
Hi,
I saw this post :
http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html
I didn't see work in progress or plans about this feature on the list
and bugtracker.
Does someone already created a patch, pof, ... I wouldn't have been
able
Hi,
I'm having oome problems with solr. From random browsing
I'm getting an impression that a lot of memory fixes happened
recently in solr and lucene.
Could you give me a quick summary how (un)stable are different
lucene / solr branches and how much improvement I can expect?
I wonder too, that there shouldn't be a special tool which analyzes solr
logfiles (e.g. parses qtime, the parameters q, fq, ...)
Because there are some other open source log analyzers out there:
http://yaala.org/ http://www.mrunix.net/webalizer/
Another free tool is newrelic.com (you will submit
Hi all,
The indexing part of solr is going good,but i got a error on indexing
a single pdf file. when i searched for the error in the mailing list i found
that the error was due to copyright of that file. can't we index a file
which has copy rights or any digital rights???
regards,
satya
Hi,
I'm trying to index a txt-File (~150MB) using Solr Cell/Tika.
The curl command aborts due to a java.lang.OutOfMemoryError.
*
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
(10/08/12 21:06), Tomasz Wegrzanowski wrote:
Hi,
I'm having oome problems with solr. From random browsing
I'm getting an impression that a lot of memory fixes happened
recently in solr and lucene.
Could you give me a quick summary how (un)stable are different
lucene / solr branches and how much
One way I've done to handle this, and it works only for some types of data,
is to put the searchable part of the sub-doc in a search field
(indexed=true) and put an xml or json representation of the sub-doc in a
stored only field. Then if the main doc is hit via search I can grab the xml
or json,
I am a little confused - how did 180k documents become 100m index documents?
We use have over 20 indices (for different content sets), one with 5m
documents (about a couple of pages each) and another with 100k+ docs.
We can index the 5m collection in a couple of days (limitation is in
the source) w
On 12 August 2010 13:46, Koji Sekiguchi wrote:
> (10/08/12 21:06), Tomasz Wegrzanowski wrote:
>>
>> Hi,
>>
>> I'm having oome problems with solr. From random browsing
>> I'm getting an impression that a lot of memory fixes happened
>> recently in solr and lucene.
>>
>> Could you give me a quick su
On Thu, 12 Aug 2010 14:32:19 +0200
Lannig Carina wrote:
> Hi,
>
> I'm trying to index a txt-File (~150MB) using Solr Cell/Tika.
> The curl command aborts due to a java.lang.OutOfMemoryError.
[...]
> AFAIK Tika keeps the whole file in RAM and posts it as one single
> string to Solr. I'm using JVM
sorry -- i used the term "documents" too loosely!
180k scientific articles with between 500-1000 sentences each
and we index sentence-level index documents
so i'm guessing about 100 million lucene index documents in total.
an update on my progress:
i used GC settings of:
-XX:+UseConcMarkSweepGC
I'm doing deletes with the DIH but getting mixed results. Sometimes the
documents get deleted, other times I can still find them in the index. What
would prevent a doc from getting deleted?
For example, I delete 594039 and get this in the logs;
2010-08-12 14:41:55,625 [Thread-210] INFO [DataImp
I wrote a simple java program to import a pdf file. I can get a result when I
do search *:* from admin page. I get nothing if I search a word. I wonder if I
did something wrong or miss set something.
Here is part of result I get when do *:* search:
*
To help you we need the description of your fields in your schema.xml and
the query that you do when you search only a single word.
Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42
2010/8/12 Ma, Xiao
Thanks so much. I didn't know how to make any changes in schema.xml for pdf
files. I used solr default schema.xml. Please tell me what I need do in
schema.xml.
The simple java program I use is following. I also attached that pdf file. I
really appreciate your help!
*
please excuse this newbie question, but:
I want to upgrade solr to a version but not to the latest version in the
trunk (because there are so many changes that I would have to test against,
and modify my custom classes for, and behavior changes, and deal with the
lucene index change, etc)
My t
1) I assume you are doing batching interspersed with commits
2) Why do you need sentence level Lucene docs?
3) Are your custom handlers/parsers a part of SOLR jvm? Would not be
surprised if you a memory/connection leak their (or it is not
releasing some resource explicitly)
In general, we have NEV
Another option is the 3x branch - that should still be able to read
indexes from Solr 1.4/Lucene 2.9
I personally don't expect a 1.5 release to ever materialize.
There will eventually be a Lucene/Solr 3.1 release off of the 3x
branch, and a Lucene/Solr 4.0 release off of trunk.
-Yonik
http://www.l
no help ? =(
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Doc-Lucene-Doc-tp995922p1114172.html
Sent from the Solr - User mailing list archive at Nabble.com.
Thanks Yonik but
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/CHANGES.txt
says that the lucene index has changed
"
Upgrading from Solr 1.4
--
* The Lucene index format has changed and as a result, once you upgrade,
previous versions of Solr will no lo
hi,
> 1) I assume you are doing batching interspersed with commits
as each file I crawl for are article-level each contains all the
sentences for the article so they are naturally batched into the about
500 documents per post in LCF.
I use auto-commit in Solr:
50
90
>
On Thu, Aug 12, 2010 at 12:24 PM, solr-user wrote:
> Thanks Yonik but
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/CHANGES.txt
> says that the lucene index has changed
Right - but it will be able to read your older index.
Do you need Solr 1.4 to be able to read the new ind
Short summary:
Is there any way I can specify that I want a lot
of phrase slop for the "pf" parameter, but none
at all for the "pf2" parameter?
I find the 'pf' parameter with a pretty large 'ps' to do a very
nice job for providing a modest boost to many documents that are
quite well rela
no, once upgraded I wouldnt need to have an older solr read the indexes.
misunderstood the note.
thx
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-update-solr-to-older-1-5-builds-instead-of-to-trunk-tp1113863p1115694.html
Sent from the Solr - User mailing list arch
Does anyone know if I need define fields in schema.xml for indexing pdf files?
If I need, please tell me how I can do it.
I defined fields in schema.xml and created data-configuration file by using
xpath for xml files. Would you please tell me if I need do it for pdf files and
how I can do?
T
Hi Peter,
If hits aren't showing up, and you aren't getting any queryResultCache hits
even with the exact query being repeated, something is very wrong. I'd suggest
first getting the query result cache working, and then moving on to look at
other possible bottlenecks.
What are your settings
Maybe this helps:
http://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2
Cheers,
Stefan
Am 12.08.2010 19:45, schrieb Ma, Xiaohui (NIH/NLM/LHC) [C]:
Does anyone know if I need define fields in schema.xml for indexing pdf files?
If I need, please tell me how I can do
Are you just trying to learn the tiny details of how Solr and DIH work? Is
this just an intellectual curiosity? Or are you having some specific problem
that you are trying to solve? If you have a problem, could you describe the
symptoms of the problem? I am using Solr, DIH, and several other relat
Hallo Users...
I tryed to get results from more then one Cores..
But i dont know how..
Maby you have a Idea..
I need it into PHP
King
Thanks so much for your help! I defined dynamic field in schema.xml as
following:
But I wonder what I should put for .
I really appreciate your help!
-Original Message-
From: Stefan Moises [mailto:moi...@shoptimax.de]
Sent: Thursday, August 12, 2010 1:58 PM
To: solr-user@lucene.ap
i write a little thesis about this. and i need to know how solr is using
lucene -in which way. in example of using dih and searching. so for my
better understanding .. ;-)
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Doc-Lucene-Doc-tp995922p1118089.html
Sent from t
Thanks so much. I got it work now. I really appreciate your help!
Xiaohui
-Original Message-
From: Stefan Moises [mailto:moi...@shoptimax.de]
Sent: Thursday, August 12, 2010 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: index pdf files
Maybe this helps:
http://www.packtpub.com/a
I was looking at the ability to sort by Function that was added to solr.
For the most part it seems to work. However solr doesn't seem to like to
sort by certain functions.
For example, this sum works:
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(1,Latitude,Longitude,sum(Latitude,Longitu
small typo in last email: second sum should have been hsin, but I notice
that the problem also occurs when I leave it as sum
--
View this message in context:
http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1118260.html
Sent from the Solr - User mailing list arc
I'm attempting to make use of PatternReplaceCharFilterFactory, but am running
into issues on both 1.4.1 ( I ported it) and on nightly (4.0-2010-07-27). It
seems that on a real query the charFilter isn't executed prior to the
tokenizer.
I modified the example configuration included in the dis
Hello,
I'm customizing my XML response using with the XSLTResponseWriter using
"&wt=xslt&tr=transform.xsl". Because I have a few use-cases to support, I
wanted to break up the common bits and import/include them from multiple top
level xslt files, but it appears that the base directory of the tran
Hi,
I am new to text search and mining and have been doing research for
different available products. My application requires reading a SMS message
(unstructured) and finding out entities such as person name, area, zip ,
city and skills associated with the person. SMS would be in form of free
text.
I got the following error when I index some pdf files. I wonder if anyone has
this issue before and how to fix it. Thanks so much in advance!
***
Error 500
HTTP ERROR: 500org.apache.tika.exception.TikaException:
Unexpected RuntimeException from org.apache.tik
Here's perhaps the coolest webinar we've done to date, IMO :)
I attended Tyler's presentation at Lucene EuroCon* and thoroughly
enjoyed it. Search UI/UX is a fascinating topic to me, and really
important to do well for the applications most of us are building.
I'm pleased to pass along the
problem could be related to some oddity in sum()?? some more examples:
note: Latitude and Longitude are fields of type=double
works:
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(1,1.0))%20asc
http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(Latitude,Latitude)%20asc
http://10.0.11.54:8
Solr is a search engine, not an entity extraction tool.
While there are some decent open source entity extraction tools, they are
focused on processing sentences and paragraphs. The structural differences in
text messages means you'd need to do a fair amount of work to get decent entity
extrac
I tried some time ago to use SOLR-788. Ultimately I was able to get
both patch versions to apply (separately), but neither worked. The
suggestion I received when I commented on the issue was to download the
specific release mentioned in the patch and then update, but the patch
was created be
issue resolve. problem was that solr.war was silently not being overwritten
by new version.
will try to spend more time debugging before posting.
--
View this message in context:
http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1121349.html
Sent from the Solr -
On 8/11/2010 3:27 PM, JohnRodey wrote:
1) Is there any information on preferred maximum sizes for a single solr
index. I've read some people say 10 million, some say 80 million, etc...
Is there any official recommendation or has anyone experimented with large
datasets into the tens of billions?
Try this,
http://viewer.opencalais.com/
They have an open API for that data. With your text message of :
"John Mayer Mumbai 411004 Juhu, car driver, also capable of body guard"
It gives back:
People: John Mayer Mumbai
Positions: body guard, car driver.
It's not perfect but it's not bad eithe
Hey thanks Stanislaw! I'm going to try this against the current trunk
tonight and see what happens.
Matt
On Wed, Jul 28, 2010 at 8:41 AM, Stanislaw Osinski <
stanislaw.osin...@carrotsearch.com> wrote:
> > The patch should also work with trunk, but I haven't verified it yet.
> >
>
> I've just add
Hey all,
I am doing a search on hierarchical data, and I have a hard time
getting my head around the following problem.
I want a result as follows, in one single query only:
USA (3)
> California (2)
> Arizona (1)
Europe (4)
> Norway (3)
>> Oslo (3)
> Sweden (1)
How it looks in the XML/JSON resp
: I'm trying to match "Apple 2" but not "Apple2" using phrase search, this is
why I have it quoted.
: I was under the impression --when I use phrase search-- all the
: analyzer magic would not apply, but it is!!! Otherwise, how would I
: search for a phrase?!
well .. yes ... even with phras
: please help - how can I calculate queryresultcache size (how much RAM should
: be dedicated for that). I have 1,5 index size, 4 mio docs.
: QueryResultWindowSize is 20.
: Could I use "expire" property on the documents in this cache?
There is no "expire" property, items are automaticly removed f
: I'm trying to extend the writer used by solrj
: (org.apache.solr.response.BinaryResponseWriter), i have declared it in
...
: I see that it is initialized, but when i try to set the 'wt' param to
: 'myWriter'
:
: solrQuery.setParam("wt","myWriter"), nothing happen, it's still using the
:
Collection myFL =
searcher.getReader().getFieldNames(IndexReader.FieldOption.ALL);
will return all fields in the schema (i.e. index, stored, and
indexed+stored).
Collection myFL =
searcher.getReader().getFieldNames(IndexReader.FieldOption.INDEXED );
likely returns all fields that are indexed (I
Thanks Alexey. That solved the issue. I am now able to get all images
information in the index.
On Thu, Aug 12, 2010 at 12:47 AM, Alexey Serba wrote:
> Try to define image solr fields <-> db columns mapping explicitly in
> "image" entity, i.e.
>
>
>
>
>
>
>
> See
> http://www.luci
: Is it possible to duplicate a core? I want to have one core contain only
: documents within a certain date range (ex: 3 days old), and one core with
: all documents that have ever been in the first core. The small core is then
: replicated to other servers which do "real-time" processing on it
Hi there,
I've a problem querying SOLR for a specific field with a query string that
contains spaces. I added following lines in the schema.xml to add my own
defined fields. Fields are: ap_name, ap_address, ap_dob, ap_desg, ap_sec.
Since all these fields are beginning with ap_, I included the th
: Furthermore, I would like to add its not just the highlight matches
: functionality that is horribly broken here, but the output of the analysis
: itself is misleading.
:
: lets say i take 'textTight' from the example, and add the following synonym:
:
: this is broken => broke
:
: the query t
On Thu, Aug 12, 2010 at 7:55 PM, Chris Hostetter
wrote:
>
>
> You say it's bogus because the qp will divide on whitesapce first -- but
> you're assuming you know what query parser will be used ... the "field"
> query parser (to name one) doesn't split on whitespace first. That's my
> point: analys
: It returns in around a second. When I execute the attached code it takes just
: over three minutes. The optimal for me would be able get closer to the
: performance I'm seeing with curl using Solrj.
I think your problem may be that StreamingUpdateSolrServer buffers up
commands and sends them
: > You say it's bogus because the qp will divide on whitesapce first -- but
: > you're assuming you know what query parser will be used ... the "field"
: > query parser (to name one) doesn't split on whitespace first. That's my
: > point: analysis.jsp doesn't make any assumptions about what quer
:
: I'm trying to set different autocommit settings to 2 separate request
: handlers...I would like a requesthandler to use an update handler and a
: second requesthandler use another update handler...
:
: can I have more than one update handler in the same solrconfig?
: how can I configure a req
We were able to get the hierarchy faceting working with a work around
approach.
e.g. if you have Europe//Norway//Oslo as an entry
1. Create a new multivalued field with string type
2. Index the field for "Europe//Norway//Oslo" with values
0//Europe
1//Europe//Norway
2//Europe//Norway//Oslo
3
: >
: > That should still be true in the the official 4.0 release (i really should
: > have said "When 4.0 can no longer read SOlr 1.4 indexes"), ...
: > i havne't been following the detials closely, but i suspect that tool
: > hasn't been writen yet because there isn't much point until the full
:
We pretty much had the same issue, ended up customizing the ExtendedDismax
code.
In your case its just a change of a single line
addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
tiebreaker, pslop);
to
addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
Please add a JIRA issue for this.
https://issues.apache.org/jira/secure/BrowseProject.jspa
On Tue, Aug 10, 2010 at 6:59 PM, kenf_nc wrote:
>
> Glad I could help. I also would think it was a very common issue. Personally
> my schema is almost all dynamic fields. I have unique_id, content,
> last_u
Please add a JIRA issue for this.
On Wed, Aug 11, 2010 at 6:24 AM, Sascha Szott wrote:
> Sorry, there was a mistake in the stack trace. The correct one is:
>
> SEVERE: Full Import failed
> org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir'
> value: /home/doe/foo is not a dir
On Thu, Aug 12, 2010 at 8:07 PM, Chris Hostetter
wrote:
>
> : > You say it's bogus because the qp will divide on whitesapce first --
> but
> : > you're assuming you know what query parser will be used ... the "field"
> : > query parser (to name one) doesn't split on whitespace first. That's
> my
: > Does not return document as expected:
: > id:1234 AND (-indexid:1 AND -indexid:2) AND -indexid:3
: >
: > Has anyone else experienced this? The exact placement of the parens isn't
: > key, just adding a level of nesting changes the query results.
...
: I could be wrong but I think this
: Subject: index pdf files
: References:
: <4c63ed43.4030...@r.email.ne.jp>
:
: In-Reply-To:
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh
: Subject: Indexing large files using Solr Cell causes OutOfMemory error
: References:
: In-Reply-To:
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a
There was a major Lucene change in filter handling from Solr 1.3 to
Solr 1.4. They are much much faster in 1.4. Really Lucene 2.4.1 to
Lucene 2.9.2. The filter is now consulted much earlier in the search
process, thus weeding out many more documents early.
It sounds like in Solr 1.3, you should on
: Subject: PDF file
: References: <20100729152139.321c4...@ibis>
:
: In-Reply-To:
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even
: In-Reply-To:
: References:
:
: Subject: In multicore env, can I make it access core0 by default
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a f
: Subject: hl.usePhraseHighlighter
: References: <1281125904548-1031951.p...@n3.nabble.com>
: <960560.55971...@web52904.mail.re2.yahoo.com>
: In-Reply-To: <960560.55971...@web52904.mail.re2.yahoo.com>
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When startin
This is probably true about Luke. The trunk has a new Lucene format
and does not read any previous format. The trunk is a busy code base.
The 3.1 branch is slated to be the next Solr release, and is probably
a better base for your testing. Best of all is to use the Solr 1.4.1
binary release.
On W
Which version of Solr is this? How many documents are there in the
index? Etc. It is hard for us to help you without more details.
On Thu, Aug 12, 2010 at 8:32 AM, Qwerky wrote:
>
> I'm doing deletes with the DIH but getting mixed results. Sometimes the
> documents get deleted, other times I can
Can you provide more details? What is the error you're receiving?
What do you "think" is going on?
It might be helpful if you reviewed:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Thu, Aug 12, 2010 at 8:21 AM, satya swaroop wrote:
> Hi all,
> The indexing part of solr is
There is no information to go on here. Please review
http://wiki.apache.org/solr/UsingMailingLists
and add some more details...
Best
Erick
On Thu, Aug 12, 2010 at 2:09 PM, Jörg Agatz wrote:
> Hallo Users...
>
> I tryed to get results from more then one Cores..
> But i dont know how..
>
> Maby y
You'll get a lot of insight into what's actually happening if you append
&debugQuery=true to your queries, or check the "debug" checkbox
in the solr admin page.
But I suspect (and it's a guess since you haven't included your schema)
that your problem is that you're mixing explicit and default fiel
On Thu, Aug 12, 2010 at 8:29 PM, Chris Hostetter
wrote:
>
> It was a big part of the proposal regarding hte creation of hte 3x
> branch ... that index format compabtibility between major versions would
> no longer be supported by silently converted on first write -- instead
> there there would be
Win XP, Solr 1.4.1 out of the box install, using jetty. If I add greater than
or less than (ie < or >) in any xml field and attempt to load or run from
the DataImportConsole I receive a SAXParseException. Example follows:
If I don't have a 'less than' it works just fine. I know this must work,
be
I tried ap_address:(tom+cruise) and that worked. I am sure its the same
problem as you suspected!
Thanks a lot Erick(& users!) for your time.
Moiz
On Thu, Aug 12, 2010 at 8:51 PM, Erick Erickson wrote:
> You'll get a lot of insight into what's actually happening if you append
> &debugQuery=true
On Thu, Aug 12, 2010 at 7:05 AM, Girish wrote:
> Hi,
>
> I did load of the data with DIH and now once the data is loaded. I want to
> load the records dynamically as an when I received.
>
> Use cases:
>
> 1. I did load of 7MM records and now everything is working fine.
> 2. A new record is re
90 matches
Mail list logo