i did some research in schema DIH config file and i created my own DIH, i'm
getting this error when i run
−
0
0
−
−
try.xml
full-import
idle
−
0:0:0.163
0
1
0
0
2011-01-25 13:56:48
Indexing failed. Rolled back all changes.
2011-01-25 13:56:48
−
This response format is experimental. It
Hey Eric,
On Mon, Jan 24, 2011 at 7:23 PM, Eric Angel wrote:
> * Or you can typecast before you concat:
> > *
>
Casting before or after concat'ing, work's both - as we've seen two weeks
ago, in a similar thread (
http://search.lucidimagination.com/search/document/250975238eaeb9e0/solr_4_0_spat
Rich,
i played around for a few minutes with Script-Transformers, but i have not
enough knowledge to get anything done right know :/
My Idea was: looping over the given row, which should be a Java HashMap or
something like that? and do sth like this (pseudo-code):
var row_data = []
for( var key i
Cam,
the examples with the provided inline-documentation should help you, no?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
The Backslash \ in that context looks like an Escaping-Character, to avoid
the => to be interpreted as "assign-command"
Regards
Stef
Hi,
I am facing performance issues in three types of queries (and their
combination). Some of the queries take more than 2-3 mins. Index size is
around 150GB.
- Wildcard
- Proximity
- Phrases (with common words)
I know CommonGrams and Stop words are a good way to resolve such issues bu
Caused by: org.xml.sax.SAXParseException: Element type "field" must be
followed by either attribute specifications, ">" or "/>".
Sounds like invalid XML in your .. dataimport-config?
On Tue, Jan 25, 2011 at 5:41 AM, Dinesh wrote:
>
> http://pastebin.com/tjCs5dHm
>
> this is the log produced by t
ya after correcting it also it is throwing an exception
-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
--
View this message in context:
http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2327662.html
On Tue, Jan 25, 2011 at 10:05 AM, Dinesh wrote:
>
> http://pastebin.com/CkxrEh6h
>
> this is my sample log
[...]
And, which portions of the log text do you want to preserve?
Does it go into Solr as a single error message, or do you want
to separate out parts of it.
Regards,
Gora
i want to take the month, time, DHCPMESSAGE, from_mac, gateway_ip, net_ADDR
-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092
http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2327738.html
this thread explains my problem
-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
--
View this message in context:
http://lucene.472066.n3.nabbl
On Tue, Jan 25, 2011 at 11:44 AM, Dinesh wrote:
>
> i don't even know whether the regex expression that i'm using for my log is
> correct or no..
If it is the same try.xml that you posted earlier, it is very likely not
going to work. You seem to have just cut and pasted entries from
the Hathi Tru
no i actually changed the directory to mine where i stored the log files.. it
is /home/exam/apa..solr/example/exampledocs
i specified it in a solr schema.. i created an DataImportHandler for that in
try.xml.. then in that i changed that file name to sample.txt
that new try.xml is
http://pastebin
Hi,
I posted a question in November last year about indexing content from
multiple binary files into a single Solr document and Jayendra responded
with a simple solution to zip them up and send that single file to Solr.
I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't
curre
Hi All,
I need to index the documents presents in my file system at various
locations (e.g. C:\docs , d:\docs ).
Is there any way through which i can specify this in my DIH
Configuration.
Here is my configuration:-
There seems to be a bug with the current 1.4.1 release. You cannot
extract any content at all, regardless of content type.
Try to get a fresh version from the SVN repository. I did that earlier
today and can verify that Tika now will extract the content. I'm not
sure about zip files.
Tika
On Tue, 2011-01-25 at 10:20 +0100, Salman Akram wrote:
> Cache warming is a good option too but the index get updated every hour so
> not sure how much would that help.
What is the time difference between queries with a warmed index and a
cold one? If the warmed index performs satisfactory, then o
Hi,
recently we're experiencing OOMEs (GC overhead limit exceeded) in our
searches. Therefore I want to get some clarification on heap and cache
configuration.
This is the situation:
- Solr 1.4.1 running on tomcat 6, Sun JVM 1.6.0_13 64bit
- JVM Heap Params: -Xmx8G -XX:MaxPermSize=256m -XX:NewSiz
Hi Chris,
On 24/01/11 21:18, Chris Hostetter wrote:
: I notice that in the schema, it is only possible to specify a Analyzer class,
: but not a Factory class as for the other elements (Tokenizer, Fitler, etc.).
: This limits the use of this feature, as it is impossible to specify parameters
: fo
Hi,
as the biggest parts of our jvm heap are used by solr caches I asked myself
if it wouldn't make sense to run solr caches backed by terracotta's
bigmemory (http://www.terracotta.org/bigmemory).
The goal is to reduce the time needed for full / stop-the-world GC cycles,
as with our 8GB heap the l
By warmed index you only mean warming the SOLR cache or OS cache? As I said
our index is updated every hour so I am not sure how much SOLR cache would
be helpful but OS cache should still be helpful, right?
I haven't compared the results with a proper script but from manual testing
here are some o
Hi,
Are you sure you need CMS incremental mode? It's only adviced when running on
a machine with one or two processors. If you have more you should consider
disabling the incremental flags.
Cheers,
On Monday 24 January 2011 19:32:38 Simon Wistow wrote:
> We have two slaves replicating off one
Frankly, this puzzles me. It *looks* like it should be OK. One warning, the
analysis page sometimes is a bit misleading, so beware of that.
But the output of your queries make it look like the query is parsing as you
expect, which leaves the question of whether your index contains what
you think i
On Tuesday 25 January 2011 11:54:55 Martin Grotzke wrote:
> Hi,
>
> recently we're experiencing OOMEs (GC overhead limit exceeded) in our
> searches. Therefore I want to get some clarification on heap and cache
> configuration.
>
> This is the situation:
> - Solr 1.4.1 running on tomcat 6, Sun JV
Hi Siva,
try using the Solr Stats Component
http://wiki.apache.org/solr/StatsComponent
similar to
select/?&q=*:*&stats=true&stats.field={your-weight-field}&stats.facet={your-facet-field}
and get the sum field from the response. You may need to resort the weighted
facet counts to get a descending
Hi Eric,
You are right, there is a copy field to EdgeNgram, I tried the configuration
but it not working as expected.
Configuration I tried:
edgy_user_query
==
When I search for th
I would just use Nutch and specify the -solr param on the command line. That
will add the extracted content your instance of solr.
Adam
Sent from my iPhone
On Jan 25, 2011, at 5:29 AM, pankaj bhatt wrote:
> Hi All,
> I need to index the documents presents in my file system at various
On Tue, Jan 25, 2011 at 2:06 PM, Markus Jelsma
wrote:
> On Tuesday 25 January 2011 11:54:55 Martin Grotzke wrote:
> > Hi,
> >
> > recently we're experiencing OOMEs (GC overhead limit exceeded) in our
> > searches. Therefore I want to get some clarification on heap and cache
> > configuration.
> >
On 25.01.11 11.30, Erlend Garåsen wrote:
Tika version 0.8 is not included in the latest release/trunk from SVN.
Ouch, I wrote "not" instead of "now". Sorry, I replied in a hurry.
And to clarify, by "content" I mean the main content of a Word file.
Title and other kinds of metadata are succes
On Tue, Jan 25, 2011 at 3:46 PM, Dinesh wrote:
>
> no i actually changed the directory to mine where i stored the log files.. it
> is /home/exam/apa..solr/example/exampledocs
>
> i specified it in a solr schema.. i created an DataImportHandler for that in
> try.xml.. then in that i changed that fi
Hi Martin,
are you sure that your GC is well tuned?
A request that needs more than a minute isn't the standard, even when I
consider all the other postings about response-performance...
Regards
--
View this message in context:
http://lucene.472066.n3.nabble.com/Use-terracotta-bigmemory-for-sol
Thanks Erlend.
Not used SVN before, but have managed to download and build latest trunk
code.
Now I'm getting an error when trying to access the admin page (via
Jetty) because I specify HTMLStripStandardTokenizerFactory in my
schema.xml, but this appears to be no-longer supplied as part of t
I use a lot of dynamic fields, so looking at my schema isn't a good way to
see all the field names that may be indexed across all documents. Is there a
way to query solr for that information? All field names that are indexed, or
stored? Possibly a count by field name? Is there any other metadata a
OK, got past the schema.xml problem, but now I'm back to square one.
I can index the contents of binary files (Word, PDF etc...), as well as
text files, but it won't index the content of files inside a zip.
As an example, I have two txt files - doc1.txt and doc2.txt. If I index
either of the
You can query all the indexed or stored fields (including dynamic fields)
using the LukeRequestHandler: http://localhost:8983/solr/example/admin/luke
See also: http://wiki.apache.org/solr/LukeRequestHandler
Regards,
*
**Juan G. Grande*
-- Solr Consultant @ http://www.plugtree.com
-- Blog @ http:/
Thanks Adam, It seems like Nutch use to solve most of my concerns.
i would be great if you can have share resources for Nutch with us.
/ Pankaj Bhatt.
On Tue, Jan 25, 2011 at 7:21 PM, Estrada Groups <
estrada.adam.gro...@gmail.com> wrote:
> I would just use Nutch and specify the -solr param on t
Hi Gary,
The latest Solr Trunk was able to extract and index the contents of the zip
file using the ExtractingRequestHandler.
The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and
worked pretty well.
Tested again with sample url and works fine -
curl "
http://localhost:8080/solr
Hi ,
I have written a lucene custom filter.
I could not figure out on how to configure Solr to pick this custom filter
for search.
How to configure Solr to pick my custom filter?
Will the Solr standard search handler pick this custom filter?
Thanks,
Valiveti
--
View this message in context:
So, the index is a list of tokens per column, right?
There's a table per column that lists the analyzed tokens?
And the tokens per column are represented as what, system integers? 32/64 bit
unsigned ints?
Dennis Gearon
Signature Warning
It is always a good idea to learn from
Why does it matter? You can't really get at them unless you store them.
I don't know what "table per column" means, there's nothing in Solr
architecture called a "table" or a "column". Although by column you
probably mean more or less Solr "field". There is nothing like a
"table" in Solr.
Let's back up here because now I'm not clear what you actually want.
EdgeNGrams
are a way of matching substrings, which is what's happening here. Of course
searching "apple" against any of the three examples, just as searching for
"apple"
without grams would match, that's the expected behavior.
So
Anyone?
On Tue, Jan 25, 2011 at 12:57 AM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:
> Just to add one thing, in case it makes a difference.
>
> Max document size on which highlighting needs to be done is few hundred
> kb's (in file system). In index its compressed so should be muc
Presumably your custom filter is in a jar file. Drop that jar file in
/lib
and refer it from your schema.xml file by its full name
(e.g. com.yourcompany.filter.yourcustomfilter) just like the other filters
and it should
work fine.
You can also put your jar anywhere you'd like and alter solrconfig.
That's exactly what I wanted, thanks. Any idea what
1294513299077
refers to under the section? I have 2 cores on one Tomcat instance,
and 1 on a second instance (different server) and all 3 have different
numbers for "version", so I don't think it's the version of Luke.
--
View this message
The index version. Can be used in replication to determine whether to
replicate or not.
On Tuesday 25 January 2011 20:30:21 kenf_nc wrote:
> refers to under the section? I have 2 cores on one Tomcat instance,
> and 1 on a second instance (different server) and all 3 have different
> numbers for
Hi Eric,
What I want here is, lets say I have 3 documents like
["pineapple vers apple", "milk with apple", "apple milk shake" ]
and If i search for "apple", it should return only "apple milk shake"
because that term alone starts with the letter "apple" which I typed in. It
should not bring oth
There are a few tutorials out there.
1. http://wiki.apache.org/nutch/RunningNutchAndSolr (not the most practical)
2. http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ (similar to 1.)
3. Build the latest from branch
http://svn.apache.org/repos/asf/nutch/branches/branch-1.3/ and read
this
I take that back...Use am currently using version 1.2 and make sure
that the latest versions of Tika and PDFBox is in the contrib folder.
1.3 is structured a bit differently and it doesn't look like there is
a contrib directory. Maybe one of the Nutch contributors can comment
on this?
Adam
On Tue
This is to announce the Berlin Buzzwords 2011. The second edition of the
successful conference on scalable and open search, data processing and data
storage in Germany, taking place in Berlin.
Call for Presentations Berlin Buzzwords
http://berlinbuz
Hi Eric,
Thanks for the reply.
I Did see some entries in the solrconfig.xml for adding custom
reposneHandlers, queryParsers and queryResponseWriters.
Bit could not find the one for adding the custom filter.
Could you point to the exact location or syntax to be used.
Thanks,
Valiveti
--
Vie
I haven't figured out any way to achieve that AT ALL without making a
seperate Solr index just to serve autosuggest queries. At least when you
want to auto-suggest on a multi-value field. Someone posted a crazy
tricky way to do it with a single-valued field a while ago. If you
can/are willing
Then you don't need NGrams at all. A wildcard will suffice or you can use the
TermsComponent.
If these strings are indexed as single tokens (KeywordTokenizer with
LowercaseFilter) you can simply do field:app* to retrieve the "apple milk
shake". You can also use the string field type but then yo
Oh, i should perhaps mention that EdgeNGrams will yield results a lot quicker
than using wildcards at the cost of a larger index. You should, of course, use
EdgeNGrams if you worry about performance and have a huge index and a number
of queries per second.
> Then you don't need NGrams at all. A
The index contains around 1.5 million documents. As this is used for
autosuggest feature, performance is an important factor.
So it looks like, using edgeNgram it is difficult to achieve the the
following
Result should return only those terms where search letter is matching with
the first word
Hi
I am searching for a way to specify optional terms in a query ( that dont need
to match (But if they match should influence the scoring) )
Using the dismax parser a query like this:
2
on
+lorem ipsum dolor amet
content
dismax
Will be parsed into something like this:
+((+(content:lor) (conte
Ah, sorry, I got confused about your requirements, if you just want to
match at the beginning of the field, it may be more possible. Using
edgegrams or wildcard. If you have a single-valued field. Do you have a
single-valued or a multi-valued field? That is, does each document have
just one v
With the 'lucene' query parser?
include &q.op=OR and then put a "+" ("mandatory") in front of every term
in the 'q' that is NOT optional, the rest will be optional. I think
that will do what want.
Jonathan
On 1/25/2011 5:07 PM, Daniel Pötzinger wrote:
Hi
I am searching for a way to specif
Right now our configuration says multivalues=true. But that need not be
"true" in our case. Will make it false and try and update this thread with
more details..
--
View this message in context:
http://lucene.472066.n3.nabble.com/EdgeNgram-Auto-suggest-doubles-ignore-tp2321919p2334627.html
Sent
Thank you Markus. I have added few more fields to schema.xml.
Now looks like the products are getting indexed. But no search results.
In Magento if I configure to use SOlr as the search engine. Search is not
returning any results. If I change the search engine to use Magento's
inbuilt MYSQL , Se
Hello list,
Apologies if this was already asked, I haven't found the answer in the archive.
I've been out of this list for quite some time now, hence.
I am looking at a good way to package a project based on maven2 that would
create me a solr-based webapp.
I would expect such projects as the ve
I am saying there is a list of tokens that have been parsed (a table of them)
for each column? Or one for the whole index?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so
First, let's be sure we're talking about the same thing. My response was for
adding
a filter to your analysis chain for a field in Schema.xml. Are you talking
about a different
sort of filter?
Best
Erick
On Tue, Jan 25, 2011 at 4:09 PM, Valiveti wrote:
>
> Hi Eric,
>
> Thanks for the reply.
>
>
This should shed some light on the matter
http://lucene.apache.org/java/2_9_0/fileformats.html
> I am saying there is a list of tokens that have been parsed (a table of
> them) for each column? Or one for the whole index?
>
> Dennis Gearon
>
>
> Signature Warning
>
> It is alw
Dear Stefan,
thank you for your help!
Well, I wrote a small script, even if not json, but works:
There aren't any tables involved. There's basically one list (per field) of
unique tokens for the entire index, and also, a list for each token of which
documents contain that token. Which is efficiently encoded, but I don't know
the details of that encoding, maybe someone who does can tell you,
OK, try this.
Use some analysis chain for your field like:
This can be a multiValued field, BTW.
now use the TermsComponent to fetch your data. See:
http://wiki.apache.org/solr/TermsComponent
and specify terms.prefix=apple e.g.
http://localhost:8983/solr/terms?terms.prefix=app&terms.fl=bli
There's almost no information to go on here. Please review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Tue, Jan 25, 2011 at 6:13 PM, Sandhya Padala wrote:
> Thank you Markus. I have added few more fields to schema.xml.
>
> Now looks like the products are getting indexed. But no
Hello
I am searching for a way to specify optional terms in a query ( that dont need
to match (But if they match should influence the scoring) )
Using the dismax parser a query like this:
2
on
+lorem ipsum dolor amet
content
dismax
Will be parsed into something like this:
+((+(content:lor) (c
I am not sure if i really understand what that mean by clean=false.
In my understanding, for full-import with default clean=true, it will blow
off all document of the existing index. Then full import data from a table
into a index. Is that right?
Then for clean=false, my understanding is that
68 matches
Mail list logo