Example data:
01/23/2011 05:12:34 [Test] a=1; hello_there=50; data=[1,5,30%];
I would love to be able to just "grep" the data - ie. if I search for "ello",
it finds and returns "ello", and if I search for "hello_there=5", it would
match too.
Here's what I'm using now:
Solr 4.0 (11/1 snapshot)
Data: 80k files, average size 2.5MB, largest is 750MB;
Solr: Each document is max 256k; total docs = 800k
Machine: Early 2009 Mac Pro, 6GB RAM, 1GBmin/2GBmax given to Solr Java; Admin
shows 30% mem usage
I originally tried injecting the entire file into a single Solr doc
hi,
i m using SolrCloud and i wanted to add Replication feature to it ..
i followed the steps in Solr Wiki .. but when the client tried to poll for data
from server i got below Error Message ..
in Master LogNov 3, 2011 8:34:00 PM org.apache.solr.common.SolrException
logSEVERE: org.apache.solr.
> Example data:
> 01/23/2011 05:12:34 [Test] a=1; hello_there=50;
> data=[1,5,30%];
>
> I would love to be able to just "grep" the data - ie. if I
> search for "ello", it finds and returns "ello", and if I
> search for "hello_there=5", it would match too.
>
> Here's what I'm using now:
>
> c
Right, not sure how to ask this question, what the terminology, but
hopefully my explaination will help...
We are chucking data into solr for queries. i cant mention the exact data,
but the closest thing i can think of is as follows:
- Unique ID for the solr record (DB ID in this case)
- A
Hi List
I have a solr index where I want to include numerical fields in my ranking
function as well as keyword relevance. For example, each document has a
document view count, and I'd like to increase the relevancy of documents
that are read often, and penalize documents with a very low view count
Hi, when is the SOLR cloud version planned to be released/stable what are your
thought of using it in a serious production environment?
Br,
Toni
**
IMPORTANT: This message is intended exclusively for in
Hi Thomas,
Do you always need the ordered proximity search by default ?
You may want to check SpanNearQuery at "
http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/";.
We are using edismax query parser provided by Solr.
I had a similar type of requirement in our project in here is how
Thanks for your reply, I will check this advice
--
View this message in context:
http://lucene.472066.n3.nabble.com/Ordered-proximity-search-tp3477946p3480321.html
Sent from the Solr - User mailing list archive at Nabble.com.
-- Forwarded message --
From: NDIAYE Bacar
Date: Fri, Nov 4, 2011 at 12:05 PM
Subject: Assist please
To: d...@tika.apache.org, u...@tika.apache.org
Hi,
I need your assist please for to configuration the Apache Tika to Sorl
attachment in Drupal 7.
I have try to confi
How are you crawling your info? Somewhere you have to inject the
source into the document, won't do the trick because
there's no source available
If you're crawling the data by yourself, you can just add the source
to the document.
If you're using DIH, you can specify the field as a constant
Try this with &debugQuery=on. I suspect you're not getting the query you
think you are and I'd straighten that out before worrying about highlighting.
Usually, for instance, AND should be capitalized to be an operator.
So try with &debugQuery=on and see what happens. The highlighter, I
believe, w
Yes. Just try it with &debugQuery=on and you can see the parsed
form of the query.
Best
Erick
On Wed, Nov 2, 2011 at 6:20 PM, Jamie Johnson wrote:
> Is it possible to do Proximity queries using edismax? I saw I could
> do the following
>
> q="batman movie"&qs=100
>
> but I wanted to be able to
Your Nutch indexes the site and host fields. If that is not enough you can use
its subcollection plugin to write values for URL patterns.
On Wednesday 02 November 2011 15:52:37 Fred Zimmerman wrote:
> I want to be able to list some searches to particular sources, e.g. "wiki
> only", "crawled only
I think that 1-2 second requirement is unreasonable. The first thing I'd
do is push back and understand whether this is actually a requirement or
just somebody picking numbers our of thin air.
Committing often enough for this to work is just *asking* for trouble
with 3.3. I'd
take a look at the Ne
Please define "sum of fields". The total number of unique terms for
all the fields?
The sum of some values of some fields for each document?
The count of the number of fields in the index?
Other???
Best
Erick
On Thu, Nov 3, 2011 at 11:43 AM, stockii wrote:
> i am searching for the best way to ge
Let's see...
1> Committing every second, even with commitWithin is probably going
to be a problem.
I usually think that 1 second latency is usually overkill, but
that's up to your
product manager. Look at the NRT (Near Real Time) stuff if you
really need this.
I thought that NRT was
Hi Spark,
2009 there was a monitor from lucidimagination:
http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open-source-apache-lucene
A colleague of mine calls the sematext-monitor "trojan" because "SPM phone
home":
"Easy in, easy out -
Hello all
I would like to handle german accents (Umlaute) by replacing the accented char
with its two-letter substitute (e.g ä => ae). For this reason I use the
char-filter solr.MappingCharFilterFactory configured with a mapping file
containing entries like “ä” => “ae”. I also want to use the
SolrMeter is useful too, it can be plugged to a production server just
to watch evolution of caches usage :
http://code.google.com/p/solrmeter/wiki/Screenshots#CacheHistoryStatistic
André
Yes -- how do I specify the field as a constant in DIH?
On Fri, Nov 4, 2011 at 11:17 AM, Erick Erickson wrote:
> How are you crawling your info? Somewhere you have to inject the
> source into the document, won't do the trick because
> there's no source available
>
> If you're crawling the da
This is a code fragment of how I am doing a ContentStreamUpdateRequest
using CommonHTTPSolrServer:
ContentStreamBase.URLStream csbu = new ContentStreamBase.URLStream(url);
InputStream is = csbu.getStream();
FastInputStream fis = new FastInputStream(is);
csur.addContentStream(csbu);
c
Hi list,
I'm working on improving the performance of the Solr scheme for Cascading.
This supports generating a Solr index as the output of a Hadoop job. We use
SolrJ to write the index locally (via EmbeddedSolrServer).
There are mentions of using overwrite=false with the CSV request handler, as
Dynamic fields are just fields, man. There's really no overhead that I know of.
I tend to prefer non-dynamic fields whenever possible to reduce
hard-to-find errors where, say, I've made a typo and they dynamic
pattern matches but that's largely a personal preference.
Best
Erick
On Thu, Nov 3, 20
It should be supported in SolrJ, I'm surprised it's been lopped out.
Bulk indexing is extremely common.
On Fri, Nov 4, 2011 at 1:16 PM, Ken Krugler wrote:
> Hi list,
>
> I'm working on improving the performance of the Solr scheme for Cascading.
>
> This supports generating a Solr index as the out
I am using this reference link:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg26389.html
However the article is a bit old and when I try to compile the class (using
newest solr 3.4 / java version "1.7.0_01" / Java(TM) SE Runtime Environment
(build 1.7.0_01-b08) / Java HotSpot(TM)
First of all, thanks a lot for your answer.
1) I could use 5 to 15 seconds between each commit and give it a try. Is
this an acceptable configuration? I'll take a look at NRT.
2) Currently I'm using a single core, the simplest setup. I don't expect to
have an overwhelming quantity of records, but
If the URL being sent to Solr is too long to be completely displayed in
the jetty request log, the next log entry is recorded on the same line.
The following line from my log is actually three separate requests:
10.100.0.240 - - [04/Nov/2011:00:00:00 +] "GET
/solr/s1live/select?qt=lbche
Gustavo -
Even with the most basic requirements, I'd recommend setting up a multi-core
configuration so you can RELOAD the main core you will be using when you make
simple changes to config files. This is much cleaner than bouncing solr each
time. There are other benefits to doing it, but thi
Hi Brian,
I'll take a look at what you mentioned. I didn't think about that. I'll
finish the implementation at the app level and then I'll read a little more
about multi-core setups. Maybe I don't know yet all the benefits it has.
Thanks a lot for your advice.
2011/11/4 Brian Gerby
>
> Gustav
Answering my own question.
ContentStreamUpdateRequest (csur) needs to be within the while loop not
outside as I had it. Still not seeing any dramatic performance
improvements over perl though (the point of this exercise). Indexing
locks after about 30-45 minutes of activity, even a commit wo
When setting SolrQuery.setSortField("field1", ORDER.asc) on SolrQuery is not
adding sort parameter to Solr query. Has anyone faced this issue ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solrj-3-3-0-Method-SolrQuery-setSortField-not-working-tp3481239p3481239.html
Sent fro
Wow, I tried with minGramSize=1 and maxgramSize=1000 (I want someone to be able
to search on any substring, just like "grep"), and the index is multiple orders
of magnitude larger than my data!
There's got to be a better way to support full grep-like searching?
Thanks!
Pete
On Nov 4, 2011, at
Hi,
I have a (dismax) request handler which has the following 3 scoring
components (1 qf & 2 bf) :
qf = "field1^2 field2^3"
bf = func1(field3)^2 func2(field4)^3
Both func1 & func2 return scores between 0 & 1. The score returned by
textual match (qf) ranges from 0 to
To allow
:To allow better combination of text match & my functions, I want the text
: score to be normalized between 0 & 1. Is there any way I can achieve that
: here?
It is achievable, but it is not usualy meaningful...
https://wiki.apache.org/lucene-java/ScoresAsPercentages
-Hoss
: /solr/ftf/dismax/?q=libya
: &debugQuery=off
: &hl=true
: &start=
: &rows=10
: --
:
: I am trying to factor in created to the SCORE. (boost) I have tried a million
: ways to do this, no success. I know the dates are populating correctly because
: I can
Really helpful, thanks so much.
Spark
2011/11/4
> Hi Spark,
>
> 2009 there was a monitor from lucidimagination:
>
> http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open-source-apache-lucene
>
> A colleague of mine calls the sematext-
Thank you for the information.
2011/11/5 yu shen
> Really helpful, thanks so much.
>
> Spark
>
> 2011/11/4
>
> Hi Spark,
>>
>> 2009 there was a monitor from lucidimagination:
>>
>> http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open
Yes, the xpath thing is a custom lightweight thing for high-speed use.
There is a separate full XSL processor.
http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1
I think this lets you run real XSL on input files. I assume it lets you
throw in your favorite XSL implem
39 matches
Mail list logo