Hello ,
We have a multicore webapp for every 50 cores.Currently 3 Multicore webapps
and 150 cores distributed across the 3 webapps.
When we re started the server [Tomcat] ,we noticed that the solr.xml was
wiped out and we could not see any cores in webapp1 and webapp3 ,but only
a few cores in we
The query seems fine - as far as the URL being UTF-8. It seems that the
documents are not being passed to Solr with UTF-8 encoding. The document is
not part of the URL. It is HTTP POST data.
Try an explicit curl command to add a document and see if it is indexed with
the accents.
-- Jack Kru
"... is this limitation documented anywhere..."
Kind of, but not very well, at least at the Lucene level.
The Lucene File Formats page says "Lucene uses a Java int to refer to
document numbers, and the index file format uses an Int32 on-disk to store
document numbers. This is a limitation of b
Hi,
The reason why I use useFastVectorHighlighter is because I want to set
stored="false", and with more settings like termVectors="true"
termPositions="true" termOffsets="true". If stored="true", what is the
difference between normal highlight and useFastVectorHighlighter? What is the
right
Add &debugQuery=true to your query and look at the scores of the older vs.
newer docs compared to the boost. Maybe the boost needs to be increased.
-- Jack Krupansky
-Original Message-
From: Jonty Rhods
Sent: Monday, May 28, 2012 5:51 AM
To: solr-user@lucene.apache.org
Subject: boost
please suggest me I am stuck here..
On Mon, May 28, 2012 at 3:21 PM, Jonty Rhods wrote:
> Hi
>
> I am facing problem to boost on date field.
> I have following field in schema
>
>
> solr version 3.4
> I don't want to sort by date but want to give 50 to 60% boost those result
> which have lat
You can use the document id and timestamp as a compound unique id.
Then the search would also sort by id, then by timestamp. Result
grouping might let you pick the most recent document from each of the
sorted docs.
On Mon, May 28, 2012 at 3:15 PM, Nicholas Ball
wrote:
>
> Hello all,
>
> For the f
Try adding rootEntity="false" to the FilePath entity. The DIH code ends up
ignoring your rootEntity="true" on the XPathEntityProcessor entity if the
parent does not have rootEntity="false". I'm not sure if that is really
correct, but that's the way the code is.
-- Jack Krupansky
-Original
Hello all,
For the first step of the distributed snapshot isolation system I'm
developing for Solr, I'm going to need to have a MVCC mechanism as opposed
to the single-version concurrency control mechanism already developed
(DistributedUpdateProcessor class). I'm trying to find the very best way
You went over the max limit for number of docs.
On Monday, May 28, 2012, tosenthu wrote:
> Hi
>
> I have a index of size 1 Tb.. And I prepared this by setting up a
> background
> script to index records. The index was fine last 2 days, and i have not
> disturbed the process. Suddenly when i queri
I think 100 million documents is a realistic number for a single shard.
Maybe 250 million depending on your data. But I would say that beyond that
is being unrealistic. In some cases, even 50 million might be too much for a
single shard, depending on the data and query usage. Sure, maybe dependi
i have xml files need to import in solr,
xml looks like below,
1
albert
LA
2
john
NY
xml filepath is in sql database, so i have created dataimporthandler file as
per below
The RAM is about 14.5G. Allocated for Tomcat..
I have now 2 shards. But I was in an impression i can handle it with couple
of Shards. But in this case i need to have shards which can only grow up
2^31-1 records and many such shards to support 12 Billion records.
I will try to have more cores and
And it might make sense to have a "multi-value flattening" attribute for
Solr itself rather than in SolrCell.
-- Jack Krupansky
-Original Message-
From: Raphaël
Sent: Monday, May 28, 2012 12:56 PM
To: solr-user@lucene.apache.org
Subject: Re: UpdateRequestProcessor : flattened values
On Mon, May 28, 2012 at 10:30:03AM -0400, Jack Krupansky wrote:
> "... the access to individual literal fields seems (currently) very limited
> as they appear to be flattened."
>
> That is s "feature" of SolrCell, to flatten multiple values for a
> non-multi-valued field into a string concatenat
"numFound="-390662429""
That suggests that you have at least two shards which each have > 2G docs
(2^31-1).
How many shards do you have and how big do you think they should be in terms
of number of documents?
Are you being careful to distribute your update requests between shards so
that n
OOM is a problem.
You need more RAM and more machines, and maybe more shards.
-- Jack Krupansky
-Original Message-
From: tosenthu
Sent: Monday, May 28, 2012 11:29 AM
To: solr-user@lucene.apache.org
Subject: Re: Negative value in numFound
There was an Out Of Memory.. But still the in
Hi
It is a multicore but when i searched the shards query even then i get this
response
which is again a negative value.
Might be the total number of records may be > 2147483647 (2^31-1), But is
this limitation documented anywhere. What is the strategy to over come this
situation. Expectation
In some cases multi-shard architecture might significantly slow down the
search process at this index size...
By the way, how much RAM do you use?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Negative-value-in-numFound-tp3986398p3986438.html
Sent from the Solr - User maili
There was an Out Of Memory.. But still the indexing was happening further..
--
View this message in context:
http://lucene.472066.n3.nabble.com/Negative-value-in-numFound-tp3986398p3986437.html
Sent from the Solr - User mailing list archive at Nabble.com.
On 28 May 2012 20:12, Jack Krupansky wrote:
> Ah, okay. Here's some PHP regexp code for parsing a raw tweet to get user
> names and hash tags:
>
> http://saturnboy.com/2010/02/parsing-twitter-with-regexp/
[...]
One could also use the Solr DataImportHandler, and
RegexTransformer to do the job:
htt
2012/5/28 Jack Krupansky :
> Ah, okay. Here's some PHP regexp code for parsing a raw tweet to get user
> names and hash tags:
>
> http://saturnboy.com/2010/02/parsing-twitter-with-regexp/
Awesome!
thank you very much Jack.
GGhh
Ah, okay. Here's some PHP regexp code for parsing a raw tweet to get user
names and hash tags:
http://saturnboy.com/2010/02/parsing-twitter-with-regexp/
-- Jack Krupansky
-Original Message-
From: Giovanni Gherdovich
Sent: Monday, May 28, 2012 10:35 AM
To: solr-user@lucene.apache.org
Is this for a single-shard or multi-shard index?
There is a 2^31-1 limit for a single Lucene index since document numbers are
"int" (32-bit signed in Java) in Lucene, but with Solr shards you can have a
multiple of that, based on number of shards.
If you are multi-shard, maybe one of the shar
Hello Jack and Anuj,
2012/5/28 Jack Krupansky :
> The Twitter API extracts hash tag and user mentions for you, in addition to
> giving you the full raw text. You'll have to read up on the Twitter API.
That's what I thought just after hittind "send" on the message above ;-)
I am pretty sure the Tw
"... the access to individual literal fields seems (currently) very limited
as they appear to be flattened."
That is s "feature" of SolrCell, to flatten multiple values for a
non-multi-valued field into a string concatenation of the values.
All you need to do is add "multiValued="true"" to th
Hm... Have you any errors in logs? During search, during indexing?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Negative-value-in-numFound-tp3986398p3986426.html
Sent from the Solr - User mailing list archive at Nabble.com.
The Twitter API extracts hash tag and user mentions for you, in addition to
giving you the full raw text. You'll have to read up on the Twitter API.
-- Jack Krupansky
-Original Message-
From: Giovanni Gherdovich
Sent: Monday, May 28, 2012 10:09 AM
To: solr-user@lucene.apache.org
Subje
Hello Jack, hi all,
2012/5/28 Jack Krupansky :
> Other obvious metadata from the Twitter API to index would be hashtags, user
> mentions (both the user id/screen name and user name), date/time, urls
> mentioned (expanded if a URL shortener is used), and possibly coordinates
> for spatial search.
This is a bit old but provides good information for schema design-
http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php
Found this link as well- https://gist.github.com/702360
The types of the field may depend on the search requirements.
Regards,
Anuj
On Mon, May 28, 2012 at
Hi, Jack.
First of all thank you for your help.
Well, I tried again then I realized that my problem is not really with solr.
I did run this query against solr after start it up with the command "java
-jar start.jar":
http://localhost:8983/solr/coreFR/spell?q=content:pr%C3%A9senta&spellcheck=true&sp
Other obvious metadata from the Twitter API to index would be hashtags, user
mentions (both the user id/screen name and user name), date/time, urls
mentioned (expanded if a URL shortener is used), and possibly coordinates
for spatial search.
You would have to add all these fields and values yo
The details are below
Solr : 3.5
Using a Schema file with 53 fields and 8 fields indexed among them.
OS : CentOS 5.4 64 Bit
Java : 1.6.0 64 Bit
Apache Tomcat : 7.0.22
Intel(R) Xeon(R) CPU L5518 @ 2.13GHz (16 Processors)
/dev/mapper/index 5.9T 1.9T 4.0T 33% /Index
Had around 2 Billion Record
I don't recall anyone being able to get acceptable performance with a
single index that large with solr/lucene. The conventional wisdom is
that parallel searching across cores (or shards in SolrCloud) is the
best way to handle index sizes in the "illions". So its of great
interest how you did.
Any
Hi!
Can you please show your hardware parameters, version of Solr, that you're
using and schema.xml file?
thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Negative-value-in-numFound-tp3986398p3986408.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi
I have a index of size 1 Tb.. And I prepared this by setting up a background
script to index records. The index was fine last 2 days, and i have not
disturbed the process. Suddenly when i queried the index i get this
response, where the value of numFound is negative. Can any one say why/how
th
It is a single node. I am trying to find out if the performance can be
referenced.
Regarding information on Solr with RankingAlgorithm, you can find all
the information here:
http://solr-ra.tgels.org
On RankingAlgorithm:
http://rankingalgorithm.tgels.org
Regards,
- NN
On 5/27/2012 4:50 PM
Hello Dmitry and David,
2012/5/28 Dmitry Kan :
> [...] If you just want to
> index the text contents of tweets (including web links etc), using just
> off-the-shelf Solr is enough. You'll have to wrap your text input (per each
> tweet I would assume) into an xml [...]
> So design your schema firs
Hey,
I think you might be over-thinking this. Tweets are structured. You
have the content (tweet), the user who tweeted it and various other meta
data. So your 'document', might look like this:
ABCD1234
I bought some apples
JohnnyBoy
To get this structure, you can use any programming
Hi,
You want to use Tika, if you have your data in some binary format, like pdf
or excel. It extracts text from the binary for you. If you just want to
index the text contents of tweets (including web links etc), using just
off-the-shelf Solr is enough. You'll have to wrap your text input (per eac
Hi all.
I am in the process of setting up Solr for my application,
which is full text search on a bunch of tweets from twitter.
I am afraid I am missing something.
>From the books I am reading, "Apache Solr 3 Enterprise Search Server",
it looks like Solr works with structured input, like XML or C
On Sun, May 27, 2012 at 11:54:02PM -0400, Jack Krupansky wrote:
> You can create your own "update processor" that gets control between the
> output of Tika and the indexing of the document.
>
> See:
> http://wiki.apache.org/solr/UpdateRequestProcessor
Seems to be exactly what I was looking for,
indexing performance is mostly about the number of docs.
but when you are optimizing, a large index takes a bit much time
On Mon, May 28, 2012 at 12:48 PM, Aditya wrote:
> Hi Ivan,
>
> It depends on number of terms it has to load. If you index less amount of
> data but store large amount of data
Hi
I am facing problem to boost on date field.
I have following field in schema
solr version 3.4
I don't want to sort by date but want to give 50 to 60% boost those result
which have latest date...
following are the query :
http://localhost:8083/solr/movie/select/?defType=dismax&q=titanic&f
Hi Ivan,
It depends on number of terms it has to load. If you index less amount of
data but store large amount of data then your index size may be big but
actual terms may be less.
It is not directly proportional.
Regards
Aditya
www.findbestopensource.com
On Mon, May 28, 2012 at 3:00 PM, Ivan
For example we know that cache warming is executed during startup.
Are any other processes executed during Solr startup?
Thank you, Ivan
> I had a schema defined as indexed="true" stored="false" termVectors="true"
> termPositions="true" termOffsets="true"/>
You need to mark your text field as stored="true" to use
&hl.useFastVectorHighlighter=true
47 matches
Mail list logo