Dear Solr users,
My company would like to use solr to index around 80 000 000 documents
(xml files with around 5~10ko size each).
My program (robot) will connect to this solr with boolean requests.
Number of users: around 1000
Number of requests by user and by day: 300
Number of users by day:
My mistake- I did not research whether the data above is stored a
strings. The hashcode has to be stored as strings for this trick to
work.
On Sun, May 20, 2012 at 8:25 PM, Otis Gospodnetic
wrote:
> I'd be curious about this, too!
> I suspect the answer is: not doable, patches welcome. :)
> But I
On 22 May 2012 12:07, KP Sanjailal wrote:
> Hi,
>
> Thank you so much for replying.
>
> The MySQL database server is running on a Fedora Core 12 Machine with Hindi
> Language Support enabled. Details of the database are - ENGINE=3DMyISAM and
> DEFAULT CHARSET=3Dutf8
>
> Data is imported using th
We use fsv=true to help debug sortings which works great for
non-distributed searches. However, its not working (no sort_values in
response) for multi shard queries. Any idea how to get this fixed?
thanks,
XJ
Hi,
I have a very basic question and hopefully there is a simple answer to
this. We are trying to index a simple product catalog which has a master
product and child products. Each master product can have multiple child
products. A master product can be assigned one or more product categories.
Now
Hi,
I was using windows 7 but it is fine with chrome on Windows Web Server 2008 R2
also I asked a colleague with windows 7 and it is fine for him too, so really
sorry but I think it was a !'works on my machine' thing.
Of course if I track down the cause I will reply to this email again.
Than
You need to explain your case in much more detail to get precise help. Please
read http://wiki.apache.org/solr/UsingMailingLists
If your problem is that you have a URL and want to know the domain for it, e.g.
www.company.com/foo/bar/index.html and you want only www.company.com you can
use the U
Dedicated Server may not be required. If you want to cut down cost, then
prefer shared server.
How much the RAM?
Regards
Aditya
www.findbestopensource.com
On Tue, May 22, 2012 at 12:36 PM, Bruno Mannina wrote:
> Dear Solr users,
>
> My company would like to use solr to index around 80 000 000
Hello,
Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data)
be a combination of the master product id and the child product id ?
Therefor whenever you update your master product db entry, you simply need
to reindex documents depending on the master product entry.
You can ev
Hi Bruno,
will you use facets and result sorting ?
What is the update frequency/volume ?
This could impact the amount of memory/server count.
Ludovic.
-
Jouve
France.
--
View this message in context:
http://lucene.472066.n3.nabble.com/System-requirements-in-my-case-tp3985309p3985327.html
Thats how de-normalization works. You need to update all child products.
If you just need the count and you are using facets then maintain a map
between category and main product, main product and child product. Lucene
db has no schema. You could retrieve the data based on its type.
Category reco
Hi all,
greetings from my end. This is my first post on this mailing list. I have
few questions on multicore solr. For background we want to create a core
for each user logged in to our application. In that case it may be 50, 100,
1000, N-numbers. Each core will be used to write and search index i
My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml
24 Go DDR3
Le 22/05/2012 10:26, findbestopensource a écrit :
Dedicated Server may not be required. If you want to cut down cost, then
prefer shared server.
How much the RAM?
Regards
Aditya
www.findbestopensource.com
On Tue, May
Hi,
facets I don't know yet because I don't know exactly what is facets (sorry)
Sorting: yes
Scoring: yes
Concerning update Frequency : every week
Volume: around 1Go data by year
Merci beaucoup :)
Aix En Provence
France
Le 22/05/2012 10:35, lboutros a écrit :
Hi Bruno,
will you use facets
Having cores per user is not good idea. The count is too high. Keep
everything in single core. You could filter the data based on user name or
user id.
Regards
Aditya
www.findbestopensource.com
On Tue, May 22, 2012 at 2:29 PM, Shanu Jha wrote:
> Hi all,
>
> greetings from my end. This is my f
Seems to be fine. Go head.
Before hosting, Have you tried / tested your application in local setup.
RAM usage is what matters in terms of Solr. Just benchmark your app for 100
000 documents, Log the memory used. Calculate the RAM reqd for 80 000 000
documents.
Regards
Aditya
www.findbestopensourc
Yes. Lucene / Solr supports multi threaded environment. You could do commit
from two different threads to same core or different core.
Regards
Aditya
www.findbestopensource.com
On Tue, May 22, 2012 at 12:35 AM, jame vaalet wrote:
> hi,
> my use case here is to search all the incoming documents
Thank you for quick replies.
Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data)
be a combination of the master product id and the child product id ?
-- We do not need it as each child is already a unique key.
Therefore whenever you update your master product db entry, yo
Hi,
I would probably use (e)DisMax.
Index your url and metadata fields as text without stemming, e.g. text_general
Then query as &q=mycompany&defType=edismax&qf=title^10 content^1 url^5
If you like to give higher weight to the domain/site part of the URL, apply
UrlClassifyProcessor and search the
It all depends on the frequency at which you refresh your data, on your
deployment (master/slave setup), ...
Many things need to be taken into account!
Did you face any performance issue while building your index?
If you didn't, rebuilding it shouldn't be more problematic.
--
Tanguy
2012/5/22 So
We are still in design phase, so we haven't hit any performance issues. We
do not want to discover performance issues too late during QA :) We would
rather account for any issues during the design phase.
The refresh rate on fields that we are using from master table will be
rare. May be three or f
Thanks Jan.* It worked perfect*. Thats all i needed.
May the God bless you.
Regards
Shameema
On Tue, May 22, 2012 at 4:57 PM, Jan Høydahl wrote:
> Hi,
>
> I would probably use (e)DisMax.
> Index your url and metadata fields as text without stemming, e.g.
> text_general
> Then query as &q=mycomp
Hi,
This is the install process I used in my shell script to try and get Tomcat
running with Solr (debian server):
I swear this used to work, but currently only Tomcat works. The Solr page
just comes up with "The requested resource (/solr/admin) is not available."
Can anyone give me some insig
Hi,
It is impossible to guess the required HW size without more knowledge about
data and usage. 80 mill docs is a fair amount.
Here's how I would approach sizing the setup:
1) Get your schema in shape, removing unnecessary stored/indexed fields
2) To a test index locally of a part of the dataset
Hi,
Could please tell me what do you mean by filter data by users? I would like
to know is there real problem creating a core for a user. ie. resource
utilization, cpu usage etc.
AJ
On Tue, May 22, 2012 at 4:39 PM, findbestopensource <
findbestopensou...@gmail.com> wrote:
> Having cores per use
Hi Darren,
Thanks very much for your reply.
The reason I want to control core indexing/searching is that I want to
use one core to store one customer's data (all customer share same
config): such as customer 1 use coreForCustomer1 and customer 2
use coreForCustomer2.
Is there any better way tha
I installed a temp server on my university with 12 000 docs (Ubuntu+solr
3.6.0)
May be I can preview the size of memory I need?
Q: How can I check the memory used?
Le 22/05/2012 13:14, findbestopensource a écrit :
Seems to be fine. Go head.
Before hosting, Have you tried / tested your applic
> > The text may contain "FooBar".
> >
> > When I do a wildcard search like this: "Foo*" - no hits.
> > When I do a wildcard search like this: "foo*" - doc is
> > found.
>
> Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
Well, it works in 3.6. With one exception: If I use german
Hi Jan,
Thanks for all these details !
Answers are below.
Sincerely,
Bruno
Le 22/05/2012 13:58, Jan Høydahl a écrit :
Hi,
It is impossible to guess the required HW size without more knowledge about
data and usage. 80 mill docs is a fair amount.
Here's how I would approach sizing the setup
Hi Bruno,
Just to confirm -- are you seeing the clusters array in the result at all
()? To get reasonable clusters, you should request at
least 30-50 documents (rows), but even with smaller values, you should see
an empty clusters array.
Staszek
On Sun, May 20, 2012 at 9:20 PM, Bruno Mannina wr
Hi Lance,
Could you provide more details about implementing this using
SignatureUpdateProcessor?
Example can be helpful.
-
Rita
--
View this message in context:
http://lucene.472066.n3.nabble.com/Question-about-sampling-tp3984103p3985379.html
Sent from the Solr - User mailing list archive
Hi all,
greetings from my end. This is my first post on this mailing list. I have
few questions on multicore solr. For background we want to create a core
for each user logged in to our application. In that case it may be 50, 100,
1000, N-numbers. Each core will be used to write and search index i
Arfff
Clusters are at the end of my XML answer
..
..
ok all work fine now !
Le 22/05/2012 15:33, Stanislaw Osinski a écrit :
Hi Bruno,
Just to confirm -- are you seeing the clusters array in the result at all
()? To get reasonable clusters, you should request at
least 30-
Hi,
I use solr-solrj 3.6.0 and solr-core 3.6.0:
I have reimplemented the handleError of the ConcurrentUpdateSolrServer
class:
final ConcurrentUpdateSolrServer newSolrServer = new
ConcurrentUpdateSolrServer(url, client, 100, 10){
@Override
public void handleError(Throwable ex) {
I'm curious what the solrcloud experts say, but my suggestion is to try not to
over-engineering the search architecture on solrcloud. For example, what is
the benefit of managing the what cores are indexed and searched? Having to know
those details, in my mind, works against the automation in
you should find some clues from tomcat log
在 2012-5-22 晚上7:49,"Spadez" 写道:
> Hi,
>
> This is the install process I used in my shell script to try and get Tomcat
> running with Solr (debian server):
>
>
>
> I swear this used to work, but currently only Tomcat works. The Solr page
> just comes up wi
I think the key is this: you want to think of a SolrCore on a single node Solr
installation as a collection on a multi node SolrCloud installation.
So if you would use multiple SolrCore's with a std Solr setup, you should be
using multiple collections in SolrCloud. If you were going to try to do
It would help if you provide your use case. What are you indexing for each
user and why would you need a separate core for indexing each user? How do
you decide schema for each user? It might be better to describe your use
case and desired results. People on the list will be able to advice on the
b
Hello Elisabeth,
Wouldn't it be more simple to have a custom component inside of the
front-end to your search server that would transform a query like <> into <<"hotel de ville" paris>> (I.e. turning each
occurence of the sequence "hotel de ville" into a phrase query ) ?
Concerning protections in
Does anyone have the slides or sample code from:
Building Query Auto-Completion Systems with Lucene 4.0
Presented by Sudarshan Gaikaiwari, Software Engineer,Yelp
We want to implement WFST with GEO boosting.
--
Bill Bell
billnb...@gmail.com
cell 720-256-8076
Hi all,
I am facing following issue ...
I have an application which is feeding Solr 3.6 index with document
updates via Solrj 3.6. I use a binary request writer, because of the
issue with XML when sending insert and deletes at once (
https://issues.apache.org/jira/browse/SOLR-1752 )
Now, I have n
Hi all,
I have a field(s) in a schema which I need to be able to specify in a
filter query. The field is not mandatory, therefore it can be empty. I
need to be able to run a query with a filer : " return only docs which
does not have value for the field " ...
What would be the optimal recommended
> I have a field(s) in a schema which I need to be able to
> specify in a
> filter query. The field is not mandatory, therefore it can
> be empty. I
> need to be able to run a query with a filer : " return only
> docs which
> does not have value for the field " ...
>
> What would be the optimal re
Hi All,
I'm quite a new user both to Lucene / Solr. I want to ask if faceted search
can be used to do a grouping for multiple field's value based on similarity
? I have look at the faceted index so far, but from my understanding they
only works on exact single and definite range values.
For examp
>
> 3) Measure the size of the index folder, multiply with 8 to get a clue of
>> total index size
>>
> With 12 000 docs my index folder size is: 33Mo
> ps: I use "solr.clustering.enabled=true"
Clustering is performed at search time, it doesn't affect the size of the
index (but obviously it does a
Hey Emma,
thanks for reporting this, i opened SOLR-3478 and will commit this soon
Stefan
On Monday, May 21, 2012 at 10:47 PM, Emma Bo Liu wrote:
> Hi,
>
> I want to index emails using solr. I put the user name, password, hostname
> in data-config.xml under mail folder. This is a valid email
Hi,
I want to display - a clickable link to the document along if a search
matches along with the no of times the search query matched.
What should i be looking at?
I am fairly new to Solr and don't know how I can achieve this.
Thanks for the help!
--
View this message in context:
http://
That worked!
Thanks!
I did
--
View this message in context:
http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985507.html
Sent from the Solr - User mailing list archive at Nabble.com.
hello all,
can i use the technique described on the wiki at:
http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
if i am populating my core using a DIH?
looking at the posts on this subject and the wiki docs - leads me to believe
that you can only use this when you are using the xml
See http://wiki.apache.org/solr/DataImportHandler#Special_Commands and the
$docBoost pseudo-field name.
James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
-Original Message-
From: geeky2 [mailto:gee...@hotmail.com]
Sent: Tuesday, May 22, 2012 2:12 PM
To: solr-user@lucene.
: That is the default response format. If you would like to change that,
: you could extend the search handler or post process the XML data.
: Another option would be to use the javabin (if your app is java based)
: and build xml the way your app would need.
there is actaully a more straight f
what does your results.xsl look like? or more sepcificly: can you post a
very small example XSL that has this problem?
you mentioned you are using xsl:include and that doesn't seem to work ...
is that a seperate problem, or does removing/adding the xsl:including
fix/cause this problem?
what d
Hi Everyone,
This is what worked in solr 1.4 and did not work in solr 3.6.
Actually solr 3.6 requires all the xsl to be present in conf/xslt directory
All paths leading to xsl should be relative to conf directory.
But before this was not the case.
-->
Thanks,
--Pramila Thakur
__
Hi Alexandre,
Can you please let me know how did you fix this issue. I am also getting this
error when I pass very large query to Solr.
An reply is highly appreciated.
Thanks,
Sai
thanks for the reply,
so to use the $docBoost pseudo-field name, would you do something like below
- and would this technique likely increase my total index time?
...
--
View this message in context:
http://lucene.472066.n3.nabble.com/index-tim
You need to add the $docBoost pseudo-field to the document somehow. A
transformer is one way to do it. You could just add it to a SELECT statement,
which is especially convienent if the boost value somehow is derrived from the
data:
SELECT case when SELL_MORE_FLAG='Y' then 999 ELSE null E
thank you james for the feedback - i appreciate it.
ultimately - i was trying to decide if i was missing the boat by ONLY using
query time boosting, and i should really be using index time boosting.
but after your reply, reading the solr book, and looking at the lucene dox -
it looks like index-t
We're testing a snapshot of Solr4 and I'm looking at some of the responses
from the Luke request handler. Everything looks good so far, with the
exception of the "distinct" attribute which (in Solr3) shows me the
distinct number of terms for a given field.
Given the request below, I'm consistentl
Hi All,
I'm trying to figure out how to index polygons in solr (trunk). I'm using LSP
right now as the solr integration of the new spatial module hasn't completed. I
have searching for a point using a polygon working, but I'm also looking for
searching for a polygon using a point.
I've seen so
Take a look at the clustering component
http://wiki.apache.org/solr/ClusteringComponent
Consider clustering off line and indexing the pre calculated group memberships
I might be wrong but I don't think their is any faceting mileage here.
Depending upon the use case
you might get some use out of
: This is what worked in solr 1.4 and did not work in solr 3.6.
:
: Actually solr 3.6 requires all the xsl to be present in conf/xslt directory
: All paths leading to xsl should be relative to conf directory.
:
: But before this was not the case.
Right ... this was actually a bug (in how all re
Hi,
Thanks for your advice.
It is basically a meta search application. Users can perform a search on N
number of data sources at a time. We broadcast Parallel search to each
selected data sources and write data to solr using custom build API(API and
solr are deployed on separate machine API jo
hellooi have configured solr on tomcat 7 in windows so when i
manually start tomcat server and when i hit the solr it searches very well
in my browser .
and when i write a java class with main method as follows the results are
fetched and shown on console.
public class Code{
public stat
Hello Bernd,
Thanks for your advice.
I have one question: how did you manage to map one word to a multiwords
synonym???
I've tried (in synonyms.txt)
mairie, hotel de ville
mairie, hotel\ de\ ville
mairie => mairie, hotel de ville
mairie => mairie, hotel\ de\ ville
but nothing prevents mairi
Hello Tanguy,
I guess you're right, maybe this shouldn't be done in Solr but inside of
the front-end.
Thanks a lot for your answer.
Elisabeth
2012/5/22 Tanguy Moal
> Hello Elisabeth,
>
> Wouldn't it be more simple to have a custom component inside of the
> front-end to your search server that
65 matches
Mail list logo