System requirements in my case?

2012-05-22 Thread Bruno Mannina
Dear Solr users, My company would like to use solr to index around 80 000 000 documents (xml files with around 5~10ko size each). My program (robot) will connect to this solr with boolean requests. Number of users: around 1000 Number of requests by user and by day: 300 Number of users by day:

Re: Question about sampling

2012-05-22 Thread Lance Norskog
My mistake- I did not research whether the data above is stored a strings. The hashcode has to be stored as strings for this trick to work. On Sun, May 20, 2012 at 8:25 PM, Otis Gospodnetic wrote: > I'd be curious about this, too! > I suspect the answer is: not doable, patches welcome. :) > But I

Re: Indexing & Searching MySQL table with Hindi and English data

2012-05-22 Thread Gora Mohanty
On 22 May 2012 12:07, KP Sanjailal wrote: > Hi, > > Thank you so much for replying. > > The MySQL database server is running on a Fedora Core 12 Machine with Hindi > Language Support enabled.  Details of the database are - ENGINE=3DMyISAM and >  DEFAULT CHARSET=3Dutf8 > > Data is imported using th

fsv=true not returning sort_values for distributed searches

2012-05-22 Thread XJ
We use fsv=true to help debug sortings which works great for non-distributed searches. However, its not working (no sort_values in response) for multi shard queries. Any idea how to get this fixed? thanks, XJ

Strategy for maintaining De-normalized indexes

2012-05-22 Thread Sohail Aboobaker
Hi, I have a very basic question and hopefully there is a simple answer to this. We are trying to index a simple product catalog which has a master product and child products. Each master product can have multiple child products. A master product can be assigned one or more product categories. Now

RE: trunk cloud ui not working

2012-05-22 Thread Phil Hoy
Hi, I was using windows 7 but it is fine with chrome on Windows Web Server 2008 R2 also I asked a colleague with windows 7 and it is fine for him too, so really sorry but I think it was a !'works on my machine' thing. Of course if I track down the cause I will reply to this email again. Than

Re: How can i search site name

2012-05-22 Thread Jan Høydahl
You need to explain your case in much more detail to get precise help. Please read http://wiki.apache.org/solr/UsingMailingLists If your problem is that you have a URL and want to know the domain for it, e.g. www.company.com/foo/bar/index.html and you want only www.company.com you can use the U

Re: System requirements in my case?

2012-05-22 Thread findbestopensource
Dedicated Server may not be required. If you want to cut down cost, then prefer shared server. How much the RAM? Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:36 PM, Bruno Mannina wrote: > Dear Solr users, > > My company would like to use solr to index around 80 000 000

Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread Tanguy Moal
Hello, Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data) be a combination of the master product id and the child product id ? Therefor whenever you update your master product db entry, you simply need to reindex documents depending on the master product entry. You can ev

Re: System requirements in my case?

2012-05-22 Thread lboutros
Hi Bruno, will you use facets and result sorting ? What is the update frequency/volume ? This could impact the amount of memory/server count. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/System-requirements-in-my-case-tp3985309p3985327.html

Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread findbestopensource
Thats how de-normalization works. You need to update all child products. If you just need the count and you are using facets then maintain a map between category and main product, main product and child product. Lucene db has no schema. You could retrieve the data based on its type. Category reco

Multicore Solr

2012-05-22 Thread Shanu Jha
Hi all, greetings from my end. This is my first post on this mailing list. I have few questions on multicore solr. For background we want to create a core for each user logged in to our application. In that case it may be 50, 100, 1000, N-numbers. Each core will be used to write and search index i

Re: System requirements in my case?

2012-05-22 Thread Bruno Mannina
My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml 24 Go DDR3 Le 22/05/2012 10:26, findbestopensource a écrit : Dedicated Server may not be required. If you want to cut down cost, then prefer shared server. How much the RAM? Regards Aditya www.findbestopensource.com On Tue, May

Re: System requirements in my case?

2012-05-22 Thread Bruno Mannina
Hi, facets I don't know yet because I don't know exactly what is facets (sorry) Sorting: yes Scoring: yes Concerning update Frequency : every week Volume: around 1Go data by year Merci beaucoup :) Aix En Provence France Le 22/05/2012 10:35, lboutros a écrit : Hi Bruno, will you use facets

Re: Multicore Solr

2012-05-22 Thread findbestopensource
Having cores per user is not good idea. The count is too high. Keep everything in single core. You could filter the data based on user name or user id. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 2:29 PM, Shanu Jha wrote: > Hi all, > > greetings from my end. This is my f

Re: System requirements in my case?

2012-05-22 Thread findbestopensource
Seems to be fine. Go head. Before hosting, Have you tried / tested your application in local setup. RAM usage is what matters in terms of Solr. Just benchmark your app for 100 000 documents, Log the memory used. Calculate the RAM reqd for 80 000 000 documents. Regards Aditya www.findbestopensourc

Re: is commit a sequential process in solr indexing

2012-05-22 Thread findbestopensource
Yes. Lucene / Solr supports multi threaded environment. You could do commit from two different threads to same core or different core. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:35 AM, jame vaalet wrote: > hi, > my use case here is to search all the incoming documents

Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread Sohail Aboobaker
Thank you for quick replies. Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data) be a combination of the master product id and the child product id ? -- We do not need it as each child is already a unique key. Therefore whenever you update your master product db entry, yo

Re: How can i search site name

2012-05-22 Thread Jan Høydahl
Hi, I would probably use (e)DisMax. Index your url and metadata fields as text without stemming, e.g. text_general Then query as &q=mycompany&defType=edismax&qf=title^10 content^1 url^5 If you like to give higher weight to the domain/site part of the URL, apply UrlClassifyProcessor and search the

Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread Tanguy Moal
It all depends on the frequency at which you refresh your data, on your deployment (master/slave setup), ... Many things need to be taken into account! Did you face any performance issue while building your index? If you didn't, rebuilding it shouldn't be more problematic. -- Tanguy 2012/5/22 So

Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread Sohail Aboobaker
We are still in design phase, so we haven't hit any performance issues. We do not want to discover performance issues too late during QA :) We would rather account for any issues during the design phase. The refresh rate on fields that we are using from master table will be rare. May be three or f

Re: How can i search site name

2012-05-22 Thread Shameema Umer
Thanks Jan.* It worked perfect*. Thats all i needed. May the God bless you. Regards Shameema On Tue, May 22, 2012 at 4:57 PM, Jan Høydahl wrote: > Hi, > > I would probably use (e)DisMax. > Index your url and metadata fields as text without stemming, e.g. > text_general > Then query as &q=mycomp

Installing Solr on Tomcat using Shell - Code wrong?

2012-05-22 Thread Spadez
Hi, This is the install process I used in my shell script to try and get Tomcat running with Solr (debian server): I swear this used to work, but currently only Tomcat works. The Solr page just comes up with "The requested resource (/solr/admin) is not available." Can anyone give me some insig

Re: System requirements in my case?

2012-05-22 Thread Jan Høydahl
Hi, It is impossible to guess the required HW size without more knowledge about data and usage. 80 mill docs is a fair amount. Here's how I would approach sizing the setup: 1) Get your schema in shape, removing unnecessary stored/indexed fields 2) To a test index locally of a part of the dataset

Re: Multicore Solr

2012-05-22 Thread Shanu Jha
Hi, Could please tell me what do you mean by filter data by users? I would like to know is there real problem creating a core for a user. ie. resource utilization, cpu usage etc. AJ On Tue, May 22, 2012 at 4:39 PM, findbestopensource < findbestopensou...@gmail.com> wrote: > Having cores per use

Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-22 Thread Yandong Yao
Hi Darren, Thanks very much for your reply. The reason I want to control core indexing/searching is that I want to use one core to store one customer's data (all customer share same config): such as customer 1 use coreForCustomer1 and customer 2 use coreForCustomer2. Is there any better way tha

Re: System requirements in my case?

2012-05-22 Thread Bruno Mannina
I installed a temp server on my university with 12 000 docs (Ubuntu+solr 3.6.0) May be I can preview the size of memory I need? Q: How can I check the memory used? Le 22/05/2012 13:14, findbestopensource a écrit : Seems to be fine. Go head. Before hosting, Have you tried / tested your applic

RE: Wildcard-Search Solr 3.5.0

2012-05-22 Thread spring
> > The text may contain "FooBar". > > > > When I do a wildcard search like this: "Foo*" - no hits. > > When I do a wildcard search like this: "foo*" - doc is > > found. > > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german

Re: System requirements in my case?

2012-05-22 Thread Bruno Mannina
Hi Jan, Thanks for all these details ! Answers are below. Sincerely, Bruno Le 22/05/2012 13:58, Jan Høydahl a écrit : Hi, It is impossible to guess the required HW size without more knowledge about data and usage. 80 mill docs is a fair amount. Here's how I would approach sizing the setup

Re: Newbie with Carrot2?

2012-05-22 Thread Stanislaw Osinski
Hi Bruno, Just to confirm -- are you seeing the clusters array in the result at all ()? To get reasonable clusters, you should request at least 30-50 documents (rows), but even with smaller values, you should see an empty clusters array. Staszek On Sun, May 20, 2012 at 9:20 PM, Bruno Mannina wr

Re: Question about sampling

2012-05-22 Thread rita
Hi Lance, Could you provide more details about implementing this using SignatureUpdateProcessor? Example can be helpful. - Rita -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-sampling-tp3984103p3985379.html Sent from the Solr - User mailing list archive

Multicore solr

2012-05-22 Thread Shanu Jha
Hi all, greetings from my end. This is my first post on this mailing list. I have few questions on multicore solr. For background we want to create a core for each user logged in to our application. In that case it may be 50, 100, 1000, N-numbers. Each core will be used to write and search index i

Re: Newbie with Carrot2?

2012-05-22 Thread Bruno Mannina
Arfff Clusters are at the end of my XML answer .. .. ok all work fine now ! Le 22/05/2012 15:33, Stanislaw Osinski a écrit : Hi Bruno, Just to confirm -- are you seeing the clusters array in the result at all ()? To get reasonable clusters, you should request at least 30-

Uncatchable Exception on solrj3.6.0

2012-05-22 Thread Jamel ESSOUSSI
Hi, I use solr-solrj 3.6.0 and solr-core 3.6.0: I have reimplemented the handleError of the ConcurrentUpdateSolrServer class: final ConcurrentUpdateSolrServer newSolrServer = new ConcurrentUpdateSolrServer(url, client, 100, 10){ @Override public void handleError(Throwable ex) {

RE: Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-22 Thread Darren Govoni
I'm curious what the solrcloud experts say, but my suggestion is to try not to over-engineering the search architecture on solrcloud. For example, what is the benefit of managing the what cores are indexed and searched? Having to know those details, in my mind, works against the automation in

Re: Installing Solr on Tomcat using Shell - Code wrong?

2012-05-22 Thread Li Li
you should find some clues from tomcat log 在 2012-5-22 晚上7:49,"Spadez" 写道: > Hi, > > This is the install process I used in my shell script to try and get Tomcat > running with Solr (debian server): > > > > I swear this used to work, but currently only Tomcat works. The Solr page > just comes up wi

Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-22 Thread Mark Miller
I think the key is this: you want to think of a SolrCore on a single node Solr installation as a collection on a multi node SolrCloud installation. So if you would use multiple SolrCore's with a std Solr setup, you should be using multiple collections in SolrCloud. If you were going to try to do

Re: Multicore solr

2012-05-22 Thread Sohail Aboobaker
It would help if you provide your use case. What are you indexing for each user and why would you need a separate core for indexing each user? How do you decide schema for each user? It might be better to describe your use case and desired results. People on the list will be able to advice on the b

Re: solr tokenizer not splitting unbreakable expressions

2012-05-22 Thread Tanguy Moal
Hello Elisabeth, Wouldn't it be more simple to have a custom component inside of the front-end to your search server that would transform a query like <> into <<"hotel de ville" paris>> (I.e. turning each occurence of the sequence "hotel de ville" into a phrase query ) ? Concerning protections in

WFST with autosuggest/geo

2012-05-22 Thread William Bell
Does anyone have the slides or sample code from: Building Query Auto-Completion Systems with Lucene 4.0 Presented by Sudarshan Gaikaiwari, Software Engineer,Yelp We want to implement WFST with GEO boosting. -- Bill Bell billnb...@gmail.com cell 720-256-8076

Binary updates handler does not propagate failures?

2012-05-22 Thread Jozef Vilcek
Hi all, I am facing following issue ... I have an application which is feeding Solr 3.6 index with document updates via Solrj 3.6. I use a binary request writer, because of the issue with XML when sending insert and deletes at once ( https://issues.apache.org/jira/browse/SOLR-1752 ) Now, I have n

How to handle filter query against empty fields

2012-05-22 Thread Jozef Vilcek
Hi all, I have a field(s) in a schema which I need to be able to specify in a filter query. The field is not mandatory, therefore it can be empty. I need to be able to run a query with a filer : " return only docs which does not have value for the field " ... What would be the optimal recommended

Re: How to handle filter query against empty fields

2012-05-22 Thread Ahmet Arslan
> I have a field(s) in a schema which I need to be able to > specify in a > filter query. The field is not mandatory, therefore it can > be empty. I > need to be able to run a query with a filer : " return only > docs which > does not have value for the field " ... > > What would be the optimal re

Faceted on Similarity ?

2012-05-22 Thread Robby
Hi All, I'm quite a new user both to Lucene / Solr. I want to ask if faceted search can be used to do a grouping for multiple field's value based on similarity ? I have look at the faceted index so far, but from my understanding they only works on exact single and definite range values. For examp

Re: System requirements in my case?

2012-05-22 Thread Stanislaw Osinski
> > 3) Measure the size of the index folder, multiply with 8 to get a clue of >> total index size >> > With 12 000 docs my index folder size is: 33Mo > ps: I use "solr.clustering.enabled=true" Clustering is performed at search time, it doesn't affect the size of the index (but obviously it does a

Re: Solr mail dataimporter cannot be found

2012-05-22 Thread Stefan Matheis
Hey Emma, thanks for reporting this, i opened SOLR-3478 and will commit this soon Stefan On Monday, May 21, 2012 at 10:47 PM, Emma Bo Liu wrote: > Hi, > > I want to index emails using solr. I put the user name, password, hostname > in data-config.xml under mail folder. This is a valid email

clickable links as results?

2012-05-22 Thread 12rad
Hi, I want to display - a clickable link to the document along if a search matches along with the no of times the search query matched. What should i be looking at? I am fairly new to Solr and don't know how I can achieve this. Thanks for the help! -- View this message in context: http://

Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-22 Thread 12rad
That worked! Thanks! I did -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985507.html Sent from the Solr - User mailing list archive at Nabble.com.

index-time boosting using DIH

2012-05-22 Thread geeky2
hello all, can i use the technique described on the wiki at: http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts if i am populating my core using a DIH? looking at the posts on this subject and the wiki docs - leads me to believe that you can only use this when you are using the xml

RE: index-time boosting using DIH

2012-05-22 Thread Dyer, James
See http://wiki.apache.org/solr/DataImportHandler#Special_Commands and the $docBoost pseudo-field name. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com] Sent: Tuesday, May 22, 2012 2:12 PM To: solr-user@lucene.

Re: Highlight feature

2012-05-22 Thread Chris Hostetter
: That is the default response format. If you would like to change that, : you could extend the search handler or post process the XML data. : Another option would be to use the javabin (if your app is java based) : and build xml the way your app would need. there is actaully a more straight f

Re: Solr 3.6 fails when using XSLT

2012-05-22 Thread Chris Hostetter
what does your results.xsl look like? or more sepcificly: can you post a very small example XSL that has this problem? you mentioned you are using xsl:include and that doesn't seem to work ... is that a seperate problem, or does removing/adding the xsl:including fix/cause this problem? what d

RE: Solr 3.6 fails when using XSLT

2012-05-22 Thread pramila_tha...@ontla.ola.org
Hi Everyone, This is what worked in solr 1.4 and did not work in solr 3.6. Actually solr 3.6 requires all the xsl to be present in conf/xslt directory All paths leading to xsl should be relative to conf directory. But before this was not the case. --> Thanks, --Pramila Thakur __

Re: Jetty rerturning HTTP error code 413

2012-05-22 Thread Sai
Hi Alexandre, Can you please let me know how did you fix this issue. I am also getting this error when I pass very large query to Solr. An reply is highly appreciated. Thanks, Sai

RE: index-time boosting using DIH

2012-05-22 Thread geeky2
thanks for the reply, so to use the $docBoost pseudo-field name, would you do something like below - and would this technique likely increase my total index time? ... -- View this message in context: http://lucene.472066.n3.nabble.com/index-tim

RE: index-time boosting using DIH

2012-05-22 Thread Dyer, James
You need to add the $docBoost pseudo-field to the document somehow. A transformer is one way to do it. You could just add it to a SELECT statement, which is especially convienent if the boost value somehow is derrived from the data: SELECT case when SELL_MORE_FLAG='Y' then 999 ELSE null E

RE: index-time boosting using DIH

2012-05-22 Thread geeky2
thank you james for the feedback - i appreciate it. ultimately - i was trying to decide if i was missing the boat by ONLY using query time boosting, and i should really be using index time boosting. but after your reply, reading the solr book, and looking at the lucene dox - it looks like index-t

always getting distinct count of -1 in luke response (solr4 snapshot)

2012-05-22 Thread Mike Hugo
We're testing a snapshot of Solr4 and I'm looking at some of the responses from the Luke request handler. Everything looks good so far, with the exception of the "distinct" attribute which (in Solr3) shows me the distinct number of terms for a given field. Given the request below, I'm consistentl

Indexing Polygons

2012-05-22 Thread Young, Cody
Hi All, I'm trying to figure out how to index polygons in solr (trunk). I'm using LSP right now as the solr integration of the new spatial module hasn't completed. I have searching for a point using a polygon working, but I'm also looking for searching for a polygon using a point. I've seen so

Re: Faceted on Similarity ?

2012-05-22 Thread Lee Carroll
Take a look at the clustering component http://wiki.apache.org/solr/ClusteringComponent Consider clustering off line and indexing the pre calculated group memberships I might be wrong but I don't think their is any faceting mileage here. Depending upon the use case you might get some use out of

RE: Solr 3.6 fails when using XSLT

2012-05-22 Thread Chris Hostetter
: This is what worked in solr 1.4 and did not work in solr 3.6. : : Actually solr 3.6 requires all the xsl to be present in conf/xslt directory : All paths leading to xsl should be relative to conf directory. : : But before this was not the case. Right ... this was actually a bug (in how all re

Re: Multicore solr

2012-05-22 Thread Amit Jha
Hi, Thanks for your advice. It is basically a meta search application. Users can perform a search on N number of data sources at a time. We broadcast Parallel search to each selected data sources and write data to solr using custom build API(API and solr are deployed on separate machine API jo

apache query

2012-05-22 Thread ketan kore
hellooi have configured solr on tomcat 7 in windows so when i manually start tomcat server and when i hit the solr it searches very well in my browser . and when i write a java class with main method as follows the results are fetched and shown on console. public class Code{ public stat

Re: Multi-words synonyms matching

2012-05-22 Thread elisabeth benoit
Hello Bernd, Thanks for your advice. I have one question: how did you manage to map one word to a multiwords synonym??? I've tried (in synonyms.txt) mairie, hotel de ville mairie, hotel\ de\ ville mairie => mairie, hotel de ville mairie => mairie, hotel\ de\ ville but nothing prevents mairi

Re: solr tokenizer not splitting unbreakable expressions

2012-05-22 Thread elisabeth benoit
Hello Tanguy, I guess you're right, maybe this shouldn't be done in Solr but inside of the front-end. Thanks a lot for your answer. Elisabeth 2012/5/22 Tanguy Moal > Hello Elisabeth, > > Wouldn't it be more simple to have a custom component inside of the > front-end to your search server that