Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun
Hi , Please comment on whether I should consider to move to the old Logbytesize MP on moving to 3.6.1 from 1.3 ,as I see improvements in query performance on optimization. Just to mention we have a lot of indexes in multi cores as well as multiple webapps and that's the reason we went for CFS in

Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun
Any comments on this? On Mon, Sep 24, 2012 at 10:28 PM, Sujatha Arun wrote: > Thanks Jack. > > so Qtime = Sum of all prepare components + sum of all process components - > Debug comp process/prepare time > > In 3.6.1 the process part of Query component for the following query seems > to take

UIMA for lemmatization

2012-09-24 Thread abhayd
hi I m new to UIMA. Solr doea not have lemmatization component, i was thinking of using UIMA for this. Is this a correct choice and if so how i would go about it any idea? I see couple of links for solr uima integration but dont know how that can be used for lemmatization Any thoughts? --

Re: Solr Swap Function doesn't work when using Solr Cloud Beta

2012-09-24 Thread sam fang
Hi Mark, If can support in future, I think it's great. It's a really useful feature. For example, user can use to refresh with totally new core. User can build index on one core. After build done, can swap old core and new core. Then get totally new core for search. Also can used in the backup. I

How can I create about 100000 independent indexes in Solr?

2012-09-24 Thread 韦震宇
Dear all, The company I'm working in have a website to server more than 10 customers, and every customer should have it's own search cataegory. So I should create independent index for every customer. The site http://wiki.apache.org/solr/MultipleIndexes give some solution to create m

Re: How to more gracefully handle field format exceptions?

2012-09-24 Thread Aaron Daubman
Hi Otis, I was just looking at how to implement that, but was hoping for a cleaner method - it seems like I will have to actually parse the error as text to find the field that caused it, then remove/mangle that field and attempt re-adding the document - which seems less than ideal. I would think

Re: How to more gracefully handle field format exceptions?

2012-09-24 Thread Otis Gospodnetic
Hi Aaron, You could catch the error on the client, fix/clean/remove, and retry, no? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Mon, Sep 24, 2012 at 9:21 PM, Aaron Daubman wrote: > Greetings, > > Is t

How to more gracefully handle field format exceptions?

2012-09-24 Thread Aaron Daubman
Greetings, Is there a way to configure more graceful handling of field formatting exceptions when indexing documents? Currently, there is a field being generated in some documents that I am indexing that is supposed to be a float but some times slips through as an empty string. (I know, fix the d

Re: SolrJ - IOException

2012-09-24 Thread roz dev
I have seen this happening We retry and that works. Is your solr server stalled? On Mon, Sep 24, 2012 at 4:50 PM, balaji.gandhi wrote: > Hi, > > I am encountering this error randomly (under load) when posting to Solr > using SolrJ. > > Has anyone encountered a similar error? > > org.apache.solr.

Re: memory leak in pdfbox--SolrCel needs to call COSName.clearResources?

2012-09-24 Thread Chris Hostetter
: We've been struggling with solr hangs in the solr process that indexes : incoming PDF documents. TLDR; summary is that I'm thinking that : PDFBox needs to have COSName.clearResources() called on it if the solr : indexer expects to be able to keep running indefinitely. Is that I don't know muc

Admin-UI: multiple facet

2012-09-24 Thread Alexandre Rafalovitch
Hello, Is there a way to provide multiple facet field names in the Admin UI? I have tried spaces, comas and simi-colons for no effect. Would have been nice to be able to push the UI just a tiny bit further before switching to the URL query string directly. Or is single facet field a limitation of

Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Shawn Heisey
On 9/24/2012 11:37 AM, Daisy wrote: One thing I would like to know what is the diffrence between PatternReplaceFilter and PatternReplaceCharFilter? The CharFilter version gets applied before anything else, including the Tokenizer. The Filter version gets applied in the order specified in the

Re: Solr Cell Questions

2012-09-24 Thread Erick Erickson
If you're concerned about throughput, consider moving all the SolrCell (Tika) processing off the server. SolrCell is way cool for showing what can be done, but its downside is you're moving all the processing of the structured documents to the same machine doing the indexing. Pretty soon, especiall

Re: Range operator problems in Chef ( automating framework)

2012-09-24 Thread Erick Erickson
Be a little careful, spaces here can mess you up. Particularly around the hyphen in -1hour. I.e. NOW -1HOUR is invalid but NOW-1HOUR is ok (note the space between W and -). There aren't any in your example, but just to be sure One other note: you may get better performance out of making this

Re: /solr/dataimport not found

2012-09-24 Thread Chris Hostetter
: database. I've got the admin page up, but I can't get : localhpst:8080/solr/dataimport/ to work. It returns a 404 errror. 1) which version of solr are you using? 2) did you try localhost:8080/solr/dataimport (no trailing slash) ? 3) does anything in the admin UI work? -Hoss

Re: /solr/dataimport not found

2012-09-24 Thread Michael Della Bitta
Hello, John, Assuming this is a single core instance of Solr, does "/solr/admin/dataimport.jsp" work? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor | New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon,

/solr/dataimport not found

2012-09-24 Thread johnohod
I've been trying to set up Solr with Tomcat, in order to connect to a MySQL database. I've got the admin page up, but I can't get localhpst:8080/solr/dataimport/ to work. It returns a 404 errror. Been googleing high and low, without finding the answer. I've put this in my solrconfig.xml

CQL instead of SQL in Solr data-config

2012-09-24 Thread PeterKerk
Please see this post here: http://stackoverflow.com/questions/12324837/apache-cassandra-integration-with-apache-solr/12326329#comment16936430_12326329 Does anyone have experience with or know if it's possible with the Solr data-config combined with Cassandra JDBC drivers (http://code.google.com/a/

Re: solrcloud and csv import hangs

2012-09-24 Thread Yonik Seeley
https://issues.apache.org/jira/browse/SOLR-3883 -Yonik http://lucidworks.com On Mon, Sep 24, 2012 at 11:42 AM, Yonik Seeley wrote: > On Mon, Sep 24, 2012 at 11:03 AM, dan sutton wrote: >> Hi, >> >> This appears to happen in trunk too. >> >> It appears that the add command request parameters ge

Solved: Re: omit tf using per-field CustomSimilarity?

2012-09-24 Thread Carrie Coy
My problem was that I specified the per-field similarity class INSIDE the analyzer instead of outside it. On 09/24/2012 02:56 PM, Carrie Coy wrote: I'm trying to configure per-field similarity to disregard term frequency (omitTf) in a 'title' field. I'm trying to follow the example docs

Persisting dataimport.properties in ZooKeeper directory

2012-09-24 Thread balaji.gandhi
Hi, We are working on a DIH for our project and we are persisting the last_modified_date in the ZooKeeper directory. Our understanding is that the properties are uploaded to ZooKeeper when the first SOLR node comes up. When the SOLR nodes are restarted whatever is persisted in the properties is lo

memory leak in pdfbox--SolrCel needs to call COSName.clearResources?

2012-09-24 Thread Kevin Goess
We've been struggling with solr hangs in the solr process that indexes incoming PDF documents. TLDR; summary is that I'm thinking that PDFBox needs to have COSName.clearResources() called on it if the solr indexer expects to be able to keep running indefinitely. Is that likely? Is there anybody

omit tf using per-field CustomSimilarity?

2012-09-24 Thread Carrie Coy
I'm trying to configure per-field similarity to disregard term frequency (omitTf) in a 'title' field. I'm trying to follow the example docs without success: my custom similarity doesn't seem to have any effect on 'tf'. Is the NoTfSimilarity function below written correctly? Any advice is

Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
Using "solr.LengthFilterFactory" was great and also solve the problem of using PatternReplaceFilter. So now I have two solutions. Thanks all for helping me. One thing I would like to know what is the diffrence between PatternReplaceFilter and PatternReplaceCharFilter? -- View this message in con

Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Walter Underwood
I've had problems with empty tokens. You can remove those with this as a step in the analyzer chain. wunder On Sep 24, 2012, at 10:07 AM, Jack Krupansky wrote: > I tried it and PRFF is indeed generating an empty token. I don't know how > Lucene will index or query an empty term. I me

Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Jonathan Rochkind
When I do things like this and want to avoid empty tokens even though previous analysis might result in some--I just throw one of these at the end of my analysis chain: A charfilter to filter raw characters can certainly still result in an empty token, if an initial token wa

Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
Thanks. Finally it works using I wonder what is the reason for that, and what is the difference between the filter and the charFilter? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009918.html Sent from the Solr - User

Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
How could I know which query parser I am using? Here is the part of my schema that I am using As shown even if I tried to remove "(" the same happened for parsed query and for numFound. -- View this message in context: htt

Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Jack Krupansky
I tried it and PRFF is indeed generating an empty token. I don't know how Lucene will index or query an empty term. I mean, what it "should" do. In any case, it is best to avoid them. You should be using a "charFilter" to simply filter raw characters before tokenizing. So, try: It has the

Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun
Thanks Jack. so Qtime = Sum of all prepare components + sum of all process components - Debug comp process/prepare time In 3.6.1 the process part of Query component for the following query seems to take 8 times more time? anything missing? For most queries the process part of the Querycomponent

Re: Problem indexing CSV files using post.jar with multivalue fields

2012-09-24 Thread rudywilkjr
Never fails. Take the time to post this message, only to discover the answer on my own a few minutes later. The solution is to surround the -Durl value in double quotes. For example: java -Durl="http://localhost:8983/solr/contacts/update/csv?f.address.split=true&f.address.separator=%7C"; -Dtype

Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Jack Krupansky
1. Which query parser are you using? 2. I see the following comment in the Java 6 doc for regex "\p{Punct}": "POSIX character classes (US-ASCII only)", so if any of the punctuation is some higher Unicode character code, it won't be matched/removed. 3. It seems very odd that the parsed query has

Re: need best solution for indexing and searching multiple, related database tables

2012-09-24 Thread Jack Krupansky
Could supply some sample user queries and some sample data the queries should match? In other words, how do your users expect to "view" the data? If you are simply trying to replicate full SQL queries in Solr, you're probably going to be disappointed, but if you look at what queries your users

At a high level how does faceting in SolrCloud work?

2012-09-24 Thread Jamie Johnson
I'd like to wrap my head around how faceting in SolrCloud works, does Solr ask each shard for their maximum value and then use that to determine what else should be asked for from other shards, or does it ask for all values and do the aggregation on the requesting server?

Re: solrcloud and csv import hangs

2012-09-24 Thread Yonik Seeley
On Mon, Sep 24, 2012 at 11:03 AM, dan sutton wrote: > Hi, > > This appears to happen in trunk too. > > It appears that the add command request parameters get sent to the > nodes. If I comment these out like so for add and commit: > > core/src/java/org/apache/solr/update/processor/DistributedUpdate

Re: need best solution for indexing and searching multiple, related database tables

2012-09-24 Thread jimtronic
I'm not sure if this will be relevant for you, but this is roughly what I do. Apologies if it's too basic. I have a complex view that normalizes all the data that I need to be together -- from over a dozen different tables. For one to many and many to many relationships, I have sql turn the data

solrcloud and csv import hangs

2012-09-24 Thread dan sutton
Hi, This appears to happen in trunk too. It appears that the add command request parameters get sent to the nodes. If I comment these out like so for add and commit: core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java - params = new ModifiableSolrParams(req.getPa

Re: Items disappearing from Solr index

2012-09-24 Thread Kissue Kissue
Hi Erick, Thanks for your reply. Yes i am using delete by query. I am currently logging the number of items to be deleted before handing off to solr. And from solr logs i can it deleted exactly that number. I will verify further. Thanks. On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson wrote: >

Re: Solrcloud not reachable and after restart just a "no servers hosting shard"

2012-09-24 Thread Mark Miller
Right - we need logs, admin->cloud dump to clipboard info, anything else to go on. On Mon, Sep 24, 2012 at 4:36 AM, Sami Siren wrote: > hi, > > Can you share a little bit more about your configuration: how many > shards, # of replicas, how does your clusterstate.json look like, > anything suspici

Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Jack Krupansky
Run a query on both old and new with &debugQuery=true on your query request and look at the component timings for possible insight. -- Jack Krupansky From: Sujatha Arun Sent: Monday, September 24, 2012 7:26 AM To: solr-user@lucene.apache.org Subject: Performance Degradation on Migrating from 1

Solr Cell Questions

2012-09-24 Thread Johannes . Schwendinger
Hi, Im currently experimenting with Solr Cell to index files to Solr. During this some questions came up. 1. Is it possible (and wise) to connect to Solr Cell with multiple Threads at the same time to index several documents at the same time? This question came up because my prrogramm takes abo

Re: Understanding autoSoftCommit

2012-09-24 Thread Mark Miller
autoCommit (hard commit) is basically just to reduce how much RAM is needed for the transaction log. You should generally use it with openSearcher=false and don't need to use it for visibility. It's also not required for durability due to the transaction log. Soft commit should be used for visibi

Re: Qtime Vs DebugComponent Timing

2012-09-24 Thread Jack Krupansky
And QTime doesn't include the time spent in the container (e.g., Tomcat or Jetty) or network latency. Usually a query benchmark would be from the time the client sent the query request until the time the client received the query results. The debug timing will help you understand which Solr co

Re: solrcloud without realtime

2012-09-24 Thread Mark Miller
On Mon, Sep 24, 2012 at 9:21 AM, Radim Kolar wrote: > and what about solr.NRTCachingDirectoryFactory? Is solr.MMapDirectoryFactory > faster if there is no NRT search requirements? NRTCachingDirectoryFactory is a wrapping directory - it's generally going to use solr.MMapDirectoryFactory as it's d

Re: Range operator problems in Chef ( automating framework)

2012-09-24 Thread Jack Krupansky
That looks like a valid Solr date math expression, but you need to make sure that the field type is actually a Solr "DateField" as opposed to simply an integer Unix time value. -- Jack Krupansky -Original Message- From: Christian Bordis Sent: Monday, September 24, 2012 7:16 AM To: so

Re: Return only matched multiValued field

2012-09-24 Thread Dotan Cohen
On Mon, Sep 24, 2012 at 9:47 AM, Mikhail Khludnev wrote: > Hi > It seems like highlighting feature. Thank you Mikhail. I actually do need the entire matched single entry, not a snippet of it. Looking at the example in the OP, with highlighting on "gold" I would get glitters is gold Whereas I ne

RE: Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
I tried & and it solved the 500 error code. But still it could find punctuation marks. Although the parsed query didnt contain the punctuation mark, "{" "{" text: text: but still the numfound gives 1 and the highlight shows the result of punctuation mark { The steps I did: 1- editing the sc

Re: solrcloud without realtime

2012-09-24 Thread Radim Kolar
Dne 24.9.2012 14:05, Erick Erickson napsal(a): I'm pretty sure all you need to do is disable autoSoftCommit. Or rather don't un-comment it from solrconfig.xml and what about solr.NRTCachingDirectoryFactory? Is solr.MMapDirectoryFactory faster if there is no NRT search requirements?

Re: Return only matched multiValued field

2012-09-24 Thread Dotan Cohen
On Mon, Sep 24, 2012 at 2:16 PM, Erick Erickson wrote: > Hmmm, works for me. What is your entire response packet? > > And you've covered the bases with indexed and stored so this > seems like it _should_ work. > I'm sorry, reducing the output to rows=1 helped me notice that the highlighted sectio

RE: Solr - Remove specific punctuation marks

2012-09-24 Thread Markus Jelsma
-Original message- > From:Daisy > Sent: Mon 24-Sep-2012 15:09 > To: solr-user@lucene.apache.org > Subject: RE: Solr - Remove specific punctuation marks > > Yes I am trying to index Arabic document. There is a problem that the && > regex couldn't be understood in the solr schema and

RE: Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
Yes I am trying to index Arabic document. There is a problem that the && regex couldn't be understood in the solr schema and it gives 500 - code error. Here is an example: input: هذا مثال: للتوضيح (مثال علي علامات الترقيم) انتهي. I tried also the regex: pattern="([\(\)\}\{\,[^.:\s+\S+]])" but I

RE: Solr - Remove specific punctuation marks

2012-09-24 Thread Steven A Rowe
Hi Daisy, I can't see anything wrong with the regex or the XML syntax. One possibility: if it's Arabic you're matching against, you may want to add ARABIC FULL STOP U+06D4 to the set you subtract from \p{Punct}. If you give an example of your input and your expected output, I might be able to

Splitting up a location to make it searchable

2012-09-24 Thread Spadez
I am using Google for location input. *It often splits out something like this:* Shorewood, Seattle, Wa *Since I am using this index analyzer:* It means that if I search for "Sho" or "Shorew" I get the result I want. However, if I search for “Sea” or “Seatt” I get no results. I guess I need

Solr is not Indexing after Mysql Upgradation

2012-09-24 Thread Rahul Paul
Indexing is not happening after 'x' documents. I am using Bitnami and had upgraded Mysql server from Mysql 5.1.* to Mysql 5.5.* version. After up gradation when I ran indexing on solr, it not get indexed. I am using a procedure in which i am finding the parent of a child and inserting it in a

Re: Items disappearing from Solr index

2012-09-24 Thread Erick Erickson
How do you delete items? By ID or by query? My guess is that one of two things is happening: 1> your delete process is deleting too much data. 2> your index process isn't indexing what you think. I'd add some logging to the SolrJ program to see what it thinks is has deleted or added to the index

Re: Return only matched multiValued field

2012-09-24 Thread Erick Erickson
Hmmm, works for me. What is your entire response packet? And you've covered the bases with indexed and stored so this seems like it _should_ work. Best Erick On Mon, Sep 24, 2012 at 6:12 AM, Dotan Cohen wrote: >> > indexed="true" >> multiValued="true" /> >> >> d

Re: solrcloud without realtime

2012-09-24 Thread Erick Erickson
I'm pretty sure all you need to do is disable autoSoftCommit. Or rather don't un-comment it from solrconfig.xml Best Erick On Mon, Sep 24, 2012 at 5:44 AM, Radim Kolar wrote: > its possible to use solrcloud but without real-time features? In my > application I do not need realtime features a

Re: Help with new Join Functionallity in Solr 4.0

2012-09-24 Thread Erick Erickson
NP, good luck! On Sun, Sep 23, 2012 at 3:41 PM, wrote: > Hello Erick, > > Thanks a lot for your reply! Your suggestion is actually exactly the > alternative solution we are thinking about and with your clarification on > Solr's performance we are going to go for it! Many thanks again! > > Mile

Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun
Hi, On migrating from 1.3 to 3.6.1 , I see the query performance degrading by nearly 2 times for all types of query. Indexing performance slight degradation over 1.3 For Indexing we use our custom scripts that post xml over HTTP. Any thing that I might have missed . I am thinking that this migh

Range operator problems in Chef ( automating framework)

2012-09-24 Thread Christian Bordis
Hi Everyone! We doing some nice stuff with Chef (http://wiki.opscode.com/display/chef/Home). It uses solr for search but range queries don't work as expected. Maybe chef, solr just buggy or I am doing it wrong ;-) In chef I have bunch of nodes witch timestamp attribute. Now want search nodes

Understanding autoSoftCommit

2012-09-24 Thread Trym R. Møller
Hi On my windows workstation I have tried to index a document into a SolrCloud instance with the following "special" configuration: 120 60 ... ${solr.data.dir:} That is commit every 20 minutes and soft commit every 10 minutes. Rig

Items disappearing from Solr index

2012-09-24 Thread Kissue Kissue
Hi, I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to index and delete items from solr. I basically index items from the db into solr every night. Existing items can be marked for deletion in the db and a delete request sent to solr to delete such items. My process runs a

Re: Return only matched multiValued field

2012-09-24 Thread Dotan Cohen
> indexed="true" > multiValued="true" /> > > doctest Note that in anonymizing the information, I introduced a typo. The above "doctest" should be "doctext". In any case, the field names in the production application and in production schema do in fact match! --

RE: Nodes cannot recover and become unavailable

2012-09-24 Thread Markus Jelsma
It seems my clusterstate.json is still old. Is there a method to recreate is without taking all nodes down at the same time? -Original message- > From:Markus Jelsma > Sent: Thu 20-Sep-2012 10:14 > To: solr-user@lucene.apache.org > Subject: RE: Nodes cannot recover and become unavail

solrcloud without realtime

2012-09-24 Thread Radim Kolar
its possible to use solrcloud but without real-time features? In my application I do not need realtime features and old style processing should be more efficient.

Solr - Remove specific punctuation marks

2012-09-24 Thread Daisy
Hi; I am working with apache-solr-3.6.0 on windows machine. I would like to remove all punctuation marks before indexing except the colon and the full-stop. I tried: But it didn't work. Any Ideas? -- View this message in context: http://lucene.472066.n3

Re: Solrcloud not reachable and after restart just a "no servers hosting shard"

2012-09-24 Thread Sami Siren
hi, Can you share a little bit more about your configuration: how many shards, # of replicas, how does your clusterstate.json look like, anything suspicious in the logs? -- Sami Siren On Mon, Sep 24, 2012 at 11:13 AM, Daniel Brügge wrote: > Hi, > > I am running Solrcloud 4.0-BETA and during th

Solrcloud not reachable and after restart just a "no servers hosting shard"

2012-09-24 Thread Daniel Brügge
Hi, I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow, so that it wasn't reachable. CPU load was 100%. After a restart i couldn't access the data it just telled me: "no servers hosting shard" Is there a way to get the data back? Thanks & regards Daniel

Re: Return only matched multiValued field

2012-09-24 Thread Mikhail Khludnev
Hi It seems like highlighting feature. 24.09.2012 0:51 пользователь "Dotan Cohen" написал: > Assuming a multivalued, stored and indexed field with name "comment". > When performing a search, I would like to return only the values of > "comment" which contain the match. For example: > > When searc