Re: I found a sorting bug in solr/lucene

2011-07-29 Thread Chris Hostetter
: According to that bug list, there are other characters that break the : sorting function. Is there a list of safe characters I can use as a : delimiter? the safest field names to use (and most efficient to parse when sorting) are things that follow the the "id" semenatics in java (not includi

Re: dih fetching but not adding records to index

2011-07-29 Thread abhayd
quick question if i want to just load document with id=2 how would that work? I tried xpath expression that works with xpath tools but not in solr. How would i do this? -- View this message in context: http://lucene.

Looking for a senior search engineer

2011-07-29 Thread Michael Economy
Hi, Sorry if this isn't the right place for this message, but it's a very specific role we're looking for and I'm not sure where else to find solr experts! I was wondering if anyone would be interested, or knew anyone who would be interested in working on goodreads.com's search: We're using So

Re: Disabling Coord on Solr queries

2011-07-29 Thread Chris Hostetter
: I am looking for the simplest way to disable coord in Solr queries. I have : found out Lucene allows this by construction of a BooleanQuery with : diableCoord=false: : public *BooleanQuery*(boolean disableCoord) : : Is there any way to activate this functionality directly from a Solr query? N

Re: Display term frequency / phrase freqency for documents

2011-07-29 Thread Chris Hostetter
: I'd like to expose the termFrequency / phraseFrequency to the end user in my : application. For example I would like to be able to say "Your search term : appears X times in this document". : : I can see these figures exposed via debugQuery=on, where I get output like ... : Is there any

Re: omitNorms

2011-07-29 Thread Chris Hostetter
: my field category (string) has omitNorms=True and omitTermFreqAndPositions=True. : i have indexed all docs but when i do a search like: : http://xxx:xxx/solr/select/?q=category:A&debugQuery=on : i see there's normalization and idf and tf. Why? i can't understand the reason. those options en

Re: Solr versioning policy

2011-07-29 Thread Chris Hostetter
: 1. Is this the plan moving forward (to aim for a new minor release : approximately every couple of months)? The goal is to release minor versions "more frequently" as features and low priority bug fixes are available. If there is a high priority bug fix available, and and no likelihood of a

Re: I can't pass the unit test when compile from apache-solr-3.3.0-src

2011-07-29 Thread Chris Hostetter
: I find that the junit test will always fail, and told me ’BUILD FAILED‘ : : but if I type 'ant dist', I can get a apache-solr-3.3-SNAPSHOT.war : with no warning. : : Is it a problem just me? Can you please be specific... * which test(s) fail for you? * what are the failures? Any time a tes

Re: Auto-Commit and failures / schema violations

2011-07-29 Thread Chris Hostetter
: sure that the add was successfull, as (for example) schema violations : seem to be detected on commit, which is therefore too late, as the I have no idea what that stamement means -- if you are getting an error, can you be specific as to what type of error you are getting? (ie: what is retu

Error with Extracting PDF metadata

2011-07-29 Thread sabman
I am using Solr 3.3 and I am trying to extract and index meta data from PDF files. I am using the DataImportHandler with the TikaEntityProcessor to add the documents. Here is are the fields as defined in my schema.xml file: So I suppose the meta data information should b

Re: slow highlighting because of stemming

2011-07-29 Thread Mike Sokolov
I'm not sure I would identify stemming as the culprit here. Do you have very large documents? If so, there is a patch for FVH committed to limit the number of phrases it looks at; see hl.phraseLimit, but this won't be available until 3.4 is released. You can also limit the amount of each doc

Re: Exact match not the first result returned

2011-07-29 Thread Brian Lamb
I implemented both solutions Hoss suggested and was able to achieve the desired results. I would like to go with defType=dismax & qf=myname & pf=myname_str^100 & q=Frank but that doesn't seem to work if I have a query like myname:Frank otherfield:something. So I think I will go with q=+myname:F

dealing with so many different sorting options

2011-07-29 Thread Jason Toy
As I'm using solr more and more, I'm finding that I need to do searches and then order by new criteria. So I am constantly add new fields into solr and then reindexing everything. I want to know if adding in all this data into solr is the normal way to deal with sorting. I'm finding that I have

RE: embeded solrj doesn't refresh index

2011-07-29 Thread Jianbin Dai
Thanks Marc. Guess I was not clear about my previous statement. So let me rephrase. I use DIH to import data into solr and do indexing. Everything works fine. I have another embedded solr server setting to the same index files. I use embedded solrj to search the index file. So the first solr i

Solr Incremental Indexing

2011-07-29 Thread Mohammed Lateef Hussain
Hi Need some help in Solr incremental indexing approch. I have built my Solr index using SolrJ API and now want to update the index whenever any changes has been made in database. My requirement is not to use DB triggers to call any update events. I want to update my index on the fly whenever my

Re: Combine XML data with DIH

2011-07-29 Thread O. Klein
Yeah, but how do I combine the two based on the value in ? -- View this message in context: http://lucene.472066.n3.nabble.com/Combine-XML-data-with-DIH-tp3209413p3209983.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Combine XML data with DIH

2011-07-29 Thread abhayd
hi I have never done this with xml files but u can have multiple data sources in dih config http://wiki.apache.org/solr/DataImportHandler#multipleds abhay -- View this message in context: http://lucene.472066.n3.nabble.com/Combine-XML-data-with-DIH-tp3209413p3209933.html Sent from the Solr -

combining xml and nutch index in solr

2011-07-29 Thread abhayd
hi I have a xml file which has url, category,subcategory, title kind of details. and we crawl the urls in xml using Nutch. Anyway for use to merge both? like schema will look like url category subcategory title crawl_data_summary_from_nutch crawl_data_body_content_from_nutch Any solution for

Re: convert date format at indexing time

2011-07-29 Thread O. Klein
If you use DIH with TikaEntityProcessor you get the dates in Solr compatible format if you use the dates stored in the meta-data. -- View this message in context: http://lucene.472066.n3.nabble.com/convert-date-format-at-indexing-time-tp3191078p320988

RE: Index time boosting with DIH

2011-07-29 Thread Bürkle , David
Thanks for the answer. I want to share the configuration that worked for me (see the follow up question at the end): (Boosting a document on the basis of a field value at index time.) It took me some time to figure out, that for the row.get to work, I had to use the column name (the one in the s

RE: Updating opinion

2011-07-29 Thread Dyer, James
Although, now that I think more, you could probably get away with the commit-at-midnight option provided it doesn't take much time to warm a new searcher. Another thing is if you set a low merge factor you likely won't need to optimize. The optimize usually would take a lot longer than the com

RE: Updating opinion

2011-07-29 Thread Dyer, James
I would imagine if you're doing updates all day the commit might take a long time. You could try it though and see if it works for you. Another option, which will use more disk & memory is to replicate all your data to another core just after midnight. Then update the data all day long as you

Re: Combine XML data with DIH

2011-07-29 Thread O. Klein
To make it easier, I included example config: O. Klein wrote: > > I have folder with XML files > > 1.xml contains: > http://www.site.com/1.html > http://www.othersite.com/2.html > bla1 > > 2.xml contains: > http://www.othersite.com/2.html > bla2 > > I

Re: Dealing with keyword stuffing

2011-07-29 Thread Pranav Prakash
Cool, So I used SweetSpotSimilarity with default params and I see some improvements. However, I could still see some of the 'stuffed' documents coming up in the results. I feel that SweetSpotSimilarity alone is not enough. Going through http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf

segment.gen file is not replicated

2011-07-29 Thread Bernd Fehling
Dear list, is there a deeper logic behind why the segment.gen file is not replicated with solr 3.2? Is it obsolete because I have a single segment? Regards, Bernd

Combine XML data with DIH

2011-07-29 Thread O. Klein
I have folder with XML files 1.xml contains: http://www.site.com/1.html http://www.othersite.com/2.html bla1 2.xml contains: http://www.othersite.com/2.html bla2 I want to create document in Solr: http://www.site.com/1.html bla2 Can this be done with DIH? And how? -- Vi

Query on multi valued field

2011-07-29 Thread rajini maski
Hi All, I have a specific requirement in the multi-valued field type.The requirement is as follows There is a multivalued field in each document which can have mutliple elements or single element. For Eg: Consider that following are the documents matched for say q= *:* *DOC1* 1 * *

Updating opinion

2011-07-29 Thread roySolr
Hello, I want some opinions for the updating process of my application. Users can edit there own data. This data will be validated and must be updated every 24 hours. I want to do this at night(0:00). Now lets say 50.000 documents are edited. The delta import will take ~20 minutes. So the index

AUTO: Ryan J Minniear is out of the office. (returning 08/01/2011)

2011-07-29 Thread Ryan J Minniear
I am out of the office until 08/01/2011. I will respond to your message when I return. Please contact Robert Guthrie for any urgent issues. Note: This is an automated response to your message "Solr 3.2.0 is not writing log" sent on 7/29/11 2:08:07. This is the only notification you will rec

Re: [WARNING] Index corruption and crashes in Apache Lucene Core / Apache Solr with Java 7

2011-07-29 Thread Sanne Grinovero
Hello, thanks for the warning, that's a pretty nasty bug. A patch was made for OpenJDK, if anybody is interested to try it out that would be great: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/4e761e7e6e12 Regards, Sanne 2011/7/28 Uwe Schindler : > Hello Apache Lucene & Apache Solr us

slow highlighting because of stemming

2011-07-29 Thread Orosz György
Dear all, I am quite new about using Solr, but would like to ask your help. I am developing an application which should be able to highlight the results of a query. For this I am using regex fragmenter: 500 0.5 true [-\w ,/\n\"']{20,300}[.?!] dok

Auto-Commit and failures / schema violations

2011-07-29 Thread Dirk Högemann
Hello, we are running a large CMS with multiple customers and we are now going to use solr for our search and indexing tasks. As we have a lot of users working simultaneously on the CMS we decided not to commit our changes programatically (we use StreamingUpdateSolrServer) on each add. Instead

Re: convert date format at indexing time

2011-07-29 Thread PacoPeralta
Please Is there any suggestion on This? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/convert-date-format-at-indexing-time-tp3191078p3208989.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr 3.2.0 is not writing log

2011-07-29 Thread Ruixiang Zhang
I'm using Solr 1.4 with jetty for my site, it writes log into files in example/logs. Now I'm testing Solr 3.2.0 with jetty on another server, but no log is written into this folder: example/logs. It is always empty. Do I need to do something to turn on the log? Any hint will be appreciated. Rui