Re: Sharded Index Creation Magic?
On Tue, Jul 14, 2009 at 2:00 AM, Nick Dimiduk wrote: > However, when I search across all > deployed shards using the &shards= query parameter ( > > http://host00:8080/solr/select?shards=host00:8080/solr,host01:8080/solr&q=body > \%3A%3Aterm), > I get a NullPointerException: > > java.lang.NullPointerException >at > org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421) >at > org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:265) >at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:264) >at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) >at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) >at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) > > Debugging into the QueryComponent.mergeIds() method reveals the instance > sreq.responses (line 356) contains one response for each shard specified, > each with the number of results received by the independant queries. The > problems begin down at line 370 because the SolrDocument instance has only > a > score field -- which proves problematic in the following line where the id > is requested. The SolrDocument, only containing a score, lacks the > designated ID field (from my schema) and thus the document cannot be added > to the results queue. > > Because the example on the wiki works by loading the documents directly > into > Solr for indexing, I have come to the conclusion that there is some extra > magic happening in this index generation process which my process lacks. > Do you have a uniqueKey defined in your schema.xml? -- Regards, Shalin Shekhar Mangar.
Re: Availability during merge
On Tue, Jul 14, 2009 at 2:30 AM, Charlie Jackson wrote: > The wiki page for merging solr cores > (http://wiki.apache.org/solr/MergingSolrIndexes) mentions that the cores > being merged cannot be indexed to during the merge. What about the core > being merged *to*? In terms of the example on the wiki page, I'm asking > if core0 can add docs while core1 and core2 are being merged into it. > > A merge operation acquires the index writer lock, so any add operations sent during the merge, will wait till the merge completes. So, even though you can send add/delete commands to core0, they'll wait for the merge to finish. -- Regards, Shalin Shekhar Mangar.
Re: Can't limit return fields in custom request handler
Thank you very much Chris. Regards. On Mon, Jul 13, 2009 at 4:30 AM, Chris Hostetter wrote: > > : Query filter = new TermQuery(new Term("inStores", "true")); > > that will work if "inStores" is a TextField or a StrField and it's got the > term "true" indexed in it ... but if it's a BoolField like in the > example schema then the values that appear in the index are "T" and "F" > > When you write custom Solr plugins You *HAVE* to leverage the FieldType of > fields you deal with when building queries programaticly. this is what > the "FieldType.toInternal" method is for. > > > > > -Hoss > > -- Osman İZBAT
Custom funcionality in SolrIndexSearcher
Hey there. I needed a funcionality similar to adjacent-field-collapsing but instead of make the docs disapear I just wanted to put them at the end of the list (ids array). At the moment, I am just experimenting the way to obtain the shortests reponse time. provably will not be able to use my solution as it's a pretty big core hack, just would like to hear advices of "cleaner" ways to do this or about what do you think. I don't want this algorithm to be applyed in the whole index as it makes responses slower and have no interest in results after page 30, for example. I just want it to be applyed for the first 3000 or 5000 results. Due to performance issues (speed request and index size) couldn't use the collapsing patch so what I have done is to apply the algorithm straight away in getDocListAndSetNC and getDocListNC. Basically what I do is... if the user asks for less than "considerHowMany" docs I will ask for this number or if there are less I will ask for all of them (when topCollector.topDocs... is called). then I will apply the adjacent field collapse algorithm but instead of making the docs desapear I will send them to the end of the cue. I meam, let's say a query has 1.357.534. I just want to apply the algorithm to the 5000 results. So, if the 2nd results must be collapsed, it will go to the position 5000, if the 3rd must be collapse will go to 4999... After the 5000th the pseudo-collapse algorithm will stop being applyied. I have added to parameters to the QueryCommand use to decide if the algorithm has to be applyed and for how many documents must be applyied. I repeat it, it's just testing, I now it's not good to modify this classes... just want to hear any advice that could help me to do something similar without messing the code that much or what people think. I leave here my getDocListAndSetNC.java (have done the same for getDocListNC): private DocSet getDocListAndSetNC(QueryResult qr,QueryCommand cmd) throws IOException { int len = cmd.getSupersetMaxDoc(); DocSet filter = cmd.getFilter()!=null ? cmd.getFilter() : getDocSet(cmd.getFilterList()); int last = len; if (last < 0 || last > maxDoc()) last=maxDoc(); final int lastDocRequested = last; int nDocsReturned; int totalHits; float maxScore; int[] ids; float[] scores; DocSet set; //extra vars boolean considerMoreDocs = cmd.getConsiderMoreDocs() ; int considerHowMany = cmd.getConsiderHowMany() ; boolean needScores = (cmd.getFlags() & GET_SCORES) != 0; int maxDoc = maxDoc(); int smallSetSize = maxDoc>>6; Query query = QueryUtils.makeQueryable(cmd.getQuery()); final long timeAllowed = cmd.getTimeAllowed(); final Filter luceneFilter = filter==null ? null : filter.getTopFilter(); // handle zero case... if (lastDocRequested<=0) { final float[] topscore = new float[] { Float.NEGATIVE_INFINITY }; Collector collector; DocSetCollector setCollector; if (!needScores) { collector = setCollector = new DocSetCollector(smallSetSize, maxDoc); } else { collector = setCollector = new DocSetDelegateCollector(smallSetSize, maxDoc, new Collector() { Scorer scorer; public void setScorer(Scorer scorer) throws IOException { this.scorer = scorer; } public void collect(int doc) throws IOException { float score = scorer.score(); if (score > topscore[0]) topscore[0]=score; } public void setNextReader(IndexReader reader, int docBase) throws IOException { } }); } if( timeAllowed > 0 ) { collector = new TimeLimitingCollector(collector, timeAllowed); } try { super.search(query, luceneFilter, collector); } catch( TimeLimitingCollector.TimeExceededException x ) { log.warn( "Query: " + query + "; " + x.getMessage() ); qr.setPartialResults(true); } set = setCollector.getDocSet(); nDocsReturned = 0; ids = new int[nDocsReturned]; scores = new float[nDocsReturned]; totalHits = set.size(); maxScore = totalHits>0 ? topscore[0] : 0.0f; } else { TopDocsCollector topCollector; //This is how it was: /*/ /* if (cmd.getSort() == null) { topCollector = TopScoreDocCollector.create(len, true); } else { topCollector = TopFieldCollector.create(cmd.getSort(), len, false, needScores, needScores, true); } **/ if (cmd.getSort() == null) { if(len < considerHowMany && considerMoreDocs){ topCollector = TopScoreDocCollector.create(considerHowMany, true); }else{ topCollector = TopScoreDocCollector.create(len, true); } } else { if(len < considerHowMany && considerMoreDocs){ topCollector = TopFieldCol
Using Multiple fields in UniqueKey
Is there any possiblity of Adding Multiple fields to the UniqueKey in Schema.xml(An Implementation similar to Compound Primary Key)? -- View this message in context: http://www.nabble.com/Using-Multiple-fields-in-UniqueKey-tp24476088p24476088.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing Solr for the first time
On Tue, Jul 14, 2009 at 1:33 AM, Kevin Miller wrote: > I am new to Solr and trying to get it set up to index files from a > directory structure on a server. I have a few questions. > > 1.) Is there an application that will return the search results in a > user friendly format? isn't the xml response format user friendly ? > > > 2.) How do I move Solr from the example environment into a production > environment? > > > 3.) Will Solr search through multiple folders when indexing and if so > can I specify which folders to index from? Solr does not search any folders. you will have to index the contents of your folder into Solr. > > > I have looked through the tutorial, the Docs, and the FAQ and am still > having problems making sense of it. > > Kevin Miller > Oklahoma Tax Commission > Web Services > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Faceting
Well, I had a bit of a facepalm moment when thinking about it a little more, I'll just show a "more countries [Y selected]" where Y is the number of countries selected which are not in the top X. If you want a nice concise interface you'll just have to enable javascript. With my earlier adventures in numerical range selection (solr-1240) I became wary of just adding facet.query parameters as Solr seemed to crash when adding a lot of facet.queries of the form facet.query=price:[* TO 10]&facet.query:[10 TO 20] etc. etc Thanks for your help, Regards, Gijs Shalin Shekhar Mangar wrote: On Mon, Jul 13, 2009 at 7:56 PM, gwk wrote: Is there a good way to select the top X facets and include some terms you want to include as well something like facet.field=country&f.country.facet.limit=X&f.country.facet.includeterms=Narnia,Guilder or is there some other way to achieve this? You can use facet.query for each of the terms you want to include. You may need to remove such terms from appearing in the facet.field=country results in the client. e.g. facet.field=country&f.country.facet.limit=X&facet.query=country:Narnia&facet.query=country:Guilder
Re: Distributed Search in Solr
Hi Grant, What i have got from your comments is: 1. We will have to add a support for BoostingTermQuery which extends SpanTermQuery like in lucene payload support. In current world we anyway have other class which is extending SpanTermQuery . Where should i put this class or newly built BoostingTermQuery and how i can use this class or BoostingTermQuery class? 2. I have not got much why we require TokenFilterFactory. In our application (our own search server) we already have payload related search for which we are using some thing searcher.setSimilarity(Similarity). Don't we require this in solr payload search. Now can you please explain little more how can we do payload search using solr. I mean we will need to set some payload term using BoostingtermQuery how we will be doing it in solr. How we will be passing such search to solr? - Sumit On Fri, Jul 10, 2009 at 8:54 PM, Grant Ingersoll wrote: > > On Jul 9, 2009, at 11:58 PM, Sumit Aggarwal wrote: > > Hi, >> 1. Calls made to multiple shards are made in some concurrent fashion or >> serially? >> > > Concurrent > > 2. Any idea of algorithm followed for merging data? I mean how efficient >> it >> is? >> > > Not sure, but given that Yonik implemented it, I suspect it is highly > efficient. ;-) > > 3. Lucene provides payload concept. How can we make search using that in >> solr. My application store payloads and use search using our custom search >> server. >> > > Not currently, but this would be a welcome patch. I added a new > DelimitedPayloadTokenFilter to Lucene that should make it really easy to > send in payloads "inline" in Solr XML, so what remains to be done, I think > is: > > 1. Create a new TokenFilterFactory for the TokenFilter > 2. Hook in some syntax support for creating a BoostingTermQuery in the > Query Parsers. > > Patches welcome! > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > -- Cheers Sumit 9818621804
Data Import ID Problem
Hi All, I have a problem when importing data using the data import handler. I import documents from multiple tables so table.id is not unique - to get round this I concatenate the type like this: When searching it seems the CONCATted string is turned into some sort of charcter array(?): 1 [...@108759d Everything is OK if I add a document via SolrJ: SolrInputDocument doc = doc.addField( doc.addField( newSolrInputDocument();"id", myThing.getId() + TCSearch.SEARCH_TYPE_THING);"dbid", myThing.getId()); Obviously this will cause problems as I remove documents by consturcting the ID and using deleteById. Any ideas? Thanks, rotis
RE: Implementing Solr for the first time
I am needing to index primarily .doc files but also need it to look at .pdf and .xls files. I am currently looking at the Tika project for this functionality. Kevin Miller Web Services -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Tuesday, July 14, 2009 1:34 AM To: solr-user@lucene.apache.org Subject: Re: Implementing Solr for the first time On Tue, Jul 14, 2009 at 1:33 AM, Kevin Miller < kevin.mil...@oktax.state.ok.us> wrote: > I am new to Solr and trying to get it set up to index files from a > directory structure on a server. I have a few questions. > > 1.) Is there an application that will return the search results in a > user friendly format? > I'm not sure. There is a ruby application called flare but I haven't used it myself. People usually build their own applications and use Solr as a search server. > 2.) How do I move Solr from the example environment into a production > environment? > If you mean how do you change the example schema/config, then that depends entirely on the kind of data you want to search. Some good starting points on deciding the schema are: http://wiki.apache.org/solr/SchemaDesign http://wiki.apache.org/solr/UniqueKey > 3.) Will Solr search through multiple folders when indexing and if so > can I specify which folders to index from? > Solr does not search through folders. Solr is only a server. You can either write a program to push data to Solr or use a plugin like DataImportHandler to do this. http://wiki.apache.org/solr/DataImportHandler What are the kind of files you are indexing? -- Regards, Shalin Shekhar Mangar.
support for Payload Feature of lucene in solr
Hi, As i am new to solr and trying to explore payloads in solr but i haven't got any success on that. In one of the thread Grant mentioned solr have DelimitedPayloadTokenFilter which can store payloads at index time. But to make search on it we will require implementation of BoostingTermQuery extending SpanTermQuery . And if any other thing also we require. My Question: 1. What all i will have to do for this. 2. How i will do this. I mean even if by adding some classes and rebuilding solr jars and then how i will prepare Document to index to store payloads and how i will build my search query to do payload search. Do we need to add a new Requesthandler for making such custom searches? Please provide a sample code if have any... -- Cheers Sumit
TooManyOpenFiles: indexing in one core, doing many searches at the same time in another
Hi, We are having a TooManyOpenFiles exception in our indexing process. We are reading data from a database and indexing this data into one of the two cores of our solr instance. Each of the cores has a different schema as they are used for a different purpose. While we index in the first core, we do many searches in the second core as it contains data to "enrich" what we index (the second core is never modifier - read only). After indexing about 50.000 documents (about 300 fields each) we get the exception. If we run the same process, but without the "enrichment" (not doing queries in the second core), everything goes all right. We are using spring batch, and we only commit+optimize at the very end, as we don't need to search anything in the data that is being indexed. I have seen recommendations that go from committing+optimize more often or lowering the merge factor? How is the merge factor affecting in this scenario? Thanks, Bruno
Re: Data Import ID Problem
Sorry - The solrJ snippet shoud read: SolrInputDocument doc = doc.addField( doc.addField( newSolrInputDocument();"id", myThing.getId() + TCSearch.SEARCH_TYPE_THING);"dbid", myThing.getId()); - Original Message From: Chris Masters To: solr-user@lucene.apache.org Sent: Tuesday, July 14, 2009 12:16:06 PM Subject: Data Import ID Problem Hi All, I have a problem when importing data using the data import handler. I import documents from multiple tables so table.id is not unique - to get round this I concatenate the type like this: When searching it seems the CONCATted string is turned into some sort of charcter array(?): 1 [...@108759d Everything is OK if I add a document via SolrJ: SolrInputDocument doc = doc.addField( doc.addField( newSolrInputDocument();"id", myThing.getId() + TCSearch.SEARCH_TYPE_THING);"dbid", myThing.getId()); Obviously this will cause problems as I remove documents by consturcting the ID and using deleteById. Any ideas? Thanks, rotis
Re: Spell checking: Is there a way to exclude words known to be wrong?
Use the stopwords feature with a custom mispeled_words.txt and a StopFilterFactory on the spell check field ;) Erik On Jul 13, 2009, at 8:27 PM, Jay Hill wrote: We're building a spell index from a field in our main index with the following configuration: textSpell default spell ./spellchecker true This works great and re-builds the spelling index on commits as expected. However, we know there are misspellings in the "spell" field of our main index. We could remove these from the spelling index using Luke, however they will be added again on commits. What we need is something similar to how the protwords.txt file is used. So that when we notice misspelled words such as "beginnning" being pulled from our main index we could add them to an exclusion file so they are not added to the spelling index again. Any tricks to make this possible? -Jay
Re: TooManyOpenFiles: indexing in one core, doing many searches at the same time in another
Setting: 2 may help. have you tried it? Indexing will be a bit slower but will be faster optimizing. You can check with lsof to see how many files jetty/tomcat (or the server you are using) is holding Bruno Aranda wrote: > > Hi, > > We are having a TooManyOpenFiles exception in our indexing process. We > are reading data from a database and indexing this data into one of > the two cores of our solr instance. Each of the cores has a different > schema as they are used for a different purpose. While we index in the > first core, we do many searches in the second core as it contains data > to "enrich" what we index (the second core is never modifier - read > only). After indexing about 50.000 documents (about 300 fields each) > we get the exception. If we run the same process, but without the > "enrichment" (not doing queries in the second core), everything goes > all right. > We are using spring batch, and we only commit+optimize at the very > end, as we don't need to search anything in the data that is being > indexed. > > I have seen recommendations that go from committing+optimize more > often or lowering the merge factor? How is the merge factor affecting > in this scenario? > > Thanks, > > Bruno > > -- View this message in context: http://www.nabble.com/TooManyOpenFiles%3A-indexing-in-one-core%2C-doing-many-searches-at-the--same-time-in-another-tp24478812p24479144.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TooManyOpenFiles: indexing in one core, doing many searches at the same time in another
What merge factor are you using now? The merge factor will influence the number of files that are created as the index grows. Lower = fewer file descriptors needed, but also slower bulk indexing. You could up the Max Open Files settings on your OS. You could also use true Which writes multiple segments to one file and requires *way* less file handles (slightly slower indexing). It would normally be odd to hit something like that after only 50,000 documents, but a doc with 300 fields is certainly not the norm ;) Anything else special about your setup? -- - Mark http://www.lucidimagination.com On Tue, Jul 14, 2009 at 12:49 PM, Bruno Aranda wrote: > Hi, > > We are having a TooManyOpenFiles exception in our indexing process. We > are reading data from a database and indexing this data into one of > the two cores of our solr instance. Each of the cores has a different > schema as they are used for a different purpose. While we index in the > first core, we do many searches in the second core as it contains data > to "enrich" what we index (the second core is never modifier - read > only). After indexing about 50.000 documents (about 300 fields each) > we get the exception. If we run the same process, but without the > "enrichment" (not doing queries in the second core), everything goes > all right. > We are using spring batch, and we only commit+optimize at the very > end, as we don't need to search anything in the data that is being > indexed. > > I have seen recommendations that go from committing+optimize more > often or lowering the merge factor? How is the merge factor affecting > in this scenario? > > Thanks, > > Bruno >
Re: Implementing Solr for the first time
On Jul 14, 2009, at 8:00 AM, Kevin Miller wrote: I am needing to index primarily .doc files but also need it to look at .pdf and .xls files. I am currently looking at the Tika project for this functionality. This is now built into trunk (aka Solr 1.4): http://wiki.apache.org/solr/ExtractingRequestHandler Erik
Anyone working on adapting AnalyzingQueryParser to solr?
The lucene class AnalyzingQueryParser does exactly what I need it to do, but I need to do it in Solr. I took a look at trying to subclass QParser, and it's clear I'm not smart enough. :-) Is anyone else looking at this? -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
wt=json Not setting application/json reponse headers but text/plain. Howto fix?
Hi folks I see that when calling wt=json I get json response but headers are text/plain which totally bugs me. I rather expect application/json response headers. Any pointers are more than welcome how I can fix this.
Re: Spell checking: Is there a way to exclude words known to be wrong?
On Tue, Jul 14, 2009 at 6:37 PM, Erik Hatcher wrote: > Use the stopwords feature with a custom mispeled_words.txt and a > StopFilterFactory on the spell check field ;) > > Very cool! :) -- Regards, Shalin Shekhar Mangar.
Re: Implementing Solr for the first time
On Jul 14, 2009, at 5:35 AM, Noble Paul നോബിള് नोब्ळ् wrote: On Tue, Jul 14, 2009 at 1:33 AM, Kevin Miller wrote: I am new to Solr and trying to get it set up to index files from a directory structure on a server. I have a few questions. 1.) Is there an application that will return the search results in a user friendly format? isn't the xml response format user friendly ? LOL! 3.) Will Solr search through multiple folders when indexing and if so can I specify which folders to index from? Solr does not search any folders. you will have to index the contents of your folder into Solr. Fairly straightforward to have some script that loops over a directory and sends files (or file paths/URLs) to the extracting request handler. Erik
Re: TooManyOpenFiles: indexing in one core, doing many searches at the same time in another
Hi, my process is: I index 60 docs in the secondary core (each doc has 5 fields). No problem with that. After this core is indexed (and optimized) it will be used only for searches, during the main core indexing. Currently, I am using mergeFactoror 10 for the main core. I will try with 2 to see if it changes and the useCompoundFile set to true. I guess I don't need to modify anything in the secondary core as it is only used for searches. Thanks for your answers, Bruno 2009/7/14 Mark Miller : > What merge factor are you using now? The merge factor will influence the > number of files that are created as the index grows. Lower = fewer file > descriptors needed, but also slower bulk indexing. > You could up the Max Open Files settings on your OS. > > You could also use > > true > > Which writes multiple segments to one file and requires *way* less file > handles (slightly slower indexing). > > It would normally be odd to hit something like that after only 50,000 > documents, but a doc with 300 fields is certainly not the norm ;) Anything > else special about your setup? > > -- > - Mark > > http://www.lucidimagination.com > > On Tue, Jul 14, 2009 at 12:49 PM, Bruno Aranda wrote: > >> Hi, >> >> We are having a TooManyOpenFiles exception in our indexing process. We >> are reading data from a database and indexing this data into one of >> the two cores of our solr instance. Each of the cores has a different >> schema as they are used for a different purpose. While we index in the >> first core, we do many searches in the second core as it contains data >> to "enrich" what we index (the second core is never modifier - read >> only). After indexing about 50.000 documents (about 300 fields each) >> we get the exception. If we run the same process, but without the >> "enrichment" (not doing queries in the second core), everything goes >> all right. >> We are using spring batch, and we only commit+optimize at the very >> end, as we don't need to search anything in the data that is being >> indexed. >> >> I have seen recommendations that go from committing+optimize more >> often or lowering the merge factor? How is the merge factor affecting >> in this scenario? >> >> Thanks, >> >> Bruno >> >
Guide to using SolrQuery object
Hi, It seems that SolrQuery is a better API than the basic ModifiableSolrParams, but I can't make it work. Constructing params with: final ModifiableSolrParams params = new ModifiableSolrParams(); params.set("q", queryString); ...results in a successful search. Constructing SolrQuery with: final SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(queryString); ... doesn't (with the same unit test driving the search). I'm sure I'm missing some basic option, but the javadoc is a little terse, and I don't see what I'm missing. Ideas? Also, are there enums or constants around the various param names that can be passed in, or do people tend to define those themselves? Thanks! Reuben
Re: support for Payload Feature of lucene in solr
As i am new to solr and trying to explore payloads in solr but i haven't got any success on that. In one of the thread Grant mentioned solr have DelimitedPayloadTokenFilter which can store payloads at index time. But to make search on it we will require implementation of BoostingTermQuery extending SpanTermQuery . And if any other thing also we require. This looks about the same as the approach I'm about to use for our research. We're looking into using payloads to improve relevance for stemmed terms, using the payload to store the unstemmed term, boosting the term if there's an exact match with the payloads. My Question: 1. What all i will have to do for this. 2. How i will do this. I mean even if by adding some classes and rebuilding solr jars and then how i will prepare Document to index to store payloads and how i will build my search query to do payload search. Do we need to add a new Requesthandler for making such custom searches? Please provide a sample code if have any... -- Cheers Sumit I'm starting work on this in the next few days, I'll let you know how I get on. If anyone else has any experience with payloads in solr please chip in :) -- Toby Cole Software Engineer, Semantico Limited Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/
Re: support for Payload Feature of lucene in solr
right now Solr does not support indexing/retrieving payloads. Probably this can be taken up as an issue On Tue, Jul 14, 2009 at 5:41 PM, Sumit Aggarwal wrote: > Hi, > As i am new to solr and trying to explore payloads in solr but i haven't got > any success on that. In one of the thread Grant mentioned solr have > DelimitedPayloadTokenFilter which > can store payloads at index time. But to make search on it we will > require implementation of BoostingTermQuery extending SpanTermQuery . And > if any other thing also we require. > > My Question: > 1. What all i will have to do for this. > 2. How i will do this. I mean even if by adding some classes and rebuilding > solr jars and then how i will prepare Document to index to store payloads > and how i will build my search query to do payload search. Do we need to add > a new Requesthandler for making such custom searches? Please provide a > sample code if have any... > > -- > Cheers > Sumit > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Data Import ID Problem
DIH is getting the field as it as a byte[] ? which db and which driver are you using? On Tue, Jul 14, 2009 at 4:46 PM, Chris Masters wrote: > > Hi All, > > I have a problem when importing data using the data import handler. I import > documents from multiple tables so table.id is not unique - to get round > this I concatenate the type like this: > > > > > > > > When searching it seems the CONCATted string is turned into some sort of > charcter array(?): > > > > 1 > [...@108759d > > > Everything is OK if I add a document via SolrJ: > > > SolrInputDocument doc = > doc.addField( > doc.addField( newSolrInputDocument();"id", myThing.getId() + > TCSearch.SEARCH_TYPE_THING);"dbid", myThing.getId()); > > > Obviously this will cause problems as I remove documents by consturcting > the ID and using deleteById. Any ideas? > > Thanks, rotis > > > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: support for Payload Feature of lucene in solr
That doesn't require payloads. I was doing that with Solr 1.1. Define two fields, stemmed and exact, with different analyzer chains. Use copyfield to load the same info into both. With the dismax handler, search both fields with a higher boost on the exact field. wunder On 7/14/09 7:39 AM, "Toby Cole" wrote: > We're looking into using payloads to improve relevance for stemmed > terms, using the payload to store the unstemmed term, boosting the > term if there's an exact match with the payloads.
Re: support for Payload Feature of lucene in solr
Hi Walter, I do have a search server where i have implemented things using payload feature itself. These days i am evaluating solr to get rid of my own search server. For that i need payloads feature in solr itself. I raised a related question and got a message from *Grant* as * "**I added a new DelimitedPayloadTokenFilter to Lucene that should make it really easy to send in payloads "inline" in Solr XML, so what remains to be done, I think is:* *1. Create a new TokenFilterFactory for the TokenFilter **2. Hook in some syntax support for creating a BoostingTermQuery in the Query Parsers.**"* Now can any one provide any custom code to do what grant mentioned. Thanks, Sumit On Tue, Jul 14, 2009 at 8:24 PM, Walter Underwood wrote: > That doesn't require payloads. I was doing that with Solr 1.1. Define two > fields, stemmed and exact, with different analyzer chains. Use copyfield to > load the same info into both. With the dismax handler, search both fields > with a higher boost on the exact field. > > wunder > > On 7/14/09 7:39 AM, "Toby Cole" wrote: > > > We're looking into using payloads to improve relevance for stemmed > > terms, using the payload to store the unstemmed term, boosting the > > term if there's an exact match with the payloads. > >
Re: Data Import ID Problem
MySQL -> com.mysql.jdbc.Driver (mysql-connector-java-5.1.7.jar). mysql concat -> http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_concat Fix is to use CAST like: SELECT CONCAT(CAST(THING.ID AS CHAR),TYPE) AS INDEX_ID... Thanks for the nudge 'Noble Paul'! - Original Message From: Noble Paul നോബിള് नोब्ळ् To: solr-user@lucene.apache.org Sent: Tuesday, July 14, 2009 3:53:44 PM Subject: Re: Data Import ID Problem DIH is getting the field as it as a byte[] ? which db and which driver are you using? On Tue, Jul 14, 2009 at 4:46 PM, Chris Masters wrote: > > Hi All, > > I have a problem when importing data using the data import handler. I import > documents from multiple tables so table.id is not unique - to get round > this I concatenate the type like this: > > > > > > > > When searching it seems the CONCATted string is turned into some sort of > charcter array(?): > > > > 1 > [...@108759d > > > Everything is OK if I add a document via SolrJ: > > > SolrInputDocument doc = > doc.addField( > doc.addField( newSolrInputDocument();"id", myThing.getId() + > TCSearch.SEARCH_TYPE_THING);"dbid", myThing.getId()); > > > Obviously this will cause problems as I remove documents by consturcting > the ID and using deleteById. Any ideas? > > Thanks, rotis > > > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: support for Payload Feature of lucene in solr
Hey Nobel, Any comments on Grant suggestion. Thanks, -Sumit On Tue, Jul 14, 2009 at 8:40 PM, Sumit Aggarwal wrote: > Hi Walter, > I do have a search server where i have implemented things using payload > feature itself. These days i am evaluating solr to get rid of my own search > server. For that i need payloads feature in solr itself. I raised a related > question and got a message from *Grant* as > * "**I added a new DelimitedPayloadTokenFilter to Lucene that should make > it really easy to send in payloads "inline" in Solr XML, so what remains to > be done, I think is:* > *1. Create a new TokenFilterFactory for the TokenFilter > **2. Hook in some syntax support for creating a BoostingTermQuery in the > Query Parsers.**"* > > Now can any one provide any custom code to do what grant mentioned. > > Thanks, > Sumit > > On Tue, Jul 14, 2009 at 8:24 PM, Walter Underwood > wrote: > >> That doesn't require payloads. I was doing that with Solr 1.1. Define two >> fields, stemmed and exact, with different analyzer chains. Use copyfield >> to >> load the same info into both. With the dismax handler, search both fields >> with a higher boost on the exact field. >> >> wunder >> >> On 7/14/09 7:39 AM, "Toby Cole" wrote: >> >> > We're looking into using payloads to improve relevance for stemmed >> > terms, using the payload to store the unstemmed term, boosting the >> > term if there's an exact match with the payloads. >> >> >
Re: Sharded Index Creation Magic?
I do, but you raise an interesting point. I had named the field incorrectly. I'm a little puzzled as to why individual search worked with the broken field name, but now all is well! On Tue, Jul 14, 2009 at 12:03 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Tue, Jul 14, 2009 at 2:00 AM, Nick Dimiduk wrote: > > > However, when I search across all > > deployed shards using the &shards= query parameter ( > > > > > http://host00:8080/solr/select?shards=host00:8080/solr,host01:8080/solr&q=body > > \%3A%3Aterm), > > I get a NullPointerException: > > > > java.lang.NullPointerException > >at > > > org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421) > >at > > > org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:265) > >at > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:264) > >at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) > >at > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) > >at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) > > > > Debugging into the QueryComponent.mergeIds() method reveals the instance > > sreq.responses (line 356) contains one response for each shard specified, > > each with the number of results received by the independant queries. The > > problems begin down at line 370 because the SolrDocument instance has > only > > a > > score field -- which proves problematic in the following line where the > id > > is requested. The SolrDocument, only containing a score, lacks the > > designated ID field (from my schema) and thus the document cannot be > added > > to the results queue. > > > > Because the example on the wiki works by loading the documents directly > > into > > Solr for indexing, I have come to the conclusion that there is some extra > > magic happening in this index generation process which my process lacks. > > > > > Do you have a uniqueKey defined in your schema.xml? > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Sharded Index Creation Magic?
On Tue, Jul 14, 2009 at 10:30 PM, Nick Dimiduk wrote: > I do, but you raise an interesting point. I had named the field > incorrectly. > I'm a little puzzled as to why individual search worked with the broken > field name, but now all is well! > > An individual Solr uses uniqueKey only for replacing documents during indexing. During a search the uniqueKey is used only for associating certain pieces of information with documents e.g. highlighting info is written in the response per uniqueKey. Solr will complain only if you don't specify a uniqueKey during indexing. If you forgot to include uniqueKeys in some documents, changed to schema to add a uniqueKey and then didn't reindex the whole bunch, there will be some documents in the index without a value in the unique key field. In such a case, if you use distributed search, it will blow up because it expects all documents to have a value for the uniqueKey field. These values are used to merge responses from the shards. -- Regards, Shalin Shekhar Mangar.
Re: wt=json Not setting application/json reponse headers but text/plain. Howto fix?
Take a look at https://issues.apache.org/jira/browse/SOLR-1123 Don't stop yourself from voting for the issue :) Cheers Avlesh On Tue, Jul 14, 2009 at 7:01 PM, Julian Davchev wrote: > Hi folks > I see that when calling wt=json I get json response but headers are > text/plain which totally bugs me. > I rather expect application/json response headers. > > Any pointers are more than welcome how I can fix this. >
Re: Availability during merge
Kind of regrettable, I think we can look at changing this in Lucene. On Tue, Jul 14, 2009 at 12:08 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Tue, Jul 14, 2009 at 2:30 AM, Charlie Jackson < > charlie.jack...@cision.com > > wrote: > > > The wiki page for merging solr cores > > (http://wiki.apache.org/solr/MergingSolrIndexes) mentions that the cores > > being merged cannot be indexed to during the merge. What about the core > > being merged *to*? In terms of the example on the wiki page, I'm asking > > if core0 can add docs while core1 and core2 are being merged into it. > > > > > A merge operation acquires the index writer lock, so any add operations > sent > during the merge, will wait till the merge completes. So, even though you > can send add/delete commands to core0, they'll wait for the merge to > finish. > > -- > Regards, > Shalin Shekhar Mangar. >
Wikipedia or reuters like index for testing facets?
Is there a standard index like what Lucene uses for contrib/benchmark for executing faceted queries over? Or maybe we can randomly generate one that works in conjunction with wikipedia? That way we can execute real world queries against faceted data. Or we could use the Lucene/Solr mailing lists and other data (ala Lucid's faceted site) as a standard index?
Re: support for Payload Feature of lucene in solr
It may be nice to tell us why you need payloads? There may be other ways of solving your problem than adding payload support to Solr? Anyway, I don't see payload support before 1.5 On Tue, Jul 14, 2009 at 10:07 PM, Sumit Aggarwal wrote: > Hey Nobel, > Any comments on Grant suggestion. > > Thanks, > -Sumit > > On Tue, Jul 14, 2009 at 8:40 PM, Sumit Aggarwal > wrote: > > > Hi Walter, > > I do have a search server where i have implemented things using payload > > feature itself. These days i am evaluating solr to get rid of my own > search > > server. For that i need payloads feature in solr itself. I raised a > related > > question and got a message from *Grant* as > > * "**I added a new DelimitedPayloadTokenFilter to Lucene that should make > > it really easy to send in payloads "inline" in Solr XML, so what remains > to > > be done, I think is:* > > *1. Create a new TokenFilterFactory for the TokenFilter > > **2. Hook in some syntax support for creating a BoostingTermQuery in the > > Query Parsers.**"* > > > > Now can any one provide any custom code to do what grant mentioned. > > > > Thanks, > > Sumit > > > > On Tue, Jul 14, 2009 at 8:24 PM, Walter Underwood < > wunderw...@netflix.com>wrote: > > > >> That doesn't require payloads. I was doing that with Solr 1.1. Define > two > >> fields, stemmed and exact, with different analyzer chains. Use copyfield > >> to > >> load the same info into both. With the dismax handler, search both > fields > >> with a higher boost on the exact field. > >> > >> wunder > >> > >> On 7/14/09 7:39 AM, "Toby Cole" wrote: > >> > >> > We're looking into using payloads to improve relevance for stemmed > >> > terms, using the payload to store the unstemmed term, boosting the > >> > term if there's an exact match with the payloads. > >> > >> > > > -- Regards, Shalin Shekhar Mangar.
Re: Wikipedia or reuters like index for testing facets?
On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Is there a standard index like what Lucene uses for contrib/benchmark for > executing faceted queries over? Or maybe we can randomly generate one that > works in conjunction with wikipedia? That way we can execute real world > queries against faceted data. Or we could use the Lucene/Solr mailing lists > and other data (ala Lucid's faceted site) as a standard index? > I don't think there is any standard set of docs for solr testing - there is not a real benchmark contrib - though I know more than a few of us have hacked up pieces of Lucene benchmark to work with Solr - I think I've done it twice now ;) Would be nice to get things going. I was thinking the other day: I wonder how hard it would be to make Lucene Benchmark generic enough to accept Solr impls and Solr algs? It does a lot that would suck to duplicate. -- -- - Mark http://www.lucidimagination.com
Multicore Solr (trunk) creates extra dirs
Hello, I just built solr.war from trunk and deployed it to a multicore solr server whose solr.xml looks like this: Each core has conf and data/index dirs under its instanceDir. e.g. $ tree /mnt/solrhome/cores/core0 cores/core0 |-- conf | |-- schema.xml -> ../../../conf/schema-foo.xml | `-- solrconfig.xml -> ../../../conf/solrconfig-foo.xml `-- data `-- index I noticed that when I start the container with this brand new Solr all of a sudden the '/mnt' directory shows up in /mnt/solrhome ! (this /mnt/solrhome is also the directory from which I happened to start the container, though I'm not sure if that matters). This is what this /mnt/solrhome/mnt looks like: $ tree /mnt/solrhome/mnt mnt `-- solrhome `-- cores |-- core0 | `-- data | `-- index | |-- segments.gen | `-- segments_1 |-- core1 | `-- data | `-- index | |-- segments.gen | `-- segments_1 So it looks like Solr decides to create the index dir and the full path ot it there. It looks almost like Solr is looking at my instanceDirs in solr.xml and decides that it needs to create those directories, but under Solr home dir (I use -Dsolr.solr.home=/mnt/solrhome). I switched back to the old solr.war and this stopped happening. Is this a bug or a new feature that I missed? Thank you, Otis
Re: support for Payload Feature of lucene in solr
The TokenFilterFactory side is trivial for the DelimitedPayloadTokenFilter. That could be in for 1.4. In fact, there is an automated way to generate the stubs that should be run in preparing for a release. I'll see if I can find a minute or two to make that happen. For query support, I've never hooked into the query parser, so I have no clue. Yonik seems to crank out new query capabilities pretty fast, so maybe it isn't too bad, even if it isn't done as fast as Yonik. Bigger picture, it would be great to have spans support too. On Jul 14, 2009, at 3:44 PM, Shalin Shekhar Mangar wrote: It may be nice to tell us why you need payloads? There may be other ways of solving your problem than adding payload support to Solr? Anyway, I don't see payload support before 1.5 On Tue, Jul 14, 2009 at 10:07 PM, Sumit Aggarwal wrote: Hey Nobel, Any comments on Grant suggestion. Thanks, -Sumit On Tue, Jul 14, 2009 at 8:40 PM, Sumit Aggarwal wrote: Hi Walter, I do have a search server where i have implemented things using payload feature itself. These days i am evaluating solr to get rid of my own search server. For that i need payloads feature in solr itself. I raised a related question and got a message from *Grant* as * "**I added a new DelimitedPayloadTokenFilter to Lucene that should make it really easy to send in payloads "inline" in Solr XML, so what remains to be done, I think is:* *1. Create a new TokenFilterFactory for the TokenFilter **2. Hook in some syntax support for creating a BoostingTermQuery in the Query Parsers.**"* Now can any one provide any custom code to do what grant mentioned. Thanks, Sumit On Tue, Jul 14, 2009 at 8:24 PM, Walter Underwood < wunderw...@netflix.com>wrote: That doesn't require payloads. I was doing that with Solr 1.1. Define two fields, stemmed and exact, with different analyzer chains. Use copyfield to load the same info into both. With the dismax handler, search both fields with a higher boost on the exact field. wunder On 7/14/09 7:39 AM, "Toby Cole" wrote: We're looking into using payloads to improve relevance for stemmed terms, using the payload to store the unstemmed term, boosting the term if there's an exact match with the payloads. -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Wikipedia or reuters like index for testing facets?
At a min, it is trivial to use the EnWikiDocMaker and then send the doc over SolrJ... On Jul 14, 2009, at 4:07 PM, Mark Miller wrote: On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: Is there a standard index like what Lucene uses for contrib/ benchmark for executing faceted queries over? Or maybe we can randomly generate one that works in conjunction with wikipedia? That way we can execute real world queries against faceted data. Or we could use the Lucene/Solr mailing lists and other data (ala Lucid's faceted site) as a standard index? I don't think there is any standard set of docs for solr testing - there is not a real benchmark contrib - though I know more than a few of us have hacked up pieces of Lucene benchmark to work with Solr - I think I've done it twice now ;) Would be nice to get things going. I was thinking the other day: I wonder how hard it would be to make Lucene Benchmark generic enough to accept Solr impls and Solr algs? It does a lot that would suck to duplicate. -- -- - Mark http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Wikipedia or reuters like index for testing facets?
You think enwiki has enough data for faceting? On Tue, Jul 14, 2009 at 2:56 PM, Grant Ingersoll wrote: > At a min, it is trivial to use the EnWikiDocMaker and then send the doc over > SolrJ... > > On Jul 14, 2009, at 4:07 PM, Mark Miller wrote: > >> On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen < >> jason.rutherg...@gmail.com> wrote: >> >>> Is there a standard index like what Lucene uses for contrib/benchmark for >>> executing faceted queries over? Or maybe we can randomly generate one >>> that >>> works in conjunction with wikipedia? That way we can execute real world >>> queries against faceted data. Or we could use the Lucene/Solr mailing >>> lists >>> and other data (ala Lucid's faceted site) as a standard index? >>> >> >> I don't think there is any standard set of docs for solr testing - there >> is >> not a real benchmark contrib - though I know more than a few of us have >> hacked up pieces of Lucene benchmark to work with Solr - I think I've done >> it twice now ;) >> >> Would be nice to get things going. I was thinking the other day: I wonder >> how hard it would be to make Lucene Benchmark generic enough to accept >> Solr >> impls and Solr algs? >> >> It does a lot that would suck to duplicate. >> >> -- >> -- >> - Mark >> >> http://www.lucidimagination.com > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >
JMX monitoring for multiple SOLR instances
Hi, If I want to run multiple SOLR war files in tomcat is it possible to monitor each of the SOLR instances individually through JMX? Has anyone attempted this before? Also, what are the implications (e.g. performance) of runnign mulitple SOLR instances in the same tomcat server? Thanks. _ Windows Live™: Keep your life in sync. http://windowslive.com/explore?ocid=TXT_TAGLM_WL_BR_life_in_synch_062009
Re: Multicore Solr (trunk) creates extra dirs
Hi, Paul and Shalin will know about this. What I'm seeing looks a lot like what Walter reported in March: * http://markmail.org/thread/dfsj7hqi5buzhd6n And this commit from Paul seems possibly related: * http://markmail.org/message/cjvjffrfszlku3ri ...because of things like: -cores = new CoreContainer(new SolrResourceLoader(instanceDir)); +cores = new CoreContainer(new SolrResourceLoader(solrHome)); ... if (!idir.isAbsolute()) { - idir = new File(loader.getInstanceDir(), dcore.getInstanceDir()); + idir = new File(solrHome, dcore.getInstanceDir()); ... I don't have dataDir in my solr.xml, only absolute paths to my cores. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Otis Gospodnetic > To: solr-user@lucene.apache.org > Sent: Tuesday, July 14, 2009 5:49:19 PM > Subject: Multicore Solr (trunk) creates extra dirs > > > Hello, > > I just built solr.war from trunk and deployed it to a multicore solr server > whose solr.xml looks like this: > > > > > > > > > Each core has conf and data/index dirs under its instanceDir. > e.g. > > $ tree /mnt/solrhome/cores/core0 > cores/core0 > |-- conf > | |-- schema.xml -> ../../../conf/schema-foo.xml > | `-- solrconfig.xml -> ../../../conf/solrconfig-foo.xml > `-- data > `-- index > > I noticed that when I start the container with this brand new Solr all of a > sudden the '/mnt' directory shows up in /mnt/solrhome ! > (this /mnt/solrhome is also the directory from which I happened to start the > container, though I'm not sure if that matters). > > This is what this /mnt/solrhome/mnt looks like: > > $ tree /mnt/solrhome/mnt > mnt > `-- solrhome > `-- cores > |-- core0 > | `-- data > | `-- index > | |-- segments.gen > | `-- segments_1 > |-- core1 > | `-- data > | `-- index > | |-- segments.gen > | `-- segments_1 > > > So it looks like Solr decides to create the index dir and the full path ot it > there. It looks almost like Solr is looking at my instanceDirs in solr.xml > and > decides that it needs to create those directories, but under Solr home dir (I > use -Dsolr.solr.home=/mnt/solrhome). > > I switched back to the old solr.war and this stopped happening. > Is this a bug or a new feature that I missed? > > Thank you, > Otis
Re: Using Multiple fields in UniqueKey
Some ideas: - Use copyField to copy fields to the field designated as the uniqueKey (not sure if this will work) - Create the field from existing data before sending docs to Solr - Create a custom UpdateRequestProcessor that adds a field for each document it processes and stuffs it with other fields' values - Try http://wiki.apache.org/solr/Deduplication I'd be curious to know which of these you will choose. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Anand Kumar Prabhakar > To: solr-user@lucene.apache.org > Sent: Tuesday, July 14, 2009 5:13:47 AM > Subject: Using Multiple fields in UniqueKey > > > Is there any possiblity of Adding Multiple fields to the UniqueKey in > Schema.xml(An Implementation similar to Compound Primary Key)? > > > -- > View this message in context: > http://www.nabble.com/Using-Multiple-fields-in-UniqueKey-tp24476088p24476088.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 Release Date
I just looked at SOLR JIRA today and saw some 40 open issues marked for 1.4, so Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: pof > To: solr-user@lucene.apache.org > Sent: Tuesday, July 14, 2009 12:37:33 AM > Subject: Re: Solr 1.4 Release Date > > > Any updates on this? > > Cheers. > > Gurjot Singh wrote: > > > > Hi, I am curious to know when is the scheduled/tentative release date of > > Solr 1.4. > > > > Thanks, > > Gurjot > > > > > > -- > View this message in context: > http://www.nabble.com/Solr-1.4-Release-Date-tp23260381p24473570.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wikipedia or reuters like index for testing facets?
Probably not as generated by the EnwikiDocMaker, but the WikipediaTokenizer in Lucene can pull out richer syntax which could then be Teed/Sinked to other fields. Things like categories, related links, etc. Mostly, though, I was just commenting on the fact that it isn't hard to at least use it for getting docs into Solr. -Grant On Jul 14, 2009, at 7:38 PM, Jason Rutherglen wrote: You think enwiki has enough data for faceting? On Tue, Jul 14, 2009 at 2:56 PM, Grant Ingersoll wrote: At a min, it is trivial to use the EnWikiDocMaker and then send the doc over SolrJ... On Jul 14, 2009, at 4:07 PM, Mark Miller wrote: On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: Is there a standard index like what Lucene uses for contrib/ benchmark for executing faceted queries over? Or maybe we can randomly generate one that works in conjunction with wikipedia? That way we can execute real world queries against faceted data. Or we could use the Lucene/Solr mailing lists and other data (ala Lucid's faceted site) as a standard index? I don't think there is any standard set of docs for solr testing - there is not a real benchmark contrib - though I know more than a few of us have hacked up pieces of Lucene benchmark to work with Solr - I think I've done it twice now ;) Would be nice to get things going. I was thinking the other day: I wonder how hard it would be to make Lucene Benchmark generic enough to accept Solr impls and Solr algs? It does a lot that would suck to duplicate. -- -- - Mark http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Wikipedia or reuters like index for testing facets?
Why don't you just randomly generate the facet data? Thats prob the best way right? You can control the uniques and ranges. On Wed, Jul 15, 2009 at 1:21 AM, Grant Ingersoll wrote: > Probably not as generated by the EnwikiDocMaker, but the WikipediaTokenizer > in Lucene can pull out richer syntax which could then be Teed/Sinked to > other fields. Things like categories, related links, etc. Mostly, though, > I was just commenting on the fact that it isn't hard to at least use it for > getting docs into Solr. > > -Grant > > On Jul 14, 2009, at 7:38 PM, Jason Rutherglen wrote: > > You think enwiki has enough data for faceting? >> >> On Tue, Jul 14, 2009 at 2:56 PM, Grant Ingersoll >> wrote: >> >>> At a min, it is trivial to use the EnWikiDocMaker and then send the doc >>> over >>> SolrJ... >>> >>> On Jul 14, 2009, at 4:07 PM, Mark Miller wrote: >>> >>> On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: Is there a standard index like what Lucene uses for contrib/benchmark > for > executing faceted queries over? Or maybe we can randomly generate one > that > works in conjunction with wikipedia? That way we can execute real world > queries against faceted data. Or we could use the Lucene/Solr mailing > lists > and other data (ala Lucid's faceted site) as a standard index? > > I don't think there is any standard set of docs for solr testing - there is not a real benchmark contrib - though I know more than a few of us have hacked up pieces of Lucene benchmark to work with Solr - I think I've done it twice now ;) Would be nice to get things going. I was thinking the other day: I wonder how hard it would be to make Lucene Benchmark generic enough to accept Solr impls and Solr algs? It does a lot that would suck to duplicate. -- -- - Mark http://www.lucidimagination.com >>> >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>> Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >>> > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > -- -- - Mark http://www.lucidimagination.com
Re: support for Payload Feature of lucene in solr
Hi Shalin, Our requirement is to have a rolling window support for popularity of catalog items for say 3 months. What we used to do we are adding term,value as tokens where term is some unique string for each day and value is popularity count for that day. Once indexing this data as token stream and while building query search we build the whole query constructing term for each day or 7 days ..etc then we are extending SpanTermQuery where we used to do summing of payloads value for that duration (we have extended SpanScorer also to do this task) We got the result based this new score. Is it possible to do in solr somehow? Thanks, Sumit On Wed, Jul 15, 2009 at 3:25 AM, Grant Ingersoll wrote: > The TokenFilterFactory side is trivial for the DelimitedPayloadTokenFilter. > That could be in for 1.4. In fact, there is an automated way to generate > the stubs that should be run in preparing for a release. I'll see if I can > find a minute or two to make that happen. > > For query support, I've never hooked into the query parser, so I have no > clue. Yonik seems to crank out new query capabilities pretty fast, so maybe > it isn't too bad, even if it isn't done as fast as Yonik. Bigger picture, > it would be great to have spans support too. > > > > On Jul 14, 2009, at 3:44 PM, Shalin Shekhar Mangar wrote: > > It may be nice to tell us why you need payloads? There may be other ways >> of >> solving your problem than adding payload support to Solr? Anyway, I don't >> see payload support before 1.5 >> >> On Tue, Jul 14, 2009 at 10:07 PM, Sumit Aggarwal >> wrote: >> >> Hey Nobel, >>> Any comments on Grant suggestion. >>> >>> Thanks, >>> -Sumit >>> >>> On Tue, Jul 14, 2009 at 8:40 PM, Sumit Aggarwal >>> wrote: >>> >>> Hi Walter, I do have a search server where i have implemented things using payload feature itself. These days i am evaluating solr to get rid of my own >>> search >>> server. For that i need payloads feature in solr itself. I raised a >>> related >>> question and got a message from *Grant* as * "**I added a new DelimitedPayloadTokenFilter to Lucene that should make it really easy to send in payloads "inline" in Solr XML, so what remains >>> to >>> be done, I think is:* *1. Create a new TokenFilterFactory for the TokenFilter **2. Hook in some syntax support for creating a BoostingTermQuery in the Query Parsers.**"* Now can any one provide any custom code to do what grant mentioned. Thanks, Sumit On Tue, Jul 14, 2009 at 8:24 PM, Walter Underwood < >>> wunderw...@netflix.com>wrote: >>> That doesn't require payloads. I was doing that with Solr 1.1. Define > two >>> fields, stemmed and exact, with different analyzer chains. Use copyfield > to > load the same info into both. With the dismax handler, search both > fields >>> with a higher boost on the exact field. > > wunder > > On 7/14/09 7:39 AM, "Toby Cole" wrote: > > We're looking into using payloads to improve relevance for stemmed >> terms, using the payload to store the unstemmed term, boosting the >> term if there's an exact match with the payloads. >> > > > >>> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >
Re: grouping and sorting by facet?
: Is there a way to group and sort by facet count? I have a large set of : images, each of which is part of a different "collection." I am performing : a faceted search: : : /solr/select/?q=my+term&max=30&version=2.2&rows=30&start=0&facet=true&facet.field=collection&facet.sort=true : : I would like to group the results by collection count. : : So all of the images in the collection with the most image "hits" comes : first. : : Not sure how to do that there isn't really any way to do that. you could make two requests: in the first get the facet counts, and then in the second augment the query to boost documents that match the various facet field values (where the boost is determined by the count) bear in mind: this probably isn't going to be as useful of a user experience as it sounds. This will cause the results that have a lot in common with the other results to appear near the top -- but users rarely need UI assistance for finding the "common" stuff, the fact that it's common means it's lready pretty prevelant. There's also no reason to think that docs matching the facet value with the highest count are more relevant to the original query. If i'm searching products for "apple ipod" the category with the highest facet count is probably going to be something like "accessories" or "cases" and not "mp3 players" because there are a lot more chargers, cases, and headphones in the world that show up when you search for ipod then there are mp3 players -- that doesn't mean those accessory products should appear first in the list of matches. -Hoss
Segments_2 and segments.gen under Index folder and spellchecker1, spellchecker2, spellcheckerFile folder
I just upgraded our solr to 1.3.0 After I deployed the solr apps, I noticed there are: Segments_2 and segments.gen and there are 3 folder spellchecker1, spellchecker2 and spellcheckerFile What's these for? When I deleted them, I need bounce the apps again and it will generate the new ones again. Thanks Francis
Re: Segments_2 and segments.gen under Index folder and spellchecker1, spellchecker2, spellcheckerFile folder
On Wed, Jul 15, 2009 at 8:46 AM, Francis Yakin wrote: > > I just upgraded our solr to 1.3.0 > > After I deployed the solr apps, I noticed there are: > > Segments_2 and segments.gen and there are 3 folder spellchecker1, > spellchecker2 and spellcheckerFile > > What's these for? When I deleted them, I need bounce the apps again and it > will generate the new ones again. > segments.gen used to be created by older versions of Lucene. Since Solr 1.3, a file named segments_N (N=1,2,3...) will be created. Both exist because the new Solr version is pointing to an index created by the earlier Solr version. There's no harm in keeping it as-is, however if you want, you can clean the index directory and re-index all documents to get rid of the segments.gen file. The spellchecker directories are created by the SpellCheckComponent. You can comment out all the sections related to SpellCheckComponent from your solrconfig.xml and delete these directories. -- Regards, Shalin Shekhar Mangar.
DefaultSearchField ? "important"
Hallo Users... And good Morning, in germany it is morning :-) I have a realy important Prroblem... My Fields are realy Bad.. Like "CUPS_EBENE1_EBENE2_TASKS_CATEGORIE" I have no Content field ore somthing like this... So when i will search somthing, i need to search in ALL fields, but when i search "*:test" it dosent Work, And when i put "*" in the defaultSearchField" it dosent Work too How i can Search in ALL fields?
spellcheck with misspelled words in index
Hi, I'm having some trouble getting the correct results from the spellcheck component. I'd like to use it to suggest correct product titles on our site, however some of our products have misspellings in them outside of our control. For example, there's 2 products with the misspelled word "cusine" (and 25k with the correct spelling "cuisine"). So if someone searches for the word "cusine" on our site, I would like to show the 2 misspelled products, and a suggestion with "Did you mean cuisine?". However, I can't seem to ever get any spelling suggestions when I search by the word "cusine", and correctlySpelled is always true. Misspelled words that don't appear in the index work fine. I noticed that setting onlyMorePopular to true will return suggestions for the misspelled word, but I've found that it doesn't work great for other words and produces suggestions too often for correctly spelled words. I incorrectly had thought that by setting thresholdTokenFrequency higher on my spelling dictionary that these misspellings would not appear in my spelling index and thus I would get suggestions for them, but as I see now, the spellcheck doesn't quite work like that. Is there any way to somehow get spelling suggestions to work for these misspellings in my index if they have a low frequency? Thanks in advance, Chris