Boost, weight, proximity, ranking which one?
I am using solr 1.4 version. I have a requirement where need to show up all documents first which matched most words from the free text search string. e.g. If user was searching for two words with no quotes "connectivity breakup" my search results should display all documents where both words matched first and then those docs where one or the other word (from those two search words) matched. And those two search words could be present in two separate fields in solr index then also I need to rank that document higher up. can someone explain how this can be achieved in Solr. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-weight-proximity-ranking-which-one-tp1414512p1414512.html Sent from the Solr - User mailing list archive at Nabble.com.
Numeric search in text field
Hello, I have a string "Marsh 1" (no quotes while searching). If I put "Marsh 1" in the search box with no quotes I get expected results back but when I search for just "1" (again no quotes) I don't get any results back. I use WorldDelimiterFactory as follow. Any idea? -- View this message in context: http://lucene.472066.n3.nabble.com/Numeric-search-in-text-field-tp1633741p1633741.html Sent from the Solr - User mailing list archive at Nabble.com.
Partial search extremly slow
Since my users wanted to have a partial search functionality I had to introduce following. I have declared two EdgeNGram filters with both side "back" and "front" since they wanted to have partial search working from any side. When executing search (which brings back 4K plus records from the index) response time extremely slow. The two db columns which I index and search against are huge and where one of the db columns is of type CLOB. This is to give you an idea that this db column of type CLOB is being indexed with "edgyText" and also searched upon. >From documentation I understand partial search behaves slow due to "gram" nature. what's the best way to implement this functionality and still get good response time? -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-search-extremly-slow-tp2572861p2572861.html Sent from the Solr - User mailing list archive at Nabble.com.
Smart Pagination queries
e.g. There are 4,000 solr documents that were found for a particular word search. My app has entitlement rules applied to those 4,000 documents and it's quite possible that user is only eligible to view 3,000 results out of 4K. This is achieved through post filtering application logic. My question related to solr pagination is : In order to paint "Next" links app would have to know total number of records that user is eligible for read. getNumFound() will tell me that there are total 4K records that Solr returned. If there wasn't any entitlement rules then it could have been easier to determine how many "Next" links to paint and when user clicks on "Next" pass in "start" position appropriately in solr query. Since I have to apply post filter as and when results are fetched from Solr is there a better way to achieve this? e.g. Because of post filtering I wouldn't know whether to paint "Next" link until results for "next" links are pre-fetched and filtered. Pre-fetching won't work as that would kill the performance and have no meaning of Solr pagination. Any better suggestion? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Smart-Pagination-queries-tp2652273p2652273.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom request handler/plugin
I have a requirement to filter solr results, based on my application entitlement rules,before they are returned from Solr to the application. In order to do this, I have to implement a custom plugin, which gets entire solr results and apply my application rules (which will be exposed via web service from the app for solr to consume). This way reults will be completely filtered from Solr itself before application gets response from Solr. Currently my search uses DisMax. Now will need to use DisMax first and then custom plugin to filter the results. What's the best way to implement. FYI: I have gone thru SolrPlugin wiki page but need more info of how chaining (if possible) for handlers can be used. e.g. First dixmax and then custom plugin/handler. Please advise. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-request-handler-plugin-tp2673822p2673822.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom request handler/plugin
Below are the reasons why I thought it wouldn't be feasible to have pre-filtered results with filter queries. please comment. Since can't pen down direct business reqs due to confidentially contact with the client, I'll mock out scenario using an example. - There is a parent entity called "Quiz", which has multiple "assignees". Quiz has one-to-many with another entity called "Schools" and "School" has multiple "Assessors". -There are two separate indexes 1) for Quiz, Quiz title, details, status are the searchable attributes of the Quiz and stored in the Quiz index. 2) School index. which has school name, and some other school related searchable attributes in the index now for e.g. when someone searches against Quiz title, search results will be returned to the application. before displaying results following access rules kick in from the application. they are nothing but java rules from the application, which decide whether person can view a particular quiz document (returned from Solr). -Rule first checks against an external entitlement service which has authorization policy defining entitlement roles. This external entitlement service first returns "true" or "false" that whether person can "view" "quiz" entity or not. -If it returned false then that document is thrown out from the resultset. -If it returned "true" then further check in the rule is; check assignees of the quiz(from the database), if person who logged in is one of the assignees of that quiz document, which s/he is trying to view. If yes then only s/he can view else not. -There is a super admin role in the entitlement service (again this is external to our app). If logged in person is super admin then s/he can view document. No need to check quiz assignments. If I store quiz assignees into the index then on the fly update of a document in the index would be too slow (yes we do update/insert index documents on the fly when a new record is created or updated in the db). Plus as I mentioned for super admin role there aren't any assignments. If I decide to store super admins into the index, then it would mean there will have to be some asynchronous thread running against entitlement service and monitor this "super admin" role, everytime new person is added to that role, add him to each indexed document's assignment list? The whole above thing looked too convoluted. Hence thought it would be clean to leverage existing application logic (rules) and continue post processing. That way pagination can work correctly if solr retruned resultset is also filtered. If solr returned resultset isn't filtered then pagination won't work correctly as application wouldn't know how many documents will be kicked out (can't know how many "next" links to paint) Sorry for the lengthy post but thought describing entire scenario would make things clear from requirement and infrastructure point of view. -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-request-handler-plugin-tp2673822p2674913.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom request handler/plugin
Thanks for the response. Finally I have decided to build access intelligence into the Solr to pre filter the results by storing required attributes in the index to determine the access. -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-request-handler-plugin-tp2673822p2696319.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr performance
I have some 25 odd fields with "stored=true" in schema.xml. Retrieving back 5,000 records back takes a few secs. I also tried passing "fl" and only include one field in the response but still response time is same. What are the things to look to tune the performance. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-performance-tp2926836p2926836.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr performance
Alright. It turned out that defaultSearchField=title where title field is of a custom fieldType=edgyText where so if no value in the "q" parameter is passed, solr picks up default field, which is tiltle of type "edgyText" taking a very long time to return results. Is there a way to IGNORE default(which is there in schema.xml) field dynamically while I only want to search filterlist on keys (e.g. fl=keys)? gram search is slowing things down extremely. Crazy clients want to have minimum word =1, which is kind of insane but that's how it is. Any idea? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-performance-tp2926836p2935175.html Sent from the Solr - User mailing list archive at Nabble.com.
can't find solr.xml
I have downloaded apache-solr-1.3.0.tgz for Linux and don't see solr.xml. can someone assist. -- View this message in context: http://old.nabble.com/can%27t-find-solr.xml-tp26136630p26136630.html Sent from the Solr - User mailing list archive at Nabble.com.
Index documents with Solr
Wanted to find out how people are using Solr’s ExtractingRequestHandler to index different types of documents from a configuration file in an import fashion. I want to use this handler in a similar way how DataImportHandler works where you can issue “import” command from the URL to create an index reading database table(s). For documents, I have a db table which stores files paths. Want to read file’s location from a db table then create an index after reading document content using ExtractingRequestHandler. Again trying to see if all this can be done just from a configuration same way how DataImportHandler handles this -- View this message in context: http://old.nabble.com/Index-documents-with-Solr-tp26205991p26205991.html Sent from the Solr - User mailing list archive at Nabble.com.
how to search against multiple attributes in the index
I want to build AND search query against field1 AND field2 etc. Both these fields are stored in an index. I am migrating lucene code to Solr. Following is my existing lucene code BooleanQuery currentSearchingQuery = new BooleanQuery(); currentSearchingQuery.add(titleDescQuery,Occur.MUST); highlighter = new Highlighter( new QueryScorer(titleDescQuery)); TermQuery searchTechGroupQyery = new TermQuery(new Term ("techGroup",searchForm.getTechGroup())); currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); TermQuery searchProgramQyery = new TermQuery(new Term("techProgram",searchForm.getTechProgram())); currentSearchingQuery.add(searchProgramQyery, Occur.MUST); } What's the equivalent Solr code for above Luce code. Any samples would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to search against multiple attributes in the index
I already did dive in before. I am using solrj API and SolrQuery object to build query. but its not clear/written how to build booleanQuery ANDing bunch of different attributes in the index. Any samples please? Avlesh Singh wrote: > > Dive in - http://wiki.apache.org/solr/Solrj > > Cheers > Avlesh > > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev wrote: > >> >> I want to build AND search query against field1 AND field2 etc. Both >> these >> fields are stored in an index. I am migrating lucene code to Solr. >> Following >> is my existing lucene code >> >> BooleanQuery currentSearchingQuery = new BooleanQuery(); >> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST); >> highlighter = new Highlighter( new QueryScorer(titleDescQuery)); >> >> TermQuery searchTechGroupQyery = new TermQuery(new Term >> ("techGroup",searchForm.getTechGroup())); >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); >> TermQuery searchProgramQyery = new TermQuery(new >> Term("techProgram",searchForm.getTechProgram())); >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST); >> } >> >> What's the equivalent Solr code for above Luce code. Any samples would be >> appreciated. >> >> Thanks, >> -- >> View this message in context: >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to search against multiple attributes in the index
I think I found the answer. needed to read more API documentation :-) you can do it using solrQuery.setFilterQueries() and build AND queries of multiple parameters. Avlesh Singh wrote: > > For a starting point, this might be a good read - > http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query > > Cheers > Avlesh > > On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev > wrote: > >> >> I already did dive in before. I am using solrj API and SolrQuery object >> to >> build query. but its not clear/written how to build booleanQuery ANDing >> bunch of different attributes in the index. Any samples please? >> >> Avlesh Singh wrote: >> > >> > Dive in - http://wiki.apache.org/solr/Solrj >> > >> > Cheers >> > Avlesh >> > >> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev >> wrote: >> > >> >> >> >> I want to build AND search query against field1 AND field2 etc. Both >> >> these >> >> fields are stored in an index. I am migrating lucene code to Solr. >> >> Following >> >> is my existing lucene code >> >> >> >> BooleanQuery currentSearchingQuery = new BooleanQuery(); >> >> >> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST); >> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery)); >> >> >> >> TermQuery searchTechGroupQyery = new TermQuery(new Term >> >> ("techGroup",searchForm.getTechGroup())); >> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); >> >> TermQuery searchProgramQyery = new TermQuery(new >> >> Term("techProgram",searchForm.getTechProgram())); >> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST); >> >> } >> >> >> >> What's the equivalent Solr code for above Luce code. Any samples would >> be >> >> appreciated. >> >> >> >> Thanks, >> >> -- >> >> View this message in context: >> >> >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to search against multiple attributes in the index
great. thanks. that was helpful Avlesh Singh wrote: > >> >> you can do it using >> solrQuery.setFilterQueries() and build AND queries of multiple >> parameters. >> > Nope. You would need to read more - > http://wiki.apache.org/solr/FilterQueryGuidance > > For your impatience, here's a quick starter - > > #and between two fields > solrQuery.setQuery("+field1:foo +field2:bar"); > > #or between two fields > solrQuery.setQuery("field1:foo field2:bar"); > > Cheers > Avlesh > > On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev > wrote: > >> >> I think I found the answer. needed to read more API documentation :-) >> >> you can do it using >> solrQuery.setFilterQueries() and build AND queries of multiple >> parameters. >> >> >> Avlesh Singh wrote: >> > >> > For a starting point, this might be a good read - >> > >> http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query >> > >> > Cheers >> > Avlesh >> > >> > On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev >> > wrote: >> > >> >> >> >> I already did dive in before. I am using solrj API and SolrQuery >> object >> >> to >> >> build query. but its not clear/written how to build booleanQuery >> ANDing >> >> bunch of different attributes in the index. Any samples please? >> >> >> >> Avlesh Singh wrote: >> >> > >> >> > Dive in - http://wiki.apache.org/solr/Solrj >> >> > >> >> > Cheers >> >> > Avlesh >> >> > >> >> > On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev >> >> wrote: >> >> > >> >> >> >> >> >> I want to build AND search query against field1 AND field2 etc. >> Both >> >> >> these >> >> >> fields are stored in an index. I am migrating lucene code to Solr. >> >> >> Following >> >> >> is my existing lucene code >> >> >> >> >> >> BooleanQuery currentSearchingQuery = new BooleanQuery(); >> >> >> >> >> >> currentSearchingQuery.add(titleDescQuery,Occur.MUST); >> >> >> highlighter = new Highlighter( new QueryScorer(titleDescQuery)); >> >> >> >> >> >> TermQuery searchTechGroupQyery = new TermQuery(new Term >> >> >> ("techGroup",searchForm.getTechGroup())); >> >> >>currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST); >> >> >> TermQuery searchProgramQyery = new TermQuery(new >> >> >> Term("techProgram",searchForm.getTechProgram())); >> >> >>currentSearchingQuery.add(searchProgramQyery, Occur.MUST); >> >> >> } >> >> >> >> >> >> What's the equivalent Solr code for above Luce code. Any samples >> would >> >> be >> >> >> appreciated. >> >> >> >> >> >> Thanks, >> >> >> -- >> >> >> View this message in context: >> >> >> >> >> >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html >> >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> >> >> >> > >> >> > >> >> >> >> -- >> >> View this message in context: >> >> >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26340776.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to use DataImportHandler with ExtractingRequestHandler?
did you extend DIH to do this work? can you share code samples. I have similar requirement where I need tp index database records and each record has a column with document path so need to create another index for documents (we allow users to search both index separately) in parallel with reading some meta data of documents from database as well. I have all sorts of different document formats to index. fyi; I am on solr 1.4.0. Any pointers would be appreciated. Thanks, Sascha Szott wrote: > > Hi Khai, > > a few weeks ago, I was facing the same problem. > > In my case, this workaround helped (assuming, you're using Solr 1.3): > For each row, extract the content from the corresponding pdf file using > a parser library of your choice (I suggest Apache PDFBox or Apache Tika > in case you need to process other file types as well), put it between > > > > and store it in a text file. To keep the relationship between a file and > its corresponding database row, use the primary key as the file name. > > Within data-config.xml use the XPathEntityProcessor as follows (replace > dbRow and primaryKey respectively): > >processor="XPathEntityProcessor" > forEach="/foo" > url="${dbRow.primaryKey}.xml"> > > > > > And, by the way, in Solr 1.4 you do not have to put your content between > xml tags: use the PlainTextEntityProcessor instead of > XPathEntityProcessor. > > Best, > Sascha > > Khai Doan schrieb: >> Hi all, >> >> My name is Khai. I have a table in a relational database. I have >> successfully use DataImportHandler to import this data into Apache Solr. >> However, one of the column store the location of PDF file. How can I >> configure DataImportHandler to use ExtractingRequestHandler to extract >> the >> content of the PDF? >> >> Thanks! >> >> Khai Doan >> > > > -- View this message in context: http://old.nabble.com/How-to-use-DataImportHandler-with-ExtractingRequestHandler--tp25267745p26443544.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Index documents with Solr
Glock, did you get this approach to work? let me know. Thanks, Glock, Thomas wrote: > > I have a similar situation but not expecting any easy setup. Currently > the tables contain both a url to the file and quite a bit of additional > metadata about the file. I'm planning one initial load to Solr by > creating xml in my own utility which posts the xml. Data is messy so DIH > is not a good choice for this situation. After the initial load (only > ~12K documents - takes 10 minutes tops); I plan to perform a second pass > which will use the extractingrequesthandler. I know how the id will map > but not clear yet how to get that id to ExtractingRequestHandler. Would be > good to see different examples on the Wiki. Have not yet had a first > attempt - hoping to in a day or so. > > > -----Original Message- > From: javaxmlsoapdev [mailto:vika...@yahoo.com] > Sent: Wed 04-Nov-2009 5:42 PM > To: solr-user@lucene.apache.org > Subject: Index documents with Solr > > > Wanted to find out how people are using Solr's ExtractingRequestHandler to > index different types of documents from a configuration file in an import > fashion. I want to use this handler in a similar way how DataImportHandler > works where you can issue "import" command from the URL to create an index > reading database table(s). > > For documents, I have a db table which stores files paths. Want to read > file's location from a db table then create an index after reading > document > content using ExtractingRequestHandler. Again trying to see if all this > can > be done just from a configuration same way how DataImportHandler handles > this > > -- > View this message in context: > http://old.nabble.com/Index-documents-with-Solr-tp26205991p26205991.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > > -- View this message in context: http://old.nabble.com/Index-documents-with-Solr-tp26205991p26443551.html Sent from the Solr - User mailing list archive at Nabble.com.
Very busy search screen
I have a client who wants to search on almost every attribute of an object (nearly 15 attributes) on the search screen. Search sreen looks very crazy/busy. I was wondering if there are better ways to address these requirements and build intelligent categorized/configurable searchs? including allowing user to choose if they want to AND or OR attributes etc? Any pointers would be appreciated. thanks, -- View this message in context: http://old.nabble.com/Very-busy-search-screen-tp26482092p26482092.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to use DataImportHandler with ExtractingRequestHandler?
Anyone any idea? javaxmlsoapdev wrote: > > did you extend DIH to do this work? can you share code samples. I have > similar requirement where I need tp index database records and each record > has a column with document path so need to create another index for > documents (we allow users to search both index separately) in parallel > with reading some meta data of documents from database as well. I have all > sorts of different document formats to index. I am on solr 1.4.0. Any > pointers would be appreciated. > > Thanks, > > > -- View this message in context: http://old.nabble.com/How-to-use-DataImportHandler-with-ExtractingRequestHandler--tp25267745p26485245.html Sent from the Solr - User mailing list archive at Nabble.com.
ExternalRequestHandler and ContentStreamUpdateRequest usage
Following code is from my test case where it tries to index a file (of type .txt) ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); up.addFile(fileToIndex); up.setParam("literal.key", "8978"); //key is the uniqueId up.setParam("ext.literal.docName", "doc123.txt"); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); server.request(up); test case doesn't give me any error and "I think" its indexing the file? but when I search for a text (which was part of the .txt file) search doesn't return me anything. Following is the config from solrconfig.xml where I have mapped content to "description" field(default search field) in the schema. description description Clearly it seems I am missing something. Any idea? Thanks, -- View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26486817.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
*:* returns me 1 count but when I search for specific word (which was part of .txt file I indexed before) it doesn't return me anything. I don't have luke setup on my end. let me see if I can set that up quickly but otherwise do you see anything I am missing in solrconfig mapping or something? which maps document "content" to wrong attribute? thanks, Grant Ingersoll-6 wrote: > > > On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote: > >> >> Following code is from my test case where it tries to index a file (of >> type >> .txt) >> ContentStreamUpdateRequest up = new >> ContentStreamUpdateRequest("/update/extract"); >> up.addFile(fileToIndex); >> up.setParam("literal.key", "8978"); //key is the uniqueId >> up.setParam("ext.literal.docName", "doc123.txt"); >> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); >> server.request(up); >> >> test case doesn't give me any error and "I think" its indexing the file? >> but >> when I search for a text (which was part of the .txt file) search doesn't >> return me anything. > > What do your logs show? Else, what does Luke show or doing a *:* query > (assuming this is the only file you added)? > > Also, I don't think you need ext.literal anymore, just literal. > >> >> Following is the config from solrconfig.xml where I have mapped content >> to >> "description" field(default search field) in the schema. >> >> > class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> >> >> description >> description >> >> >> >> Clearly it seems I am missing something. Any idea? > > > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > > -- View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
FYI: weirdly its returning me following when I run rsp.getResults().get(0).getFieldValue("description") [702, text/plain, doc123.txt, ] so it seems like its storing up.setParam("ext.literal.docName", "doc123.txt"); into description versus file content in "description" attribute. Any idea? Thanks, javaxmlsoapdev wrote: > > *:* returns me 1 count but when I search for specific word (which was part > of .txt file I indexed before) it doesn't return me anything. I don't have > luke setup on my end. let me see if I can set that up quickly but > otherwise do you see anything I am missing in solrconfig mapping or > something? which maps document "content" to wrong attribute? > > thanks, > > Grant Ingersoll-6 wrote: >> >> >> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote: >> >>> >>> Following code is from my test case where it tries to index a file (of >>> type >>> .txt) >>> ContentStreamUpdateRequest up = new >>> ContentStreamUpdateRequest("/update/extract"); >>> up.addFile(fileToIndex); >>> up.setParam("literal.key", "8978"); //key is the uniqueId >>> up.setParam("ext.literal.docName", "doc123.txt"); >>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); >>> server.request(up); >>> >>> test case doesn't give me any error and "I think" its indexing the file? >>> but >>> when I search for a text (which was part of the .txt file) search >>> doesn't >>> return me anything. >> >> What do your logs show? Else, what does Luke show or doing a *:* query >> (assuming this is the only file you added)? >> >> Also, I don't think you need ext.literal anymore, just literal. >> >>> >>> Following is the config from solrconfig.xml where I have mapped content >>> to >>> "description" field(default search field) in the schema. >>> >>> >> class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> >>> >>> description >>> description >>> >>> >>> >>> Clearly it seems I am missing something. Any idea? >> >> >> >> -- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >> Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >> > > -- View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487409.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
http://machinename:port/solr/admin/luke gives me 404 error so seems like its not able to find luke. I am reusing schema, which is used for indexing other entity from database, which has no relevance to documents. that was my next question that what do I put in, in a schema if my documents don't need any column mappings or anything. plus I want to keep file documents index separately from database entity index. what's the best way to do this? If I don't have any db columns etc to map and file documents index should leave separate from db entity index, what's the best way to achieve this. thanks, Grant Ingersoll-6 wrote: > > > On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote: > >> >> *:* returns me 1 count but when I search for specific word (which was >> part of >> .txt file I indexed before) it doesn't return me anything. I don't have >> luke >> setup on my end. > > http://localhost:8983/solr/admin/luke should give yo some info. > > >> let me see if I can set that up quickly but otherwise do >> you see anything I am missing in solrconfig mapping or something? > > What's your schema look like and how are you querying? > >> which maps >> document "content" to wrong attribute? >> >> thanks, >> >> Grant Ingersoll-6 wrote: >>> >>> >>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote: >>> >>>> >>>> Following code is from my test case where it tries to index a file (of >>>> type >>>> .txt) >>>> ContentStreamUpdateRequest up = new >>>> ContentStreamUpdateRequest("/update/extract"); >>>> up.addFile(fileToIndex); >>>> up.setParam("literal.key", "8978"); //key is the uniqueId >>>> up.setParam("ext.literal.docName", "doc123.txt"); >>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); >>>> server.request(up); >>>> >>>> test case doesn't give me any error and "I think" its indexing the >>>> file? >>>> but >>>> when I search for a text (which was part of the .txt file) search >>>> doesn't >>>> return me anything. >>> >>> What do your logs show? Else, what does Luke show or doing a *:* query >>> (assuming this is the only file you added)? >>> >>> Also, I don't think you need ext.literal anymore, just literal. >>> >>>> >>>> Following is the config from solrconfig.xml where I have mapped content >>>> to >>>> "description" field(default search field) in the schema. >>>> >>>> >>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> >>>> >>>> description >>>> description >>>> >>>> >>>> >>>> Clearly it seems I am missing something. Any idea? >>> >>> >>> >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>> Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > > -- View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26497295.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
I was able to configure /docs index separately from my db data index. still I am seeing same behavior where it only puts .docName & its size in the "content" field (I have renamed field to "content" in this new schema) below are the only two fields I have in schema.xml Following is updated code from test case File fileToIndex = new File("file.txt"); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); up.addFile(fileToIndex); up.setParam("literal.key", "8978"); up.setParam("literal.docName", "doc123.txt"); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); NamedList list = server.request(up); assertNotNull("Couldn't upload .txt",list); QueryResponse rsp = server.query( new SolrQuery( "*:*") ); assertEquals( 1, rsp.getResults().getNumFound() ); System.out.println(rsp.getResults().get(0).getFieldValue("content")); Also from solr admin UI when I search for "doc123.txt" then only it returns me following response. not sure why its not indexing file's content into "content" attribute. - - - 702 text/plain doc123.txt 8978 Any idea? Thanks, javaxmlsoapdev wrote: > > http://machinename:port/solr/admin/luke gives me 404 error so seems like > its not able to find luke. > > I am reusing schema, which is used for indexing other entity from > database, which has no relevance to documents. that was my next question > that what do I put in, in a schema if my documents don't need any column > mappings or anything. plus I want to keep file documents index separately > from database entity index. what's the best way to do this? If I don't > have any db columns etc to map and file documents index should leave > separate from db entity index, what's the best way to achieve this. > > thanks, > > > > Grant Ingersoll-6 wrote: >> >> >> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote: >> >>> >>> *:* returns me 1 count but when I search for specific word (which was >>> part of >>> .txt file I indexed before) it doesn't return me anything. I don't have >>> luke >>> setup on my end. >> >> http://localhost:8983/solr/admin/luke should give yo some info. >> >> >>> let me see if I can set that up quickly but otherwise do >>> you see anything I am missing in solrconfig mapping or something? >> >> What's your schema look like and how are you querying? >> >>> which maps >>> document "content" to wrong attribute? >>> >>> thanks, >>> >>> Grant Ingersoll-6 wrote: >>>> >>>> >>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote: >>>> >>>>> >>>>> Following code is from my test case where it tries to index a file (of >>>>> type >>>>> .txt) >>>>> ContentStreamUpdateRequest up = new >>>>> ContentStreamUpdateRequest("/update/extract"); >>>>> up.addFile(fileToIndex); >>>>> up.setParam("literal.key", "8978"); //key is the uniqueId >>>>> up.setParam("ext.literal.docName", "doc123.txt"); >>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); >>>>> server.request(up); >>>>> >>>>> test case doesn't give me any error and "I think" its indexing the >>>>> file? >>>>> but >>>>> when I search for a text (which was part of the .txt file) search >>>>> doesn't >>>>> return me anything. >>>> >>>> What do your logs show? Else, what does Luke show or doing a *:* query >>>> (assuming this is the only file you added)? >>>> >>>> Also, I don't think you need ext.literal anymore, just literal. >>>> >>>>> >>>>> Following is the config from solrconfig.xml where I have mapped >>>>> content >>>>> to >>>>> "description" field(default search field) in the schema. >>>>> >>>>> >>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> >>>>> >>>>> description >>>>> description >>>>> >>>>> >>>>> >>>>> Clearly it seems I am missing something. Any idea? >>>> >>>> >>>> >>>> -- >>>> Grant Ingersoll >>>> http://www.lucidimagination.com/ >>>> >>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >>>> using >>>> Solr/Lucene: >>>> http://www.lucidimagination.com/search >>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> -- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >> Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >> > > -- View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26498552.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
I was able to configure /docs index separately from my db data index. still I am seeing same behavior where it only puts .docName & its size in the "content" field (I have renamed field to "content" in this new schema) below are the only two fields I have in schema.xml Following is updated code from test case File fileToIndex = new File("file.txt"); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); up.addFile(fileToIndex); up.setParam("literal.key", "8978"); up.setParam("literal.docName", "doc123.txt"); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); NamedList list = server.request(up); assertNotNull("Couldn't upload .txt",list); QueryResponse rsp = server.query( new SolrQuery( "*:*") ); assertEquals( 1, rsp.getResults().getNumFound() ); System.out.println(rsp.getResults().get(0).getFieldValue("content")); Also from solr admin UI when I search for "doc123.txt" then only it returns me following response. not sure why its not indexing file's content into "content" attribute. 702 text/plain doc123.txt 8978 Any idea? Thanks, -- View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26498946.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Following is luke response. is empty. can someone assist to find out why file content isn't being index? 0 0 0 0 0 1259085661332 false true false org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index 2009-11-24T18:01:01Z Indexed Tokenized Stored Multivalued TermVector Stored Store Offset With TermVector Store Position With TermVector Omit Norms Lazy Binary Compressed Sort Missing First Sort Missing Last Document Frequency (df) is not updated when a document is marked for deletion. df values include deleted documents. javaxmlsoapdev wrote: > > I was able to configure /docs index separately from my db data index. > > still I am seeing same behavior where it only puts .docName & its size in > the "content" field (I have renamed field to "content" in this new schema) > > below are the only two fields I have in schema.xml > required="true" /> > multiValued="true"/> > > Following is updated code from test case > > File fileToIndex = new File("file.txt"); > > ContentStreamUpdateRequest up = new > ContentStreamUpdateRequest("/update/extract"); > up.addFile(fileToIndex); > up.setParam("literal.key", "8978"); > up.setParam("literal.docName", "doc123.txt"); > up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); > NamedList list = server.request(up); > assertNotNull("Couldn't upload .txt",list); > > QueryResponse rsp = server.query( new SolrQuery( "*:*") ); > assertEquals( 1, rsp.getResults().getNumFound() ); > System.out.println(rsp.getResults().get(0).getFieldValue("content")); > > Also from solr admin UI when I search for "doc123.txt" then only it > returns me following response. not sure why its not indexing file's > content into "content" attribute. > - > - > - > 702 > text/plain > doc123.txt > > > 8978 > > > > Any idea? > > Thanks, > > > javaxmlsoapdev wrote: >> >> http://machinename:port/solr/admin/luke gives me 404 error so seems like >> its not able to find luke. >> >> I am reusing schema, which is used for indexing other entity from >> database, which has no relevance to documents. that was my next question >> that what do I put in, in a schema if my documents don't need any column >> mappings or anything. plus I want to keep file documents index separately >> from database entity index. what's the best way to do this? If I don't >> have any db columns etc to map and file documents index should leave >> separate from db entity index, what's the best way to achieve this. >> >> thanks, >> >> >> >> Grant Ingersoll-6 wrote: >>> >>> >>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote: >>> >>>> >>>> *:* returns me 1 count but when I search for specific word (which was >>>> part of >>>> .txt file I indexed before) it doesn't return me anything. I don't have >>>> luke >>>> setup on my end. >>> >>> http://localhost:8983/solr/admin/luke should give yo some info. >>> >>> >>>> let me see if I can set that up quickly but otherwise do >>>> you see anything I am missing in solrconfig mapping or something? >>> >>> What's your schema look like and how are you querying? >>> >>>> which maps >>>> document "content" to wrong attribute? >>>> >>>> thanks, >>>> >>>> Grant Ingersoll-6 wrote: >>>>> >>>>> >>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote: >>>>> >>>>>> >>>>>> Following code is from my test case where it tries to index a file >>>>>> (of >>>>>> type >>>>>> .txt) >>>>>> ContentStreamUpdateRequest up = new >>>>>> ContentStreamUpdateRequest("/update/extract"); >>>>>> up.addFile(fileToIndex); >>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId >>>>>> up.setParam("ext.literal.docName", "doc123.txt"); >>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); >>>>>> server.reque
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Grant, can you assist. I am going clueless as to why its not indexing content of the file. I have provided schema, code info below/previous threads. do I need to explicitly add param("content", "') into ContentStreamUpdateRequest? which I don't think is the right thing to do. Please advie. let me know if you need anything else. Appreciate your help. Thanks, javaxmlsoapdev wrote: > > Following is luke response. is empty. can someone > assist to find out why file content isn't being index? > > > > > 0 > 0 > > > 0 > 0 > 0 > 1259085661332 > false > true > false >name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index > > 2009-11-24T18:01:01Z > > > > > Indexed > Tokenized > Stored > Multivalued > TermVector Stored > Store Offset With TermVector > Store Position With TermVector > Omit Norms > Lazy > Binary > Compressed > Sort Missing First > Sort Missing Last > > Document Frequency (df) is not updated when a document > is marked for deletion. df values include deleted documents. > > > > javaxmlsoapdev wrote: >> >> I was able to configure /docs index separately from my db data index. >> >> still I am seeing same behavior where it only puts .docName & its size in >> the "content" field (I have renamed field to "content" in this new >> schema) >> >> below are the only two fields I have in schema.xml >> > required="true" /> >> > multiValued="true"/> >> >> Following is updated code from test case >> >> File fileToIndex = new File("file.txt"); >> >> ContentStreamUpdateRequest up = new >> ContentStreamUpdateRequest("/update/extract"); >> up.addFile(fileToIndex); >> up.setParam("literal.key", "8978"); >> up.setParam("literal.docName", "doc123.txt"); >> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); >> NamedList list = server.request(up); >> assertNotNull("Couldn't upload .txt",list); >> >> QueryResponse rsp = server.query( new SolrQuery( "*:*") ); >> assertEquals( 1, rsp.getResults().getNumFound() ); >> System.out.println(rsp.getResults().get(0).getFieldValue("content")); >> >> Also from solr admin UI when I search for "doc123.txt" then only it >> returns me following response. not sure why its not indexing file's >> content into "content" attribute. >> - >> - >> - >> 702 >> text/plain >> doc123.txt >> >> >> 8978 >> >> >> >> Any idea? >> >> Thanks, >> >> >> javaxmlsoapdev wrote: >>> >>> http://machinename:port/solr/admin/luke gives me 404 error so seems like >>> its not able to find luke. >>> >>> I am reusing schema, which is used for indexing other entity from >>> database, which has no relevance to documents. that was my next question >>> that what do I put in, in a schema if my documents don't need any column >>> mappings or anything. plus I want to keep file documents index >>> separately from database entity index. what's the best way to do this? >>> If I don't have any db columns etc to map and file documents index >>> should leave separate from db entity index, what's the best way to >>> achieve this. >>> >>> thanks, >>> >>> >>> >>> Grant Ingersoll-6 wrote: >>>> >>>> >>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote: >>>> >>>>> >>>>> *:* returns me 1 count but when I search for specific word (which was >>>>> part of >>>>> .txt file I indexed before) it doesn't return me anything. I don't >>>>> have luke >>>>> setup on my end. >>>> >>>> http://localhost:8983/solr/admin/luke should give yo some info. >>>> >>>> >>>>> let me see if I can set that up quickly but otherwise do >>>>> you see anything I am missing in solrconfig mapping or something? >>>> >>>> What's your schema look like and how are you querying? >>>> >>>>> which maps >
Where to put ExternalRequestHandler and Tika jars
My SOLR_HOME =/home/solr_1_4_0/apache-solr-1.4.0/example/solr/conf in tomcat.sh POI, PDFBox, Tika and related jars are under /home/solr_1_4_0/apache-solr-1.4.0/lib When I try to index files using SolrJ API as follow, I don't see content of the file being indexed. It only indexes file size (bytes) and file/type into "content" field. See below schema defintion as well. ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); up.addFile(file); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); server.request(up); schema.xml has following content And solrconfig.xml has content content Luke response is as below, which displays correct count (7) of indexed documents but no "content" in the index. in tomcat logs I don't see any errors or anything. Unless I am going blind with something I don't see anything missing in setting things up. Can anyone advise. Do I need to include tika jars in tomcat's deployed solr/lib or unde /example/lib in SOLR_HOME? - - 0 28 - 7 7 25 1259164190261 false true false org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index 2009-11-25T15:50:03Z - - text ITSM-- ITS-- 7 18 - 3 3 3 3 2 2 1 1 1 1 - 12 2 4 - slong I-SO-l I-SO- 7 7 - 1 1 1 1 1 1 1 - 7 - - Indexed Tokenized Stored Multivalued TermVector Stored Store Offset With TermVector Store Position With TermVector Omit Norms Lazy Binary Compressed Sort Missing First Sort Missing Last Document Frequency (df) is not updated when a document is marked for deletion. df values include deleted documents. -- View this message in context: http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26515579.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Where to put ExternalRequestHandler and Tika jars
g. I had to include tika and related parsing jars into tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake. apologies for all the noise. Thanks, -- View this message in context: http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html Sent from the Solr - User mailing list archive at Nabble.com.
Batch file upload using solrJ API
Is there an API to upload files over one connection versus looping through all the files and creating new ContentStreamUpdateRequest for each file. This, as expected, doesn't work if there are large number of files and quickly run into memory problems. Please advise. Thanks, -- View this message in context: http://old.nabble.com/Batch-file-upload-using-solrJ-API-tp26518167p26518167.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Where to put ExternalRequestHandler and Tika jars
Yes. code I posted in first thread does work. And I am able to retrieve data from the document index. did you include all required jars in deployed solr application's lib folder? what errors are you seeing? Juan Pedro Danculovic wrote: > > HI! does your example finally works? I index the data with solrj and I > have > the same problem and could not retrieve file data. > > > On Wed, Nov 25, 2009 at 3:41 PM, javaxmlsoapdev wrote: > >> >> g. I had to include tika and related parsing jars into >> tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake. >> apologies for all the noise. >> >> Thanks, >> -- >> View this message in context: >> http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26576242.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Batch file upload using solrJ API
Any suggestion/pointers on this? javaxmlsoapdev wrote: > > Is there an API to upload files over one connection versus looping through > all the files and creating new ContentStreamUpdateRequest for each file. > This, as expected, doesn't work if there are large number of files and > quickly run into memory problems. Please advise. > > Thanks, > > > -- View this message in context: http://old.nabble.com/Batch-file-upload-using-solrJ-API-tp26518167p26576268.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr plugin or something else for custom work?
I have a requirement where I am indexing attachements. Attachements hang off of a database entity(table). I also need to include some meta-data info from the database table as part of the index. Trying to find best way to implement using custom handler or something? where custom handler gets all required db records (those include document path) by consuming a web service (I can expose a method from my application as a web service) and then itereate through a list (returned by web serivce) and index required meta data along with indexing attachments (attachements path is part of meta data of an entity). Has anyone tried something like this or have suggestions how best to implement this requirement? -- View this message in context: http://old.nabble.com/Solr-plugin-or-something-else-for-custom-work--tp26577014p26577014.html Sent from the Solr - User mailing list archive at Nabble.com.
dismax query syntax to replace standard query
I have configured dismax handler to search against both "title" & "description" fields now I have some other attributes on the page e.g. "status", "name" etc. On the search page I have three fields for user to input search values 1)Free text search field (which searchs against both "title" & "description") 2)Status (multi select dropdown) 3)name(single select dropdown) I want to form query like textField1:value AND status:(Male OR Female) AND name:"abc". I know first (textField1:value searchs against both "title" & "description" as that's how I have configured dixmax in the configuration) but not sure how I can AND other attributes (in my case "status" & "name") note; standadquery looks like following (w/o using dixmax handler) title:"test"description:"test"name:"Joe"statusName:(Male OR Female) -- View this message in context: http://old.nabble.com/dismax-query-syntax-to-replace-standard-query-tp26631725p26631725.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dismax query syntax to replace standard query
Thanks. When I do it that way it gives me following query. params={indent=on&start=0&q=risk+test&qt=dismax&fq=statusName:(Male+OR+Female)+name:"Joe"&hl=on&rows=10&version=2.2} hits=63 status=0 QTime=54 I typed in 'Risk test' (no quote in the text) in the text field in UI. I want search to do AND between "statusName" and "name" attributes (all attributes in fq param). Following is my dismax configuration in solrconfig.xml dismax explicit 0.01 title^2 description title description 2<-1 5<-2 6<90% 100 *:* title description 10 title regex And schema.xml has title -- when I change this to AND it does AND all params in fq and also does ANDing between words in the text field e.g. "risk+test" and doesn't return me results. basically I want to do ORing between words in "q" list and ANDing between params in "fq" list. Any pointers would be appreciated. Thanks, isugar wrote: > > I believe you need to use the fq parameter with dismax (not to be confused > with qf) to add a "filter query" in addition to the q parameter. > > So your text search value goes in q parameter (which searches on the > fields > you configure) and the rest of the query goes in the fq. > > Would that work? > > On Thu, Dec 3, 2009 at 7:28 PM, javaxmlsoapdev wrote: > >> >> I have configured dismax handler to search against both "title" & >> "description" fields now I have some other attributes on the page e.g. >> "status", "name" etc. On the search page I have three fields for user to >> input search values >> >> 1)Free text search field (which searchs against both "title" & >> "description") >> 2)Status (multi select dropdown) >> 3)name(single select dropdown) >> >> I want to form query like textField1:value AND status:(Male OR Female) >> AND >> name:"abc". I know first (textField1:value searchs against both "title" & >> "description" as that's how I have configured dixmax in the >> configuration) >> but not sure how I can AND other attributes (in my case "status" & >> "name") >> >> note; standadquery looks like following (w/o using dixmax handler) >> title:"test"description:"test"name:"Joe"statusName:(Male OR Female) >> -- >> View this message in context: >> http://old.nabble.com/dismax-query-syntax-to-replace-standard-query-tp26631725p26631725.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/dismax-query-syntax-to-replace-standard-query-tp26631725p26635928.html Sent from the Solr - User mailing list archive at Nabble.com.
how to set multiple fq while building a query in solrj
how do I create a query string witih multiple fq params using solrj SolrQuery API. e.g. I want to build a query as follow http://servername:port/solr/issues/select/?q=testing&fq=statusName:(Female OR Male)&fq=name="Joe" I am using solrj client APIs to build query and using SolrQuery as follow solrQuery.setParam("fq" statusString); solrQuery.setParam("fq", nameString); It only sets last "fq" (fq=nameString)in the string.. If I swich abover setParam order it sets fq=statusString. How do I set muliple fq params in SolrQuery object. Thanks, -- View this message in context: http://old.nabble.com/how-to-set-multiple-fq-while-building-a-query-in-solrj-tp26638650p26638650.html Sent from the Solr - User mailing list archive at Nabble.com.
store content only of documents
I store document in a field "content" field defiend as follow in schema.xml and following in solrconfig.xml content content I want to store only "content" into this field but it store other meta data of a document e.g. "Author", "timestamp", "document type" etc. how can I ask solr to store only body of document into this field and not other meta data? Thanks, -- View this message in context: http://old.nabble.com/store-content-only-of-documents-tp26803101p26803101.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: store content only of documents
Anyone? javaxmlsoapdev wrote: > > I store document in a field "content" field defiend as follow in > schema.xml > multiValued="true"/> > > and following in solrconfig.xml > class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> > > content > content > > > > I want to store only "content" into this field but it store other meta > data of a document e.g. "Author", "timestamp", "document type" etc. how > can I ask solr to store only body of document into this field and not > other meta data? > > Thanks, > > -- View this message in context: http://old.nabble.com/store-content-only-of-documents-tp26803101p26834525.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching .msg files
1)use tika to index .msg files (Tika does support Microsoft outlook format and I am already using Tika: http://lucene.apache.org/tika/formats.html). 2)while indexing you'll have to write handler to extract To, CC, Bcc values and store it in a separate field in index. 3)when user searches on .msg files, compare if s/he is in To, Cc, Bcc field first before returning result to the page and filter results accordingly. Abhishek Srivastava-2 wrote: > > Hello Everyone, > > In my company, we store a lot of old emails (.msg files) in a database > (done > for the purpose of legal compliance). > > The users have been asking us to give search functionality on the old > emails. > > One of the primary requirement is that when people search, they should > only > be able to search in their own emails (emails in which they were in the > to, > cc or bcc list). > > How can solr be used? > > from what I know about this product is that it only searches xml > content... > so I will have to extract the body of the email and convert it to xml > right? > > How will I limit the search results to only those emails where the user > who > is searching was in the to, cc or bcc list? > > Please do recommend me an approach for providing a solution to our > requirement. > > -- View this message in context: http://old.nabble.com/Searching-.msg-files-tp26788199p26835015.html Sent from the Solr - User mailing list archive at Nabble.com.
Build index by consuming web service
I am in a need of a handler which consumes web serivce and builds index from return results of the service. Until now I was building index by reading data directly from database query using DataImportHandler. There are new functional requirements to index calculated fields in the index and allow search on them. I have exposed an application API as a web service, which returns all attributes for indexing. How can ask Solr to consume this service and index attributes returned by the service? Any pointers would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/Build-index-by-consuming-web-service-tp26970642p26970642.html Sent from the Solr - User mailing list archive at Nabble.com.
weird text stripping issue
I am observing very weird text stripping issue. when I search for word "Search" I get following Issue 18 Search String 4688 Issue 18 Search String2 And highliting node Issue 18 Search String2 Issue 18 Search String My actual description string is "Issue 18 Search String2" # 2 isn't coming up back in description attribute in my search results. note; both title & description are the fields solr searches against. that's how my default config is. also note description is of type "clob" in my config as below. however when I search on other attributes (excluding title & description) then returning result brings back full description text including "2". Any idea what's wrong going on here? Thanks, -- View this message in context: http://old.nabble.com/weird-text-stripping-issue-tp27363086p27363086.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: weird text stripping issue
Analyzers are default. anything in particular to look for? ANKITBHATNAGAR wrote: > > > Check you analyzers > > Ankit > > -Original Message- > From: javaxmlsoapdev [mailto:vika...@yahoo.com] > Sent: Thursday, January 28, 2010 4:46 PM > To: solr-user@lucene.apache.org > Subject: weird text stripping issue > > > I am observing very weird text stripping issue. > > when I search for word "Search" I get following > > Issue 18 Search String > 4688 > Issue 18 Search String2 > > > And highliting node > > > Issue 18 Search String2 > > > Issue 18 Search String > > > My actual description string is "Issue 18 Search String2" # 2 isn't coming > up back in description attribute in my search results. note; both title & > description are the fields solr searches against. that's how my default > config is. > > also note description is of type "clob" in my config as below. > > > however when I search on other attributes (excluding title & description) > then returning result brings back full description text including "2". > > Any idea what's wrong going on here? > > Thanks, > -- > View this message in context: > http://old.nabble.com/weird-text-stripping-issue-tp27363086p27363086.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://old.nabble.com/weird-text-stripping-issue-tp27363086p27365621.html Sent from the Solr - User mailing list archive at Nabble.com.
Any idea what could be wrong with this fq value?
Following is my solr URL. http://hostname:port/solr/entities/select/?version=2.2&start=0&indent=on&qt=dismax&rows=60&fq=statusName:(Open OR Cancelled)&debugQuery=true&q=dev&fq=groupName:"Infrastructure“ “groupName” is one of the attributes I create fq (filterQuery) on. This field(groupName) is being indexed and stored. When I search for anything else other than “Infrastructure” in fq groupName Solr brings me back correct results. When I pass in “Infrastructure” in the fq=groupName:"Infrastructure“ it never brings me anything back. If I remove “fq” completely it will bring me all results including records with groupName:"Infrastructure“. Something is wrong only with this “Infrastructure” value in the fq. Any idea what wrong could be happening. Clearly this is only related to value "Infrastructure“ in the filter query. Thanks, -- View this message in context: http://old.nabble.com/Any-idea-what-could-be-wrong-with-this-fq-value--tp27437723p27437723.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any idea what could be wrong with this fq value?
thanks Erik for the pointer. I had this field as "text" and after changing it to "string" it started working as expected. I am still not sure why this particular value("Infrastructure") was failing to bring back results. other values like "Network", "Information" etc worked fine when field was of type "text" as well. I tried(when groupName was of type "text") &q=*:*&facet=on&facet.field=groupName and it brought back "Infrascture" correctly. Can you explain internally how solr indexed this attribute differently and changing to "string" from "text" started working? Thanks, Erik Hatcher-4 wrote: > > is groupName a "string" field? If not, it probably should be. My > hunch is that you're analyzing that field and it is lowercased in the > index, and maybe even stemmed. > > Try &q=*:*&facet=on&facet.field=groupName to see all the *indexed* > values of the groupName field. > > Erik > > On Feb 3, 2010, at 10:05 AM, javaxmlsoapdev wrote: > >> >> Following is my solr URL. >> >> http://hostname:port/solr/entities/select/? >> version=2.2&start=0&indent=on&qt=dismax&rows=60&fq=statusName:(Open >> OR Cancelled)&debugQuery=true&q=dev&fq=groupName:"Infrastructure“ >> >> “groupName” is one of the attributes I create fq (filterQuery) on. >> This >> field(groupName) is being indexed and stored. >> >> When I search for anything else other than “Infrastructure” in fq >> groupName >> Solr brings me back correct results. When I pass in “Infrastructure” >> in the >> fq=groupName:"Infrastructure“ it never brings me anything back. If I >> remove >> “fq” completely it will bring me all results including records with >> groupName:"Infrastructure“. Something is wrong only with this >> “Infrastructure” value in the fq. >> >> Any idea what wrong could be happening. Clearly this is only related >> to >> value "Infrastructure“ in the filter query. >> >> Thanks, >> >> -- >> View this message in context: >> http://old.nabble.com/Any-idea-what-could-be-wrong-with-this-fq-value--tp27437723p27437723.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > -- View this message in context: http://old.nabble.com/Any-idea-what-could-be-wrong-with-this-fq-value--tp27437723p27439279.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImportHandler can't understand query
I have a complex query (runs fine in database), which I am trying to include in DataImportHandler query. Query has case statements with < > in it e.g. case when (ASSIGNED_TO < > '' and TRANSLATE(ASSIGNED_TO, '', '0123456789')='') DataImportHandler failes to understand query with following error and complaining about "<" symbol. How to go about this? Note; query is valid and runs fine in database. [Fatal Error] :26:26: The value of attribute "query" associated with an element type "entity" must not contain the '<' character. Feb 8, 2010 6:02:09 PM org.apache.solr.handler.dataimport.DataImportHandler inform SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:190) Thansk, -- View this message in context: http://old.nabble.com/DataImportHandler-can%27t-understand-query-tp27507918p27507918.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler can't understand query
Note I already tried to escape < character with \< but still it throws same error. Any idea? Thanks, javaxmlsoapdev wrote: > > I have a complex query (runs fine in database), which I am trying to > include in DataImportHandler query. Query has case statements with < > in > it > > e.g. > > case when (ASSIGNED_TO < > '' and TRANSLATE(ASSIGNED_TO, '', > '0123456789')='') > > DataImportHandler failes to understand query with following error and > complaining about "<" symbol. How to go about this? Note; query is valid > and runs fine in database. > > [Fatal Error] :26:26: The value of attribute "query" associated with an > element type "entity" must not contain the '<' character. > Feb 8, 2010 6:02:09 PM > org.apache.solr.handler.dataimport.DataImportHandler inform > SEVERE: Exception while loading DataImporter > org.apache.solr.handler.dataimport.DataImportHandlerException: Exception > occurred while initializing context > at > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:190) > > Thansk, > -- View this message in context: http://old.nabble.com/DataImportHandler-can%27t-understand-query-tp27507918p27508214.html Sent from the Solr - User mailing list archive at Nabble.com.
Good literature on search basics
Does anyone know good literature(web resources, books etc) on basics of search? I do have Solr 1.4 and Lucene books but wanted to go in more details on basics. Thanks, -- View this message in context: http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html Sent from the Solr - User mailing list archive at Nabble.com.
HttpDataSource consume REST API with Authentication required
I have to use http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource HttpDataSource to ask Solr consume my REST service and index data returned from that service. My application/service has authentication/authorization. When Solr invokes this service it MUST have valid credentials and stuff. How/where do I configure/write authentication part before Solr consumes my REST service? Any pointers would be appreciated. Thanks, -- View this message in context: http://old.nabble.com/HttpDataSource-consume-REST-API-with-Authentication-required-tp27785340p27785340.html Sent from the Solr - User mailing list archive at Nabble.com.