Re: Extract info from parent node during data import
On Sat, Sep 12, 2009 at 12:24 PM, Fergus McMenemie wrote: >>On Fri, Sep 11, 2009 at 6:48 AM, venn hardy wrote: >>> >>> Hi Fergus, >>> >>> When I debugged in the development console >>> http://localhost:9080/solr/admin/dataimport.jsp?handler=/dataimport >>> >>> I had no problems. Each category/item seems to be only indexed once, and no >>> parent fields are available (except the category name). >>> >>> I am not entirely sure how the forEach statement works, but my >>> interpretation of forEach="/document/category/item | /document/category" is >>> something like this: >>> >>> 1. Whenever DIH encounters a document/category it will extract the >>> /document/category/ >>> >>> name field as a common field >>> 2. Whenever DIH encounters a document/category/item it will extract all of >>> the item fields. >>> 3. When all fields have been encountered, save the document in solr and go >>> to the next category/item >> >>/document/category/item | /document/category >> >>means there are two paths which triggers a new doc (it is possible to >>have more). Whenever it encounters the closing tag of that xpath , it >>emits all the fields it collected since the opening of the same tag. >>after that it clears all the fields it collected since the opening of >>the tag. >> >>If there are fields it collected before opening of the same tag, it retains it > > > Nice and clear, but that is not what I see. > > With my test case with forEach="/record | /record/mediaBlock" > I see that for each /record/mediaBlock "document" indexed it contains all > fields > from the parent "/record" document as well. A search over mediaBlock s > returns lots > of extra fields from the parent which did not have the commonField attribute. > I > will try and produce a testcase yes it does . . /record/mediaBlock will have all the fields collected from /record as well. It is by design . > > >>> >>> Date: Thu, 10 Sep 2009 14:19:31 +0100 To: solr-user@lucene.apache.org From: fer...@twig.me.uk Subject: RE: Extract info from parent node during data import >Hi Paul, >The forEach="/document/category/item | /document/category/name" didn't >work (no categoryname was stored or indexed). >However forEach="/document/category/item | /document/category" seems to >work well. I am not sure why category on its own works, but not >category/name... >But thanks for tip. It wasn't as painful as I thought it would be. >Venn Hmmm, I had bother with this. Although each occurance of /document/category/item causes a new solr document to indexed, that document contained all the fields from the parent element as well. Did you see this? > >> From: noble.p...@corp.aol.com >> Date: Thu, 10 Sep 2009 09:58:21 +0530 >> Subject: Re: Extract info from parent node during data import >> To: solr-user@lucene.apache.org >> >> try this >> >> add two xpaths in your forEach >> >> forEach="/document/category/item | /document/category/name" >> >> and add a field as follows >> >> >>> >> commonField="true"/> >> >> Please try it out and let me know. >> >> On Thu, Sep 10, 2009 at 7:30 AM, venn hardy >> wrote: >> > >> > Hello, >> > >> > >> > >> > I am using SOLR 1.4 (from nighly build) and its URLDataSource in >> > conjunction with the XPathEntityProcessor. I have successfully >> > imported XML content, but I think I may have found a limitation when >> > it comes to the commonField attribute in the DataImportHandler. >> > >> > >> > >> > Before writing my own parser to read in a whole XML document, I >> > thought I'd post the question here (since I got some great advice >> > last time). >> > >> > >> > >> > The bulk of my content is contained within each tag. However, >> > each item has a parent called and each category has a name >> > which I would like to import. In my forEach loop I specify the >> > /document/category/item as the collection of items I am interested >> > in. Is there anyway to extract an element from underneath a parent >> > node? To be a more more specific (see eg xml below). I would like to >> > index the following: >> > >> > - category: Category 1; id: 1; author: Author 1 >> > >> > - category: Category 1; id: 2; author: Author 2 >> > >> > - category: Category 2; id: 3; author: Author 3 >> > >> > - category: Category 2; id: 4; author: Author 4 >> > >> > >> > >> > Any ideas on how I can get to a parent node from within a child >> > during data import? If it cant be done, what do you suggest would be >> > the best way so I can keep using the DataImportHandler... would XSLT >> > be a good idea to 'flatten out' the structure a bit? >> > >> >
Re: Single Core or Multiple Core?
+1 Can you add a JIRA issue for that so we can vote for it? Chris Hostetter wrote: : > For the record: even if you're only going to have one SOlrCore, using the : > multicore support (ie: having a solr.xml file) might prove handy from a : > maintence standpoint ... the ability to configure new "on deck cores" with ... : Yeah, it is a shame that single-core deployments (no solr.xml) does not have : a way to enable CoreAdminHandler. This is something we should definitely : look at in Solr 1.5. I think the most straight forward starting point is to switch how we structure the examples so that all of the examples uses a solr.xml with multicore support. Then we can move forward on deprecating the specification of "Solr Home" using JNDI/systemvars and switch to having the location of the solr.xml be the one master config option with everything else coming after that. -Hoss
Re: Facet Response Structure
As to point 1 - this is not a problem with the response structure I've outlined. This is exactly the problem I'm trying to solve. NULL is not a value in the field, it is a placeholder to indicate how many documents the field does not exist for. In my example response structure above, 'missing' is placed outside of the 'facets' list, clearing up the confusion. 'missing' could indeed be a facet value without any collisions. To point 2 - I understand it would cause compatibility issues, that is why I was suggesting it be incorporated into the next SOLR release. I'd also be willing to work Regarding the stats component, it does not do what you think it does. It reports a count of all values, not distinct values. The stats component also strictly works on numeric fields, which would make it impossible to use in a lot of cases where the FacetComponent does work. Shalin Shekhar Mangar wrote: > > On Sat, Sep 12, 2009 at 1:20 AM, smock wrote: > >> >> I'd like to propose a change to the facet response structure. Currently, >> it >> looks like: >> >> {'facet_fields':{'field1':[('value1',count1),('value2',count2),(null,missingCount)]}} >> >> My immediate problem with this structure is that null is not of the same >> type as the 'value's. Also, the meaning of the (null,missingCount) tuple >> is >> not the same as the meaning of the ('value',count) tuples, it is a >> special >> case to represent the documents for which the field has no value. I'd >> like >> to propose changing the response to: >> >> {'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount}}} >> >> > Well, there are two problems: > 1. 'missing' can be a value in the field > 2. Facet support has been there for a long time. This would break > compatibility with existing clients. > > >> >> In addition to cleaning up the 'null' issue mentioned above, I think this >> will allow for greater flexibility moving forward with the facet >> component. >> For instance, it would be great if the FacetComponent could add an >> optional >> count of the 'hits', or number of distinct facet values contained in the >> query result. If the facet request has a limit on it, this number is not >> available via a count of the returned facet values. The response >> structure >> I've outlined above could accomodate this piece of metadata very easily: >> >> {'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount,'hits':hitsCount}}} >> >> > Have you looked at StatsComponent? It give counts for total distinct > values > and count of documents missing a value among other things: > > http://wiki.apache.org/solr/StatsComponent > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Facet-Response-Structure-tp25407363p25414267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: "standard" requestHandler components
Hi Jay, I got it from reading your response. I did browse around in solrconfig.xml but could not find any components configured for 'standard', but didn't realized that there are 'defaults' hardwired. Thanks for your quick & detailed response and also your additional tip on spellcheck config. You saved me lots of time on trial-&-error. Regards, Michael Jay Hill wrote: > > RequestHandlers are configured in solrconfig.xml. If no components are > explicitly declared in the request handler config the the defaults are > used. > They are: > - QueryComponent > - FacetComponent > - MoreLikeThisComponent > - HighlightComponent > - StatsComponent > - DebugComponent > > If you wanted to have a custom list of components (either omitting > defaults > or adding custom) you can specify the components for a handler directly: > > query > facet > mlt > highlight > debug > someothercomponent > > > You can add components before or after the main ones like this: > > mycomponent > > > > myothercomponent > > > and that's how the spell check component can be added: > > spellcheck > > > Note that the a component (except the defaults) must be configured in > solrconfig.xml with the name used in the str element as well. > > Have a look at the solrconfig.xml in the example directory > (".../example/solr/conf/") for examples on how to set up the spellcheck > component, and on how the request handlers are configured. > > -Jay > http://www.lucidimagination.com > > > On Fri, Sep 11, 2009 at 3:04 PM, michael8 wrote: > >> >> Hi, >> >> I have a newbie question about the 'standard' requestHandler in >> solrconfig.xml. What I like to know is where is the config information >> for >> this requestHandler kept? When I go to http://localhost:8983/solr/admin, >> I >> see the following info, but am curious where are the supposedly 'chained' >> components (e.g. QueryComponent, FacetComponent, MoreLikeThisComponent) >> configured for this requestHandler. I see timing and process debug >> output >> from these components with "debugQuery=true", so somewhere these >> components >> must have been configured for this 'standard' requestHandler. >> >> name:standard >> class: org.apache.solr.handler.component.SearchHandler >> version:$Revision: 686274 $ >> description:Search using components: >> >> org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.DebugComponent, >> stats: handlerStart : 1252703405335 >> requests : 3 >> errors : 0 >> timeouts : 0 >> totalTime : 201 >> avgTimePerRequest : 67.0 >> avgRequestsPerSecond : 0.015179728 >> >> >> What I like to do from understanding this is to properly integrate >> spellcheck component into the standard requestHandler as suggested in a >> solr >> spellcheck example. >> >> Thanks for any info in advance. >> Michael >> -- >> View this message in context: >> http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25409075.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25414682.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr SVN build problem
Should be fixed in trunk. Try updating and see if it works for you See: https://issues.apache.org/jira/browse/SOLR-1424 On Sep 9, 2009, at 8:12 PM, Allahbaksh Asadullah wrote: Hi , I am building Solr from source. During building it from source I am getting following error. generate-maven-artifacts: [mkdir] Created dir: c:\Downloads\solr_trunk\build\maven [mkdir] Created dir: c:\Downloads\solr_trunk\dist\maven [copy] Copying 1 file to c:\Downloads\solr_trunk\build\maven\c:\Downloads\s olr_trunk\src\maven BUILD FAILED c:\Downloads\solr_trunk\build.xml:741: The following error occurred while execut ing this line: c:\Downloads\solr_trunk\common-build.xml:261: Failed to copy c:\Downloads\solr_t runk\src\maven\solr-parent-pom.xml.template to c:\Downloads\solr_trunk\build\mav en\c:\Downloads\solr_trunk\src\maven\solr-parent-pom.xml.template due to java.io .FileNotFoundException c:\Downloads\solr_trunk\build\maven\c:\Downloads\solr_tru nk\src\maven\solr-parent-pom.xml.template (The filename, directory name, or volu me label syntax is incorrect) Regards, Allahbaksh
Re: Single Core or Multiple Core?
What do you mean by "single-core deployments does not have a way to enable CoreAdminHandler"?I'm just trying to understand the feature that you are talking about On Sat, Sep 12, 2009 at 6:44 AM, Uri Boness wrote: > +1 > Can you add a JIRA issue for that so we can vote for it? > > > Chris Hostetter wrote: > >> : > For the record: even if you're only going to have one SOlrCore, using >> the >> : > multicore support (ie: having a solr.xml file) might prove handy from >> a >> : > maintence standpoint ... the ability to configure new "on deck cores" >> with >>... >> : Yeah, it is a shame that single-core deployments (no solr.xml) does not >> have >> : a way to enable CoreAdminHandler. This is something we should definitely >> : look at in Solr 1.5. >> >> I think the most straight forward starting point is to switch how we >> structure the examples so that all of the examples uses a solr.xml with >> multicore support. >> >> Then we can move forward on deprecating the specification of "Solr Home" >> using JNDI/systemvars and switch to having the location of the solr.xml be >> the one master config option with everything else coming after that. >> >> >> >> -Hoss >> >> >> >> >
Re: Highlighting in SolrJ?
Will do Shalin. -Jay http://www.lucidimagination.com On Fri, Sep 11, 2009 at 9:23 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Jay, it would be great if you can add this example to the Solrj wiki: > > http://wiki.apache.org/solr/Solrj > > On Fri, Sep 11, 2009 at 5:15 AM, Jay Hill wrote: > > > Set up the query like this to highlight a field named "content": > > > >SolrQuery query = new SolrQuery(); > >query.setQuery("foo"); > > > >query.setHighlight(true).setHighlightSnippets(1); //set other params > as > > needed > >query.setParam("hl.fl", "content"); > > > >QueryResponse queryResponse =getSolrServer().query(query); > > > > Then to get back the highlight results you need something like this: > > > >Iterator iter = queryResponse.getResults(); > > > >while (iter.hasNext()) { > > SolrDocument resultDoc = iter.next(); > > > > String content = (String) resultDoc.getFieldValue("content")); > > String id = (String) resultDoc.getFieldValue("id"); //id is the > > uniqueKey field > > > > if (queryResponse.getHighlighting().get(id) != null) { > >List highightSnippets = > > queryResponse.getHighlighting().get(id).get("content"); > > } > >} > > > > Hope that gets you what you need. > > > > -Jay > > http://www.lucidimagination.com > > > > On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin > wrote: > > > > > Can somebody point me to some sample code for using highlighting in > > > SolrJ? I understand the highlighted versions of the field comes in a > > > separate NamedList? How does that work? > > > > > > -- > > > http://www.linkedin.com/in/paultomblin > > > > > > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Single Core or Multiple Core?
On Sat, Sep 12, 2009 at 9:45 PM, Jonathan Ariel wrote: > What do you mean by "single-core deployments does not have a way to enable > CoreAdminHandler"?I'm just trying to understand the feature that you are > talking about > > I'm talking about the core related commands described here: http://wiki.apache.org/solr/CoreAdmin -- Regards, Shalin Shekhar Mangar.
Re: Facet Response Structure
On Sat, Sep 12, 2009 at 6:29 PM, smock wrote: > > As to point 1 - this is not a problem with the response structure I've > outlined. This is exactly the problem I'm trying to solve. NULL is not a > value in the field, it is a placeholder to indicate how many documents the > field does not exist for. In my example response structure above, > 'missing' > is placed outside of the 'facets' list, clearing up the confusion. > 'missing' could indeed be a facet value without any collisions. > > You are right, I missed that. > To point 2 - I understand it would cause compatibility issues, that is why > I > was suggesting it be incorporated into the next SOLR release. I'd also be > willing to work > > I'm not convinced that it is something that needs to be changed. I'm also not sure about the right way to deprecate a widely used response format. Go ahead and raise an issue if you want and we can collect thoughts from others. > Regarding the stats component, it does not do what you think it does. It > reports a count of all values, not distinct values. The stats component > also strictly works on numeric fields, which would make it impossible to > use > in a lot of cases where the FacetComponent does work. > > Yes, my bad. Though it does report the count of missing values. -- Regards, Shalin Shekhar Mangar.
Re: Highlighting in SolrJ?
Thanks Jay! On Sat, Sep 12, 2009 at 10:03 PM, Jay Hill wrote: > Will do Shalin. > > -Jay > http://www.lucidimagination.com > > > On Fri, Sep 11, 2009 at 9:23 PM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > > Jay, it would be great if you can add this example to the Solrj wiki: > > > > http://wiki.apache.org/solr/Solrj > > > > On Fri, Sep 11, 2009 at 5:15 AM, Jay Hill > wrote: > > > > > Set up the query like this to highlight a field named "content": > > > > > >SolrQuery query = new SolrQuery(); > > >query.setQuery("foo"); > > > > > >query.setHighlight(true).setHighlightSnippets(1); //set other params > > as > > > needed > > >query.setParam("hl.fl", "content"); > > > > > >QueryResponse queryResponse =getSolrServer().query(query); > > > > > > Then to get back the highlight results you need something like this: > > > > > >Iterator iter = queryResponse.getResults(); > > > > > >while (iter.hasNext()) { > > > SolrDocument resultDoc = iter.next(); > > > > > > String content = (String) resultDoc.getFieldValue("content")); > > > String id = (String) resultDoc.getFieldValue("id"); //id is the > > > uniqueKey field > > > > > > if (queryResponse.getHighlighting().get(id) != null) { > > >List highightSnippets = > > > queryResponse.getHighlighting().get(id).get("content"); > > > } > > >} > > > > > > Hope that gets you what you need. > > > > > > -Jay > > > http://www.lucidimagination.com > > > > > > On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin > > wrote: > > > > > > > Can somebody point me to some sample code for using highlighting in > > > > SolrJ? I understand the highlighted versions of the field comes in a > > > > separate NamedList? How does that work? > > > > > > > > -- > > > > http://www.linkedin.com/in/paultomblin > > > > > > > > > > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > > -- Regards, Shalin Shekhar Mangar.