Re: Indexing an XML file in Apache Solr
You might be interested in trying Lux, which is a Solr extension that indexes XML documents using the element and attribute names and the contents of those nodes in your document. It also allows you to define XPath indexes (like DIH, I think, but with the full XPath 2.0 syntax), and to query your document collection using XQuery 1.0 (in combination with standard lucene searches at the document level). See http://luxdb.org/ -Mike Sokolov On 8/16/2013 8:55 AM, Abhiroop wrote: I am very new to Solr. I am looking to index an xml file and search its contents. Its structure resembles something like this ((1,6)-alpha-glucosyl)poly((1,4)-alpha-glucosyl)glycogenin => poly{(1,4)-alpha- glucosyl} glycogenin + alpha-D-glucose This event has been computationally inferred from an event that has been demonstrated in another species.The inference is based on the homology mapping in Ensembl Compara. Briefly, reactions for which all involved PhysicalEntities (in input, output and catalyst) have a mapped orthologue/paralogue (for complexes at least 75% of components must have a mapping) are inferred to the other species. High level events are also inferred for these events to allow for easier navigation.More details and caveats of the event inference in Reactome. For details on the Ensembl Compara system see also: Gene orthology/paralogy prediction method. Saccharomyces cerevisiae Is it essential to use the DIH to import this data into Solr? Isn't there any simpler way to accomplish the task? Can it be done through SolrJ as I am fine with outputting the result through the console too. It would be really helpful if someone could point me to some useful examples or resources on this apart from the official documentation. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-an-XML-file-in-Apache-Solr-tp4085053.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems installing Solr4 in Jetty9
On Aug 17, 2013, at 9:01 AM, Robert Muir wrote: > I think this is only a "test dependency" ? Right - it's only for the hdfs 'test' setup. I thought that when Steve moved it from the test module to the core, he handled it so that it would not go out in the dist. - mark
Re: More on topic of Meta-search/Federated Search with Solr
The lack of global TF/IDF has been answered in the past, in the sharded case, by "usually you have similar enough stats that it doesn't matter". This pre-supposes a fairly evenly distributed set of documents. But if you're talking about federated search across different types of documents, then what would you "rescore" with? How would you even consider scoring docs that are somewhat/ totally different? Think magazine articles an meta-data associated with pictures. What I've usually found is that one can use grouping to show the top N of a variety of results. Or show tabs with different types. Or have the app intelligently combine the different types of documents in a way that "makes sense". But I don't know how you'd just get "the right thing" to happen with some kind of scoring magic. Best Erick On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis wrote: > I've thought about it, and I have no time to really do a meta-search during > evaluation. What I need to do is to create a single core that contains > both of my data sets, and then describe the architecture that would be > required to do blended results, with liberal estimates. > > From the perspective of evaluation, I need to understand whether any of the > solutions to better ranking in the absence of global IDF have been > explored?I suspect that one could retrieve a much larger than N set of > results from a set of shards, re-score in some way that doesn't require > IDF, e.g. storing both results in the same priority queue and *re-scoring* > before *re-ranking*. > > The other way to do this would be to have a custom SearchHandler that works > differently - it performs the query, retries all results deemed relevant by > another engine, adds them to the Lucene index, and then performs the query > again in the standard way. This would be quite slow, but perhaps useful > as a way to evaluate my method. > > I still welcome any suggestions on how such a SearchHandler could be > implemented. >
Re: Problems installing Solr4 in Jetty9
bq. I thought that when Steve moved it from the test module to the core, he handled it so that it would not go out in the dist. Mea culpa. @Chris Collins, I think you're talking about Maven dependencies, right? As a workaround, you can exclude dependencies you don't need, including hadoop-hdfs, hadoop-auth, and hadoop-annotations - this will also exclude the indirect jetty 6 dependency/ies. hadoop-common is a compile-time dependency, though, so I'm not sure if it's safe to exclude. The problems, as far as I can tell, are: 1) The ivy configuration puts three test-only dependencies (hadoop-hdfs, hadoo-auth, and hadoop-annotations) in solr/core/lib/, rather than where they belong, in solr/core/test-lib/. (hadoop-common is required for solr-core compilation to succeed.) 2) The Maven configuration makes the equivalent mistake in marking these test-only hadoop dependencies as compile-scope rather than test-scope dependencies. 3) The Solr .war, which packages everything under solr/core/lib/, includes these three test-only hadoop dependencies (though it does not include any jetty 6 jars). 4) The license files for jetty and jetty-util v6.1.26, but not the jar files corresponding to them, are included in the Solr distribution. I have working (tests pass) local Ant and Maven configurations that treat the three hadoop test-only dependencies properly; as result, the .war will no longer contain them - this will cover problems #1-3 above. I think we can just remove the jetty and jetty-util 6.1.26 license files from solr/licenses/, since we don't ship those jars. I'll open an issue. Steve On Sun, Aug 18, 2013 at 1:58 PM, Mark Miller wrote: > > On Aug 17, 2013, at 9:01 AM, Robert Muir wrote: > > > I think this is only a "test dependency" ? > > Right - it's only for the hdfs 'test' setup. I thought that when Steve > moved it from the test module to the core, he handled it so that it would > not go out in the dist. > > - mark > >
Giving OpenSearcher as false
Hi, 1. What is the impact , use of giving opensearcher as true ${solr.autoCommit.maxTime:15000} true 2. Giving the value as "false" , does this create index in the temp file and then commit? Regards, Prasi