Re: Indexing an XML file in Apache Solr

2013-08-18 Thread Michael Sokolov
You might be interested in trying Lux, which is a Solr extension that 
indexes XML documents using the element and attribute names and the 
contents of those nodes in your document.  It also allows you to define 
XPath indexes (like DIH, I think, but with the full XPath 2.0 syntax), 
and to query your document collection using XQuery 1.0 (in combination 
with standard lucene searches at the document level).  See http://luxdb.org/


-Mike Sokolov

On 8/16/2013 8:55 AM, Abhiroop wrote:

I am very new to Solr. I am looking to index an xml file and search its
contents. Its structure resembles something like this


((1,6)-alpha-glucosyl)poly((1,4)-alpha-glucosyl)glycogenin =>
poly{(1,4)-alpha-  glucosyl} glycogenin + alpha-D-glucose
This event has been computationally inferred from an event that
has been demonstrated in another species.The inference is based on the
homology mapping in Ensembl Compara. Briefly, reactions for which all
involved PhysicalEntities (in input, output and catalyst) have a mapped
orthologue/paralogue (for complexes at least 75% of components must have a
mapping) are inferred to the other species. High level events are also
inferred for these events to allow for easier navigation.More details and
caveats of the event inference in Reactome. For details on the Ensembl
Compara system see also: Gene orthology/paralogy prediction
method.














Saccharomyces cerevisiae



Is it essential to use the DIH to import this data into Solr? Isn't there
any simpler way to accomplish the task? Can it be done through SolrJ as I am
fine with outputting the result through the console too. It would be really
helpful if someone could point me to some useful examples or resources on
this apart from the official documentation.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-an-XML-file-in-Apache-Solr-tp4085053.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Problems installing Solr4 in Jetty9

2013-08-18 Thread Mark Miller

On Aug 17, 2013, at 9:01 AM, Robert Muir  wrote:

> I think this is only a "test dependency" ?

Right - it's only for the hdfs 'test' setup. I thought that when Steve moved it 
from the test module to the core, he handled it so that it would not go out in 
the dist.

- mark



Re: More on topic of Meta-search/Federated Search with Solr

2013-08-18 Thread Erick Erickson
The lack of global TF/IDF has been answered in the past,
in the sharded case, by "usually you have similar enough
stats that it doesn't matter". This pre-supposes a fairly
evenly distributed set of documents.

But if you're talking about federated search across different
types of documents, then what would you "rescore" with?
How would you even consider scoring docs that are somewhat/
totally different? Think magazine articles an meta-data associated
with pictures.

What I've usually found is that one can use grouping to show
the top N of a variety of results. Or show tabs with different
types. Or have the app intelligently combine the different types
of documents in a way that "makes sense". But I don't know
how you'd just get "the right thing" to happen with some kind
of scoring magic.

Best
Erick


On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis  wrote:

> I've thought about it, and I have no time to really do a meta-search during
> evaluation.  What I need to do is to create a single core that contains
> both of my data sets, and then describe the architecture that would be
> required to do blended results, with liberal estimates.
>
> From the perspective of evaluation, I need to understand whether any of the
> solutions to better ranking in the absence of global IDF have been
> explored?I suspect that one could retrieve a much larger than N set of
> results from a set of shards, re-score in some way that doesn't require
> IDF, e.g. storing both results in the same priority queue and *re-scoring*
> before *re-ranking*.
>
> The other way to do this would be to have a custom SearchHandler that works
> differently - it performs the query, retries all results deemed relevant by
> another engine, adds them to the Lucene index, and then performs the query
> again in the standard way.   This would be quite slow, but perhaps useful
> as a way to evaluate my method.
>
> I still welcome any suggestions on how such a SearchHandler could be
> implemented.
>


Re: Problems installing Solr4 in Jetty9

2013-08-18 Thread Steve Rowe
bq. I thought that when Steve moved it from the test module to the core, he
handled it so that it would not go out in the dist.

Mea culpa.

@Chris Collins, I think you're talking about Maven dependencies, right?  As
a workaround, you can exclude dependencies you don't need, including
hadoop-hdfs, hadoop-auth, and hadoop-annotations - this will also exclude
the indirect jetty 6 dependency/ies.  hadoop-common is a compile-time
dependency, though, so I'm not sure if it's safe to exclude.

The problems, as far as I can tell, are:

1) The ivy configuration puts three test-only dependencies (hadoop-hdfs,
hadoo-auth, and hadoop-annotations) in solr/core/lib/, rather than where
they belong, in solr/core/test-lib/.  (hadoop-common is required for
solr-core compilation to succeed.)

2) The Maven configuration makes the equivalent mistake in marking these
test-only hadoop dependencies as compile-scope rather than test-scope
dependencies.

3) The Solr .war, which packages everything under solr/core/lib/, includes
these three test-only hadoop dependencies (though it does not include any
jetty 6 jars).

4) The license files for jetty and jetty-util v6.1.26, but not the jar
files corresponding to them, are included in the Solr distribution.

I have working (tests pass) local Ant and Maven configurations that treat
the three hadoop test-only dependencies properly; as result, the .war will
no longer contain them - this will cover problems #1-3 above.

I think we can just remove the jetty and jetty-util 6.1.26 license files
from solr/licenses/, since we don't ship those jars.

I'll open an issue.

Steve



On Sun, Aug 18, 2013 at 1:58 PM, Mark Miller  wrote:

>
> On Aug 17, 2013, at 9:01 AM, Robert Muir  wrote:
>
> > I think this is only a "test dependency" ?
>
> Right - it's only for the hdfs 'test' setup. I thought that when Steve
> moved it from the test module to the core, he handled it so that it would
> not go out in the dist.
>
> - mark
>
>


Giving OpenSearcher as false

2013-08-18 Thread Prasi S
Hi,
1. What is the impact , use of giving opensearcher as true

 
   ${solr.autoCommit.maxTime:15000}
   true

2. Giving the value as "false" , does this create index in the temp file
and then commit?


Regards,
Prasi