date:20090708

Re: Can't limit return fields in custom request handler

2009-07-08 Thread Osman İZBAT

II'll look SolrPluginUtils.setReturnFields.

I'm running same query :
http://localhost:8983/solr/select/?qt=cfacet&q=%2BitemTitle:nokia%20%2BcategoryId:130&start=0&limit=3
I get none empty result when  filter parameter is null, but when i pass
"inStores" filter parameter to getDocListAndSet i get empty result.

SolrParams solrParams = req.getParams();
Query q = QueryParsing.parseQuery(solrParams.get("q"),
req.getSchema());
Query filter = new TermQuery(new Term("inStores", "true"));
DocListAndSet results = req.getSearcher().getDocListAndSet(q,
(Query)filter, (Sort)null, solrParams.getInt("start"),
solrParams.getInt("limit"));

Thanks.


On Tue, Jul 7, 2009 at 11:45 PM, Chris Hostetter
wrote:

>
> : But I have a problem like this;  when i call
> :
> http://localhost:8983/solr/select/?qt=cfacet&q=%2BitemTitle:nokia%20%2BcategoryId:130&start=0&limit=3&fl=id
> ,
> : itemTitle
> : i'm getiing all fields instead of only id and itemTitle.
>
> Your custom handler is responsible for checking the fl and setting what
> you want the response fields to be on the response object.
>
> SolrPluginUtils.setReturnFields can be used if you want this to be done in
> the "normal" way.
>
> : Also i'm gettting no result when i give none null filter parameter in
> : getDocListAndSet(...).
>...
> : DocListAndSet results = req.getSearcher().getDocListAndSet(q,
> : (Query)null, (Sort)null, solrParams.getInt("start"),
> : solrParams.getInt("limit"));
>
> ...that should work.  What does your query look like?  what are you
> passing for the "start" and "limit" params (is it possible you are getting
> results, but limit=0 so there aren't any results on the current page of
> pagination?) what does the debug output look like?
>
>
> -Hoss
>
>


-- 
Osman İZBAT

RE: Browse indexed terms in a field

2009-07-08 Thread Pierre-Yves LANDRON


Thanks !

it seems that can do the trick...

> Date: Tue, 7 Jul 2009 11:10:15 -0400
> Subject: Re: Browse indexed terms in a field
> From: bill.w...@gmail.com
> To: solr-user@lucene.apache.org
> 
> You can use facet.perfix to match the beginning of a given word:
> 
> http://wiki.apache.org/solr/SimpleFacetParameters#head-579914ef3a14d775a5ac64d2c17a53f3364e3cf6
> 
> Bill
> 
> On Tue, Jul 7, 2009 at 11:02 AM, Pierre-Yves LANDRON
> wrote:
> 
> >
> > Hello,
> >
> > Here is what I would like to achieve : in an indexed document there's a
> > fulltext indexed field ; I'd like to browse the terms in this field, ie. get
> > all the terms that match the begining of a given word, for example.
> > I can get all the field's facets for this document, but that's a lot of
> > terms to process ; is there a way to constraint the returned facets ?
> >
> > Thank you for your highlights.
> > Kind regards,
> > Pierre.
> >
> > _
> > More than messages–check out the rest of the Windows Live™.
> > http://www.microsoft.com/windows/windowslive/

_
Windows Live™: Keep your life in sync. Check it out!
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_t1_allup_explore_012009

Change in DocListAndSetNC not messing everything

2009-07-08 Thread Marc Sturlese


Hey there,
I had to implement something similar to field collapsing but could't use the
patch as it decreases a lot performance with index of about 4 Gig.
For testing, what I have done is do some hacks to SolrIndexSearcher's
getDocListAndSetNC funcion. I fill the ids array in my own order or I just
don't add some docs id's (and so change this id's array size). I have been
testing it and the performance is dramatically better that using the patch.
Can anyone tell me witch is the best way to hack DocListAndSetNC? I mean, I
know this change can make me go mad in the future, when I decide to update
trunk version or update to new releases.
My hack provably is too specific for my use case but could upload the source
in case someone can advice me what to do.
Thanks in advance,

-- 
View this message in context: 
http://www.nabble.com/Change-in-DocListAndSetNC-not-messing-everything-tp24387830p24387830.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to do the distributed search with sort using solr?

2009-07-08 Thread shb

In my project, I am trying to do a distributed search sorted by some field
using solr. The test code is
as follows:

SolrQuery query = new SolrQuery();
query.set("q", "id:[1 TO *]");
query.setSortField("id",SolrQuery.ORDER.asc);
query.setParam("shards", "localhost:8983/solr, localhost:7574/solr");
QueryResponse response = server.query(query);

I get the following error. It seems that solr doesn't support the sort
function while doing the
distributed search.  Do you have any suggestions to solve this problem,
thanks!

org.apache.solr.client.solrj.SolrServerException: Error executing query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115)
at test.MainClassTest.searchTest(MainClassTest.java:88)
at test.MainClassTest.main(MainClassTest.java:48)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.ConnectException: Connection refused
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:391)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
... 3 more

Re: Is there any other way to load the index beside using "http" connection?

2009-07-08 Thread Norberto Meijome

On Tue, 7 Jul 2009 13:54:07 -0700
Francis Yakin  wrote:
[...]
> much on our setup.
> 
> Like said we have file name "test.xml" which come from SQL output , we put it
> locally on the solr server under "/opt/test.xml"
> 
> So, I need to execute the commands from solr system to add and update this to
> the solr data/indexes.
> 
> What commands do I have to use, for example the xml file
> named" /opt/test.xml" ?
> 

Francis,
as much as we can tell you the answer, have you tried reading the documentation 
in the wiki, and the example setup bundled with SOLR?

Most, if not ALL your questions, are answered there.

Good luck,
B
_
{Beto|Norberto|Numard} Meijome

Computers are like air conditioners; they can't do their job properly if you 
open windows.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.

Re: how to do the distributed search with sort using solr?

2009-07-08 Thread shb

Sorry, the error is as follows. I have read the solr wiki carefully and
google it, but I haven't founded
any related question or solution,  any one can help me, thanks!

org.apache.solr.client.solrj.SolrServerException: Error executing query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115)
at test.MainClassTest.searchTest(MainClassTest.java:88)
at test.MainClassTest.main(MainClassTest.java:48)
Caused by: org.apache.solr.common.SolrException:

Re: Updating Solr index from XML files

2009-07-08 Thread Norberto Meijome

On Tue, 7 Jul 2009 22:16:04 -0700
Francis Yakin  wrote:

> 
> I have the following "curl" cmd to update and doing commit to Solr ( I have
> 10 xml files just for testing)

[...]

hello,
DIH supports XML, right? 

not sure if it works with n files...but it's worth looking at it. 
alternatively, u can write a relatively simple java app that will pick each 
file up and post it for you using SolrJ
b

_
{Beto|Norberto|Numard} Meijome

"Mix a little foolishness with your serious plans;
it's lovely to be silly at the right moment."
   Horace

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.

Adding new Fields ?

2009-07-08 Thread Saeli Mathieu

Hello.

I posted recently in this ML a script to transform any xml files in Solr's
xml files.
Anyway.
I've got a problem when I want to index my file, the indexation script from
the demonstration works perfectly, but now the only problem is, I can make
any research on this document.

I added



 and
 

In schema.xml file.


Did I forgot something ?

-- 
Saeli Mathieu.

All in one index, or multiple indexes?

2009-07-08 Thread Tim Sell

Hi,
I am wondering if it is common to have just one very large index, or
multiple smaller indexes specialized for different content types.

We currently have multiple smaller indexes, although one of them is
much larger then the others. We are considering merging them, to allow
the convenience of searching across multiple types at once and get
them back in one list. The largest of the current indexes has a couple
of types that belong together, it has just one text field, and it is
usually quite short and is similar to product names (words like "The"
matter). Another index I would merge with this one, has multiple text
fields (also quite short).

We of course would still like to be able to get specific types. Is
doing filtering on just one type a big performance hit compared to
just querying it from it's own index? Bare in mind all these indexes
run on the same machine. (we replicate them all to three machines and
do load balancing).

There are a number of considerations. From an application standpoint
when querying across all types we may split the results out into the
separate types anyway once we have the list back. If we always do
this, is it silly to have them in one index, rather then query
multiple indexes at once? Is multiple http requests less significant
then the time to post split the results?

In some ways it is easier to maintain a single index, although it has
felt easier to optimize the results for the type of content if they
are in separate indexes. My main concern of putting it all in one
index is that we'll make it harder to work with. We will definitely
want to do filtering on types sometimes, and if we go with a mashed up
index I'd prefer not to maintain separate specialized indexes as well.

Any thoughts?

~Tim.

Re: how to do the distributed search with sort using solr?

2009-07-08 Thread Mark Miller

On Wed, Jul 8, 2009 at 6:45 AM, shb  wrote:

> Sorry, the error is as follows. I have read the solr wiki carefully and
> google it, but I haven't founded
> any related question or solution,  any one can help me, thanks!
>
> org.apache.solr.client.solrj.SolrServerException: Error executing query
>at
>
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103)
>at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115)
>at test.MainClassTest.searchTest(MainClassTest.java:88)
>at test.MainClassTest.main(MainClassTest.java:48)
> Caused by: org.apache.solr.common.SolrException:
>


java.net.ConnectException: Connection refused
   at
org.apache.solr.client.solrj.
impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:391)

Are you sure both servers are running properly? You can hit them
individually?


-- 
-- 
- Mark

http://www.lucidimagination.com

Re: Updating Solr index from XML files

2009-07-08 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Jul 8, 2009 at 4:19 PM, Norberto Meijome wrote:
> On Tue, 7 Jul 2009 22:16:04 -0700
> Francis Yakin  wrote:
>
>>
>> I have the following "curl" cmd to update and doing commit to Solr ( I have
>> 10 xml files just for testing)
>
> [...]
>
> hello,
> DIH supports XML, right?
yes.
it supports multiple files too . (use FileListEntityProcessor)
>
> not sure if it works with n files...but it's worth looking at it. 
> alternatively, u can write a relatively simple java app that will pick each 
> file up and post it for you using SolrJ
> b
>
> _
> {Beto|Norberto|Numard} Meijome
>
> "Mix a little foolishness with your serious plans;
> it's lovely to be silly at the right moment."
>   Horace
>
> I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
> Reading disclaimers makes you go blind. Writing them is worse. You have been 
> Warned.
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Updating Solr index from XML files

2009-07-08 Thread Erik Hatcher



On Jul 8, 2009, at 6:49 AM, Norberto Meijome wrote:
alternatively, u can write a relatively simple java app that will  
pick each file up and post it for you using SolrJ


Note that Solr ships with post.jar.  So one could post a bunch of Solr  
XML file like this:


java -jar post.jar *.xml

  Erik

Re: facets and stopwords

2009-07-08 Thread JCodina




hossman wrote:
> 
> 
> but are you sure that example would actually cause a problem?
> i suspect if you index thta exact sentence as is you wouldn't see the 
> facet count for "si" or "que" increase at all.
> 
> If you do a query for "{!raw field=content}que" you bypass the query 
> parsers (which is respecting your stopwords file) and see all docs that 
> contain the raw term "que" in the content field.
> 
> if you look at some of the docs that match, and paste their content field 
> into the analysis tool, i think you'll see that the problem comes from 
> using the whitespace tokenizer, and is masked by using the WDF 
> after the stop filter ... things like "Que?" are getting ignored by the 
> stopfilter, but ultimately winding up in your index as "que"
> 
> 
> -Hoss
> 
> 

Yes your are right, que? que, que... i need to change the analyzer. They are
not detected by the stopwords analyzer because i use the whitespace
tokenizer, I will use the StanadardTokenizer

Thanks Hoss

-- 
View this message in context: 
http://www.nabble.com/facets-and-stopwords-tp23952823p24390157.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding new Fields ?

2009-07-08 Thread Erik Hatcher



On Jul 8, 2009, at 7:06 AM, Saeli Mathieu wrote:


Hello.

I posted recently in this ML a script to transform any xml files in  
Solr's

xml files.
Anyway.
I've got a problem when I want to index my file, the indexation  
script from
the demonstration works perfectly, but now the only problem is, I  
can make

any research on this document.

I added


stored="true"

multiValued="true" omitNorms="true" termVectors="true" />
and


In schema.xml file.


Did I forgot something ?


your field name is not the same as your copyfield source (note the  
"entry" on the source attribute)


Erik

Re: how to do the distributed search with sort using solr?

2009-07-08 Thread shb

>java.net.ConnectException: Connection refused
>  at
>org.apache.solr.client.solrj.
>impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:391)

The "Connection refused" error is caused becaused that the servers have been
stopped.

I>Are you sure both servers are running properly? You can hit them
> individually?

I start both servers and if I comment out "query.setParam("shards",
"localhost:8983/solr, localhost:7574/solr");  " or
"query.setSortField("id",SolrQuery.ORDER.asc);", it will both
work correctly.

However, if I keep them both in the program, I got the error as follows:

  org.apache.solr.client.solrj.SolrServerException: Error executing query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:103)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:115)
at test.MainClassTest.searchTest(MainClassTest.java:88)
at test.MainClassTest.main(MainClassTest.java:48)

Re: Adding new Fields ?

2009-07-08 Thread Saeli Mathieu

Yep I know that, I almost add more than 60 lines in this file :)
It's just an example.

Do you have any idea why when I'm trying to search something, the result of
Solr is equal to 0 ?

I'm looking forward to read you.

-- 
Saeli Mathieu.

Re: Question regarding ExtractingRequestHandler

2009-07-08 Thread Grant Ingersoll

For metadata, you can add the ext.metadata.prefix field and then use a  
dynamic field that maps that prefix, such as:


&ext.metadata.prefix=metadata_

 stored="true"/>



Note, some of this is currently under review to be changed.  See 
https://issues.apache.org/jira/browse/SOLR-284

-Grant

On Jul 7, 2009, at 10:49 AM, ahammad wrote:



Hello,

I've recently started using this handler to index MS Word and PDF  
files.

When I set ext.extract.only=true, I get back all the metadata that is
associated with that file.

If I want to index, I need to set ext.extract.only=false. If I want  
to index
all that metadata along with the contents, what inputs do I need to  
pass to
the http request? Do I have to specifically define all the fields in  
the

schema or can Solr dynamically generate those fields?

Thanks.
--
View this message in context: 
http://www.nabble.com/Question-regarding-ExtractingRequestHandler-tp24374393p24374393.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Placing a CSV file into SOLR Server

2009-07-08 Thread Anand Kumar Prabhakar


Is there any way to Place the CSV file to index in the SOLR Server so that
the file can be indexed and searched. If so please let me know the location
in which we have to place the file. We are looking for a workaround to avoid
the HTTP request to the SOLR server as it is taking much time.
-- 
View this message in context: 
http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24390648.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding new Fields ?

2009-07-08 Thread Erik Hatcher



On Jul 8, 2009, at 8:10 AM, Saeli Mathieu wrote:


Yep I know that, I almost add more than 60 lines in this file :)
It's just an example.

Do you have any idea why when I'm trying to search something, the  
result of

Solr is equal to 0 ?


The first place I start with a general question like is add  
&debugQuery=true and see what the query expression is parsed to, then  
go from there to find out if that is the actually intended query  
(proper fields being used, etc) and then back into the analysis  
process and the data that was indexed.  analysis.jsp comes in real  
handy troubleshooting these things.


Erik

Re: Placing a CSV file into SOLR Server

2009-07-08 Thread Yonik Seeley

from: http://wiki.apache.org/solr/UpdateCSV
"""
The following request will cause Solr to directly read the input file:

curl 
http://localhost:8983/solr/update/csv?stream.file=exampledocs/books.csv&stream.contentType=text/plain;charset=utf-8
#NOTE: The full path, or a path relative to the CWD of the running
solr server must be used.
"""

So you can put it anywhere local and give solr the full path to
directly read it.

-Yonik
http://www.lucidimagination.com



On Wed, Jul 8, 2009 at 8:34 AM, Anand Kumar
Prabhakar wrote:
>
> Is there any way to Place the CSV file to index in the SOLR Server so that
> the file can be indexed and searched. If so please let me know the location
> in which we have to place the file. We are looking for a workaround to avoid
> the HTTP request to the SOLR server as it is taking much time.
> --
> View this message in context: 
> http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24390648.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Adding new Fields ?

2009-07-08 Thread Saeli Mathieu

The research debug is bit wired...

I'll give you a typical example.

I want to find, this word "Cycle"

the field in my xml file is this one
"


..
Cycle 2


"

This field is refered in my schema.xml by this way.

"


"

and
" "

Here is my research in debug mode with this request
http://localhost:8983/solr/select?indent=on&version=2.2&q=Cycle&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl.fl=


"
-
-
0
0
-

*,score
on
on
0
Cycle

standard
standard
2.2
10



-
Cycle
Cycle
text:cycl
text:cycl

OldLuceneQParser
-
0.0
-
0.0
-
0.0

-
0.0

-
0.0

-
0.0

-
0.0


-
0.0
-
0.0

-
0.0

-
0.0

-
0.0

-
0.0





"

I don't know what I'm missing :/

Because I think I add all the necessary information in schema.xml.

-- 
Saeli Mathieu.

Re: Preparing the ground for a real multilang index

2009-07-08 Thread Paul Libbrecht


Can't the copy field use a different analyzer?
Both for query and indexing?
Otherwise you need to craft your own analyzer which reads the language  
from the field-name... there's several classes ready for this.


paul

Le 08-juil.-09 à 02:36, Michael Lackhoff a écrit :


On 08.07.2009 00:50 Jan Høydahl wrote:


itself and do not need to know the query language. You may then want
to do a copyfield from all your text_ -> text for convenient  
one-

field-to-rule-them-all search.


Would that really help? As I understand it, copyfield takes the raw,  
not

yet analyzed field value. I cannot see yet the advantage of this
"text"-field over the current situation with no text_ fields  
at all.
The copied-to text field has to be language agnostic with no  
stemming at

all, so it would miss many hits. Or is there a way to combine many
differently stemmed variants into one field to be able to search  
against

all of them at once? That would be great indeed!

-Michael




smime.p7s
Description: S/MIME cryptographic signature

Re: Placing a CSV file into SOLR Server

2009-07-08 Thread Anand Kumar Prabhakar


Thank you for the input Yonik, anyway again we are sending an HTTP request to
the server, my requirement is to skip the HTTP request to the SOLR server.
Is there any way to avoid these HTTP requests?



Yonik Seeley-2 wrote:
> 
> from: http://wiki.apache.org/solr/UpdateCSV
> """
> The following request will cause Solr to directly read the input file:
> 
> curl
> http://localhost:8983/solr/update/csv?stream.file=exampledocs/books.csv&stream.contentType=text/plain;charset=utf-8
> #NOTE: The full path, or a path relative to the CWD of the running
> solr server must be used.
> """
> 
> So you can put it anywhere local and give solr the full path to
> directly read it.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> 
> On Wed, Jul 8, 2009 at 8:34 AM, Anand Kumar
> Prabhakar wrote:
>>
>> Is there any way to Place the CSV file to index in the SOLR Server so
>> that
>> the file can be indexed and searched. If so please let me know the
>> location
>> in which we have to place the file. We are looking for a workaround to
>> avoid
>> the HTTP request to the SOLR server as it is taking much time.
>> --
>> View this message in context:
>> http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24390648.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Placing-a-CSV-file-into-SOLR-Server-tp24390648p24391630.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr's MLT query call doesn't work

2009-07-08 Thread SergeyG


Hi,

Recently, while implementing the MoreLikeThis search, I've run into the
situation when Solr's mlt query calls don't work. 

More specifically, the following query:

http://localhost:8080/solr/select?q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
5&mlt.interestingTerms=details&fl=title+author+score

brings back just the doc with id=10 and nothing else. While using the
GetMethod approach (putting /mlt explicitely into the url), I got back some
results.

I've been trying to solve this problem for more than a week with no luck. If
anybody has any hint, please help.

Below, I put logs & outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
GetMethod (/select).

Thanks a lot.

Regards,
Sergey Goldberg


Here're the logs: 

a) Solr (http://localhost:8080/solr/select)
08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt=
true&mlt.interestingTerms=details&mlt.maxqt=5&wt=javabin&version=2.2} hits=1
status=0 QTime=172

INFO MLTSearchRequestProcessor:49 - SolrServer url:
http://localhost:8080/solr
INFO MLTSearchRequestProcessor:67 - solrQuery>
q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
5&mlt.interestingTerms=details&fl=title+author+score
INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612


b) GetMethod (http://localhost:8080/solr/mlt)
08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/mlt
params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.max
qt=5&mlt.interestingTerms=details} status=0 QTime=15

INFO MLT2SearchRequestProcessor:76 - 

002.098612S.G.SG_Book0.28923997O.
HenryS.G.Four Million,
The0.08667877Katherine MosbyThe Season
of Lillian Dawes0.07947738Jerome K. JeromeThree Men
in a Boat0.047219563Charles
OliverS.G.ABC's of
Science1.01.01.01.01.0



c) GetMethod (http://localhost:8080/solr/select)
08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.
maxqt=5&mlt.interestingTerms=details} hits=1 status=0 QTime=16

INFO MLT2SearchRequestProcessor:80 - 

016title author
scorecontent_mltid:105details2.098612S.G.SG_Bookid:10id:10id:10id:10
2.098612 = (MATCH) weight(id:10 in 3), product of:
  0.9994 = queryWeight(id:10), product of:
2.0986123 = idf(docFreq=1, numDocs=5)
0.47650534 = queryNorm
  2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
1.0 = tf(termFreq(id:10)=1)
2.0986123 = idf(docFreq=1, numDocs=5)
1.0 = fieldNorm(field=id, doc=3)
OldLuceneQParser16.00.00.00.00.00.00.016.00.00.00.00.016.0



And here're the relevant entries from solrconfig.xml:

 
   

  explicit
  id,title,author,score
  on

 

 

  1
  10

 
-- 
View this message in context: 
http://www.nabble.com/Solr%27s-MLT-query-call-doesn%27t-work-tp24391843p24391843.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Placing a CSV file into SOLR Server

2009-07-08 Thread Yonik Seeley

On Wed, Jul 8, 2009 at 9:33 AM, Anand Kumar
Prabhakar wrote:
> Thank you for the input Yonik, anyway again we are sending an HTTP request to
> the server, my requirement is to skip the HTTP request to the SOLR server.
> Is there any way to avoid these HTTP requests?

You're sending a tiny HTTP request to the server that tells Solr to
directly read the big CSV file from disk... that should satisfy the
requirement which seemed to stem from the desire to avoid network
overhead, no?

-Yonik
http://www.lucidimagination.com

Re: Solr's MLT query call doesn't work

2009-07-08 Thread Yao Ge


A couple of things, your mlt.fl value, must be part of fl. In this case,
content_mlt is not included in fl.
I think the fl parameter value need to be comma separated. try
fl=title,author,content_mlt,score

-Yao

SergeyG wrote:
> 
> Hi,
> 
> Recently, while implementing the MoreLikeThis search, I've run into the
> situation when Solr's mlt query calls don't work. 
> 
> More specifically, the following query:
> 
> http://localhost:8080/solr/select?q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
> 5&mlt.interestingTerms=details&fl=title+author+score
> 
> brings back just the doc with id=10 and nothing else. While using the
> GetMethod approach (putting /mlt explicitely into the url), I got back
> some results.
> 
> I've been trying to solve this problem for more than a week with no luck.
> If anybody has any hint, please help.
> 
> Below, I put logs & outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
> GetMethod (/select).
> 
> Thanks a lot.
> 
> Regards,
> Sergey Goldberg
> 
> 
> Here're the logs: 
> 
> a) Solr (http://localhost:8080/solr/select)
> 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt=
> true&mlt.interestingTerms=details&mlt.maxqt=5&wt=javabin&version=2.2}
> hits=1 status=0 QTime=172
> 
> INFO MLTSearchRequestProcessor:49 - SolrServer url:
> http://localhost:8080/solr
> INFO MLTSearchRequestProcessor:67 - solrQuery>
> q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
>   5&mlt.interestingTerms=details&fl=title+author+score
> INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
> INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612
> 
> 
> b) GetMethod (http://localhost:8080/solr/mlt)
> 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/mlt
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.max
> qt=5&mlt.interestingTerms=details} status=0 QTime=15
> 
> INFO MLT2SearchRequestProcessor:76 -  encoding="UTF-8"?>
> 
> 0 name="QTime">0 maxScore="2.098612">2.098612S.G. name="title">SG_Book umFound="4" start="0" maxScore="0.28923997"> name="score">0.28923997O.
> HenryS.G.Four Million,
> The0.08667877 name="author">Katherine MosbyThe Season
> of Lillian Dawes name="score">0.07947738Jerome K.
> JeromeThree Men in a
> Boat name="score">0.047219563Charles
> OliverS.G.ABC's of
> Science name="content_mlt:ye">1.0 name="content_mlt:tobin">1.0 name="content_mlt:a">1.0 name="content_mlt:i">1.0 name="content_mlt:his">1.0
> 
> 
> 
> c) GetMethod (http://localhost:8080/solr/select)
> 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.
> maxqt=5&mlt.interestingTerms=details} hits=1 status=0 QTime=16
> 
> INFO MLT2SearchRequestProcessor:80 -  encoding="UTF-8"?>
> 
> 0 name="QTime">16title author
> scorecontent_mlt name="q">id:105 name="mlt.interestingTerms">details name="response" numFound="1" start="0" maxScore="2.098612"> name="score">2.098612S.G. name="title">SG_Book name="rawquerystring">id:10id:10 name="parsedq
> uery">id:10id:10 name="explain">
> 2.098612 = (MATCH) weight(id:10 in 3), product of:
>   0.9994 = queryWeight(id:10), product of:
> 2.0986123 = idf(docFreq=1, numDocs=5)
> 0.47650534 = queryNorm
>   2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
> 1.0 = tf(termFreq(id:10)=1)
> 2.0986123 = idf(docFreq=1, numDocs=5)
> 1.0 = fieldNorm(field=id, doc=3)
> OldLuceneQParser name="timing">16.0 name="time">0.0 name="org.apache.solr.handler.component.QueryComponent"> name="time">0.0 name="org.apache.solr.handler.component.FacetComponent"> name="time">0.00.0 name="org.apache.solr.handler.component.HighlightComponent"> name="time">0.0 name="org.apache.solr.handler.component.DebugComponent"> name="time">0.0 name="time">16.0 name="org.apache.solr.handler.component.QueryComponent"> name="time">0.0 name="org.apache.solr.handler.component.FacetComponent"> name="time">0.0 name="org.apache.solr.handler.component.MoreLikeThisComponent"> name="time">0.0 name="org.apache.solr.handler.component.HighlightComponent"> name="time">0.0 name="org.apache.solr.handler.component.DebugComponent"> name="time">16.0
> 
> 
> 
> And here're the relevant entries from solrconfig.xml:
> 
>   default="true">
>
> 
>   explicit
>   id,title,author,score
>   on
> 
>  
> 
>  
> 
>   1
>   10
> 
>  
> 

-- 
View this message in context: 
http://www.nabble.com/Solr%27s-MLT-query-call-doesn%27t-work-tp24391843p24391918.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: about defaultSearchField

2009-07-08 Thread Yang Lin

Thanks for your reply. But it works not.

Yang

2009/7/8 Yao Ge 

>
> Try with fl=* or fl=*,score added to your request string.
> -Yao
>
> Yang Lin-2 wrote:
> >
> > Hi,
> > I have some problems.
> > For my solr progame, I want to type only the Query String and get all
> > field
> > result that includ the Query String. But now I can't get any result
> > without
> > specified field. For example, query with "tina" get nothing, but
> > "Sentence:tina" could.
> >
> > I hava adjusted the *schema.xml* like this:
> >
> > 
> >> >> stored="true" multiValued="true"/>
> >> >> stored="true" multiValued="true"/>
> >> >> stored="true" multiValued="true"/>
> >> >> multiValued="true"/>
> >>
> >> >> multiValued="true"/>
> >> 
> >>
> >> Sentence
> >>
> >>  
> >>  allText
> >>
> >>  
> >>  
> >>
> >> 
> >> 
> >> 
> >> 
> >
> >
> > I think the problem is in , but I don't know how to
> > fix
> > it. Could anyone help me?
> >
> > Thanks
> > Yang
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Using relevance scores for psuedo-random-probabilistic ordenation

2009-07-08 Thread Raimon Bosch



Hi,

I've just implemented my PseudoRandomFieldComparator (migrated from
PseudoRandomComparatorSource). The problem that I see is that I don't have
acces to the relevance's scores in the deprecated
PseudoRandomComparatorSource. I'm trying to fill the scores from my
PseudoRandomComponent (in the process() method).

I don't know if use a PseudoRandomComparator that extends from
QueryComponent and then repeat the query or sth similar like reorder my
doclist, or if use two diferent components QueryComponent and
PseudoComponent (extends from SearchComponent) and look for a good
combination.

How can I have my relevance scores on my PseudoRandomFieldComparator? Any
ideas?


Regards,
Raimon Bosch.
-- 
View this message in context: 
http://www.nabble.com/Using-relevance-scores-for-psuedo-random-probabilistic-ordenation-tp24392432p24392432.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding new Fields ?

2009-07-08 Thread Saeli Mathieu

I don't really know how to solve my problem :/

On Wed, Jul 8, 2009 at 3:16 PM, Saeli Mathieu wrote:

> The research debug is bit wired...
>
> I'll give you a typical example.
>
> I want to find, this word "Cycle"
>
> the field in my xml file is this one
> "
> 
> 
> ..
> Cycle 2
> 
> 
> "
>
> This field is refered in my schema.xml by this way.
>
> "
> 
>  indexed="true" stored="true" multiValued="true" omitNorms="true"
> termVectors="true" />
> "
>
> and
> "  dest="text"/>"
>
> Here is my research in debug mode with this request
> http://localhost:8983/solr/select?indent=on&version=2.2&q=Cycle&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl.fl=
>
>
> "
> -
> -
> 0
> 0
> -
> 
> *,score
> on
> on
> 0
> Cycle
> 
> standard
> standard
> 2.2
> 10
> 
> 
> 
> -
> Cycle
> Cycle
> text:cycl
> text:cycl
> 
> OldLuceneQParser
> -
> 0.0
> -
> 0.0
> -
> 0.0
> 
> -
> 0.0
> 
> -
> 0.0
> 
> -
> 0.0
> 
> -
> 0.0
> 
> 
> -
> 0.0
> -
> 0.0
> 
> -
> 0.0
> 
> -
> 0.0
> 
> -
> 0.0
> 
> -
> 0.0
> 
> 
> 
> 
> 
> "
>
> I don't know what I'm missing :/
>
> Because I think I add all the necessary information in schema.xml.
>
> --
> Saeli Mathieu.
>



-- 
Saeli Mathieu.

Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread ahammad


Hello,

I can index rich documents like pdf for instance that are on the filesystem.
Can we use ExtractingRequestHandler to index files that are accessible on a
website?

For example, there is a file that can be reached like so:
http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf

How would I go about indexing that file? I tried using the following
combinations. I will put the errors in brackets:

stream.file=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The
filename, directory name, or volume label syntax is incorrect)
stream.file=www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The system
cannot find the path specified)
stream.file=//www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The format of
the specified network name is invalid)
stream.file=sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot
find the path specified)
stream.file=//sub.myDomain.com/files/pdfdocs/testfile.pdf (The network path
was not found)

I sort of understand why I get those errors. What are the alternative
methods of doing this? I am guessing that the stream.file attribute doesn't
support web addresses. Is there another attribute that does?
-- 
View this message in context: 
http://www.nabble.com/Indexing--rich-documents-from-websites-using-ExtractingRequestHandler-tp24392809p24392809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding new Fields ?

2009-07-08 Thread Jon Gorman

I think at least you need to review your import process.  If nothing
indexed, there's going to be nothing that matched.  We need a little
more information.  Stuff like a short but concise test sample of what
you're trying to index, how you're submitting the http request and the
commit request (you did commit, right?), what messages you're getting
when you do index and then commit.

I didn't look too closely at your last code example, but I would
recommend using some XML libraries.  If I remember it didn't.

Most folks seem to process xml files for indexing by using the source
xml files to create new files just for indexing.  There's an
identifier, which is usually used to link back to the source xml file
in the application you design.

Jon Gorman

Re: Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread Glen Newton

Try putting all the PDF URLs into a file, download with something like
'wget' then index locally.

Glen Newton
http://zzzoot.blogspot.com/

2009/7/8 ahammad :
>
> Hello,
>
> I can index rich documents like pdf for instance that are on the filesystem.
> Can we use ExtractingRequestHandler to index files that are accessible on a
> website?
>
> For example, there is a file that can be reached like so:
> http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf
>
> How would I go about indexing that file? I tried using the following
> combinations. I will put the errors in brackets:
>
> stream.file=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The
> filename, directory name, or volume label syntax is incorrect)
> stream.file=www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The system
> cannot find the path specified)
> stream.file=//www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The format of
> the specified network name is invalid)
> stream.file=sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot
> find the path specified)
> stream.file=//sub.myDomain.com/files/pdfdocs/testfile.pdf (The network path
> was not found)
>
> I sort of understand why I get those errors. What are the alternative
> methods of doing this? I am guessing that the stream.file attribute doesn't
> support web addresses. Is there another attribute that does?
> --
> View this message in context: 
> http://www.nabble.com/Indexing--rich-documents-from-websites-using-ExtractingRequestHandler-tp24392809p24392809.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 

-

Re: Adding new Fields ?

2009-07-08 Thread Erick Erickson

Have you thought about looking at your index with Luke to see ifwhat you
expect to be there is actually there?

Best
Erick

On Wed, Jul 8, 2009 at 11:28 AM, Jon Gorman wrote:

> I think at least you need to review your import process.  If nothing
> indexed, there's going to be nothing that matched.  We need a little
> more information.  Stuff like a short but concise test sample of what
> you're trying to index, how you're submitting the http request and the
> commit request (you did commit, right?), what messages you're getting
> when you do index and then commit.
>
> I didn't look too closely at your last code example, but I would
> recommend using some XML libraries.  If I remember it didn't.
>
> Most folks seem to process xml files for indexing by using the source
> xml files to create new files just for indexing.  There's an
> identifier, which is usually used to link back to the source xml file
> in the application you design.
>
> Jon Gorman
>

SolrException - Lock obtain timed out, no leftover locks

2009-07-08 Thread danben

Hi,

I'm running Solr 1.3.0 in multicore mode and feeding it data from which the
core name is inferred from a specific field. My service extracts the core
name and, if it has not seen it before, issues a create request for that
core before attempting to add the document (via SolrJ). I have a pool of
MyIndexers that run in parallel, taking documents from a queue and adding
them via the add method on the SolrServer instance corresponding to that
core (exactly one per core exists). Each core is in a separate data
directory. My timeouts are set as such:

15000
25000

I remove the index directories, start the server, check that no locks exist,
and generate ~500 documents spread across 5 cores for the MyIndexers to
handle. Each time, I see one or more exceptions with a message like

Lock_obtain_timed_out_SimpleFSLockmulticoreNewUser3dataindexlucenebd4994617386d14e2c8c29e23bcca719writelock__orgapachelucenestoreLockObtainFailedException_Lock_obtain_timed_out_...

When the indexers have completed, no lock is left over. There is no
discernible pattern as far as when the exception occurs (ie, it does not
tend to happen on the first or last or any particular document).

Interestingly, this problem does not happen when I have only a single
MyIndexer, or if I have a pool of MyIndexers and am running in single core
mode.

I've looked at the other posts from users getting this exception but it
always seemed to be a different case, such as the server having crashed
previously and a lock file being left over.

--
View this message in context:
http://www.nabble.com/SolrException---Lock-obtain-timed-out%2C-no-leftover-locks-tp24393255p24393255.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: about defaultSearchField

2009-07-08 Thread Jay Hill

Just to be sure: You mentioned that you "adjusted" schema.xml - did you
re-index after making your changes?

-Jay


On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin  wrote:

> Thanks for your reply. But it works not.
>
> Yang
>
> 2009/7/8 Yao Ge 
>
> >
> > Try with fl=* or fl=*,score added to your request string.
> > -Yao
> >
> > Yang Lin-2 wrote:
> > >
> > > Hi,
> > > I have some problems.
> > > For my solr progame, I want to type only the Query String and get all
> > > field
> > > result that includ the Query String. But now I can't get any result
> > > without
> > > specified field. For example, query with "tina" get nothing, but
> > > "Sentence:tina" could.
> > >
> > > I hava adjusted the *schema.xml* like this:
> > >
> > > 
> > >> > >> stored="true" multiValued="true"/>
> > >> > >> stored="true" multiValued="true"/>
> > >> > >> stored="true" multiValued="true"/>
> > >> > >> multiValued="true"/>
> > >>
> > >> > >> multiValued="true"/>
> > >> 
> > >>
> > >> Sentence
> > >>
> > >>  
> > >>  allText
> > >>
> > >>  
> > >>  
> > >>
> > >> 
> > >> 
> > >> 
> > >> 
> > >
> > >
> > > I think the problem is in , but I don't know how to
> > > fix
> > > it. Could anyone help me?
> > >
> > > Thanks
> > > Yang
> > >
> > >
> >
> > --
> > View this message in context:
> > http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>

Re: Adding new Fields ?

2009-07-08 Thread Saeli Mathieu

Here is my result when I'm adding a file to solr

{...@framboise.}:java -jar post.jar
FinalParsing.xml
[18:37]#25
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file FinalParsing.xml
SimplePostTool: COMMITting Solr index changes..
{...@framboise.}:


Here is my typical xml file.



   0
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT



here is my schema.xml configuration.


 
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  f
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  




 









































































-- 
Saeli Mathieu.

expand synonyms without tokenizing stream?

2009-07-08 Thread Don Clore

I'm pretty new to solr; my apologies if this is a naive question, and my
apologies for the verbosity:
I'd like to take keywords in my documents, and expand them as synonyms; for
example, if the document gets annotated with a keyword of 'sf', I'd like
that to expand to 'San Francisco'.  (San Francisco,San Fran,SF is a line in
my synonyms.txt file).

But I also want to be able to display facets with counts for these keywords;
I'd like them to be suitable for display.

So, if I define the keywords field as 'text', I use the following pipeline
(from my schema.xml):

  


  




Faceting on this field, I get return values (when I query specifically
for the single document in question):

  
1
1
1
1
  

I've also done a copyfield to a 'KeywordsString' field, which is
defined as "string". i.e.



Faceting on *that* field (when querying for just this 1 document,
which has a keyword of 'sf'), results in:

  
1
  

I guess what I'd like to see is the ability to stamp keywords like
'sf', 'san fran', 'san francisco', and 'mlb' (with a synonyms.txt file
entry of mlb => Major League Baseball, and see all the documents that
are inscribed with all those synonym variants, come back as:

  
1

   1




But, I don't know how to define a processing pipeline that expands
synonyms that doesn't tokenize them, breaking 'San Francisco' into
'san' and 'francisco', and presenting those as separate facets.

Thanks for any help,

Don

Re: Multiple values for custom fields provided in SOLR query

2009-07-08 Thread Otis Gospodnetic


Suryasnat,

I suggest you go to your Solr Admin page and run a few searches from there, 
using Lucene query syntax (link on Lucene site).
e.g.
fieldID:111 AND fieldID:222 AND fieldID:333 AND foo:product

then eplace ANDs with ORs where appropriate

That should give you an idea/feel about which query you need.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Suryasnat Das 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, July 7, 2009 12:16:30 PM
> Subject: Re: Multiple values for custom fields provided in SOLR query
> 
> Hi Otis,
> 
> Thanks for replying to my query.
> 
> My query is, if multiple values are provided for a custom field then how can
> it be represented in a SOLR query. So if my field is fileID and its values
> are 111, 222 and 333 and my search string is ‘product’ then how can this be
> represented in a SOLR query? I want to perform the search on basis of
> fileIDs *and* search string provided.
> 
> If i provide the query in the format,
> q=fileID:111+fileID:222+fileID:333+product, then how will it actually
> search? Can you please provide me the correct format of the query?
> 
> Regards
> 
> Suryasnat Das
> 
> On Mon, Jul 6, 2009 at 10:05 PM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
> 
> >
> > I actually don't fully understand your question.
> > q=+fileID:111+fileID:222+fileID:333+apple looks like a valid query to me.
> > (not sure what that space encoded as + is, though)
> >
> > Also not sure what you mean by:
> > > Basically the requirement is , if fileIDs are provided as search
> > parameter
> > > then search should happen on the basis of fileID.
> >
> >
> > Do you mean "apple" should be ignored if a term (field name:field value) is
> > provided?
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> > > From: Suryasnat Das 
> > > To: solr-user@lucene.apache.org
> > > Sent: Monday, July 6, 2009 11:31:10 AM
> > > Subject: Multiple values for custom fields provided in SOLR query
> > >
> > > Hi,
> > > I have a requirement in which i need to have multiple values in my custom
> > > fields while forming the search query to SOLR. For example,
> > > fileID is my custom field. I have defined the fileID in schema.xml as
> > > name="fileID" type="string" indexed="true" stored="true" required="true"
> > > multiValued="true"/>.
> > > Now fileID can have multiple values like 111,222,333 etc. So will my
> > query
> > > be of the form,
> > >
> > > q=+fileID:111+fileID:222+fileID:333+apple
> > >
> > > where apple is my search query string. I tried with the above query but
> > it
> > > did not work. SOLR gave invalid query error.
> > > Basically the requirement is , if fileIDs are provided as search
> > parameter
> > > then search should happen on the basis of fileID.
> > >
> > > Is my approach correct or i need to do something else? Please, if
> > immediate
> > > help is provided then that would be great.
> > >
> > > Regards
> > > Suryasnat Das
> > > Infosys.
> >
> >

Re: Solr's MLT query call doesn't work

2009-07-08 Thread Otis Gospodnetic


Sergey,

What about 
http://localhost:8080/solr/select?q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
5&mlt.interestingTerms=details&fl=title+author+score&qt=mlt

?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: SergeyG 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, July 8, 2009 9:44:20 AM
> Subject: Solr's MLT query call doesn't work
> 
> 
> Hi,
> 
> Recently, while implementing the MoreLikeThis search, I've run into the
> situation when Solr's mlt query calls don't work. 
> 
> More specifically, the following query:
> 
> http://localhost:8080/solr/select?q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
> 5&mlt.interestingTerms=details&fl=title+author+score
> 
> brings back just the doc with id=10 and nothing else. While using the
> GetMethod approach (putting /mlt explicitely into the url), I got back some
> results.
> 
> I've been trying to solve this problem for more than a week with no luck. If
> anybody has any hint, please help.
> 
> Below, I put logs & outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
> GetMethod (/select).
> 
> Thanks a lot.
> 
> Regards,
> Sergey Goldberg
> 
> 
> Here're the logs: 
> 
> a) Solr (http://localhost:8080/solr/select)
> 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt=
> true&mlt.interestingTerms=details&mlt.maxqt=5&wt=javabin&version=2.2} hits=1
> status=0 QTime=172
> 
> INFO MLTSearchRequestProcessor:49 - SolrServer url:
> http://localhost:8080/solr
> INFO MLTSearchRequestProcessor:67 - solrQuery>
> q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
> 5&mlt.interestingTerms=details&fl=title+author+score
> INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
> INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612
> 
> 
> b) GetMethod (http://localhost:8080/solr/mlt)
> 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/mlt
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.max
> qt=5&mlt.interestingTerms=details} status=0 QTime=15
> 
> INFO MLT2SearchRequestProcessor:76 - 
> 
> 0
> name="QTime">0
> maxScore="2.098612">2.098612S.G.
> name="title">SG_Book
> umFound="4" start="0" maxScore="0.28923997">
> name="score">0.28923997O.
> HenryS.G.Four Million,
> The0.08667877
> name="author">Katherine MosbyThe Season
> of Lillian Dawes0.07947738
> name="author">Jerome K. JeromeThree Men
> in a Boat
> name="score">0.047219563Charles
> OliverS.G.ABC's of
> Science
> name="content_mlt:ye">1.0
> name="content_mlt:tobin">1.0
> name="content_mlt:a">1.0
> name="content_mlt:i">1.0
> name="content_mlt:his">1.0
> 
> 
> 
> c) GetMethod (http://localhost:8080/solr/select)
> 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.
> maxqt=5&mlt.interestingTerms=details} hits=1 status=0 QTime=16
> 
> INFO MLT2SearchRequestProcessor:80 - 
> 
> 0
> name="QTime">16title author
> scorecontent_mltid:10
> name="mlt.maxqt">5
> name="mlt.interestingTerms">details
> numFound="1" start="0" maxScore="2.098612">
> name="score">2.098612S.G.
> name="title">SG_Book
> name="rawquerystring">id:10id:10
> name="parsedq
> uery">id:10id:10
> name="explain">
> 2.098612 = (MATCH) weight(id:10 in 3), product of:
>   0.9994 = queryWeight(id:10), product of:
> 2.0986123 = idf(docFreq=1, numDocs=5)
> 0.47650534 = queryNorm
>   2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
> 1.0 = tf(termFreq(id:10)=1)
> 2.0986123 = idf(docFreq=1, numDocs=5)
> 1.0 = fieldNorm(field=id, doc=3)
> OldLuceneQParser
> name="timing">16.0
> name="time">0.0
> name="org.apache.solr.handler.component.QueryComponent">
> name="time">0.0
> name="org.apache.solr.handler.component.FacetComponent">
> name="time">0.00.0
> name="org.apache.solr.handler.component.HighlightComponent">
> name="time">0.0
> name="org.apache.solr.handler.component.DebugComponent">
> name="time">0.0
> name="time">16.0
> name="org.apache.solr.handler.component.QueryComponent">
> name="time">0.0
> name="org.apache.solr.handler.component.FacetComponent">
> name="time">0.0
> name="org.apache.solr.handler.component.MoreLikeThisComponent">
> name="time">0.0
> name="org.apache.solr.handler.component.HighlightComponent">
> name="time">0.0
> name="org.apache.solr.handler.component.DebugComponent">
> name="time">16.0
> 
> 
> 
> And here're the relevant entries from solrconfig.xml:
> 
> 
>   
> 
>   explicit
>   id,title,author,score
>   on
> 
> 
> 
> 
> 
>   1
>   10
> 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Solr%27s-MLT-query-call-doesn%27t-work-tp24391843p24391843.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread Jay Hill

I haven't tried this myself, but it sounds like what you're looking for is
enabling remote streaming:
http://wiki.apache.org/solr/ContentStream#head-7179a128a2fdd5dde6b1af553ed41735402aadbf

As the link above shows you should be able to enable remote streaming like
this:   and then something like this might work:
stream.url=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf

So you use stream.url instead of stream.file.

Hope this helps.

-Jay


On Wed, Jul 8, 2009 at 7:40 AM, ahammad  wrote:

>
> Hello,
>
> I can index rich documents like pdf for instance that are on the
> filesystem.
> Can we use ExtractingRequestHandler to index files that are accessible on a
> website?
>
> For example, there is a file that can be reached like so:
> http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf
>
> How would I go about indexing that file? I tried using the following
> combinations. I will put the errors in brackets:
>
> stream.file=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The
> filename, directory name, or volume label syntax is incorrect)
> stream.file=www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The system
> cannot find the path specified)
> stream.file=//www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The format
> of
> the specified network name is invalid)
> stream.file=sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot
> find the path specified)
> stream.file=//sub.myDomain.com/files/pdfdocs/testfile.pdf (The network
> path
> was not found)
>
> I sort of understand why I get those errors. What are the alternative
> methods of doing this? I am guessing that the stream.file attribute doesn't
> support web addresses. Is there another attribute that does?
> --
> View this message in context:
> http://www.nabble.com/Indexing--rich-documents-from-websites-using-ExtractingRequestHandler-tp24392809p24392809.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Solr's MLT query call doesn't work

2009-07-08 Thread Bill Au

You definitely need "mlt=true" if you are not using /solr/mlt.

Bill

On Wed, Jul 8, 2009 at 2:14 PM, Otis Gospodnetic  wrote:

>
> Sergey,
>
> What about
> http://localhost:8080/solr/select?q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
> 5&mlt.interestingTerms=details&fl=title+author+score&qt=mlt
>
> ?
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: SergeyG 
> > To: solr-user@lucene.apache.org
> > Sent: Wednesday, July 8, 2009 9:44:20 AM
> > Subject: Solr's MLT query call doesn't work
> >
> >
> > Hi,
> >
> > Recently, while implementing the MoreLikeThis search, I've run into the
> > situation when Solr's mlt query calls don't work.
> >
> > More specifically, the following query:
> >
> >
> http://localhost:8080/solr/select?q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
> > 5&mlt.interestingTerms=details&fl=title+author+score
> >
> > brings back just the doc with id=10 and nothing else. While using the
> > GetMethod approach (putting /mlt explicitely into the url), I got back
> some
> > results.
> >
> > I've been trying to solve this problem for more than a week with no luck.
> If
> > anybody has any hint, please help.
> >
> > Below, I put logs & outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
> > GetMethod (/select).
> >
> > Thanks a lot.
> >
> > Regards,
> > Sergey Goldberg
> >
> >
> > Here're the logs:
> >
> > a) Solr (http://localhost:8080/solr/select)
> > 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
> > INFO: [] webapp=/solr path=/select
> > params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt=
> > true&mlt.interestingTerms=details&mlt.maxqt=5&wt=javabin&version=2.2}
> hits=1
> > status=0 QTime=172
> >
> > INFO MLTSearchRequestProcessor:49 - SolrServer url:
> > http://localhost:8080/solr
> > INFO MLTSearchRequestProcessor:67 - solrQuery>
> > q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
> > 5&mlt.interestingTerms=details&fl=title+author+score
> > INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
> > INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612
> >
> >
> > b) GetMethod (http://localhost:8080/solr/mlt)
> > 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
> > INFO: [] webapp=/solr path=/mlt
> > params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.max
> > qt=5&mlt.interestingTerms=details} status=0 QTime=15
> >
> > INFO MLT2SearchRequestProcessor:76 -
> >
> > 0
> > name="QTime">0
> > maxScore="2.098612">2.098612S.G.
> > name="title">SG_Book
> > umFound="4" start="0" maxScore="0.28923997">
> > name="score">0.28923997O.
> > HenryS.G.Four Million,
> > The0.08667877
> > name="author">Katherine MosbyThe Season
> > of Lillian Dawes0.07947738
> > name="author">Jerome K. JeromeThree Men
> > in a Boat
> > name="score">0.047219563Charles
> > OliverS.G.ABC's of
> > Science
> > name="content_mlt:ye">1.0
> > name="content_mlt:tobin">1.0
> > name="content_mlt:a">1.0
> > name="content_mlt:i">1.0
> > name="content_mlt:his">1.0
> >
> >
> >
> > c) GetMethod (http://localhost:8080/solr/select)
> > 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
> > INFO: [] webapp=/solr path=/select
> > params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.
> > maxqt=5&mlt.interestingTerms=details} hits=1 status=0 QTime=16
> >
> > INFO MLT2SearchRequestProcessor:80 -
> >
> > 0
> > name="QTime">16title author
> > scorecontent_mltid:10
> > name="mlt.maxqt">5
> > name="mlt.interestingTerms">details
> > numFound="1" start="0" maxScore="2.098612">
> > name="score">2.098612S.G.
> > name="title">SG_Book
> > name="rawquerystring">id:10id:10
> > name="parsedq
> > uery">id:10id:10
> > name="explain">
> > 2.098612 = (MATCH) weight(id:10 in 3), product of:
> >   0.9994 = queryWeight(id:10), product of:
> > 2.0986123 = idf(docFreq=1, numDocs=5)
> > 0.47650534 = queryNorm
> >   2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
> > 1.0 = tf(termFreq(id:10)=1)
> > 2.0986123 = idf(docFreq=1, numDocs=5)
> > 1.0 = fieldNorm(field=id, doc=3)
> > OldLuceneQParser
> > name="timing">16.0
> > name="time">0.0
> > name="org.apache.solr.handler.component.QueryComponent">
> > name="time">0.0
> > name="org.apache.solr.handler.component.FacetComponent">
> > name="time">0.00.0
> > name="org.apache.solr.handler.component.HighlightComponent">
> > name="time">0.0
> > name="org.apache.solr.handler.component.DebugComponent">
> > name="time">0.0
> > name="time">16.0
> > name="org.apache.solr.handler.component.QueryComponent">
> > name="time">0.0
> > name="org.apache.solr.handler.component.FacetComponent">
> > name="time">0.0
> > name="org.apache.solr.handler.component.MoreLikeThisComponent">
> > name="time">0.0
> > name="org.apache.solr.handler.component.HighlightComponent">
> > name="time">0.0
> > name="org.apache.solr.handler.comp

Re: about defaultSearchField

2009-07-08 Thread Yang Lin

Yes, I have deleted whole "index" directory and re-index after making
changes.

Yang


2009/7/8 Jay Hill 

> Just to be sure: You mentioned that you "adjusted" schema.xml - did you
> re-index after making your changes?
>
> -Jay
>
>
> On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin  wrote:
>
> > Thanks for your reply. But it works not.
> >
> > Yang
> >
> > 2009/7/8 Yao Ge 
> >
> > >
> > > Try with fl=* or fl=*,score added to your request string.
> > > -Yao
> > >
> > > Yang Lin-2 wrote:
> > > >
> > > > Hi,
> > > > I have some problems.
> > > > For my solr progame, I want to type only the Query String and get all
> > > > field
> > > > result that includ the Query String. But now I can't get any result
> > > > without
> > > > specified field. For example, query with "tina" get nothing, but
> > > > "Sentence:tina" could.
> > > >
> > > > I hava adjusted the *schema.xml* like this:
> > > >
> > > > 
> > > >> > > >> stored="true" multiValued="true"/>
> > > >> > > >> stored="true" multiValued="true"/>
> > > >> > > >> stored="true" multiValued="true"/>
> > > >> > > >> multiValued="true"/>
> > > >>
> > > >> > > >> multiValued="true"/>
> > > >> 
> > > >>
> > > >> Sentence
> > > >>
> > > >>  
> > > >>  allText
> > > >>
> > > >>  
> > > >>  
> > > >>
> > > >> 
> > > >> 
> > > >> 
> > > >> 
> > > >
> > > >
> > > > I think the problem is in , but I don't know how
> to
> > > > fix
> > > > it. Could anyone help me?
> > > >
> > > > Thanks
> > > > Yang
> > > >
> > > >
> > >
> > > --
> > > View this message in context:
> > >
> http://www.nabble.com/about-defaultSearchField-tp24382105p24384615.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> > >
> >
>

Re: Solr's MLT query call doesn't work

2009-07-08 Thread SergeyG


Many thanks to everybody who replied to my message. 

1. "A couple of things, your mlt.fl value, must be part of fl. In this case,
content_mlt is not included in fl.
I think the fl parameter value need to be comma separated. try
fl=title,author,content_mlt,score"

Yao, 

Although I don't understand why mlt.fl must be part of fl (at least, I
didn't see this mentioned anywhere), I included this field into fl. But this
didn't change anything. As to the syntax, both
"fl=title,author,content_mlt,score" and "fl=title author content_mlt score"
produced the same output (what, again, was the exactly the same as the one
with "fl=title author score").

2. "You definitely need "mlt=true" if you are not using /solr/mlt."

Bill, 

"mlt=true" was included in the query while making the Solr call from the
very beginning.

3. What about 
http://localhost:8080/solr/select?q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=%0A5&mlt.interestingTerms=details&fl=title+author+score&qt=mlt

Otis,

I tried that too and got this:

INFO MLTSearchRequestProcessor:69 - solrQuery>
q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
5&mlt.interestingTerms=details&fl=title+author+score&qt=mlt
ERROR MLTSearchRequestProcessor:88 - Error executing query

INFO MLTSearchRequestProcessor:69 - solrQuery>
q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+author+content_mlt+score&qt=mlt
ERROR MLTSearchRequestProcessor:88 - Error executing query


Well, I didn't expect this to be such a hurdle. And I'm sure that hundreds
of people before me have already done something similar, haven't they? This
really looks bizarre.

Thank you all. (Otis, when I saw your name I got a feeling that it was just
a matter of seconds till this stubborn calls would start doing their job. :)
)

Sergey 



SergeyG wrote:
> 
> Hi,
> 
> Recently, while implementing the MoreLikeThis search, I've run into the
> situation when Solr's mlt query calls don't work. 
> 
> More specifically, the following query:
> 
> http://localhost:8080/solr/select?q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
> 5&mlt.interestingTerms=details&fl=title+author+score
> 
> brings back just the doc with id=10 and nothing else. While using the
> GetMethod approach (putting /mlt explicitely into the url), I got back
> some results.
> 
> I've been trying to solve this problem for more than a week with no luck.
> If anybody has any hint, please help.
> 
> Below, I put logs & outputs from 3 runs: a) Solr; b) GetMethod (/mlt); c)
> GetMethod (/select).
> 
> Thanks a lot.
> 
> Regards,
> Sergey Goldberg
> 
> 
> Here're the logs: 
> 
> a) Solr (http://localhost:8080/solr/select)
> 08.07.2009 15:50:33 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt=
> true&mlt.interestingTerms=details&mlt.maxqt=5&wt=javabin&version=2.2}
> hits=1 status=0 QTime=172
> 
> INFO MLTSearchRequestProcessor:49 - SolrServer url:
> http://localhost:8080/solr
> INFO MLTSearchRequestProcessor:67 - solrQuery>
> q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
>   5&mlt.interestingTerms=details&fl=title+author+score
> INFO MLTSearchRequestProcessor:73 - Number of docs found = 1
> INFO MLTSearchRequestProcessor:77 - title = SG_Book; score = 2.098612
> 
> 
> b) GetMethod (http://localhost:8080/solr/mlt)
> 08.07.2009 16:55:44 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/mlt
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.max
> qt=5&mlt.interestingTerms=details} status=0 QTime=15
> 
> INFO MLT2SearchRequestProcessor:76 -  encoding="UTF-8"?>
> 
> 0 name="QTime">0 maxScore="2.098612">2.098612S.G. name="title">SG_Book umFound="4" start="0" maxScore="0.28923997"> name="score">0.28923997O.
> HenryS.G.Four Million,
> The0.08667877 name="author">Katherine MosbyThe Season
> of Lillian Dawes name="score">0.07947738Jerome K.
> JeromeThree Men in a
> Boat name="score">0.047219563Charles
> OliverS.G.ABC's of
> Science name="content_mlt:ye">1.0 name="content_mlt:tobin">1.0 name="content_mlt:a">1.0 name="content_mlt:i">1.0 name="content_mlt:his">1.0
> 
> 
> 
> c) GetMethod (http://localhost:8080/solr/select)
> 08.07.2009 17:06:45 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={fl=title+author+score&mlt.fl=content_mlt&q=id:10&mlt.
> maxqt=5&mlt.interestingTerms=details} hits=1 status=0 QTime=16
> 
> INFO MLT2SearchRequestProcessor:80 -  encoding="UTF-8"?>
> 
> 0 name="QTime">16title author
> scorecontent_mlt name="q">id:105 name="mlt.interestingTerms">details name="response" numFound="1" start="0" maxScore="2.098612"> name="score">2.098612S.G. name="title">SG_Book name="rawquerystring">id:10id:10 name="parsedq
> uery">id:10id:10 name="explain">
> 2.098612 = (MATCH) weight(id:10 in 3), product of:
>   0.9994 = queryWeight(id:10), product of:
> 2.0986123 = idf(docFreq=1, numDocs=5)
> 0.47650534 = queryNorm
>   2.0986123 = (MATCH) fieldWeight(id:10 in 3),

boosting MoreLikeThis results

2009-07-08 Thread Bill Au

I have a need to boost MoreLikeThis results.  So far the only boost that I
have come across is boosting query field by using mlt.qf.  But what I really
need is to use boost query and boost function like those in the
DisMaxRequestHandler.  Is that possible at all either out-of-the-box or by
writing a custom handler, or are we limited by the MoreLikeThis of Lucene?

Bill

Best way to integrate custom functionality

2009-07-08 Thread Andrew Nguyen


Hello all,

I am working on a project that involves searching through free-text  
fields and would like to add the ability to filter out negative  
expressions at a very simple level.  For example, the field may  
contain the text, "person has no cars."  If the user were to search  
for "cars," I would like to be able to intercept the results and  
return only those without the word "no" in front of the search term.   
While is is a very simple example, it's pretty much my end goal.


I've been reading up on the various hooks provided within Solr but  
wanted to get some guidance on the best way to proceed.


Thanks!

--Andrew

Boosting for most recent documents

2009-07-08 Thread vivek sar

Hi,

  I'm trying to find a way to get the most recent entry for the
searched word. For ex., if I have a document with field name "user".
If I search for user:vivek, I want to get the document that was
indexed most recently. Two ways I could think of,

1) Sort by some time stamp field - but with millions of documents this
becomes a huge memory problem as we have seen OOM with sorting before
2) Boost the most recent document - I'm not sure how to do this.
Basically, we want to have the most recent document score higher than
any other and then we can retrieve just 10 records and sort in the
application by time stamp field to get the most recent document
matching the keyword.

Any suggestion on how can this be done?

Thanks,
-vivek

Re: Boosting for most recent documents

2009-07-08 Thread Otis Gospodnetic


Sort by the internal Lucene document ID and pick the highest one.  That might 
do the job for you.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user 
> Sent: Wednesday, July 8, 2009 8:34:16 PM
> Subject: Boosting for most recent documents
> 
> Hi,
> 
>   I'm trying to find a way to get the most recent entry for the
> searched word. For ex., if I have a document with field name "user".
> If I search for user:vivek, I want to get the document that was
> indexed most recently. Two ways I could think of,
> 
> 1) Sort by some time stamp field - but with millions of documents this
> becomes a huge memory problem as we have seen OOM with sorting before
> 2) Boost the most recent document - I'm not sure how to do this.
> Basically, we want to have the most recent document score higher than
> any other and then we can retrieve just 10 records and sort in the
> application by time stamp field to get the most recent document
> matching the keyword.
> 
> Any suggestion on how can this be done?
> 
> Thanks,
> -vivek

Re: Best way to integrate custom functionality

2009-07-08 Thread Otis Gospodnetic


How about, for example

+cars -"no cars" -"nothing cars"

 
In other words, the basic query is the original query, and then loop over all 
negative words and append exclude phrase clauses like in the above example.
That will find documents that have the word cars in them, but any documents 
with "no cars" phrase or "nothing cars" phrase will be excluded.

Just make sure your negative words are not stopwords.

Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Andrew Nguyen 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, July 8, 2009 7:17:09 PM
> Subject: Best way to integrate custom functionality
> 
> Hello all,
> 
> I am working on a project that involves searching through free-text fields 
> and 
> would like to add the ability to filter out negative expressions at a very 
> simple level.  For example, the field may contain the text, "person has no 
> cars."  If the user were to search for "cars," I would like to be able to 
> intercept the results and return only those without the word "no" in front of 
> the search term.  While is is a very simple example, it's pretty much my end 
> goal.
> 
> I've been reading up on the various hooks provided within Solr but wanted to 
> get 
> some guidance on the best way to proceed.
> 
> Thanks!
> 
> --Andrew

Re: reindexed data on master not replicated to slave

2009-07-08 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Jul 8, 2009 at 10:14 PM, solr jay wrote:
> Thanks. The patch looks good, and I now see the new index directory and it
> is in sync with the one on master. I'll do more testing.
>
> It is probably not important, but I am just curious why we switch index
> directory. I thought it would be easier to just rename index to index.*, and
> rename the new index directory to index.

It is for consistency across OS's . Windows would not let me do a rename.
>
> 2009/7/7 Noble Paul നോബിള്‍ नोब्ळ् 
>>
>> jay,
>> Thanks. The testcase was not enough. I have given a new patch . I
>> guess that should solve this
>>
>> On Wed, Jul 8, 2009 at 3:48 AM, solr jay wrote:
>> > I guess in this case it doesn't matter whether the two directories
>> > tmpIndexDir and indexDir are the same or not. It looks that the index
>> > directory is switched to tmpIndexDir and then it is deleted inside
>> > "finally".
>> >
>> > On Tue, Jul 7, 2009 at 12:31 PM, solr jay  wrote:
>> >
>> >> In fact, I saw the directory was created and then deleted.
>> >>
>> >>
>> >> On Tue, Jul 7, 2009 at 12:29 PM, solr jay  wrote:
>> >>
>> >>> Ok, Here is the problem. In the function, the two directories
>> >>> tmpIndexDir
>> >>> and indexDir are the same (in this case only?), and then at the end of
>> >>> the
>> >>> function, the directory tmpIndexDir is deleted, which deletes the new
>> >>> index
>> >>> directory.
>> >>>
>> >>>
>> >>>       } finally {
>> >>>         delTree(tmpIndexDir);
>> >>>
>> >>>       }
>> >>>
>> >>>
>> >>> On Tue, Jul 7, 2009 at 12:17 PM, solr jay  wrote:
>> >>>
>>  I see. So I tried it again. Now index.properties has
>> 
>>  #index properties
>>  #Tue Jul 07 12:13:49 PDT 2009
>>  index=index.20090707121349
>> 
>>  but there is no such directory index.20090707121349 under the data
>>  directory.
>> 
>>  Thanks,
>> 
>>  J
>> 
>> 
>>  On Tue, Jul 7, 2009 at 11:50 AM, Shalin Shekhar Mangar <
>>  shalinman...@gmail.com> wrote:
>> 
>> > On Tue, Jul 7, 2009 at 11:50 PM, solr jay  wrote:
>> >
>> > > It seemed that the patch fixed the symptom, but not the problem
>> > itself.
>> > >
>> > > Now the log messages looks good. After one download and installed
>> > > the
>> > > index,
>> > > it printed out
>> > >
>> > > *Jul 7, 2009 10:35:10 AM org.apache.solr.handler.SnapPuller
>> > > fetchLatestIndex
>> > > INFO: Slave in sync with master.*
>> > >
>> > > but the files inside index directory did not change. Both
>> > index.properties
>> > > and replication.properties were updated though.
>> > >
>> >
>> > Note that in this case, Solr would have created a new index
>> > directory.
>> > Are
>> > you comparing the files on the slave in the new index directory? You
>> > can
>> > get
>> > the new index directory's name from index.properties.
>> >
>> > --
>> > Regards,
>> > Shalin Shekhar Mangar.
>> >
>> 
>> 
>> >>>
>> >>
>> >
>> >
>> > --
>> > J
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>
>
>
> --
> J
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

49 matches

Mail list logo