wildcard search: Update

2011-07-02 Thread Thomas Fischer
Hello,

I'm still struggling with wildcard search in solr.
I installed the ComplexPhraseQueryParser which essentially accomplishes what 
I'm looking for: I can search in my field "GOK" using phrases with wildcards, 
e.g. GOK:"POF 15?".
This works with either solr 1.4.2 or 3.3.
What irritates me is that this kind of a search throws an exception when there 
is *no* space, e.g. for GOK:"POF15?"  (useless) or DDC:"942.?" (meaningful). On 
the other hand, the search will work if the quotes are omitted: DDC:942.? 
yields the expected results.

An additional source of irritation is the error message:

The server encountered an internal error (Unknown query type 
"org.apache.lucene.search.WildcardQuery" found in phrase query string "POF15?" 
java.lang.IllegalArgumentException: Unknown query type 
"org.apache.lucene.search.WildcardQuery" found in phrase query string "POF1??" 
at 
org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite

I don' understand why the query type "org.apache.lucene.search.WildcardQuery" 
is unknown (this is contained in lucene-core-2.9.3.jar), nor what it means that 
it is 'found in phrase query string "POF15?"'

Can anybody give me a hint how to handle this problem (apart from erasing the 
quotes if no whitespace is present)?

Cheers
Thomas




Re: Indexing CSV data in Multicore setup

2011-07-02 Thread Ahmet Arslan
> I am trying to index CSV data in
> multicore setup using post.jar.
> 
> Here is what I have tried so far:
> 1) Started the server using "java
> -Dsolr.solr.home=multicore -jar
> start.jar"
> 
> 2a) Tried to post to "localhost:8983/solr/core0/update/csv"
> using "java
> -Dcommit=no -Durl=http://localhost:8983/solr/core0/update/csv -jar
> post.jar
> test.csv"
>   Error: SimplePostTool: FATAL: Solr returned an error
> #404 Not Found
> 
> 2b) Tried to send CSV data to core0 using "java -Durl=
> http://localhost:8983/solr/core0/update
> -jar post.jar test.csv"
>   Error: SimplePostTool: FATAL: Solr returned an error
> #400 Unexpected
> character 'S' (code 83) in prolog; expected
> '<'   at [row,col
> {unknown-source}]: [1,1]
> 
> I could feed in the xml files to core0 without any issues.
> 
> Am I missing something here?

post.jar is used to post xml files. You can use curl to feed csv.
http://wiki.apache.org/solr/UpdateCSV


Feed index with analyzer output

2011-07-02 Thread Lox
Hi,

I'm trying to achieve a sort of better separation between the analysis of a
document (tokenizing, filtering ecc.) and the indexing (storing).
Now, I would like my application to call the analyzer (/analysis/document)
via REST which returns the various tokens in xml format, then feed these
data to the index directly without doing the analysis again.
But I would also like to retain the original non-analyzed field for
diplaying purposes. 
This can probably be achieved with a copyField, right?

So my question is:
is it possible to feed the solr index with the ouput of the analyzer?

Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3131771.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Feed index with analyzer output

2011-07-02 Thread Juan Grande
Hi Lox,

But I would also like to retain the original non-analyzed field for
>
diplaying purposes.
>

Actually, for stored fields, Solr always retains the original non-analyzed
content, which is the one included in the response. So, if I'm not missing
something, you don't need to separate the analysis (Solr does this for
you!), just configure the analysis that you want for the indexed fields, and
the stored content will be saved vertatim.

Regards,

*Juan*



On Sat, Jul 2, 2011 at 7:17 AM, Lox  wrote:

> Hi,
>
> I'm trying to achieve a sort of better separation between the analysis of a
> document (tokenizing, filtering ecc.) and the indexing (storing).
> Now, I would like my application to call the analyzer (/analysis/document)
> via REST which returns the various tokens in xml format, then feed these
> data to the index directly without doing the analysis again.
> But I would also like to retain the original non-analyzed field for
> diplaying purposes.
> This can probably be achieved with a copyField, right?
>
> So my question is:
> is it possible to feed the solr index with the ouput of the analyzer?
>
> Thank you.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3131771.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Indexing CSV data in Multicore setup

2011-07-02 Thread sandeep
> post.jar is used to post xml files. You can use curl to feed csv. 
> http://wiki.apache.org/solr/UpdateCSV


I tried using curl as well to post the CSV data using following command.

curl http://localhost:8983/solr/core0/update/csv --data-binary @books.csv -H
'Content-type:text/plain;charset=utf-8'

It errors out saying problem accessing "/solr/core0/update/csv".

"
HTTP ERROR 404

Problem accessing /solr/core0/update/csv. Reason:
NOT_FOUND/Powered by Jetty:///"

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-CSV-data-in-Multicore-setup-tp3131252p3132350.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing CSV data in Multicore setup

2011-07-02 Thread Stefan Matheis

Sandeep,

did you check that this handler is defined in your solrconfig?
Otherwise it will not work, and you'll get an HTTP 404

Regards
Stefan

Am 02.07.2011 17:15, schrieb sandeep:

post.jar is used to post xml files. You can use curl to feed csv.
http://wiki.apache.org/solr/UpdateCSV



I tried using curl as well to post the CSV data using following command.

curl http://localhost:8983/solr/core0/update/csv --data-binary @books.csv -H
'Content-type:text/plain;charset=utf-8'

It errors out saying problem accessing "/solr/core0/update/csv".

"
HTTP ERROR 404

Problem accessing /solr/core0/update/csv. Reason:
 NOT_FOUND/Powered by Jetty:///"

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-CSV-data-in-Multicore-setup-tp3131252p3132350.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Feed index with analyzer output

2011-07-02 Thread Lox
Yes, from an utilitarian perspective you're absolutely right.
Mine is actually a more academic exercise.

I will be more clear on the steps that I would like to take:
1) Call the analyzer of Solr that returns me an XML response in the
following format (just a snippet as example)



 
  

incomingArc|1.6
word
0
15
1


outgoingArc|1.6
word
16
31
2

  
  

incomingArc
word
0
15
1
org.apache.lucene.index.Payload:org.apache.lucene.index.Payload@ffe807d2



etc.

2) now I would like to be able to extract the info that I need from there
and tell Solr directly which things to index, telling him directly also
which are the tokens with their respective payload without performing more
analysis.
I know that solr does all those things internally starting from the original
text but is there a way to skip that phase by telling it immediately from a
given field which are the tokens with their payloads? So that they will be
stored internally as before, only that this time I would have performed the
2 steps (analysis and indexing) in 2 different phases, with my application
orchestrating both of them.

I don't know if building the documents with SolrJ could help...maybe that's
the way to go?
Or is there a particular XML format to send to Solr? For example somthing
like:


   
 0001
 
 this is text
 this
 is
 text
 
   


Does it make sense? Or maybe I'm dreaming? :)

Thank you for answering!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3132556.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing CSV data in Multicore setup

2011-07-02 Thread Sandeep Gond
Thanks Stefan.

/update/csv handler was not defined in my solrconfig.xml. After defining it
in the solconfig I could get the CSV files indexed using following two
commands:

> java -Dcommit=no -Durl=http://localhost:8983/solr/core0/update/csv -jar
post.jar books.csv
> java -Dcommit=yes -Durl=http://localhost:8983/solr/core0/update -jar
post.jar

Thanks,
Sandeep


On Sat, Jul 2, 2011 at 9:15 PM, Stefan Matheis <
matheis.ste...@googlemail.com> wrote:

> Sandeep,
>
> did you check that this handler is defined in your solrconfig?
> Otherwise it will not work, and you'll get an HTTP 404
>
> Regards
> Stefan
>
> Am 02.07.2011 17:15, schrieb sandeep:
>
>  post.jar is used to post xml files. You can use curl to feed csv.
>>> http://wiki.apache.org/solr/**UpdateCSV
>>>
>>
>>
>> I tried using curl as well to post the CSV data using following command.
>>
>> curl 
>> http://localhost:8983/solr/**core0/update/csv--data-binary
>>  @books.csv -H
>> 'Content-type:text/plain;**charset=utf-8'
>>
>> It errors out saying problem accessing "/solr/core0/update/csv".
>>
>> "
>> HTTP ERROR 404
>>
>> Problem accessing /solr/core0/update/csv. Reason:
>>  NOT_FOUND/Powered by
>> Jetty:///"
>>
>> --
>> View this message in context: http://lucene.472066.n3.**
>> nabble.com/Indexing-CSV-data-**in-Multicore-setup-**
>> tp3131252p3132350.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>


Re: Match only documents which contain all query terms

2011-07-02 Thread Michael Sokolov
I believe you should be able to get results ordered so that the 
documents you want will always come first, so you can truncate the 
results efficiently on the client side.


You could also try a regexp query (untested):

a b c -/~(a|b|c)/

-Mike

On 7/1/2011 7:50 PM, Spyros Kapnissis wrote:

Hello to all,


Is it possible that I can make solr return only documents that contain all or 
most of my query terms for a specific field? Or will I need some 
post-processing on the results?

So, for example, if I search for (a b c), I would like the following documents 
returned:

a b c
a' c b (where a' is a stem for example)

but not 
x y a b c z


Thanks,
Spyros




Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-07-02 Thread Yonik Seeley
OK, I tried a quick test of 1.4.1 vs 3x on optimized indexes
(unoptimized had different numbers of segments so I didn't try that).
3x (as of today) was 28% faster at a large filter query (300 terms in
one  big disjunction, with each term matching ~1000 docs).

-Yonik
http://www.lucidimagination.com


On Thu, Jun 30, 2011 at 3:30 PM, Shawn Heisey  wrote:
> On 6/29/2011 10:16 PM, Shawn Heisey wrote:
>>
>> I was thinking perhaps I might actually decrease the termIndexInterval
>> value below the default of 128.  I know from reading the Hathi Trust blog
>> that memory usage for the tii file is much more than the size of the file
>> would indicate, but if I increase it from 13MB to 26MB, it probably would
>> still be OK.
>
> Decreasing the termIndexInterval to 64 almost doubled the tii file size, as
> expected.  It made the filterCache warming much faster, but made the
> queryResultCache warming very very slow.  Regular queries also seem like
> they're slower.
>
> I am trying again with 256.  I may go back to the default before I'm done.
>  I'm guessing that a lot of trial and error was put into choosing the
> default value.
>
> It's been fun having a newer index available on my backup servers.  I've
> been able to do a lot of trials, learned a lot of things that don't work and
> a few that do.  I might do some experiments with trunk once I've moved off
> 1.4.1.
>
> Thanks,
> Shawn
>
>


Re: pagination and groups

2011-07-02 Thread Yonik Seeley
2011/7/1 Tomás Fernández Löbbe :
> I'm not sure I understand what you want to do. To paginate with groups you
> can use "start" and "rows" as with ungrouped queries. with "group.ngroups"
> (Something I found a couple of days ago) you can show the total number of
> groups. "group.limit" tells Solr how many (max) documents you want to see
> for each group.

Right - just be aware that requesting the total number of groups (via
group.ngroups) is pretty memory and resource intensive - that's why
there is a separate option for it.

-Yonik
http://www.lucidimagination.com


Re: pagination and groups

2011-07-02 Thread Benson Margulies
Hey, I don't suppose you could easily tell me the rev in which ngroups arrived?

Also, how does ngroups compare to the 'matches' value inside each group?



On Sat, Jul 2, 2011 at 3:06 PM, Yonik Seeley  wrote:
> 2011/7/1 Tomás Fernández Löbbe :
>> I'm not sure I understand what you want to do. To paginate with groups you
>> can use "start" and "rows" as with ungrouped queries. with "group.ngroups"
>> (Something I found a couple of days ago) you can show the total number of
>> groups. "group.limit" tells Solr how many (max) documents you want to see
>> for each group.
>
> Right - just be aware that requesting the total number of groups (via
> group.ngroups) is pretty memory and resource intensive - that's why
> there is a separate option for it.
>
> -Yonik
> http://www.lucidimagination.com
>


Re: pagination and groups

2011-07-02 Thread Yonik Seeley
On Sat, Jul 2, 2011 at 7:34 PM, Benson Margulies  wrote:
> Hey, I don't suppose you could easily tell me the rev in which ngroups 
> arrived?

1137037 I believe.  Grouping originated in Solr, was refactored to a
shared lucene/solr module, including the ability to get the total
number of groups, and then Solr's implementation was cut over to that.

> Also, how does ngroups compare to the 'matches' value inside each group?

The units for "matches" is currently number of documents, while the
units for "ngroups" is number of groups.


-Yonik
http://www.lucidimagination.com