date:20110415

Re: phpnative response writer in SOLR 3.1 ?

2011-04-15 Thread Ralf Kraus


Am 14.04.2011 09:53, schrieb Ralf Kraus:

Hello,

I just updatet to SOLR 3.1 and wondering if the phpnative response 
writer plugin is part of it?

( https://issues.apache.org/jira/browse/SOLR-1967 )

When I try to compile the sources files I get some errors :

PHPNativeResponseWriter.java:57: 
org.apache.solr.request.PHPNativeResponseWriter is not abstract and 
does not override abstract method 
getContentType(org.apache.solr.request.SolrQueryRequest,org.apache.solr.response.SolrQueryResponse) 
in org.apache.solr.response.QueryResponseWriter

public class PHPNativeResponseWriter implements QueryResponseWriter {
   ^
PHPNativeResponseWriter.java:70: method does not override a method 
from its superclass

@Override
 ^

Is there a new JAR File or something I could use with SOLR 3.1? 
Because the SOLR pecl Package only uses XML oder PHPNATIVE as response 
writer ( http://pecl.php.net/package/solr )




No hints at all ?

--
Greetings,
Ralf Kraus

Dismax Minimum Match/Stopwords Bug

2011-04-15 Thread Jan Høydahl

A thread with this same subject from 2008/2009 is here: 
http://search-lucene.com/m/jkBgXnSsla

We're seeing customers being bitten by this "bug" now and then, and normally my 
workaround is to simply not use stopwords at all.
However, is there an actual fix in the 3.1 eDisMax parser which solves the 
problem for real? Cannot find a JIRA issue for it.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

Re: SOLR support for unicode?

2011-04-15 Thread Sivasakthivel

Hi, 

Thanks for your response. I am currently working in this issue. 

When I run the test_utf8.sh script, I got the following result. 
Solr server is up. 
HTTP GET is accepting UTF-8 
HTTP POST is accepting UTF-8 
HTTP POST defaults to UTF-8 
ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane 
ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane 
ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic
multilingual plane

I also placed "TM" symbol and "–" Symbol in one of the example XML docs and
indexed that with post.jar, 
with  "wt=python" param. 

Input: 
  Good unicode support: héllo (hello with an™ accent OLB – Account  over the
e) 

Output: 
Good unicode support: héllo (hello with an� accent OLB � Account over the e)  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-support-for-unicode-tp2790512p2824041.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search and index Result

2011-04-15 Thread Erick Erickson

You're possibly getting hit by server caching. Are you by chance
submitting the exact same query after your commit? What
happens if you change your query do one you haven't used before?

Turning off http caching might help. Solr should be searching
the new contents after a commit (and any attendant warmup
time).

Best
Erick

On Fri, Apr 15, 2011 at 1:43 AM, satya swaroop wrote:

> Hi all,
>   i just made a duplication  of solrdispatchfilter as
> solrdispatchfilter1 and solrdispatchfilter2 such that all the /update or
> /update/extract things are passed through the solrdispatchfilter1
> and all search (/select)  things are passes through the
> solrdispatchfilter2. It is because i need to establish a privacy concern
> for
> the search result.
> I need to check whether the required user has access to the particular
> files
> or not.. it was success in implementing the privacy of results.
> one major problem i am getting is after indexing some documents and
> commiting it, i am not getting the commited data in the search result, i am
> getting the old data that was before commit...
> But i get the result only after restarting the server.. can anyone tell me
> where to modify such that the search will give the results from the recent
> commit...
>
>
> Thanks and Regards,
> satya
>

newbie - filter to only show queried field when query is free text

2011-04-15 Thread bryan rasmussen

Hi,

If I want to filter a search result to not return all fields as per
the default but I don't know what field my hits will be in.

This is basically for unstructured document type data, for example
large HTML or DOCBOOK documents.

thanks,
Bryan Rasmussen

Re: newbie - filter to only show queried field when query is free text

2011-04-15 Thread Marek Tichy

Hi
There may be better ways but as far as my knowledge goes, I'd try to use
the highhlighting component, with hl.requireFieldMatch the hightlighting
response only includes fields where hightlights were applied (match was
found), which is probably what you want.

Best
 Marek Tichy
> Hi,
>
> If I want to filter a search result to not return all fields as per
> the default but I don't know what field my hits will be in.
>
> This is basically for unstructured document type data, for example
> large HTML or DOCBOOK documents.
>
> thanks,
> Bryan Rasmussen
>
>

DataImportHandler - importing XML documents, undeclared general entity - DTD right there

2011-04-15 Thread bryan rasmussen

Hi,
I am importing a number of XML documents from the filesystem. The
dataimporthandler finds them, but returns an undeclared general entity
error - even though my DTD is present and findable by other parsers.

DTD Declaration

In XML file in the same folder as the DTD allartikel.dtd

Thanks,
Bryan Rasmussen

Using autocomplete with the new Suggest component

2011-04-15 Thread openvictor Open

Hi everybody,


Recently I implemented an autocomplete mechanism for my website using a
custom TermsComponent. I was quite happy with that because it also enables
me to do a Google-like feature where complete sentences where suggested to
the user when he typed in the search field. I used Shingles to search
against pieces of sentences.
(I have resources for French people if somebody asks)

Then came solr 3.1 and its new suggest component. I have looked at the
documentation but it's still unclear how it works exactly. So please let me
ask some questions :


   - Is there performance improvements over TermsComponent ?
   - Is it able to autosuggest sentences and not only words ? If yes, how ?
   Should I keep my shingles ?
   - What is this "threshold" value that I see ? Is it a mandatory field to
   complete ? I want to have suggestion no matter what the frequency is in the
   document !


Thank you all, if I succeed to do that I will try to provide a tutorial to
do what with Jquery UI autocomplete + Suggest component if anyone's
interested.
Best regards.

Victor

Strange DisMax results

2011-04-15 Thread Daniel Persson

Hi.

I've got a strange result of a DisMax search function. I might have
understood the functionallity wrong. But after I read the manual I
understood it is used to do ranked results with simple search terms.

Solr Version 1.4.0

I've got the setup

Schema fields
--













DisMax config
--
  

 explicit
 0.01
 
name^1.2 shortDescription^1.0 longDescription^1.0
prodShortDescription^0.5 prodLongDescription^0.5
 
 
name^1.2 shortDescription^1.0 longDescription^1.0
prodShortDescription^0.5 prodLongDescription ^0.5
 
 *:*
 100

 
 spellcheck
 
  

Standard config
-

 
   explicit
 
 
 spellcheck
 
  


When I search for a term "q=term" I get 68 hits. But when I search for
"q=term&qt=dismax" I get 0 hits.

Of course I got more fields and search parameters. But the only difference I
could see is that in one case I use dismax and the other I don't.

What have I missed? Any suggestions?

Best regards

Daniel

Re: Strange DisMax results

2011-04-15 Thread Erick Erickson

If you haven't modified your schema.xml, you'll find that the
 is set to the text field. So when
you issue the q=term you're going against your default
search field.

Assuming you've changed the default search field to
"defaultSearch", then the problem is probably that your
analysis chain for default search is different from that
applied to your individual fields. Which I absolutely
guarantee since you have two different fieldTypes in
your 5 fields. I'm extremely suspicious of your fieldTypes
that involve the word "keyword", because if this indicates
the KeywordTokenizer is being used, then everything
in the input is a single token, the input stream isn't being
split up...

But the best way to understand this is in the admin/analysis page.
If you check the "verbose" box and put in some text you'll see
the effects of each part of the chain. Try this for the field you
expect Dismax to find your term in, and also for your
defaultSearch field and I suspect you'll see what's going on

Best
Erick

On Fri, Apr 15, 2011 at 10:35 AM, Daniel Persson wrote:

> Hi.
>
> I've got a strange result of a DisMax search function. I might have
> understood the functionallity wrong. But after I read the manual I
> understood it is used to do ranked results with simple search terms.
>
> Solr Version 1.4.0
>
> I've got the setup
>
> Schema fields
> --
>  multiValued="false"/>
>  multiValued="false"/>
>  stored="false"  multiValued="false"/>
>  indexed="true" stored="false"  multiValued="false"/>
>  indexed="true" stored="false"  multiValued="false"/>
>
> 
> 
> 
> 
> 
>
>
> DisMax config
> --
>  
>
> explicit
> 0.01
> 
>name^1.2 shortDescription^1.0 longDescription^1.0
> prodShortDescription^0.5 prodLongDescription^0.5
> 
> 
>name^1.2 shortDescription^1.0 longDescription^1.0
> prodShortDescription^0.5 prodLongDescription ^0.5
> 
> *:*
> 100
>
> 
> spellcheck
> 
>  
>
> Standard config
> -
> default="true">
> 
>   explicit
> 
> 
> spellcheck
> 
>  
>
>
> When I search for a term "q=term" I get 68 hits. But when I search for
> "q=term&qt=dismax" I get 0 hits.
>
> Of course I got more fields and search parameters. But the only difference
> I
> could see is that in one case I use dismax and the other I don't.
>
> What have I missed? Any suggestions?
>
> Best regards
>
> Daniel
>

Sort by function - 400 error

2011-04-15 Thread Michael Owen


Using solr 3.1.
When I do:
sort=score desc
it works.
sort=product(typeId,2) desc (typeId is a valid attribute in document)
it works.
sort=product(score,typeId) desc
fails on 400 error? Also "sort=product(score,2) desc" fails too.
Must be something basic I'm missing? Tried adding &fl=*,score too.
Thanks
Mike

RE: Understanding the DisMax tie parameter

2011-04-15 Thread Burton-West, Tom

Thanks everyone.

I updated the wiki.  If you have a chance please take a look and check to make 
sure I got it right on the wiki.

http://wiki.apache.org/solr/DisMaxQParserPlugin#tie_.28Tie_breaker.29

Tom



-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, April 14, 2011 5:41 PM
To: solr-user@lucene.apache.org; yo...@lucidimagination.com
Cc: Burton-West, Tom
Subject: Re: Understanding the DisMax tie parameter


: Perhaps the parameter could have had a better name.  It's essentially
: max(score of matching clauses) + tie * (score of matching clauses that
: are not the max)
: 
: So it can be used and thought of as a tiebreak only in the sense that
: if two docs match a clause (with essentially the same score), then a
: small tie value will act as a tiebreaker *if* one of those docs also
: matches some other fields.

correct.  w/o a tiebreaker value, a dismax query will only look at the 
maximum scoring clause for each doc -- the "tie" param is named for it's 
ability to help break ties when multiple documents have the same score 
from the max scoring clause -- by adding in a small portion of the scores 
(based on the 0->1 ratio of the "tie" param) from the other clauses.


-Hoss

Re: Sort by function - 400 error

2011-04-15 Thread Yonik Seeley

On Fri, Apr 15, 2011 at 11:50 AM, Michael Owen
 wrote:
>
> Using solr 3.1.
> When I do:
>        sort=score desc
> it works.
>        sort=product(typeId,2) desc (typeId is a valid attribute in document)
> it works.
>        sort=product(score,typeId) desc
> fails on 400 error? Also "sort=product(score,2) desc" fails too.

You can't currently use "score" in function queries.
You can embed another query in a function query though.
Example:
 sort=product($qq,typeId) desc&qq=my_query_here

In your case, when you just want to multiply the score by a field,
then you can either use the edismax query parser and the "boost"
parameter:

defType=edismax&q=my_query_here&boost=typeId

Or you could directly use the "boost" query parser
http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html

q={!boost b=typeId}my_query_here
 OR
q={!boost b=typeId v=$qq}&qq=my_query_here

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Field compression

2011-04-15 Thread Charlie Jackson

I know I'm late to the party, but I recently learned that field compression was 
removed as of Solr 1.4.1. I think a lot of sites were relying on that feature, 
so I'm curious what people are doing now that it's gone. Specifically, what are 
people doing to efficiently store *and highlight* large fulltext fields? I can 
think of ways to store the text efficiently (compress it myself), or highlight 
it (leave it uncompressed), but not both at the same time.

Also, is anyone working on anything to restore compression to Solr? I 
understand it was removed because Lucene removed support for it, but I was 
hoping to upgrade my site to 3.1 soon and we rely on that feature.

- Charlie

Solr 3.1: Old Index Files Not Removed on Optimize?

2011-04-15 Thread Trey Grainger

I was just hoping someone might be able to point me in the right direction
here.  We just upgraded from Solr 1.4 to Solr 3.1 this past week and we're
having issues running out of disk space on our Master servers.  Our Master
has dozens of cores.  We have a script that kicks off once per day to do a
rolling optimize.  The script optimizes a single core, waits 5 minutes to
give the server some breathing room to catch up on indexing in a non-i/o
intensive state, and then moves onto the next core (repeating until done).

The problem we are facing is that under Solr 1.4, the old index files were
deleted very quickly after each optimize, but under Solr 3.1, the old index
files hang around for hours... in many cases they don't disappear until we
restart Solr completely.  This is leading to us running out of disk space,
as each core's index doubles in size during the optimize process and stays
that way until the next solr restart.

I was just wondering if anyone could point me to some specific changes or
settings which may be leading to the difference between solr versions (or
any other environmental issues you may know about).  I see several tickets
in Jira about similar issues, but they mostly appear to have been resolved
in the past.

Has anyone else seen this behavior under Solr 3.1, or do you think we may be
missing some kind of new configuration setting?

For reference, we are running on 64bit RedHat Linux.  This is what I have
right now: [From SolrConfig.xml]:
true



commit
optimize
startup



  

  10
  30

  


  false
  1



Thanks in advance,

-Trey

Split token

2011-04-15 Thread roySolr

Hello,

I want to split my string when it contains "(". Example:

spurs (London)
Internationale (milan)

to

spurs
(london)
Internationale
(milan)

What tokenizer can i use to fix this problem?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Split-token-tp2810772p2810772.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Renee Sun

thanks! 

It seems the file count in index directory is the segment# * 8 in my dev
environment...

I see there are .fnm .frq .fdt .fdx .nrm .prx .tii .tis (8) file extensions,
and each has as many as segment# files.

Is it always safe to calculate the file counts using segment number multiply
by 8? of course this excludes the segment_N, segment.gen and xxx_del files.

I found most of the cores has the file count that can be calculated just
using above formula, but few cores do not have a match number... 

thanks
Renee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2813419.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Renee Sun

yeah, I can figure out the segment number by going to stat page of solr...
but my question was how to figure out exact total number of files in 'index'
folder for each core.

Like I mentioned in previous message, I currently have 8 files per segment
(.prx .tii etc), but it seems this might change if I use term vector for
example.  So I need suggestions on how to accurately figure out the total
file number.

thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2817912.html
Sent from the Solr - User mailing list archive at Nabble.com.

most stable way to get facet pivoting

2011-04-15 Thread Nikolas Tautenhahn

Hi,

I want to evaluate (and probably use in production) facet pivoting -
what is the best approach to get a "as-stable-as-can-be" version of solr
which is able to do facet pivoting? I was hoping to see this in Solr
3.1, but apparently it is only in the dev versions/nightlies...

Is it possible to patch this feature into Solr 3.1 stable?

best regards,
Nik

-- 
Nikolas Tautenhahn

nikolas.tautenh...@livinglogic.de

http://www.livinglogic.de

LivingLogic AG
Markgrafenallee 44
95448 Bayreuth
Amtsgericht Bayreuth ++ HRB 3274
Aufsichtsratsvorsitzender: Achim Lindner
Vorstand: Philipp Ambrosch, Alois Kastner-Maresch (Vors.)

How to combine Deduplication and Elevation

2011-04-15 Thread shamex

Hi I have a question. How to combine the Deduplication and Elevation
implementations in Solr. Currently , I managed to implement either one only.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-combine-Deduplication-and-Elevation-tp2819621p2819621.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 3.1.0 core not reloading with RamDirectoryFactory

2011-04-15 Thread nskmda

Hello,

We just tried core reloading on a freshly installed Solr 3.1.0 with
RamDirectoryFactory.
It doesn't seem to happen.
With the FSDirectoryFactory everything works fine.

Looks like the RamDirectoryFactory implementation caches directory and if
it's available it doesn't really reopen it thus not having updated index
loaded into memory.

Can anyone comment on this?
Should we implement our own RamDirectoryFactory?

Here is the code snippet from Solr 3.1.0. It looks a bit confusing.

public Directory open(String path) throws IOException {
synchronized (RAMDirectoryFactory.class) {
  RefCntRamDirectory directory = directories.get(path);
  if (directory == null || !directory.isOpen()) {
directory = (RefCntRamDirectory) openNew(path);
directories.put(path, directory);
  } else {
directory.incRef();
  }

  return directory;
}
  }


Regards,
Dmitry


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-1-0-core-not-reloading-with-RamDirectoryFactory-tp2820603p2820603.html
Sent from the Solr - User mailing list archive at Nabble.com.

Avoiding corrupted index

2011-04-15 Thread Laurent Vaills

Hi everyone,

We are using Solr 1.4.1 in my company and we need to do some backups of the
indexes.

After some googling, I'm quite confused about the differents ways of backing
up the index.

First, I tried the scripts provided in the Solr distribution without success
:
I untarred the apache-solr-1.4.1.tar.gz into /opt; then I launched but I get
this error :
$ /opt/apache-solr-1.4.1/src/scripts/backup
/opt/apache-solr-1.4.1/src/scripts/backup: line 26:
/opt/apache-solr-1.4.1/src/bin/scripts-util: No such file or directory
And that's true : there is no /opt/apache-solr-1.4.1/src/bin/scripts-util
but a /opt/apache-solr-1.4.1/src/scripts/scripts-util
Is this normal to distribute the scripts with a bad path ?

Then I discovered that these utility scripts were not distributed anymore
with the version 3.1.0 : were they not reliable ? can we get corrupted
backups with this scripts ?

Finally, we found the page about SolrReplication on the Solr wiki also this
post
http://stackoverflow.com/questions/3083314/solr-incremental-backup-on-real-time-system-with-heavy-indexand
in particular the answer advising to use the replication.
So we tried to use this replication mecanism (and call the URL on the slave
with the query parameters command="backup" and location="/backup") but this
method requires lots of i/o for big index.

Is it the best way to get not corrupted backup of the index ?

Is there another way to do the backup with Solr 3.1 ?

Thanks in advance for your time.

Regards,
Laurent

Re: SOLR support for unicode?

2011-04-15 Thread Sivasakthivel

Hi,

Thanks for your response. I am currently working in this issue.

When I run the test_utf8.sh script, I got the following result.
Solr server is up.
HTTP GET is accepting UTF-8
HTTP POST is accepting UTF-8
HTTP POST defaults to UTF-8
ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane
ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane
ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic
multilingual plane

I also placed "TM" symbol and "–" Symbol in one of the example XML docs and
indexed that with post.jar,
with  "wt=python" param.

Input:
  Good unicode support: héllo (hello with an™ accent OLB – Account 
over the e)

Output:
Good unicode support: héllo (hello with an� accent OLB � Account over the e)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-support-for-unicode-tp2790512p2822358.html
Sent from the Solr - User mailing list archive at Nabble.com.

Indexing relations for sorting

2011-04-15 Thread derk.h

Hi everybody,

I have the following problem/question:

In our system we have some categories and products in those categories. Our
structure looks a bit like this:

 
product X belongs to category: cat1_subcat1 (10)
 
product X belongs to category: cat2_subcat1 (20)
 
product Y belongs to category: cat1_subcat2 (30)
 
product Z belongs to category: cat2_subcat1 (15)


Every product-to-category relation has its own sorting order which we would
like to index in solr. 

To make the problem more complex, we have two ways of searching for a
product:

 
We want all products of subcat1 (no mather what the parent category is)
ordered by their sorting order
 
We want all products of cat2_subcat1 ordered by their sorting order



This probably is not what solr is designed for, but everything else in our
system is indexed and searched by solr. 
So it would be very helpfull if someone has an idea or suggestion to make
this work.

Our solr version is 1.3.0

Many thanks!
Derk

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-relations-for-sorting-tp2824223p2824223.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to import data from database combine with file content in solr

2011-04-15 Thread vrpar...@gmail.com

Hello,

I am new to solr,

my requirements are,

1. at regular interval need solr to fetch data from sql server database and
do indexing on it.
2. fetch only those records which is not yet indexed
3. for each record there is one file associated, so with database table
fields also want to index content of that particular file

e.g. there is one table "Customer" in database and customerid is primary key
   for each customerid there is associated file of that customerprofile
named with customerid,

4. as i metioned above that when solr fetch data from sql server database
table , should fetch only data which is not yet indexed, (we have one older
lucene code, in which there is one field in table that isindexed so when
fetching data in select clause there is one condition that isindexed=false,
and when indexing is done update particular record of database with
isindexed=true) is there any mechanism in solr for that?

how to achieve same ?
do i need to write custom code for that or it can be done with configuration
provided by solr?

Thanks,

Vishal Parekh 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-import-data-from-database-combine-with-file-content-in-solr-tp2824749p2824749.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using autocomplete with the new Suggest component

2011-04-15 Thread Quentin Proust

Hi Victor,

I have the same questions about the new Suggest component.
I can't really help you as I didn't really manage to understand how it
worked.
Sometimes, I had more results, sometimes less.

Even so, I would really be interested in your resources using Terms and
shingles to implement auto-complete.
I am myself a French student and it could help me improve the solution of
one of my project.

Best regards,
Quentin

2011/4/15 openvictor Open 

> Hi everybody,
>
>
> Recently I implemented an autocomplete mechanism for my website using a
> custom TermsComponent. I was quite happy with that because it also enables
> me to do a Google-like feature where complete sentences where suggested to
> the user when he typed in the search field. I used Shingles to search
> against pieces of sentences.
> (I have resources for French people if somebody asks)
>
> Then came solr 3.1 and its new suggest component. I have looked at the
> documentation but it's still unclear how it works exactly. So please let me
> ask some questions :
>
>
>   - Is there performance improvements over TermsComponent ?
>   - Is it able to autosuggest sentences and not only words ? If yes, how ?
>   Should I keep my shingles ?
>   - What is this "threshold" value that I see ? Is it a mandatory field to
>   complete ? I want to have suggestion no matter what the frequency is in
> the
>   document !
>
>
> Thank you all, if I succeed to do that I will try to provide a tutorial to
> do what with Jquery UI autocomplete + Suggest component if anyone's
> interested.
> Best regards.
>
> Victor
>



-- 

Quentin Proust
Email : q.pro...@gmail.com
Tel : 06.78.81.15.94
http://www.linkedin.com/in/quentinproust

Re: Split token

2011-04-15 Thread Erick Erickson

What you've shown would be handled with WhitespaceTokenizer, but you'd have
to
prevent filters from stripping the parens. If you have to handle things like
blah ( stuff )
WhitespaceTokenizer wouldn't work.

PatternTokenizerFactory might work for you, see:
http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternTokenizerFactory.html

Best
Erick

On Tue, Apr 12, 2011 at 6:02 AM, roySolr  wrote:

> Hello,
>
> I want to split my string when it contains "(". Example:
>
> spurs (London)
> Internationale (milan)
>
> to
>
> spurs
> (london)
> Internationale
> (milan)
>
> What tokenizer can i use to fix this problem?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Split-token-tp2810772p2810772.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Using autocomplete with the new Suggest component

2011-04-15 Thread openvictor Open

Hi Quentin, well stick in this thread, I will try to see how it works and
get inputs from other people.

Here is the link to my blog who shows how to do it :

http://www.victorkabdebon.net/archives/16

Note that I used Tomcat + SolR, but it can easily done with PHP. Also solrj
in 1.4.1 didn't have terms component so I had to find a way around that
problem but it's provided.



2011/4/15 Quentin Proust 

> Hi Victor,
>
> I have the same questions about the new Suggest component.
> I can't really help you as I didn't really manage to understand how it
> worked.
> Sometimes, I had more results, sometimes less.
>
> Even so, I would really be interested in your resources using Terms and
> shingles to implement auto-complete.
> I am myself a French student and it could help me improve the solution of
> one of my project.
>
> Best regards,
> Quentin
>
> 2011/4/15 openvictor Open 
>
> > Hi everybody,
> >
> >
> > Recently I implemented an autocomplete mechanism for my website using a
> > custom TermsComponent. I was quite happy with that because it also
> enables
> > me to do a Google-like feature where complete sentences where suggested
> to
> > the user when he typed in the search field. I used Shingles to search
> > against pieces of sentences.
> > (I have resources for French people if somebody asks)
> >
> > Then came solr 3.1 and its new suggest component. I have looked at the
> > documentation but it's still unclear how it works exactly. So please let
> me
> > ask some questions :
> >
> >
> >   - Is there performance improvements over TermsComponent ?
> >   - Is it able to autosuggest sentences and not only words ? If yes, how
> ?
> >   Should I keep my shingles ?
> >   - What is this "threshold" value that I see ? Is it a mandatory field
> to
> >   complete ? I want to have suggestion no matter what the frequency is in
> > the
> >   document !
> >
> >
> > Thank you all, if I succeed to do that I will try to provide a tutorial
> to
> > do what with Jquery UI autocomplete + Suggest component if anyone's
> > interested.
> > Best regards.
> >
> > Victor
> >
>
>
>
> --
> 
> Quentin Proust
> Email : q.pro...@gmail.com
> Tel : 06.78.81.15.94
> http://www.linkedin.com/in/quentinproust
> 
>

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Erick Erickson

Why do you care? You haven't outlined why having the precise numbers
here is necessary. Perhaps with a higher-level statement of the problem
you're trying to solve we could make some better suggestions

Best
Erick

On Wed, Apr 13, 2011 at 5:23 PM, Renee Sun  wrote:

> yeah, I can figure out the segment number by going to stat page of solr...
> but my question was how to figure out exact total number of files in
> 'index'
> folder for each core.
>
> Like I mentioned in previous message, I currently have 8 files per segment
> (.prx .tii etc), but it seems this might change if I use term vector for
> example.  So I need suggestions on how to accurately figure out the total
> file number.
>
> thanks
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2817912.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

RE: Split token

2011-04-15 Thread Steven A Rowe

This pattern split tokens *only* in the presence of parentheses with adjoining 
whitespace, and includes the parentheses with the tokens:

(?<=\))\s+|\s+(?=\()

So you'll get this kind of behavior:

   Tottenham Hotspur (London)
   F.C. Internationale (milan)
   FC Midtjylland (Herning) (Ikast)

to

   Tottenham Hotspur
   (London)
   F.C. Internationale
   (milan)
   FC Midtjylland 
   (Herning)
   (Ikast)

Steve
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, April 15, 2011 1:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Split token
> 
> What you've shown would be handled with WhitespaceTokenizer, but you'd
> have
> to
> prevent filters from stripping the parens. If you have to handle things
> like
> blah ( stuff )
> WhitespaceTokenizer wouldn't work.
> 
> PatternTokenizerFactory might work for you, see:
> http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternTokeniz
> erFactory.html
> 
> Best
> Erick
> 
> On Tue, Apr 12, 2011 at 6:02 AM, roySolr  wrote:
> 
> > Hello,
> >
> > I want to split my string when it contains "(". Example:
> >
> > spurs (London)
> > Internationale (milan)
> >
> > to
> >
> > spurs
> > (london)
> > Internationale
> > (milan)
> >
> > What tokenizer can i use to fix this problem?
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Split-token-tp2810772p2810772.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >

Re: how to import data from database combine with file content in solr

2011-04-15 Thread Erick Erickson

Sorry if this comes through twice, but my first got rejected (this one
is plain text,
should come through better).

Part of this is solved by the Data Import Handler (DIH) see:
http://wiki.apache.org/solr/DataImportHandler

And think about a "database" data source. This can be combined
with the "TikaEntityParser", and maybe some transformers to assemble
the file name and send it through parsing. Don't overlook the possibility
of parameters (the ${ reference pattern).

If you need some custom code, you can also implement a custom
Transformer that gets into the transformation chain in DIH, but you
should only approach that after you exhaust the above approach.

Hope this helps
Erick

On Fri, Apr 15, 2011 at 10:24 AM, vrpar...@gmail.com  wrote:
>
> Hello,
>
> I am new to solr,
>
> my requirements are,
>
> 1. at regular interval need solr to fetch data from sql server database and
> do indexing on it.
> 2. fetch only those records which is not yet indexed
> 3. for each record there is one file associated, so with database table
> fields also want to index content of that particular file
>
> e.g. there is one table "Customer" in database and customerid is primary key
>       for each customerid there is associated file of that customerprofile
> named with customerid,
>
> 4. as i metioned above that when solr fetch data from sql server database
> table , should fetch only data which is not yet indexed, (we have one older
> lucene code, in which there is one field in table that isindexed so when
> fetching data in select clause there is one condition that isindexed=false,
> and when indexing is done update particular record of database with
> isindexed=true) is there any mechanism in solr for that?
>
> how to achieve same ?
> do i need to write custom code for that or it can be done with configuration
> provided by solr?
>
> Thanks,
>
> Vishal Parekh
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-import-data-from-database-combine-with-file-content-in-solr-tp2824749p2824749.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 3.1: Old Index Files Not Removed on Optimize?

2011-04-15 Thread Yonik Seeley

I can reproduce this with the example server w/ your deletionPolicy
and replicationHandler configs.
I'll dig further to see what's behind this behavior.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

On Fri, Apr 15, 2011 at 1:14 PM, Trey Grainger  wrote:
> I was just hoping someone might be able to point me in the right direction
> here.  We just upgraded from Solr 1.4 to Solr 3.1 this past week and we're
> having issues running out of disk space on our Master servers.  Our Master
> has dozens of cores.  We have a script that kicks off once per day to do a
> rolling optimize.  The script optimizes a single core, waits 5 minutes to
> give the server some breathing room to catch up on indexing in a non-i/o
> intensive state, and then moves onto the next core (repeating until done).
>
> The problem we are facing is that under Solr 1.4, the old index files were
> deleted very quickly after each optimize, but under Solr 3.1, the old index
> files hang around for hours... in many cases they don't disappear until we
> restart Solr completely.  This is leading to us running out of disk space,
> as each core's index doubles in size during the optimize process and stays
> that way until the next solr restart.
>
> I was just wondering if anyone could point me to some specific changes or
> settings which may be leading to the difference between solr versions (or
> any other environmental issues you may know about).  I see several tickets
> in Jira about similar issues, but they mostly appear to have been resolved
> in the past.
>
> Has anyone else seen this behavior under Solr 3.1, or do you think we may be
> missing some kind of new configuration setting?
>
> For reference, we are running on 64bit RedHat Linux.  This is what I have
> right now: [From SolrConfig.xml]:
> true
>
> 
>    
>        commit
>        optimize
>        startup
>    
> 
>
>  
>    
>      10
>      30
>    
>  
>
>    
>      false
>      1
>    
>
>
> Thanks in advance,
>
> -Trey
>

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Renee Sun

sorry I should elaborate that earlier...

in our production environment, we have multiple cores and the ingest
continuously all day long; we only do optimize periodically, and optimize
once a day in mid night.

So sometimes we could see 'too many open files' error. To prevent it from
happening, in production we maintain a script to monitor the segment files
total with all cores, and send out warnings if that number exceed a
threshold... it is kind of preventive measurement.  Currently we are using
the linux command to count the files. We are wondering if we can simply use
some formula to figure out this number, it will be better that way. Seems we
could use the stat url to get segment number and multiply it by 8 (that is
what we have given our schema).

Any better way to approach this? thanks a lot!
Renee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2825736.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: QUESTION: SOLR INDEX BIG FILE SIZES

2011-04-15 Thread Juan Grande

Hi John,

¿How can split the file of the solr index into multiple files?
>

Actually, the index is organized in a set of files called segments. It's not
just a single file, unless you tell Solr to do so.

That's because some "file systems are about to support a maximun
> of space in a single file" for example some UNIX file systems only support
> a maximun of 2GB per file.
>

As far as I know, Solr will never arrive to a segment file greater than 2GB,
so this shouldn't be a problem.

¿What is the recommended storage strategy for a big solr index files?
>

I guess that it depends in the indexing/querying performance that you're
having, the performance that you want, and what "big" exactly means for you.
If your index is so big that individual queries take too long, sharding may
be what you're looking for.

To better understand the index format, you can see
http://lucene.apache.org/java/3_1_0/fileformats.html

Also, you can take a look at my blog (http://juanggrande.wordpress.com), in
my last post I speak about segments merging.

Regards,

*Juan*


2011/4/15 JOHN JAIRO GÓMEZ LAVERDE 

>
> SOLR
> USER SUPPORT TEAM
>
> I have a quiestion about the "maximun file size of solr index",
> when i have a "lot of data in the solr index",
>
> -¿How can split the file of the solr index into multiple files?
>
> That's because some "file systems are about to support a maximun
> of space in a single file" for example some UNIX file systems only support
> a maximun of 2GB per file.
>
> -¿What is the recommended storage strategy for a big solr index files?
>
> Thanks for the reply.
>
> JOHN JAIRO GÓMEZ LAVERDE
> Bogotá - Colombia - South America

Re: Solr 3.1: Old Index Files Not Removed on Optimize?

2011-04-15 Thread Trey Grainger

Thank you, Yonik!

I see the Jira issue you created and am guessing it's due to this issue.
 We're going to remove replicateAfter="startup" in the mean-time to see if
that helps (assuming this is the issue the jira ticket described).

I appreciate you taking a look at this.

Thanks

-Trey


On Fri, Apr 15, 2011 at 2:58 PM, Yonik Seeley wrote:

> I can reproduce this with the example server w/ your deletionPolicy
> and replicationHandler configs.
> I'll dig further to see what's behind this behavior.
>
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
>
> On Fri, Apr 15, 2011 at 1:14 PM, Trey Grainger  wrote:
> > I was just hoping someone might be able to point me in the right
> direction
> > here.  We just upgraded from Solr 1.4 to Solr 3.1 this past week and
> we're
> > having issues running out of disk space on our Master servers.  Our
> Master
> > has dozens of cores.  We have a script that kicks off once per day to do
> a
> > rolling optimize.  The script optimizes a single core, waits 5 minutes to
> > give the server some breathing room to catch up on indexing in a non-i/o
> > intensive state, and then moves onto the next core (repeating until
> done).
> >
> > The problem we are facing is that under Solr 1.4, the old index files
> were
> > deleted very quickly after each optimize, but under Solr 3.1, the old
> index
> > files hang around for hours... in many cases they don't disappear until
> we
> > restart Solr completely.  This is leading to us running out of disk
> space,
> > as each core's index doubles in size during the optimize process and
> stays
> > that way until the next solr restart.
> >
> > I was just wondering if anyone could point me to some specific changes or
> > settings which may be leading to the difference between solr versions (or
> > any other environmental issues you may know about).  I see several
> tickets
> > in Jira about similar issues, but they mostly appear to have been
> resolved
> > in the past.
> >
> > Has anyone else seen this behavior under Solr 3.1, or do you think we may
> be
> > missing some kind of new configuration setting?
> >
> > For reference, we are running on 64bit RedHat Linux.  This is what I have
> > right now: [From SolrConfig.xml]:
> > true
> >
> > 
> >
> >commit
> >optimize
> >startup
> >
> > 
> >
> >  
> >
> >  10
> >  30
> >
> >  
> >
> >
> >  false
> >  1
> >
> >
> >
> > Thanks in advance,
> >
> > -Trey
> >
>

Re: QUESTION: SOLR INDEX BIG FILE SIZES

2011-04-15 Thread François Schiettecatte

Specifically to the file size support, all the file systems on current releases 
of linux (and unixes too) support large files with 64 bit offsets, and I am 
pretty sure that java VM supports 64 bit offsets in files, so there is no 2GB 
file size limit anymore.

François

On Apr 15, 2011, at 4:31 PM, JOHN JAIRO GÓMEZ LAVERDE wrote:

> 
> SOLR
> USER SUPPORT TEAM
> 
> I have a quiestion about the "maximun file size of solr index",
> when i have a "lot of data in the solr index",
> 
> -¿How can split the file of the solr index into multiple files?
> 
> That's because some "file systems are about to support a maximun
> of space in a single file" for example some UNIX file systems only support
> a maximun of 2GB per file.
> 
> -¿What is the recommended storage strategy for a big solr index files?
> 
> Thanks for the reply.
> 
> JOHN JAIRO GÓMEZ LAVERDE
> Bogotá - Colombia - South America

Re: Understanding the DisMax tie parameter

2011-04-15 Thread Jay Hill

Looks good, thanks Tom.

-Jay


On Fri, Apr 15, 2011 at 8:55 AM, Burton-West, Tom wrote:

> Thanks everyone.
>
> I updated the wiki.  If you have a chance please take a look and check to
> make sure I got it right on the wiki.
>
> http://wiki.apache.org/solr/DisMaxQParserPlugin#tie_.28Tie_breaker.29
>
> Tom
>
>
>
> -Original Message-
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> Sent: Thursday, April 14, 2011 5:41 PM
> To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> Cc: Burton-West, Tom
> Subject: Re: Understanding the DisMax tie parameter
>
>
> : Perhaps the parameter could have had a better name.  It's essentially
> : max(score of matching clauses) + tie * (score of matching clauses that
> : are not the max)
> :
> : So it can be used and thought of as a tiebreak only in the sense that
> : if two docs match a clause (with essentially the same score), then a
> : small tie value will act as a tiebreaker *if* one of those docs also
> : matches some other fields.
>
> correct.  w/o a tiebreaker value, a dismax query will only look at the
> maximum scoring clause for each doc -- the "tie" param is named for it's
> ability to help break ties when multiple documents have the same score
> from the max scoring clause -- by adding in a small portion of the scores
> (based on the 0->1 ratio of the "tie" param) from the other clauses.
>
>
> -Hoss
>

Re: Solr 3.1: Old Index Files Not Removed on Optimize?

2011-04-15 Thread Yonik Seeley

On Fri, Apr 15, 2011 at 5:28 PM, Trey Grainger  wrote:
> Thank you, Yonik!
> I see the Jira issue you created and am guessing it's due to this issue.
>  We're going to remove replicateAfter="startup" in the mean-time to see if
> that helps (assuming this is the issue the jira ticket described).

Yes, removing replicateAfter="startup" will avoid this bug.
https://issues.apache.org/jira/browse/SOLR-2469 fixes the bug, if you
need to replicate after startup.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

38 matches

Mail list logo