date:20100805

Multiple Facet Dates

2010-08-05 Thread Raphaël Droz


Hi,
I saw this post : 
http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html
I didn't see work in progress or plans about this feature on the list 
and bugtracker.


Does someone already created a patch, pof, ... I wouldn't have been able 
to find ?
From my naïve point of view the ratio "usefulness" / "added code 
complexity" appears as high.


My use-case is to provide, in one request :
- the results count for each one of several years (tag-based exclusion)
- the results count for each month of a given year
- the results count for each day of a given month and year)

I pretty sure someone here already encountered the above, isn't ?

RE: Indexing fieldvalues with dashes and spaces

2010-08-05 Thread PeterKerk

@Michael, @Erick,

You both mention interesting things that triggered me.

@Erick:
Your referenced page is very useful. It seems the whitespace tokenizer under
the text_ws is causing issues.

You do mention another interesting thing:
"And do be aware that fields you get back from a request (i.e. a search) are
the stored fields, NOT what's indexed."

On the page you provided I see this under the Analyzers section: "Analyzers
are components that pre-process input text at index time and/or at search
time."

So I dont completely understand how that sentence is in line with your
comment.

@Michael:
You say: "use the tokenized field to return results, but have a duplicate
field of fieldtype="string" to show the untokenized results. E.g. facet on
that field."
I think your comment applies on my requirement: "a city field is something
that I want users to search on via text input, so lets say "New Yo" would
give the results for "New York".
But also a facet "Cities" is available in which "New York" is just one of
the cities that is clickable.
The other facet is "theme", which in my example holds values like
"Gemeentehuis" and "Strand & Zee", that would not be a thing on which can be
searched via manual input but IS clickable. "

Could you please indicate (just for the above fields) what needs to be
changed in my schema.xml and if so how that affects the way my request is
build up?

Thanks so much ahead in getting me started!

This is my schema.xml

text

--
View this message in context:
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1025463.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to take a value from the query result

2010-08-05 Thread Geert-Jan Brits

you should parse the xml and extract the value. Lot's of libraries
undoubtably exist for PHP to help you with that (I don't know PHP)

Moreover, if all you want from the result is AUC_CAT you should consider
using the fl=param like:
http://172.16.17.126:8983/search/select/?q=AUC_ID:607136&fl=AUC_CAT

to return a document of the form:


576


which if more efficient.
Still you have to parse the doc with xml though.





2010/8/5 twojah 

>
> this is my query in browser navigation toolbar
> http://172.16.17.126:8983/search/select/?q=AUC_ID:607136
>
> and this is the result in browser page:
> ...
> 
> 1
> 1.0
> 576
> 27017
> Bracket Ceiling untuk semua merk projector,
> panjang 60-90 cm  Bahan Besi Cat Hitam = 325rb Bahan Sta
> 
> name="AUC_HTML_DIR_NL">/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html
> 607136
> Nego
> 7
> 270/27017/bracket_lcd_plasma_3a-1274291780.JPG
> 2010-05-19 17:56:45
> [UPDATE] BRACKET Projector dan LCD/PLASMA TV
> 1
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 28
> 
>
> I want to get the AUC_CAT value (576) and using it in my PHP, how can I get
> that value?
> please help
> thanks before
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-take-a-value-from-the-query-result-tp1025119p1025119.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: DIH and Cassandra

2010-08-05 Thread Shalin Shekhar Mangar

On Thu, Aug 5, 2010 at 3:07 AM, Dennis Gearon  wrote:

> If data is stored in the index, isn't the index of Solr pretty much already
> a 'Big/Cassandra Table', except with tokenized columns to make seaching
> easier?
>
> How are Cassandra/Big/Couch DBs doing text/weighted searching?
>
> Seems a real duplication to use Cassandra AND Solr. OTOH, I don't know how
> many 'Tables'/indexes one can make using Solr, I'm still a newbie.
>
>
I don't think Mark wants to "duplicate" Solr's functionality through
Cassandra. He is just asking if he can use DIH to import data from his data
sources into Cassandra.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Load cores without restarting/reloading Solr

2010-08-05 Thread Karthik K

Can some one please answer this.

 Is there a way of creating/adding a core and starting it without having to
reload Solr ?

RE: Re: Load cores without restarting/reloading Solr

2010-08-05 Thread Markus Jelsma

http://wiki.apache.org/solr/CoreAdmin

-Original message-
From: Karthik K 
Sent: Thu 05-08-2010 12:00
To: solr-user@lucene.apache.org; 
Subject: Re: Load cores without restarting/reloading Solr

Can some one please answer this.

Is there a way of creating/adding a core and starting it without having to
reload Solr ?

Re: Auto suggest with spell check

2010-08-05 Thread Grijesh.singh


Given below are the steps for auto-suggest and spellcheck in single query:
Make the change in TermComponent part in solrconfig.xml



 
  true


  termsComponent   
  spellcheck

  
Use given below query format for getting autosuggest and spellcheck
suggestion.
http://localhost:8983/solr/terms?terms.fl=text&terms.prefix=computr&spellcheck.q=computr&spellcheck=true
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-with-spell-check-tp1015114p1025688.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH and Cassandra

2010-08-05 Thread Jon Baer

That is not 100% true.  I would think RDBMS and XML would be the most common 
importers but the real flexibility is with the TikaEntityProcessor [1] that 
comes w/ DIH ...

http://wiki.apache.org/solr/TikaEntityProcessor

Im pretty sure it would be able to handle any type of serde (in the case of 
Cassandra I believe it is Thrift) on it's own w/ the dep libraries.

I find the TEP to be underutilized sometimes, I think it's because the docs on 
the DIH lack more info on what it can do.

[1] - http://tika.apache.org

- Jon

On Aug 4, 2010, at 3:00 PM, Andrei Savu wrote:

> DIH only works with relational databases and XML files [1], you need
> to write custom code in order to index data from Cassandra.
> 
> It should be pretty easy to map documents from Cassandra to Solr.
> There are a lot of client libraries available [2] for Cassandra.
> 
> [1] http://wiki.apache.org/solr/DataImportHandler
> [2] http://wiki.apache.org/cassandra/ClientOptions
> 
> On Wed, Aug 4, 2010 at 6:41 PM, Mark  wrote:
>> Is it possible to use DIH with Cassandra either out of the box or with
>> something more custom? Thanks
>> 
> 
> 
> 
> -- 
> Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr

Using solr response json

2010-08-05 Thread Rakhi Khatwani

Hi,
i want to query solr and convert my response object to a json string
using solrj

when i query from my browser(with wt=json) i get the following result:
{
"responseHeader":{
"status":0,
"QTime":0},
"response":{"numFound":0,"start":0,"docs":[]
}}


 At the moment i am using google-gson (a third party api) to directly
convert an object into a json string
but somehow when i try converting a QueryResponse object into a json string
i get:

{"_header":{"nvPairs":["status",0,"QTime",1]},"_results":[],"elapsedTime":121,"response":{"nvPairs":["responseHeader",{"nvPairs":["status",0,"QTime",1]},"response",[]]}}

 Any pointers?

Regards
Raakhi.

Process entire result set

2010-08-05 Thread Eloi Rocha

Hi everybody,

I would like to know if does make sense to use Solr in the following
scenario:
  - search for large amount of data (like 1000, 1, 10 registers)
  - each register contains four or five fields (strings and integers)
  - every time will request for entire result set (I can paginate the
results). It would be much better to get all results at once
  - we need to process the entire set in order to decide which ones will be
returned
  - this kind of request will happen frequently in several machines (several
transactions per second)
  - solr machines and request machines will be in the same cluster
  - we would like to get the entire result set in less than 500ms.

Thanks in advance,

Eloi

Re: Load cores without restarting/reloading Solr

2010-08-05 Thread Mark Miller

On 8/5/10 5:59 AM, Karthik K wrote:
> Can some one please answer this.
> 
>  Is there a way of creating/adding a core and starting it without having to
> reload Solr ?
> 

Yes, see http://wiki.apache.org/solr/CoreAdmin

- Mark
lucidimagination.com

word delimiter

2010-08-05 Thread j

I have UPPER12-lower and would like to be able to find it with queries
"UPPER" or "lower". What should break this up for the index? A
tokenizer or a filter such as WordDelimiterFilterFactory?

I have tried various combinations of parameters to
WordDelimiterFilterFactory and cant get it to split properly. Here are
the results from using standard tokenizer followed directly by the
WordDelimiterFilterFactory markup below (from analysis.jsp):

1 | 2
UPPER12-lower | lower
---
UPPER  |
---
12   |

Re: No "group by"? looking for an alternative.

2010-08-05 Thread kenf_nc


In the size 'facet' you have values that may not be in red, but in the size
'field' of any individual document  you wont'. If you searched on
q=converse&fq=color:red  the shoes returned would have appropriate sizes in
their field. Having a facet value for size 10 means at least 1 shoe in your
potential result set has that size in red, it doesn't mean the shoe you got
back in position 1 does.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1026581.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: word delimiter

2010-08-05 Thread Ahmet Arslan

> I have UPPER12-lower and would like
> to be able to find it with queries
> "UPPER" or "lower". What should break this up for the
> index? A
> tokenizer or a filter such as WordDelimiterFilterFactory?

If all thats you want just LowerCaseTokenizer will be enough.

Re: No "group by"? looking for an alternative.

2010-08-05 Thread Mickael Magniez


I've got only one document per shoes, whatever its size or color.

My first try was to create one document per model/size/color, but when i
searche for 'converse' for example, the same shoe is retrieved several
times, and i want to show only one record for each model. But I don't
succeed in grouping results by shoe model.

If you look at  
http://www.amazon.com/s/ref=nb_sb_noss?url=node%3D679255011&field-keywords=Converse+All+Star+Leather+Hi+Chuck+Taylor+&x=0&y=0&ih=1_0_0_0_0_0_0_0_0_0.4136_1&fsc=-1
amazon for Converse All Star Leather Hi Chuck Taylor  .
They show the shoe only one time, but if you go on the product details, its
exists in several colors and sizes. Now if you filter or color, there is
less sizes available.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1026618.html
Sent from the Solr - User mailing list archive at Nabble.com.

get-colt

2010-08-05 Thread Sai . Thumuluri

Hi - I am trying to compile Solr source and during "ant dist" step, the
build times out on 

get-colt:
  [get] Getting:
http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar
  [get] To:
/opt/solr/apache-solr-1.4.0/contrib/clustering/lib/downloads/colt-1.2.0.
jar

After a while - the steps fails giving the following message

BUILD FAILED
/opt/solr/apache-solr-1.4.0/common-build.xml:356: The following error
occurred while executing this line:
/opt/solr/apache-solr-1.4.0/common-build.xml:219: The following error
occurred while executing this line:
/opt/solr/apache-solr-1.4.0/contrib/clustering/build.xml:79:
java.net.ConnectException: Connection timed out

Any help is greatly appreciated?

Sai Thumuluri

RE: get-colt

2010-08-05 Thread Sai . Thumuluri

This is the message I am getting 

Error getting
http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar

-Original Message-
From: sai.thumul...@verizonwireless.com
[mailto:sai.thumul...@verizonwireless.com] 
Sent: Thursday, August 05, 2010 1:15 PM
To: solr-user@lucene.apache.org
Subject: get-colt

Hi - I am trying to compile Solr source and during "ant dist" step, the
build times out on 

get-colt:
  [get] Getting:
http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar
  [get] To:
/opt/solr/apache-solr-1.4.0/contrib/clustering/lib/downloads/colt-1.2.0.
jar

After a while - the steps fails giving the following message

BUILD FAILED
/opt/solr/apache-solr-1.4.0/common-build.xml:356: The following error
occurred while executing this line:
/opt/solr/apache-solr-1.4.0/common-build.xml:219: The following error
occurred while executing this line:
/opt/solr/apache-solr-1.4.0/contrib/clustering/build.xml:79:
java.net.ConnectException: Connection timed out

Any help is greatly appreciated?

Sai Thumuluri

Re: question about relevance

2010-08-05 Thread Bharat Jain

Thank you for all the help. Greatly appreciated. I have seen the related
issues and I see lot of patches in the JIRA mentioned. I am really confused
which patch to use (pls excuse my ignorance). Also are the patches
production ready? I will greatly appreciate if you can point me to the
correct patch or is it that i have to apply all the patches and make it
work. Can I apply the patch in solr 1.3?

Thanks
Bharat Jain


On Sat, Jul 31, 2010 at 2:16 AM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> May I suggest looking at some of the related issues, say SOLR-1682
>
>
> This issue is related to:
>  SOLR-1682 Implement CollapseComponent
>  SOLR-1311 pseudo-field-collapsing
>  LUCENE-1421 Ability to group search results by field
>  SOLR-1773 Field Collapsing (lightweight version)
>  SOLR-237  Field collapsing
>
>
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Bharat Jain 
> > To: solr-user@lucene.apache.org
> > Sent: Fri, July 30, 2010 10:40:19 AM
> > Subject: Re: question about relevance
> >
> > Hi,
> >Thanks a lot for the info and your time. I think field collapse  will
> work
> > for us. I looked at the https://issues.apache.org/jira/browse/SOLR-236but
> > which file I should  use for patch. We use solr-1.3.
> >
> > Thanks
> > Bharat Jain
> >
> >
> > On Fri,  Jul 30, 2010 at 12:53 AM, Chris Hostetter
> > wrote:
> >
> > >
> > >  : 1. There are user records of type A, B, C etc. (userId field in
> index  is
> > > : common to all records)
> > > : 2. A user can have any number of  A, B, C etc (e.g. think of A being
> a
> > > : language then user can know many  languages like french, english,
> german
> > > etc)
> > > : 3. Records are  currently stored as a document in index.
> > > : 4. A given query can match  multiple records for the user
> > > : 5. If for a user more records are  matched (e.g. if he knows both
> french
> > > and
> > > : german) then he is  more relevant and should come top in UI. This is
> the
> > > : reason I wanted  to add lucene scores assuming the greater score
> means
> > > more
> > > :  relevance.
> > >
> > > if your goal is to get back "users" from each search,  then you should
> > > probably change your indexing strategry so that each  "user" has a
> single
> > > document -- fields like "langauge" can be  multivalued, etc...
> > >
> > > then a search for "language:en langauge:fr"  will return users who
> speak
> > > english or french, and hte ones that speak  both will score higher.
> > >
> > > if you really cant change the index  structure, then essentially waht
> you
> > > are looking for is a "field  collapsing" solution on the userId field,
> > > where you want each collapsed  group to get a cumulative score.  i
> don't
> > > know if the existing  field collapsing patches support this -- if you
> are
> > > already  willing/capable to do it in the lcient then that may be the
> > > simplest  thing to support moving foward.
> > >
> > > Adding the scores is certainly  one metric you could use -- it's
> generally
> > > suspicious to try and imply  too much meaning to scores in lucene/solr
> but
> > > that's becuase people  typically try to imply broader absolute meaning.
>  in
> > > the case of a  single query the scores are relative eachother, and
> adding
> > > up all the  scores for a given userId is approximaly what would happen
> in
> > > my example  above -- except that there is also a "coord" factor that
> would
> > >  penalalize documents that only match one clause ... it's complicated,
>  but
> > > as an approximation adding the scores might give you what you are
>  looking
> > > for -- only you can know for sure based on your specific  data.
> > >
> > >
> > >
> > > -Hoss
> > >
> > >
> >
>

anti-words - exact match

2010-08-05 Thread Satish Kumar

Hi,

We have a requirement to NOT display search results if user query contains
terms that are in our anti-words field. For example, if user query is "I
have swollen foot" and if some records in our index have "swollen foot" in
anti-words field, we don't want to display those records. How do I go about
implementing this?

NOTE 1: anti-words field can contain multiple values. Each value can be a
one or multiple words (e.g. "swollen foot", "headache", etc. )

NOTE 2: the match must be exact. If anti-words field contains "swollen foot"
and if user query is "I have swollen foot", record must be excluded. If user
query is "My foot is swollen", the record should not be excluded.

Any pointers is greatly appreciated!


Thanks,
Satish

Re: Index compatibility 1.4 Vs 3.1 Trunk

2010-08-05 Thread Ravi Kiran

Hello Mr. Horsetter,
I again tried the code from trunk '
https://svn.apache.org/repos/asf/lucene/dev/trunk' on solr 1.4 index and it
gave me the following IndexFormatTooOldExceptio which in the first place
prompted me to think the indexes are incompatible. Any ideas ?

java.lang.RuntimeException:
org.apache.lucene.index.IndexFormatTooOldException: Format version is not
supported in file '_1d60.fdx': 1 (needs to be between 2 and 2). This version
of Lucene only supports indexes created with release 3.0 and later. at
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1067) at
org.apache.solr.core.SolrCore.(SolrCore.java:582) at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:453) at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:308) at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:198) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:273)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:385)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:119)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4529)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:5348)
at com.sun.enterprise.web.WebModule.start(WebModule.java:353) at
com.sun.enterprise.web.LifecycleStarter.doRun(LifecycleStarter.java:58) at
com.sun.appserv.management.util.misc.RunnableBase.runSync(RunnableBase.java:304)
at
com.sun.appserv.management.util.misc.RunnableBase.run(RunnableBase.java:341)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
java.util.concurrent.FutureTask.run(FutureTask.java:138) at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619) Caused by:
org.apache.lucene.index.IndexFormatTooOldException: Format version is not
supported in file '_1d60.fdx': 1 (needs to be between 2 and 2). This version
of Lucene only supports indexes created with release 3.0 and later. at
org.apache.lucene.index.FieldsReader.(FieldsReader.java:109) at
org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:242)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:523) at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:494) at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:133) at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:28)
at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:98)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:92) at
org.apache.lucene.index.IndexReader.open(IndexReader.java:415) at
org.apache.lucene.index.IndexReader.open(IndexReader.java:294) at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1056) ... 21 more

Ravi Kiran Bhaskar

On Tue, Aug 3, 2010 at 11:15 AM, Ravi Kiran  wrote:

> Hello Mr.Hostetter,
> Thank you very much for the clarification. I do
> remember that when I first deployed the solr code from trunk on a test
> server I couldnt open the index (created via 1.4) even via the solr admin
> page, It kept giving me corrupted index EOF kind of exception, so I was
> curious. Let me try it out again and report to you with the exact error.
>
>
> On Mon, Aug 2, 2010 at 4:28 PM, Chris Hostetter 
> wrote:
>
>> : I am trying to use the solr code from '
>> : https://svn.apache.org/repos/asf/lucene/dev/trunk' as my design
>> warrants use
>> : of PolyType fields. My understanding is that the indexes are
>> incompatible,
>> : am I right ?. I have about a million docs in my index (indexed via solr
>> : 1.4). Is re-indexing my only option or is there a tool of some sort to
>> : convert the 1.4 index to 3.1 format ?
>>
>> a) the "trunk" is what will ultimately be Solr 4.x, not 3.x ... for the
>> 3.x line there is a 3x branch...
>>
>> http://wiki.apache.org/solr/Solr3.1
>> http://wiki.apache.org/solr/Solr4.0
>>
>> b) The 3x branch can read indexes created by Solr 1.4 -- the first time
>> you add a doc and commit the new segments wil automaticly be converted to
>> the new format.  I am fairly certian that as of this moment, the 4x trunk
>> can also read indexes created by Solr 1.4, with the same automatic
>> converstion taking place.
>>
>> c)  If/When the trunk can no longer read Solr 1.4 indexes, there will be
>> a tool provided for "upgra

RE: get-colt

2010-08-05 Thread Sai . Thumuluri

Got it working - had to manually copy the jar files under the contrib
directories

-Original Message-
From: sai.thumul...@verizonwireless.com
[mailto:sai.thumul...@verizonwireless.com] 
Sent: Thursday, August 05, 2010 2:00 PM
To: solr-user@lucene.apache.org
Subject: RE: get-colt

This is the message I am getting 

Error getting
http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar

-Original Message-
From: sai.thumul...@verizonwireless.com
[mailto:sai.thumul...@verizonwireless.com] 
Sent: Thursday, August 05, 2010 1:15 PM
To: solr-user@lucene.apache.org
Subject: get-colt

Hi - I am trying to compile Solr source and during "ant dist" step, the
build times out on 

get-colt:
  [get] Getting:
http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar
  [get] To:
/opt/solr/apache-solr-1.4.0/contrib/clustering/lib/downloads/colt-1.2.0.
jar

After a while - the steps fails giving the following message

BUILD FAILED
/opt/solr/apache-solr-1.4.0/common-build.xml:356: The following error
occurred while executing this line:
/opt/solr/apache-solr-1.4.0/common-build.xml:219: The following error
occurred while executing this line:
/opt/solr/apache-solr-1.4.0/contrib/clustering/build.xml:79:
java.net.ConnectException: Connection timed out

Any help is greatly appreciated?

Sai Thumuluri

Re: get-colt

2010-08-05 Thread Koji Sekiguchi


(10/08/06 2:14), sai.thumul...@verizonwireless.com wrote:

Hi - I am trying to compile Solr source and during "ant dist" step, the
build times out on

get-colt:
   [get] Getting:
http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar
   [get] To:
/opt/solr/apache-solr-1.4.0/contrib/clustering/lib/downloads/colt-1.2.0.
jar

After a while - the steps fails giving the following message

BUILD FAILED
/opt/solr/apache-solr-1.4.0/common-build.xml:356: The following error
occurred while executing this line:
/opt/solr/apache-solr-1.4.0/common-build.xml:219: The following error
occurred while executing this line:
/opt/solr/apache-solr-1.4.0/contrib/clustering/build.xml:79:
java.net.ConnectException: Connection timed out

Any help is greatly appreciated?

Sai Thumuluri



   

Sai,

If there is a proxy in your environment, specify the proxy host
and port (and optionally user and password):

$ ant dist -Dproxy.home=HOST -Dproxy.port=PORT -Dproxy.user=USER 
-Dproxy.password=PASSWORD


Koji

--
http://www.rondhuit.com/en/

Re: No "group by"? looking for an alternative.

2010-08-05 Thread Geert-Jan Brits

If I understand correctly:
1. products have different product variants ( in case of shoes a combination
of color and size + some other fields).
2. Each product is shown once in the result set. (so no multiple product
variants of the same product are shown)

This would solve that IMO:

1, create 1 document per product (so not a document per product-variant)
2.create a multivalued field on which to facet containing: all combinations
of: ---
3. make sure to include combinations in which the user is indifferent of a
particular filter. i.e: "don't care about size (dc)" + "red" --> "dc-red"
4. filtering on that combination would give you all the products that
satisfy the product-variant constraints (size, color, etc.) + the extra
product constraints ('converse")
5. on the detail page show all available product-variants not filtered by
the constraints specified. This would likely be something outside of solr (a
simple sql-select on a single product)

hope that helps,
Geert-Jan

2010/8/5 Mickael Magniez 

>
> I've got only one document per shoes, whatever its size or color.
>
> My first try was to create one document per model/size/color, but when i
> searche for 'converse' for example, the same shoe is retrieved several
> times, and i want to show only one record for each model. But I don't
> succeed in grouping results by shoe model.
>
> If you look at
>
> http://www.amazon.com/s/ref=nb_sb_noss?url=node%3D679255011&field-keywords=Converse+All+Star+Leather+Hi+Chuck+Taylor+&x=0&y=0&ih=1_0_0_0_0_0_0_0_0_0.4136_1&fsc=-1
> amazon for Converse All Star Leather Hi Chuck Taylor  .
> They show the shoe only one time, but if you go on the product details, its
> exists in several colors and sizes. Now if you filter or color, there is
> less sizes available.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1026618.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: No "group by"? looking for an alternative.

2010-08-05 Thread Jonathan Rochkind


Mickael Magniez wrote:

Thanks for your response.

Unfortunately, I don't think it'll be enough. In fact, I have many other
products than shoes in my index, with many other facets fields.

I simplified my schema : in reality facets are dynamic fields.
  


You could change the way you do indexing, so every product-color-size 
combo is it's own "document".


Document1:
   product: running shoe
   size: 12
   color: red

Document2:
   product: running shoe
  size: 13
   color: red

That would let you do the kind of facetting drill-down you want to do. 
It would of course make other things more complicated. But it's the only 
way I can think of to let you do the kind of facet drill-down you want, 
if I understand what you want correctly, which I may not.


Jonathan

Re: anti-words - exact match

2010-08-05 Thread Jonathan Rochkind

This is tricky. You could try doing something with the ShingleFilter 
(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory) 
at _query time_ to turn the users query:


"i have a swollen foot" into:
"i", "i have", "i have a", "i have a swollen",  "have", "have a", 
"have a swollen"... etc.


I _think_ you can get the ShingleFilter factory to do that.

But now you only want to exclude if one of those shingles matches the 
ENTIRE "anti-word". So maybe index as non-tokenized, so each of those 
shingles will somehow only match on the complete thing.  You'd want to 
normalize spacing and punctuation.


But then you need to turn that into a _negated_ element of your query. 
Perhaps by using an fq with a NOT/"-" in it? And a query which 'matches' 
(causing 'not' behavior) if _any_ of the shingles match.


I have no idea if it's actually possible to put these things together in 
that way. A non-tokenized field? Which still has it's queries 
shingle-ized at query time? And then works as a negated query, matching 
for negation if any of the shingles match?  Not really sure how to put 
that together in your solrconfig.xml and/or application logic if needed. 
You could try.


Another option would be doing the query-time 'shingling' in your app, 
and then it's a somewhat more normal Solr query. &fq= -"shingle one" 
-"shingle two" -"shingle three" etc.  Or put em in seperate fq's 
depending on how you want to use your filter cache. Still searching on a 
non-tokenized field, and still normalizing on white-space and 
punctuation at both index time and (using same normalization logic but 
in your application logic this time) query time.  I think that might work.


So I'm not really sure, but maybe that gives you some ideas.

Jonathan



Satish Kumar wrote:

Hi,

We have a requirement to NOT display search results if user query contains
terms that are in our anti-words field. For example, if user query is "I
have swollen foot" and if some records in our index have "swollen foot" in
anti-words field, we don't want to display those records. How do I go about
implementing this?

NOTE 1: anti-words field can contain multiple values. Each value can be a
one or multiple words (e.g. "swollen foot", "headache", etc. )

NOTE 2: the match must be exact. If anti-words field contains "swollen foot"
and if user query is "I have swollen foot", record must be excluded. If user
query is "My foot is swollen", the record should not be excluded.

Any pointers is greatly appreciated!


Thanks,
Satish

Re: Solr searching performance issues, using large documents

2010-08-05 Thread Peter Spam

I've read through the DataImportHandler page a few times, and still can't 
figure out how to separate a large document into smaller documents.  Any hints? 
:-)  Thanks!

-Peter

On Aug 2, 2010, at 9:01 PM, Lance Norskog wrote:

> Spanning won't work- you would have to make overlapping mini-documents
> if you want to support this.
> 
> I don't know how big the chunks should be- you'll have to experiment.
> 
> Lance
> 
> On Mon, Aug 2, 2010 at 10:01 AM, Peter Spam  wrote:
>> What would happen if the search query phrase spanned separate document 
>> chunks?
>> 
>> Also, what would the optimal size of chunks be?
>> 
>> Thanks!
>> 
>> 
>> -Peter
>> 
>> On Aug 1, 2010, at 7:21 PM, Lance Norskog wrote:
>> 
>>> Not that I know of.
>>> 
>>> The DataImportHandler has the ability to create multiple documents
>>> from one input stream. It is possible to create a DIH file that reads
>>> large log files and splits each one into N documents, with the file
>>> name as a common field. The DIH wiki page tells you in general how to
>>> make a DIH file.
>>> 
>>> http://wiki.apache.org/solr/DataImportHandler
>>> 
>>> From this, you should be able to make a DIH file that puts log files
>>> in as separate documents. As to splitting files up into
>>> mini-documents, you might have to write a bit of Javascript to achieve
>>> this. There is no data structure or software that implements
>>> structured documents.
>>> 
>>> On Sun, Aug 1, 2010 at 2:06 PM, Peter Spam  wrote:
 Thanks for the pointer, Lance!  Is there an example of this somewhere?
 
 
 -Peter
 
 On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote:
 
> Ah! You're not just highlighting, you're snippetizing. This makes it 
> easier.
> 
> Highlighting does not stream- it pulls the entire stored contents into
> one string and then pulls out the snippet.  If you want this to be
> fast, you have to split up the text into small pieces and only
> snippetize from the most relevant text. So, separate documents with a
> common group id for the document it came from. You might have to do 2
> queries to achieve what you want, but the second query for the same
> query will be blindingly fast. Often <1ms.
> 
> Good luck!
> 
> Lance
> 
> On Sat, Jul 31, 2010 at 1:12 PM, Peter Spam  wrote:
>> However, I do need to search the entire document, or else the 
>> highlighting will sometimes be blank :-(
>> Thanks!
>> 
>> - Peter
>> 
>> ps. sorry for the many responses - I'm rushing around trying to get this 
>> working.
>> 
>> On Jul 31, 2010, at 1:11 PM, Peter Spam wrote:
>> 
>>> Correction - it went from 17 seconds to 10 seconds - I was changing the 
>>> hl.regex.maxAnalyzedChars the first time.
>>> Thanks!
>>> 
>>> -Peter
>>> 
>>> On Jul 31, 2010, at 1:06 PM, Peter Spam wrote:
>>> 
 On Jul 30, 2010, at 1:16 PM, Peter Karich wrote:
 
> did you already try other values for hl.maxAnalyzedChars=2147483647
 
 Yes, I tried dropping it down to 21, but it didn't have much of an 
 impact (one search I just tried went from 17 seconds to 15.8 seconds, 
 and this is an 8-core Mac Pro with 6GB RAM - 4GB for java).
 
> ? Also regular expression highlighting is more expensive, I think.
> What does the 'fuzzy' variable mean? If you use this to query via
> "~someTerm" instead "someTerm"
> then you should try the trunk of solr which is a lot faster for fuzzy 
> or
> other wildcard search.
 
 "fuzzy" could be set to "*" but isn't right now.
 
 Thanks for the tips, Peter - this has been very frustrating!
 
 
 - Peter
 
> Regards,
> Peter.
> 
>> Data set: About 4,000 log files (will eventually grow to millions).  
>> Average log file is 850k.  Largest log file (so far) is about 70MB.
>> 
>> Problem: When I search for common terms, the query time goes from 
>> under 2-3 seconds to about 60 seconds.  TermVectors etc are enabled. 
>>  When I disable highlighting, performance improves a lot, but is 
>> still slow for some queries (7 seconds).  Thanks in advance for any 
>> ideas!
>> 
>> 
>> -Peter
>> 
>> 
>> -
>> 
>> 4GB RAM server
>> % java -Xms2048M -Xmx3072M -jar start.jar
>> 
>> -
>> 
>> schema.xml changes:
>> 
>>  
>>
>>  
>>
>>> generateWordParts="0" generate

Re: Process entire result set

2010-08-05 Thread Jonathan Rochkind


Eloi Rocha wrote:

Hi everybody,

I would like to know if does make sense to use Solr in the following
scenario:
  - search for large amount of data (like 1000, 1, 10 registers)
  - each register contains four or five fields (strings and integers)
  - every time will request for entire result set (I can paginate the
results). It would be much better to get all results at once [...]
  


Depends on what kinds of searching you're doing. Are you doing searching 
that needs an indexer like Solr?  Then Solr is a good tool for your job. 
 Are you not, and you can do what you want just as easily in an rdbms 
or non-sql store like MongoDB? Then I wouldn't use Solr.


Assuming you really do need Solr, I think this should work, but I would 
not store the actual stored fields in Solr, I'd store those fields in an 
external store (key-value store, rdbms, whatever).   You store only what 
you need to index in Solr, you do your search, you get ID's back.  You 
ask for the entire result set back, why not.  If you give Solr enough 
RAM, and set your cache settings appropriately (really big document and 
related caches), then I _think_ it should perform okay. One way to find 
out.


What you'd get back is just ID's, then you'd look up that ID in your 
external store to get your actual fields you want to operate on. _May_ 
not be neccesary, maybe you could do it with solr stored fields, but 
making Solr do only exactly what you really need from it (an index) will 
maximize it's ability to do what you need in available RAM.


If you don't need Solr/Lucene indexing/faceting behavior, and you can do 
just fine with an rdbms or non-sql store, use that.


Jonathan

Re: Sharing index files between multiple JVMs and replication

2010-08-05 Thread Lance Norskog

Oh yes, replication will not work for shared files. It is about making
your own copy from another machine.

There is no read-only option but there should be. The files and
directory can be read-only, I've done it. You could use the OS
permission system to enforce read-only. Then you can just do a
 against the read-only instances, and this will reload the
index without changing it.

Lance

On Wed, Aug 4, 2010 at 10:42 AM, Kelly Taylor  wrote:
> Is anybody else encountering these same issues; IF having a similar setup?  
> And
> is there a way to configure certain Solr web-apps as read-only (basically 
> dummy
> instances) so that index changes are not allowed?
>
>
>
> - Original Message 
> From: Kelly Taylor 
> To: solr-user@lucene.apache.org
> Sent: Tue, August 3, 2010 5:48:11 PM
> Subject: Re: Sharing index files between multiple JVMs and replication
>
> Yes, they are on a common file server, and I've been sharing the same index
> directory between the Solr JVMs. But I seem to be hitting a wall when 
> attempting
>
> to use just one instance for changing the index.
>
> With Solr replication disabled, I stream updates to the one instance, and this
> process hangs whenever there are additional Solr JVMs started up with the same
> configuration in solrconfig.xml  -  So I then tried, to no avail, using a
> different configuration, solrconfig-readonly.xml where the updateHandler was
> commmented out, all /update* requestHandlers removed, mainIndex locktype of
> none, etc.
>
> And with Solr replication enabled, the Slave seems to hang, or at least report
> unusually long time estimates for the current running replication process to
> complete.
>
>
> -Kelly
>
>
>
> - Original Message 
> From: Lance Norskog 
> To: solr-user@lucene.apache.org
> Sent: Tue, August 3, 2010 4:56:58 PM
> Subject: Re: Sharing index files between multiple JVMs and replication
>
> Are these files on a common file server? If you want to share them
> that way, it actually does work just to give them all the same index
> directory, as long as only one of them changes it.
>
> On Tue, Aug 3, 2010 at 4:38 PM, Kelly Taylor  wrote:
>> Is there a way to share index files amongst my multiple Solr web-apps, by
>> configuring only one of the JVMs as an indexer, and the remaining, as
> read-only
>> searchers?
>>
>> I'd like to configure in such a way that on startup of the read-only
> searchers,
>> missing cores/indexes are not created, and updates are not handled.
>>
>> If I can get around the files being locked by the read-only instances, I
> should
>> be able to scale wider in a given environment, as well as have less 
>> replicated
>> copies of my master index (Solr 1.4 Java Replication).
>>
>> Then once the commit is issued to the slave, I can fire off a RELOAD script
> for
>> each of my read-only cores.
>>
>> -Kelly
>>
>>
>>
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Support loading queries from external files in QuerySenderListener

2010-08-05 Thread Lance Norskog

You can use an XInclude in solrconfig.xml. Your external query file
has to be in the XML format.

Lance

On Wed, Aug 4, 2010 at 7:57 AM, Shalin Shekhar Mangar
 wrote:
> On Wed, Aug 4, 2010 at 3:27 PM, Stanislaw 
> wrote:
>
>> Hi all!
>> I cant load my custom queries from the external file, as written here:
>> https://issues.apache.org/jira/browse/SOLR-784
>>
>> This option is seems to be not implemented in current version 1.4.1 of
>> Solr.
>> It was deleted or it comes first with new version?
>>
>>
> That patch was never committed so it is not available in any release.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Lance Norskog
goks...@gmail.com

Re: Index compatibility 1.4 Vs 3.1 Trunk

2010-08-05 Thread Chris Hostetter


: Hello Mr. Horsetter,

Please, call me Hoss.  "Mr. Horsetter" is ... well frankly i have no idea 
who that is.

: I again tried the code from trunk '
: https://svn.apache.org/repos/asf/lucene/dev/trunk' on solr 1.4 index and it

Please note my previous comments...

: >> a) the "trunk" is what will ultimately be Solr 4.x, not 3.x ... for the
: >> 3.x line there is a 3x branch...
: >>
: >> http://wiki.apache.org/solr/Solr3.1
: >> http://wiki.apache.org/solr/Solr4.0
: >>
: >> b) The 3x branch can read indexes created by Solr 1.4 -- the first time
: >> you add a doc and commit the new segments wil automaticly be converted to
: >> the new format.  I am fairly certian that as of this moment, the 4x trunk
: >> can also read indexes created by Solr 1.4, with the same automatic
: >> converstion taking place.

...aparently i was mistaken about "trunk" that has already had the code 
for reading Lucene 2.9 indexes (what's used in Solr 1.4) removed (hence 
the "IndexFormatTooOldException".

But that doens't change hte fact that 3.1 will be able to read Solr 1.4 
indexes.  And 4.0 will be able to read 3.1 indexes.

You should, infact, be able to use the 3x branch code today to open your 
SOlr 1.4 index, add one document to have it convert to a 3x index.  then 
use the trunk code to open that index, add one doucment, andh ave it 
convert to a "trunk" index

Of course: there is no garuntee that index format in the official 4.0 
index format will be the same as what's on trunk right now -- it hasn't 
been officially released.

: >> c)  If/When the trunk can no longer read Solr 1.4 indexes, there will be
: >> a tool provided for "upgrading" index versions.

That should still be true in the the official 4.0 release (i really should 
have said "When 4.0 can no longer read SOlr 1.4 indexes"), ...
i havne't been following the detials closely, but i suspect that tool 
hasn't been writen yet because there isn't much point until the full 
details of the trunk index format are nailed down.


-Hoss

Re: Indexing fieldvalues with dashes and spaces

2010-08-05 Thread Erick Erickson

This confuses lots of people. When you index a field, it's Analyzed 10
ways from Sunday. Consider "The World is an unknown Entity". When
you INDEX it, many thing happen, depending upon the analyser.
Stopwords may be removed. each token may be lower cased. Each token
may be stemmed. It all depends on what's in your analyzer chain. Assume
a simple chain consisting of breaking up tokens on whitespace, lowercasing,
and removing stopwords. The actual tokens INDEXED would be "world",
"unknown", and "entity". That is what is searched against.

However, the string, unchanged, would be STORED if you specified it so.
So when you asked for the field to be returned in a search result, you
would
get "The World is an unknown Entity" if you asked for the field to be
returned as part of a search result that matched on, say, "world".

HTH
Erick

On Thu, Aug 5, 2010 at 4:31 AM, PeterKerk  wrote:

>
> @Michael, @Erick,
>
> You both mention interesting things that triggered me.
>
> @Erick:
> Your referenced page is very useful. It seems the whitespace tokenizer
> under
> the text_ws is causing issues.
>
> You do mention another interesting thing:
> "And do be aware that fields you get back from a request (i.e. a search)
> are
> the stored fields, NOT what's indexed."
>
> On the page you provided I see this under the Analyzers section: "Analyzers
> are components that pre-process input text at index time and/or at search
> time."
>
> So I dont completely understand how that sentence is in line with your
> comment.
>
>
> @Michael:
> You say: "use the tokenized field to return results, but have a duplicate
> field of fieldtype="string" to show the untokenized results. E.g. facet on
> that field."
> I think your comment applies on my requirement: "a city field is something
> that I want users to search on via text input, so lets say "New Yo" would
> give the results for "New York".
> But also a facet "Cities" is available in which "New York" is just one of
> the cities that is clickable.
> The other facet is "theme", which in my example holds values like
> "Gemeentehuis" and "Strand & Zee", that would not be a thing on which can
> be
> searched via manual input but IS clickable. "
>
> Could you please indicate (just for the above fields) what needs to be
> changed in my schema.xml and if so how that affects the way my request is
> build up?
>
>
> Thanks so much ahead in getting me started!
>
>
> This is my schema.xml
>
>
> 
>
> 
>
>  
> omitNorms="true"/>
> omitNorms="true"/>
>
>
>
>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> omitNorms="true"/>
>
> positionIncrementGap="100">
>  
>
>  
>
> positionIncrementGap="100">
>  
>
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
> words="stopwords.txt"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords.txt"/>
>
>  
>
>
> positionIncrementGap="100" >
>  
>
> ignoreCase="true" expand="false"/>
> words="stopwords.txt"/>
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>
> protected="protwords.txt"/>
>
>  
>
>
> sortMissingLast="true" omitNorms="true">
>  
>
>
>
> replacement="" replace="all" />
>  
>
> class="solr.StrField" />
>  
>
>  
>required="true" />
>   
>
>
> multiValued="true" omitNorms="true" termVectors="true" />
>multiValued="true"/>
>multiValued="true"/>
>   
>
>multiValued="true"/>
>default="NOW" multiValued="false"/>
>
>   
>   
>   
>   
>   
>   
>   
>   
>   
>
>  
>
>  id
>
>  text
>
>  
>
>   
>   
>   
>   
>   
>   
> 
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1025463.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Index compatibility 1.4 Vs 3.1 Trunk

2010-08-05 Thread Robert Muir

On Thu, Aug 5, 2010 at 9:07 PM, Chris Hostetter wrote:

>
> That should still be true in the the official 4.0 release (i really should
> have said "When 4.0 can no longer read SOlr 1.4 indexes"), ...
> i havne't been following the detials closely, but i suspect that tool
> hasn't been writen yet because there isn't much point until the full
> details of the trunk index format are nailed down.
>
>
This is news to me?

File formats are back-compatible between major versions. Version X.N should
be able to read indexes generated by any version after and including version
X-1.0, but may-or-may-not be able to read indexes generated by version
X-2.N.

(And personally I think there is stuff in 2.x like modified-utf8 that i
would object to adding support for with terms now as byte[])

-- 
Robert Muir
rcm...@gmail.com

Re: XML Format

2010-08-05 Thread twojah


can somebody help me please
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/XML-Format-tp1024608p1028456.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr searching performance issues, using large documents

2010-08-05 Thread Lance Norskog

You may have to write your own javascript to read in the giant field
and split it up.

On Thu, Aug 5, 2010 at 5:27 PM, Peter Spam  wrote:
> I've read through the DataImportHandler page a few times, and still can't 
> figure out how to separate a large document into smaller documents.  Any 
> hints? :-)  Thanks!
>
> -Peter
>
> On Aug 2, 2010, at 9:01 PM, Lance Norskog wrote:
>
>> Spanning won't work- you would have to make overlapping mini-documents
>> if you want to support this.
>>
>> I don't know how big the chunks should be- you'll have to experiment.
>>
>> Lance
>>
>> On Mon, Aug 2, 2010 at 10:01 AM, Peter Spam  wrote:
>>> What would happen if the search query phrase spanned separate document 
>>> chunks?
>>>
>>> Also, what would the optimal size of chunks be?
>>>
>>> Thanks!
>>>
>>>
>>> -Peter
>>>
>>> On Aug 1, 2010, at 7:21 PM, Lance Norskog wrote:
>>>
 Not that I know of.

 The DataImportHandler has the ability to create multiple documents
 from one input stream. It is possible to create a DIH file that reads
 large log files and splits each one into N documents, with the file
 name as a common field. The DIH wiki page tells you in general how to
 make a DIH file.

 http://wiki.apache.org/solr/DataImportHandler

 From this, you should be able to make a DIH file that puts log files
 in as separate documents. As to splitting files up into
 mini-documents, you might have to write a bit of Javascript to achieve
 this. There is no data structure or software that implements
 structured documents.

 On Sun, Aug 1, 2010 at 2:06 PM, Peter Spam  wrote:
> Thanks for the pointer, Lance!  Is there an example of this somewhere?
>
>
> -Peter
>
> On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote:
>
>> Ah! You're not just highlighting, you're snippetizing. This makes it 
>> easier.
>>
>> Highlighting does not stream- it pulls the entire stored contents into
>> one string and then pulls out the snippet.  If you want this to be
>> fast, you have to split up the text into small pieces and only
>> snippetize from the most relevant text. So, separate documents with a
>> common group id for the document it came from. You might have to do 2
>> queries to achieve what you want, but the second query for the same
>> query will be blindingly fast. Often <1ms.
>>
>> Good luck!
>>
>> Lance
>>
>> On Sat, Jul 31, 2010 at 1:12 PM, Peter Spam  wrote:
>>> However, I do need to search the entire document, or else the 
>>> highlighting will sometimes be blank :-(
>>> Thanks!
>>>
>>> - Peter
>>>
>>> ps. sorry for the many responses - I'm rushing around trying to get 
>>> this working.
>>>
>>> On Jul 31, 2010, at 1:11 PM, Peter Spam wrote:
>>>
 Correction - it went from 17 seconds to 10 seconds - I was changing 
 the hl.regex.maxAnalyzedChars the first time.
 Thanks!

 -Peter

 On Jul 31, 2010, at 1:06 PM, Peter Spam wrote:

> On Jul 30, 2010, at 1:16 PM, Peter Karich wrote:
>
>> did you already try other values for hl.maxAnalyzedChars=2147483647
>
> Yes, I tried dropping it down to 21, but it didn't have much of an 
> impact (one search I just tried went from 17 seconds to 15.8 seconds, 
> and this is an 8-core Mac Pro with 6GB RAM - 4GB for java).
>
>> ? Also regular expression highlighting is more expensive, I think.
>> What does the 'fuzzy' variable mean? If you use this to query via
>> "~someTerm" instead "someTerm"
>> then you should try the trunk of solr which is a lot faster for 
>> fuzzy or
>> other wildcard search.
>
> "fuzzy" could be set to "*" but isn't right now.
>
> Thanks for the tips, Peter - this has been very frustrating!
>
>
> - Peter
>
>> Regards,
>> Peter.
>>
>>> Data set: About 4,000 log files (will eventually grow to millions). 
>>>  Average log file is 850k.  Largest log file (so far) is about 70MB.
>>>
>>> Problem: When I search for common terms, the query time goes from 
>>> under 2-3 seconds to about 60 seconds.  TermVectors etc are 
>>> enabled.  When I disable highlighting, performance improves a lot, 
>>> but is still slow for some queries (7 seconds).  Thanks in advance 
>>> for any ideas!
>>>
>>>
>>> -Peter
>>>
>>>
>>> -
>>>
>>> 4GB RAM server
>>> % java -Xms2048M -Xmx3072M -jar start.jar
>>>
>>> -

Re: No "group by"? looking for an alternative.

2010-08-05 Thread Lance Norskog

I can see how one document per model blows up when you have many
options. But how many models of the shoe do they actually make? They
can't possibly make 5000, one for every metadat combination.

If you go with one document per model, you have to do a second search
on that product ID to get all of the models.

Field Collapsing is exactly for the 'many shoes for one product'
problem, but it is not released, so the second search is what you
want.

On Thu, Aug 5, 2010 at 4:54 PM, Jonathan Rochkind  wrote:
> Mickael Magniez wrote:
>>
>> Thanks for your response.
>>
>> Unfortunately, I don't think it'll be enough. In fact, I have many other
>> products than shoes in my index, with many other facets fields.
>>
>> I simplified my schema : in reality facets are dynamic fields.
>>
>
> You could change the way you do indexing, so every product-color-size combo
> is it's own "document".
>
> Document1:
>   product: running shoe
>   size: 12
>   color: red
>
> Document2:
>   product: running shoe
>  size: 13
>   color: red
>
> That would let you do the kind of facetting drill-down you want to do. It
> would of course make other things more complicated. But it's the only way I
> can think of to let you do the kind of facet drill-down you want, if I
> understand what you want correctly, which I may not.
>
> Jonathan
>
>
>
>

-- 
Lance Norskog
goks...@gmail.com

Query Result is not updated based on the new XML files

2010-08-05 Thread twojah


hi everyone,
I run the query from the browser:
http://172.16.17.126:8983/search/select/?q=AUC_CAT:978

the query is based on cat_978.xml which was produced by my PHP script
and I got the correct result like this:

  
0
4

  AND
  AUC_ID,AUC_CAT,AUC_DESCR_SHORT
  0
  AUC_CAT:978
  1000

  
  

  978
  HP Compaq Presario V3700Core 2 duo webcam
wifi lan HD 160Gb DDR2 1Gb Tas original windows 7 ultimate
  618436123


  978
  HP Compaq Presario V3700Core 2 duo webcam
wifi lan HD 160Gb DDR2 1Gb Tas original windows 7 ultimate
  618436

  


now, I edit the AUC_ID field in cat_978.xml, I change 618436123 to 618436
(look the bold letters above)
and I refresh the browser but it doesn't updated or reflect the changes I
was made
how to make the query result updated exactly based on cat_978.xml changes?

really need your help
thanks before
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Result-is-not-updated-based-on-the-new-XML-files-tp1028575p1028575.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multiple Facet Dates

RE: Indexing fieldvalues with dashes and spaces

Re: how to take a value from the query result

Re: DIH and Cassandra

Re: Load cores without restarting/reloading Solr

RE: Re: Load cores without restarting/reloading Solr

Re: Auto suggest with spell check

Re: DIH and Cassandra

Using solr response json

Process entire result set

Re: Load cores without restarting/reloading Solr

word delimiter

Re: No "group by"? looking for an alternative.

Re: word delimiter

Re: No "group by"? looking for an alternative.

get-colt

RE: get-colt

Re: question about relevance

anti-words - exact match

Re: Index compatibility 1.4 Vs 3.1 Trunk

RE: get-colt

Re: get-colt

Re: No "group by"? looking for an alternative.

Re: No "group by"? looking for an alternative.

Re: anti-words - exact match

Re: Solr searching performance issues, using large documents

Re: Process entire result set

Re: Sharing index files between multiple JVMs and replication

Re: Support loading queries from external files in QuerySenderListener

Re: Index compatibility 1.4 Vs 3.1 Trunk

Re: Indexing fieldvalues with dashes and spaces

Re: Index compatibility 1.4 Vs 3.1 Trunk

Re: XML Format

Re: Solr searching performance issues, using large documents

Re: No "group by"? looking for an alternative.

Query Result is not updated based on the new XML files

36 matches

Site Navigation

Mail list logo

Footer information