Re: Data Import from a Queue

2011-07-20 Thread Stefan Matheis

Brandon,

i don't know how they are using it in detail, but Part of Chef's 
Architecture is this one:


Chef Server -> RabbitMQ -> Chef Solr Indexer -> Solr
http://wiki.opscode.com/download/attachments/7274878/chef-server-arch.png

Perhaps not exactly, what you're looking for - but may give you an idea?

Regards
Stefan

Am 19.07.2011 19:04, schrieb Brandon Fish:

Let me provide some more details to the question:

I was unable to find any example implementations where individual documents
(single document per message) are read from a message queue (like ActiveMQ
or RabbitMQ) and then added to Solr via SolrJ, a HTTP POST or another
method. Does anyone know of any available examples for this type of import?

If no examples exist, what would be a recommended commit strategy for
performance? My best guess for this would be to have a queue per core and
commit once the queue is empty.

Thanks.

On Mon, Jul 18, 2011 at 6:52 PM, Erick Ericksonwrote:


This is a really cryptic problem statement.

you might want to review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Fri, Jul 15, 2011 at 1:52 PM, Brandon Fish
wrote:

Does anyone know of any existing examples of importing data from a queue
into Solr?

Thank you.







Re: how to get solr core information using solrj

2011-07-20 Thread Stefan Matheis

Jiang,

what about http://wiki.apache.org/solr/CoreAdmin#STATUS ?

Regards
Stefan

Am 20.07.2011 05:40, schrieb Jiang mingyuan:

hi all,

Our solr server contains two cores:core0,core1,and they both works well.

Now I'am trying to find a way to get information about core0 and core1.

Can solrj or other api do this?


thanks very much.



suggester component from trunk throwing error

2011-07-20 Thread abhayd
hi 
I am trying to configure suggester component. I downloaded solr from trunk
and did a build.

here is my config
  

true
suggest
10


 suggest

  

  

suggest
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.tst.TSTLookup
name_autocomplete
true

  

When i build my index, index gets created but i get following exception
-
Jul 20, 2011 2:32:00 AM
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener
buildSpellIndex
INFO: Building spell index for spell checker: suggest
Jul 20, 2011 2:32:00 AM org.apache.solr.spelling.suggest.Suggester build
INFO: build()
Jul 20, 2011 2:32:00 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoSuchMethodError:
org.apache.lucene.index.IndexReader.fields()Lorg/apache/lucene/index/Fields;
at
org.apache.lucene.index.MultiFields.getFields(MultiFields.java:64)
at
org.apache.lucene.index.MultiFields.getFields(MultiFields.java:69)
at
org.apache.lucene.index.MultiFields.getTerms(MultiFields.java:142)
at
org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.(HighFrequencyDictionary.java:65)
at
org.apache.lucene.search.spell.HighFrequencyDictionary.getWordsIterator(HighFrequencyDictionary.java:54)
at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:63)
at
org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:136)
at
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.buildSpellIndex(SpellCheckComponent.java:373)
at
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.newSearcher(SpellCheckComponent.java:358)
at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1163)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Any help?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/suggester-component-from-trunk-throwing-error-tp3184736p3184736.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: - character in search query

2011-07-20 Thread roySolr
Here is my complete fieldtype:


  



 
 
  


In the Field Analysis i see that the - is removed by the
patternreplaceFilter. When i escaped the term($q =
SolrUtils::escapeQueryChars($q);) i see in my debugQuery something like
this(term = arsenal - london):

+((DisjunctionMaxQuery((name:arsenal)~1.0) DisjunctionMaxQuery((name:"\
london"~1.0))~2) ()

When i don't escaped the query i get something like this:

+((DisjunctionMaxQuery((name:arsenal)~1.0)
-DisjunctionMaxQuery((name:london)~1.0))~1) ()

The "-" is my term is used by the -DisjunctionMaxQuery. How can i fix this
problem? What is the Easiest way?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3184805.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: query time boosting in solr

2011-07-20 Thread Sowmya V.B.
Can anyone throw some light on this issue?

My problem is to: give a query time boost to certain documents, which have a
field, say field1, in the range that the user chooses during query time. I
think the below link indicates a range query:

http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper+field1:[10%20TO%2030]&start=0&rows=10

But, apart from that, how can I indicate a boost for the condition
field1:[10%20TO%2030]?

I tried using a &bq=field1:[20 TO 25] and also &bq=field1:[20 TO 25]^10
-But I am not able to figure out what these two mean, from the results.
Because, i get top1 result as a document where field1 is 40..in this
case..after using &bq clause. I increased the boost to 10,20,50 100..but the
results dont change at all.

S.

On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B.  wrote:

> Hi
>
> Is query time boosting possible in Solr?
>
> Here is what I want to do: I want to boost the ranking of certain
> documents, which have their relevant field values, in a particular range
> (selected by user at query time)...
>
> when I do something like:
>
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper&fq=field1:[10%20TO%2030]&start=0&rows=10
> -I guess, it is just a filter over the normal results and not exactly a
> query.
>
> I tried giving this:
>
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper+field1:[10%20TO%2030]&start=0&rows=10
> -This still worked and gave me different results. But, I did not quite
> understand what this second query meant. Does it mean: "Rank those documents
> with field1 value in 10-30 better than those without" ?
>
> S
> --
> Sowmya V.B.
> 
> Losing optimism is blasphemy!
> http://vbsowmya.wordpress.com
> 
>



-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: Solr UI

2011-07-20 Thread Gora Mohanty
On Tue, Jul 19, 2011 at 7:51 PM, Erik Hatcher  wrote:
> There's several starting points for Solr UI out there, but really the best 
> choice is whatever fits your environment and the skills/resources you have 
> handy.  Here's a few off the top of my head -
[...]

Besides these excellent examples, if you are looking at Python/Django,
Haystack works well as a starting point, though:
* One does have to build a template/view architecture around it,
  that is fairly easy to do.
* Haystack allows multiple search back-ends, and while that is
  convenient for starting out, it does not implement some Solr
  features. E.g., one big missing item is support for multi-core
  Solr.

Regards,
Gora


Re: any detailed tutorials on plugin development?

2011-07-20 Thread Gora Mohanty
On Wed, Jul 20, 2011 at 6:29 AM, deniz  wrote:
> gosh sorry for my typo in msg first... i just realized it now... well
> anyway...
>
> i would like to find a detailed tutorial about how to implement an analyzer
> or a request handler plugin... but all i have got is nothing from the
> documentation of solr wiki...

This does not help: http://wiki.apache.org/solr/SolrPlugins ?

Google also turns up multiple examples, e.g.,
http://e-mats.org/2008/06/writing-a-solr-analysis-filter-plugin/
I remember using that blog as a starting point for writing
a custom plugin.

Regards,
Gora


Re: - character in search query

2011-07-20 Thread roySolr
When i use the edismax handler the escaping works great(before i used the
dismax handler).The debugQuery shows me this:

+((DisjunctionMaxQuery((name:arsenal)~1.0)
DisjunctionMaxQuery((name:london)~1.0))~2

The "\" is not in the parsedquery, so i get the results i wanted. I don't
know why the dismax handler working this way.

Can someone tells me the difference between the dismax and edismax handler?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3184941.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any detailed tutorials on plugin development?

2011-07-20 Thread samuele.mattiuzzo
actually i'm rewriting http://wiki.apache.org/solr/UpdateRequestProcessor
this wiki page with a more detailed how-to, it will be ready and online
after i get back from work!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-detailed-tutorials-on-plugin-development-tp3177821p3184990.html
Sent from the Solr - User mailing list archive at Nabble.com.


term positions performance

2011-07-20 Thread Marco Martinez
Hi,

I am developing a new query term proximity and i am using the term positions
to get the positions of each term. I want to know if there is any clues to
increase the perfomance of using term positions, in index time o in query
time, all my fields that i am applying the term positions are indexed.

Thanks in advance,

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Re: term positions performance

2011-07-20 Thread Marco Martinez
Also, i develop this query via function query, i wonder if i do it via a
normal query will increase the perfomance..

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Marco Martinez 

> Hi,
>
> I am developing a new query term proximity and i am using the term
> positions to get the positions of each term. I want to know if there is any
> clues to increase the perfomance of using term positions, in index time o in
> query time, all my fields that i am applying the term positions are indexed.
>
> Thanks in advance,
>
> Marco Martínez Bautista
> http://www.paradigmatecnologico.com
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón
> Tel.: 91 352 59 42
>


Re: POST VS GET and NON English Characters

2011-07-20 Thread Sujatha Arun
Paul ,

I added the fllowing line to catalina.sh  and restarted the server ,but this
does not seem to help.


JAVA_OPTS="-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8"
Regards
Sujatha

On Sun, Jul 17, 2011 at 3:51 AM, Paul Libbrecht  wrote:

> If you have the option, try setting the default charset of the
> servlet-container to utf-8.
> Typically this is done by setting a system property on startup.
>
> My experience has been that the default used to be utf-8 but it is less and
> less and sometimes in a surprising way!
>
> paul
>
>
> Le 16 juil. 2011 à 05:34, Sujatha Arun a écrit :
>
> > It works fine with GET method ,but I am wondering why it does not with
> POST
> > method.
> >
> > 2011/7/15 pankaj bhatt 
> >
> >> Hi Arun,
> >> This looks like an Encoding issue to me.
> >>  Can you change your browser settinsg to UTF-8 and hit the search
> url
> >> via GET method.
> >>
> >>   We faced the similar problem with chienese,korean languages, this
> >> solved the problem.
> >>
> >> / Pankaj Bhatt.
> >>
> >> 2011/7/15 Sujatha Arun 
> >>
> >>> Hello,
> >>>
> >>> We have implemented solr search in  several languages .Intially we used
> >> the
> >>> "GET" method for querying ,but later moved to  "POST" method to
> >> accomodate
> >>> lengthy queries .
> >>>
> >>> When we moved form  GET TO POSt method ,the german characteres could no
> >>> longer be searched and I had to use the fucntion utf8_decode in my
> >>> application  for the search to work for german characters.
> >>>
> >>> Currently I am doing this  while quering using the POST method ,we are
> >>> using
> >>> the standard Request Handler
> >>>
> >>>
> >>> $this->_queryterm=iconv("UTF-8", "ISO-8859-1//TRANSLIT//IGNORE",
> >>> $this->_queryterm);
> >>>
> >>>
> >>> This makes the query work for german characters and other languages but
> >>> does
> >>> not work for certain charactes  in Lithuvanian and spanish.Example:
> >>> *Not working
> >>>
> >>>  - *Iš
> >>>  - Estremadūros
> >>>  - sNaująjį
> >>>  - MEDŽIAGOTYRA
> >>>  - MEDŽIAGOS
> >>>  - taškuose
> >>>
> >>> *Working
> >>>
> >>>  - *garbę
> >>>  - ieškoti
> >>>  - ispanų
> >>>
> >>> Any ideas /input  ?
> >>>
> >>> Regards
> >>> Sujatha
> >>>
> >>
>
>


Re: POST VS GET and NON English Characters

2011-07-20 Thread François Schiettecatte
You need to do something like this in the ./conf/tomcat server.xml file:



See 'URIEncoding' in http://tomcat.apache.org/tomcat-7.0-doc/config/http.html

Note that this will assume that the encoding of the data is in utf-8 if (and 
ONLY if) the charset parameter is not set in the HTTP request content type 
header, the header looks like this:

Content-Type: text/plain; charset=UTF-8

Also note that most browsers encode data in ISO-8859-1 unless overridden in the 
browser settings or by the content type and charset set in the html in case you 
are using a form. This you can do either by setting it in the http response 
content type header (like above), or as a meta tag like this:



Hope this helps.

Cheers

François



On Jul 20, 2011, at 7:20 AM, Sujatha Arun wrote:

> Paul ,
> 
> I added the fllowing line to catalina.sh  and restarted the server ,but this
> does not seem to help.
> 
> 
> JAVA_OPTS="-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8"
> Regards
> Sujatha
> 
> On Sun, Jul 17, 2011 at 3:51 AM, Paul Libbrecht  wrote:
> 
>> If you have the option, try setting the default charset of the
>> servlet-container to utf-8.
>> Typically this is done by setting a system property on startup.
>> 
>> My experience has been that the default used to be utf-8 but it is less and
>> less and sometimes in a surprising way!
>> 
>> paul
>> 
>> 
>> Le 16 juil. 2011 à 05:34, Sujatha Arun a écrit :
>> 
>>> It works fine with GET method ,but I am wondering why it does not with
>> POST
>>> method.
>>> 
>>> 2011/7/15 pankaj bhatt 
>>> 
 Hi Arun,
This looks like an Encoding issue to me.
 Can you change your browser settinsg to UTF-8 and hit the search
>> url
 via GET method.
 
  We faced the similar problem with chienese,korean languages, this
 solved the problem.
 
 / Pankaj Bhatt.
 
 2011/7/15 Sujatha Arun 
 
> Hello,
> 
> We have implemented solr search in  several languages .Intially we used
 the
> "GET" method for querying ,but later moved to  "POST" method to
 accomodate
> lengthy queries .
> 
> When we moved form  GET TO POSt method ,the german characteres could no
> longer be searched and I had to use the fucntion utf8_decode in my
> application  for the search to work for german characters.
> 
> Currently I am doing this  while quering using the POST method ,we are
> using
> the standard Request Handler
> 
> 
> $this->_queryterm=iconv("UTF-8", "ISO-8859-1//TRANSLIT//IGNORE",
> $this->_queryterm);
> 
> 
> This makes the query work for german characters and other languages but
> does
> not work for certain charactes  in Lithuvanian and spanish.Example:
> *Not working
> 
> - *Iš
> - Estremadūros
> - sNaująjį
> - MEDŽIAGOTYRA
> - MEDŽIAGOS
> - taškuose
> 
> *Working
> 
> - *garbę
> - ieškoti
> - ispanų
> 
> Any ideas /input  ?
> 
> Regards
> Sujatha
> 
 
>> 
>> 



Solr 3.3: Exception in thread "Lucene Merge Thread #1"

2011-07-20 Thread mdz-munich
Dear Devs and Users,

it is I! 

Okay, it starts with that:

/Exception in thread "Lucene Merge Thread #1"
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Map
failed
at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:264)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at org.apache.lucene.index.FieldsReader.(FieldsReader.java:129)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4192)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3859)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779)
... 10 more/


And than quickly moves forward to that:

/SEVERE: auto commit error...
java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:264)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:88)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:114)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:677)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:249)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3571)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3508)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1850)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1814)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1778)
at 
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:143)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:183)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:416)
at
org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run(DirectUpdateHandler2.java:611)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:109)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:218)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:736)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779)
... 24 more/


And at than that:
/
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out:
NativeFSLock@/bsbsolrdata/solrindex/master01_solr33x/core.digi20/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1115)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:175)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:223)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:99)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:

Re: query time boosting in solr

2011-07-20 Thread Tomás Fernández Löbbe
Hi Sowmya, "bq" is a great way of boosting, but you have to be using the
Dismax Query Parser or the Extended Dismax (edismax) query parser, it
doesn't work with the Lucene Query Parser. If you can use any of those, then
that's the solution. If you need to use the Lucene Query Parser, for a user
query like:

scientific temper

you could create a query like:

(scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X

being "X" the boost you want for those documents.

with your query:
scientific temper field1:[10 TO 2030]

you are either adding the condition of the range value for the field (if
your default operator is AND) or adding another way of matching the query
(if your default operator ir OR, you can have documents in your result set
that only matched the range query, and this is not what the user wanted).

Hope this helps,

Tomás

On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B.  wrote:

> Can anyone throw some light on this issue?
>
> My problem is to: give a query time boost to certain documents, which have
> a
> field, say field1, in the range that the user chooses during query time. I
> think the below link indicates a range query:
>
>
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper+field1:[10%20TO%2030]&start=0&rows=10
>
> But, apart from that, how can I indicate a boost for the condition
> field1:[10%20TO%2030]?
>
> I tried using a &bq=field1:[20 TO 25] and also &bq=field1:[20 TO 25]^10
> -But I am not able to figure out what these two mean, from the results.
> Because, i get top1 result as a document where field1 is 40..in this
> case..after using &bq clause. I increased the boost to 10,20,50 100..but
> the
> results dont change at all.
>
> S.
>
> On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B.  wrote:
>
> > Hi
> >
> > Is query time boosting possible in Solr?
> >
> > Here is what I want to do: I want to boost the ranking of certain
> > documents, which have their relevant field values, in a particular range
> > (selected by user at query time)...
> >
> > when I do something like:
> >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper&fq=field1:[10%20TO%2030]&start=0&rows=10
> > -I guess, it is just a filter over the normal results and not exactly a
> > query.
> >
> > I tried giving this:
> >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper+field1:[10%20TO%2030]&start=0&rows=10
> > -This still worked and gave me different results. But, I did not quite
> > understand what this second query meant. Does it mean: "Rank those
> documents
> > with field1 value in 10-30 better than those without" ?
> >
> > S
> > --
> > Sowmya V.B.
> > 
> > Losing optimism is blasphemy!
> > http://vbsowmya.wordpress.com
> > 
> >
>
>
>
> --
> Sowmya V.B.
> 
> Losing optimism is blasphemy!
> http://vbsowmya.wordpress.com
> 
>


Re: embeded solrj doesn't refresh index

2011-07-20 Thread Marco Martinez
You should send a commit to you embedded solr

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Jianbin Dai 

> Hi,
>
>
>
> I am using embedded solrj. After I add new doc to the index, I can see the
> changes through solr web, but not from embedded solrj. But after I restart
> the embedded solrj, I do see the changes. It works as if there was a cache.
> Anyone knows the problem? Thanks.
>
>
>
> Jianbin
>
>


Re: query time boosting in solr

2011-07-20 Thread Sowmya V.B.
Hi Tomasso

Thanks for a quick response.

So, if I say:
http://localhost:8085/apache-solr-3.3.0/select?indent=on&version=2.2*
&defType=dismax*&q=scientific&bq=Field1:[20%20TO%2025]^10&start=0&rows=30
-will it be right?

The above query: boosts the documents which suit the given query
("scientific"), which has Field1 values between 20-25, by a factor of 10 :
Is that right??

S

2011/7/20 Tomás Fernández Löbbe 

> Hi Sowmya, "bq" is a great way of boosting, but you have to be using the
> Dismax Query Parser or the Extended Dismax (edismax) query parser, it
> doesn't work with the Lucene Query Parser. If you can use any of those,
> then
> that's the solution. If you need to use the Lucene Query Parser, for a user
> query like:
>
> scientific temper
>
> you could create a query like:
>
> (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X
>
> being "X" the boost you want for those documents.
>
> with your query:
> scientific temper field1:[10 TO 2030]
>
> you are either adding the condition of the range value for the field (if
> your default operator is AND) or adding another way of matching the query
> (if your default operator ir OR, you can have documents in your result set
> that only matched the range query, and this is not what the user wanted).
>
> Hope this helps,
>
> Tomás
>
> On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B.  wrote:
>
> > Can anyone throw some light on this issue?
> >
> > My problem is to: give a query time boost to certain documents, which
> have
> > a
> > field, say field1, in the range that the user chooses during query time.
> I
> > think the below link indicates a range query:
> >
> >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper+field1:[10%20TO%2030]&start=0&rows=10
> >
> > But, apart from that, how can I indicate a boost for the condition
> > field1:[10%20TO%2030]?
> >
> > I tried using a &bq=field1:[20 TO 25] and also &bq=field1:[20 TO 25]^10
> > -But I am not able to figure out what these two mean, from the results.
> > Because, i get top1 result as a document where field1 is 40..in this
> > case..after using &bq clause. I increased the boost to 10,20,50 100..but
> > the
> > results dont change at all.
> >
> > S.
> >
> > On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B.  wrote:
> >
> > > Hi
> > >
> > > Is query time boosting possible in Solr?
> > >
> > > Here is what I want to do: I want to boost the ranking of certain
> > > documents, which have their relevant field values, in a particular
> range
> > > (selected by user at query time)...
> > >
> > > when I do something like:
> > >
> > >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper&fq=field1:[10%20TO%2030]&start=0&rows=10
> > > -I guess, it is just a filter over the normal results and not exactly a
> > > query.
> > >
> > > I tried giving this:
> > >
> > >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper+field1:[10%20TO%2030]&start=0&rows=10
> > > -This still worked and gave me different results. But, I did not quite
> > > understand what this second query meant. Does it mean: "Rank those
> > documents
> > > with field1 value in 10-30 better than those without" ?
> > >
> > > S
> > > --
> > > Sowmya V.B.
> > > 
> > > Losing optimism is blasphemy!
> > > http://vbsowmya.wordpress.com
> > > 
> > >
> >
> >
> >
> > --
> > Sowmya V.B.
> > 
> > Losing optimism is blasphemy!
> > http://vbsowmya.wordpress.com
> > 
> >
>



-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: query time boosting in solr

2011-07-20 Thread Tomás Fernández Löbbe
Yes, it should,  but make sure you specify at least the "qf" parameter for
dismax. You can activate debugQuery and you'll see which documents get
boosted and which aren't.

On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B.  wrote:

> Hi Tomasso
>
> Thanks for a quick response.
>
> So, if I say:
> http://localhost:8085/apache-solr-3.3.0/select?indent=on&version=2.2*
> &defType=dismax*&q=scientific&bq=Field1:[20%20TO%2025]^10&start=0&rows=30
> -will it be right?
>
> The above query: boosts the documents which suit the given query
> ("scientific"), which has Field1 values between 20-25, by a factor of 10 :
> Is that right??
>
> S
>
> 2011/7/20 Tomás Fernández Löbbe 
>
> > Hi Sowmya, "bq" is a great way of boosting, but you have to be using the
> > Dismax Query Parser or the Extended Dismax (edismax) query parser, it
> > doesn't work with the Lucene Query Parser. If you can use any of those,
> > then
> > that's the solution. If you need to use the Lucene Query Parser, for a
> user
> > query like:
> >
> > scientific temper
> >
> > you could create a query like:
> >
> > (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X
> >
> > being "X" the boost you want for those documents.
> >
> > with your query:
> > scientific temper field1:[10 TO 2030]
> >
> > you are either adding the condition of the range value for the field (if
> > your default operator is AND) or adding another way of matching the query
> > (if your default operator ir OR, you can have documents in your result
> set
> > that only matched the range query, and this is not what the user wanted).
> >
> > Hope this helps,
> >
> > Tomás
> >
> > On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B.  wrote:
> >
> > > Can anyone throw some light on this issue?
> > >
> > > My problem is to: give a query time boost to certain documents, which
> > have
> > > a
> > > field, say field1, in the range that the user chooses during query
> time.
> > I
> > > think the below link indicates a range query:
> > >
> > >
> > >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper+field1:[10%20TO%2030]&start=0&rows=10
> > >
> > > But, apart from that, how can I indicate a boost for the condition
> > > field1:[10%20TO%2030]?
> > >
> > > I tried using a &bq=field1:[20 TO 25] and also &bq=field1:[20 TO 25]^10
> > > -But I am not able to figure out what these two mean, from the results.
> > > Because, i get top1 result as a document where field1 is 40..in this
> > > case..after using &bq clause. I increased the boost to 10,20,50
> 100..but
> > > the
> > > results dont change at all.
> > >
> > > S.
> > >
> > > On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B. 
> wrote:
> > >
> > > > Hi
> > > >
> > > > Is query time boosting possible in Solr?
> > > >
> > > > Here is what I want to do: I want to boost the ranking of certain
> > > > documents, which have their relevant field values, in a particular
> > range
> > > > (selected by user at query time)...
> > > >
> > > > when I do something like:
> > > >
> > > >
> > >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper&fq=field1:[10%20TO%2030]&start=0&rows=10
> > > > -I guess, it is just a filter over the normal results and not exactly
> a
> > > > query.
> > > >
> > > > I tried giving this:
> > > >
> > > >
> > >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper+field1:[10%20TO%2030]&start=0&rows=10
> > > > -This still worked and gave me different results. But, I did not
> quite
> > > > understand what this second query meant. Does it mean: "Rank those
> > > documents
> > > > with field1 value in 10-30 better than those without" ?
> > > >
> > > > S
> > > > --
> > > > Sowmya V.B.
> > > > 
> > > > Losing optimism is blasphemy!
> > > > http://vbsowmya.wordpress.com
> > > > 
> > > >
> > >
> > >
> > >
> > > --
> > > Sowmya V.B.
> > > 
> > > Losing optimism is blasphemy!
> > > http://vbsowmya.wordpress.com
> > > 
> > >
> >
>
>
>
> --
> Sowmya V.B.
> 
> Losing optimism is blasphemy!
> http://vbsowmya.wordpress.com
> 
>


Re: Solr 3.3: Exception in thread "Lucene Merge Thread #1"

2011-07-20 Thread mdz-munich
Update.

After adding 1626 documents without doing a commit or optimize:

/Exception in thread "Lucene Merge Thread #1"
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Map
failed
at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:264)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at org.apache.lucene.index.FieldsReader.(FieldsReader.java:129)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4192)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3859)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779)
... 10 more
/

Any ideas, any suggestions?

Greetz & thank you,

Sebastian



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-3-Exception-in-thread-Lucene-Merge-Thread-1-tp3185248p3185344.html
Sent from the Solr - User mailing list archive at Nabble.com.


Culr Tika not working with blanks into literal.field

2011-07-20 Thread Peralta Gutiérrez del Álamo

















Hi. 
I'm trying to index binary documents with curl and Tika for extracting text. 

The problem  is that when I set the value of a field with spaces blanks using 
the input parameter: literal.=, the document is not indexed. 

The sentence I send is the follow: 

curl 
http://localhost:8983/solr/update/extract?literal.id="doc1"\&literal.url="/mnt/windows/Ofertas/2006
 Portal 
Intranet/DOCUMENTACION/datos.doc"\&uprefix=attr_\&fmap.content=text\&commit=true
 -F myfile=\@"/mnt/windows/Ofertas/DOCUMENTACION/datos.doc" 


That is literal.url="value with blanks" apparently is not working 
  
  

Re: defType argument weirdness

2011-07-20 Thread Yonik Seeley
On Tue, Jul 19, 2011 at 11:41 PM, Jonathan Rochkind  wrote:
> Is it generally recognized that this terminology is confusing, or is it just 
> me?
>
> I do understand what they do (at least well enough to use them), but I find 
> it confusing that it's called "defType" as a main param, but "type" in a 
> LocalParam

When used as the main param, it is still just the default (i.e. it may
be overridden).
For example defType=lucene&q={!func}1

> (and then there's 'qt', often confused with defType/type by newbies, since 
> they guess it stands for 'query type', but which should probably actually 
> have been called 'requestHandler'/'rh' instead, since that's what it actually 
> chooses, no?  It gets very confusing).

Yeah, "qt" is very historical... before the QParserPlugin framework,
and before request handlers were used for many other things (including
updates).

-Yonik
http://www.lucidimagination.com


> If it's generally recognized it's confusing and perhaps a somewhat 
> inconsistent mental model being implied, I wonder if there'd be any interest 
> in renaming these to be more clear, leaving the old ones as aliases/synonyms 
> for backwards compatibility (perhaps with a long deprecation period, or 
> perhaps existing forever). I know it was very confusing to me to keep track 
> of these parameters and what they did for quite a while, and still trips me 
> up from time to time.
>
> Jonathan
> 
> From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
> [yo...@lucidimagination.com]
> Sent: Tuesday, July 19, 2011 9:40 PM
> To: solr-user@lucene.apache.org
> Subject: Re: defType argument weirdness
>
> On Tue, Jul 19, 2011 at 1:25 PM, Naomi Dushay  wrote:
>> Regardless, I thought that     defType=dismax&q=*:*   is supposed to be
>> equivalent to  q={!defType=dismax}*:*  and also equivalent to q={!dismax}*:*
>
> Not quite - there is a very subtle distinction.
>
> {!dismax}  is short for {!type=dismax}, the type of the actual query,
> and this may not be overridden.
>
> The defType local param is only the default type for sub-queries (as
> opposed to the current query).
> It's useful in conjunction with the "query"  or nested query qparser:
> http://lucene.apache.org/solr/api/org/apache/solr/search/NestedQParserPlugin.html
>
> -Yonik
> http://www.lucidimagination.com
>


Re: query time boosting in solr

2011-07-20 Thread Sowmya V.B.
Hi Tomas

Here is what I was trying to give.

http://localhost:8085/apache-solr-3.3.0/select?indent=on&version=2.2&defType=dismax&q=scientific&bq=Field1:[20%20TO%2030]
^10&start=0&rows=30&qf=text&fl=Field1,docid&debugQuery=on

Over here, I was trying to change the range of Field1, keeping everything
else intact. Here are my observations:

1) The number of results found remain intact. Only that the order of the
results varies.
2) The boost factor (10) does not seem to throw any influence at all.

Here is what the debugQuery says:
+DisjunctionMaxQuery((text:scientif)) ()
Field1:[20.0 TO 30.0]^10.0
+(text:scientif) () Field1:[20.0 TO
30.0]^10.0

>From these, it seems like its just filtering the results based on the Field1
values, rather than performing a Boost Query.

S.

2011/7/20 Tomás Fernández Löbbe 

> Yes, it should,  but make sure you specify at least the "qf" parameter for
> dismax. You can activate debugQuery and you'll see which documents get
> boosted and which aren't.
>
> On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B.  wrote:
>
> > Hi Tomasso
> >
> > Thanks for a quick response.
> >
> > So, if I say:
> > http://localhost:8085/apache-solr-3.3.0/select?indent=on&version=2.2*
> > &defType=dismax*&q=scientific&bq=Field1:[20%20TO%2025]^10&start=0&rows=30
> > -will it be right?
> >
> > The above query: boosts the documents which suit the given query
> > ("scientific"), which has Field1 values between 20-25, by a factor of 10
> :
> > Is that right??
> >
> > S
> >
> > 2011/7/20 Tomás Fernández Löbbe 
> >
> > > Hi Sowmya, "bq" is a great way of boosting, but you have to be using
> the
> > > Dismax Query Parser or the Extended Dismax (edismax) query parser, it
> > > doesn't work with the Lucene Query Parser. If you can use any of those,
> > > then
> > > that's the solution. If you need to use the Lucene Query Parser, for a
> > user
> > > query like:
> > >
> > > scientific temper
> > >
> > > you could create a query like:
> > >
> > > (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X
> > >
> > > being "X" the boost you want for those documents.
> > >
> > > with your query:
> > > scientific temper field1:[10 TO 2030]
> > >
> > > you are either adding the condition of the range value for the field
> (if
> > > your default operator is AND) or adding another way of matching the
> query
> > > (if your default operator ir OR, you can have documents in your result
> > set
> > > that only matched the range query, and this is not what the user
> wanted).
> > >
> > > Hope this helps,
> > >
> > > Tomás
> > >
> > > On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. 
> wrote:
> > >
> > > > Can anyone throw some light on this issue?
> > > >
> > > > My problem is to: give a query time boost to certain documents, which
> > > have
> > > > a
> > > > field, say field1, in the range that the user chooses during query
> > time.
> > > I
> > > > think the below link indicates a range query:
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper+field1:[10%20TO%2030]&start=0&rows=10
> > > >
> > > > But, apart from that, how can I indicate a boost for the condition
> > > > field1:[10%20TO%2030]?
> > > >
> > > > I tried using a &bq=field1:[20 TO 25] and also &bq=field1:[20 TO
> 25]^10
> > > > -But I am not able to figure out what these two mean, from the
> results.
> > > > Because, i get top1 result as a document where field1 is 40..in this
> > > > case..after using &bq clause. I increased the boost to 10,20,50
> > 100..but
> > > > the
> > > > results dont change at all.
> > > >
> > > > S.
> > > >
> > > > On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B. 
> > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Is query time boosting possible in Solr?
> > > > >
> > > > > Here is what I want to do: I want to boost the ranking of certain
> > > > > documents, which have their relevant field values, in a particular
> > > range
> > > > > (selected by user at query time)...
> > > > >
> > > > > when I do something like:
> > > > >
> > > > >
> > > >
> > >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper&fq=field1:[10%20TO%2030]&start=0&rows=10
> > > > > -I guess, it is just a filter over the normal results and not
> exactly
> > a
> > > > > query.
> > > > >
> > > > > I tried giving this:
> > > > >
> > > > >
> > > >
> > >
> >
> http://localhost:8085/solr/select?indent=on&version=2.2&q=scientific+temper+field1:[10%20TO%2030]&start=0&rows=10
> > > > > -This still worked and gave me different results. But, I did not
> > quite
> > > > > understand what this second query meant. Does it mean: "Rank those
> > > > documents
> > > > > with field1 value in 10-30 better than those without" ?
> > > > >
> > > > > S
> > > > > --
> > > > > Sowmya V.B.
> > > > > 
> > > > > Losing optimism is blasphemy!
> > > > > http://vbsowmya.wordpress.com
> > > > > 
> > > > >
> > >

Re: Geospatial queries in Solr

2011-07-20 Thread Jamie Johnson
Thanks for responding so quickly, I don't mind waiting a bit.  I'll
hang out until the updates have been  made.  Thanks again.

On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W.  wrote:
> Hi Jamie.
> I work on LSP; it can index polygons and query for them. Although the 
> capability is there, we have more testing & benchmarking to do, and then we 
> need to put together a tutorial to explain how to use it at the Solr layer.  
> I recently cleaned up the READMEs a bit.  Try downloading the trunk codebase, 
> and follow the README.  It points to another README which shows off a demo 
> webapp.  At the conclusion of this, you'll need to examine the tests and 
> webapp a bit to figure out how to apply it in your app.  We don't yet have a 
> tutorial as the framework has been in flux  although it has stabilized a good 
> deal.
>
> Oh... by the way, this works off of Lucene/Solr trunk.  Within the past week 
> there was a major change to trunk and LSP won't compile until we make 
> updates.  Either Ryan McKinley or I will get to that by the end of the week.  
> So unless you have access to 2-week old maven artifacts of Lucene/Solr, 
> you're stuck right now.
>
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>
> On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:
>
>> I have looked at the code being shared on the
>> lucene-spatial-playground and was wondering if anyone could provide
>> some details as to its state.  Specifically I'm looking to add
>> geospatial support to my application based on a user provided polygon,
>> is this currently possible using this extension?
>
>
>
>
>
>
>


Re: Geospatial queries in Solr

2011-07-20 Thread Smiley, David W.
Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a 
"mvn clean install" and you'll be back in business.

On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:

> Thanks for responding so quickly, I don't mind waiting a bit.  I'll
> hang out until the updates have been  made.  Thanks again.
> 
> On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W.  wrote:
>> Hi Jamie.
>> I work on LSP; it can index polygons and query for them. Although the 
>> capability is there, we have more testing & benchmarking to do, and then we 
>> need to put together a tutorial to explain how to use it at the Solr layer.  
>> I recently cleaned up the READMEs a bit.  Try downloading the trunk 
>> codebase, and follow the README.  It points to another README which shows 
>> off a demo webapp.  At the conclusion of this, you'll need to examine the 
>> tests and webapp a bit to figure out how to apply it in your app.  We don't 
>> yet have a tutorial as the framework has been in flux  although it has 
>> stabilized a good deal.
>> 
>> Oh... by the way, this works off of Lucene/Solr trunk.  Within the past week 
>> there was a major change to trunk and LSP won't compile until we make 
>> updates.  Either Ryan McKinley or I will get to that by the end of the week. 
>>  So unless you have access to 2-week old maven artifacts of Lucene/Solr, 
>> you're stuck right now.
>> 
>> ~ David Smiley
>> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>> 
>> On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:
>> 
>>> I have looked at the code being shared on the
>>> lucene-spatial-playground and was wondering if anyone could provide
>>> some details as to its state.  Specifically I'm looking to add
>>> geospatial support to my application based on a user provided polygon,
>>> is this currently possible using this extension?
>> 
>> 
>> 
>> 
>> 
>> 
>> 



Reading Solr's JSON

2011-07-20 Thread Sowmya V.B.
Hi All

Which is the best way to read Solr's JSON output, from a Java code?
There seems to be a JSONParser in one of the jar files in SolrLib
(org.apache.noggit..)...but I dont understand how to read the parsed output
in this.

Are there any better JSON parsers for Java?

S

-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: Solr suggester and spell checker

2011-07-20 Thread abhayd
hi 

I am having same issue, did you find the solution for this problem?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-suggester-and-spell-checker-tp2326907p3185680.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: suggester component from trunk throwing error

2011-07-20 Thread abhayd
i had a old jar in build.

Everything works fine now

--
View this message in context: 
http://lucene.472066.n3.nabble.com/suggester-component-from-trunk-throwing-error-tp3184736p3185681.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reading Solr's JSON

2011-07-20 Thread Yonik Seeley
On Wed, Jul 20, 2011 at 10:58 AM, Sowmya V.B.  wrote:
> Which is the best way to read Solr's JSON output, from a Java code?

You could use SolrJ - it handles parsing for you (and uses the most
efficient binary format by default).

> There seems to be a JSONParser in one of the jar files in SolrLib
> (org.apache.noggit..)...but I dont understand how to read the parsed output
> in this.

If you just want to deserialize into objects (Maps, Lists, etc) then it's easy:

ObjectBuilder.fromJSON(my_json_string)

-Yonik
http://www.lucidimagination.com


Manipulating a Fuzzy Query's Prefix Length

2011-07-20 Thread Kyle Lee
We're performing fuzzy searches on a field possessing a large number of
unique terms. Specifying a required minimum similarity of 0.7 results in a
query execution time of 13-15 seconds, which stands in stark contrast to our
average query time of 40ms.

We suspect that the performance problem most likely emanates from the
enumeration over all the unique terms in the index. The Lucene documentation
for FuzzyQuery supports this theory with the following warning:

*"Warning:* this query is not very scalable with its default prefix length
of 0 - in this case, *every* term will be enumerated and cause an edit score
calculation."

We would therefore like to set the prefix length to one or two, mandating
that the first couple of characters match and thereby substantially reduce
the number of terms enumerated. Is this possible with Solr? I haven't yet
discovered a method, if so. Any help would be greatly appreciated.


Tokenizer Question

2011-07-20 Thread Jamie Johnson
I have a query which starts out with something like name:"john", I
need to expand this to something like name:("john" "johnny").  I've
implemented a custom tokenzier which gets close, but isn't quite right
it outputs name:"john johnny".  Is there a simple example of doing
what I'm attempting?


How can i find a document by a special id?

2011-07-20 Thread Per Newgro

Hi,

i'm new to solr. I built an application using the standard solr 3.3 
examples as default.
My id field is a string and is copied to a solr.TextField ("searchtext") 
for search queries.

All works fine except i try to get documents by a special id.

Let me explain the detail's. Assume id = "1234567". I would like to 
query this document
by using q=searchtext:AB1234567. The prefix ("AB") is acting as a 
pseudo-id in our
system. Users know and search for it. But it's not findable because 
solr-index only knows

the "short id".

Adding a new document with the prefixed-id as id is not an option. Then 
i have to add

many documents.

For my understanding stemming and ngram tokenizing is not possible
because they act on tokens longer then the search token.

How can i do this?

Thanks
Per


Re: How can i find a document by a special id?

2011-07-20 Thread Kyle Lee
Perhaps I'm missing something, but if your fields are indexed as "1234567"
but users are searching for "AB1234567," is it not possible simply to strip
the prefix from the user's input before sending the request?

On Wed, Jul 20, 2011 at 10:57 AM, Per Newgro  wrote:

> Hi,
>
> i'm new to solr. I built an application using the standard solr 3.3
> examples as default.
> My id field is a string and is copied to a solr.TextField ("searchtext")
> for search queries.
> All works fine except i try to get documents by a special id.
>
> Let me explain the detail's. Assume id = "1234567". I would like to query
> this document
> by using q=searchtext:AB1234567. The prefix ("AB") is acting as a pseudo-id
> in our
> system. Users know and search for it. But it's not findable because
> solr-index only knows
> the "short id".
>
> Adding a new document with the prefixed-id as id is not an option. Then i
> have to add
> many documents.
>
> For my understanding stemming and ngram tokenizing is not possible
> because they act on tokens longer then the search token.
>
> How can i do this?
>
> Thanks
> Per
>


Re: Tokenizer Question

2011-07-20 Thread Kyle Lee
I'm not sure how to accomplish what you're asking, but have you considered
using a synonyms file? This would also allow you to catch ostensibly
unrelated name substitutes such as Robert -> Bob and Richard -> Dick.

On Wed, Jul 20, 2011 at 10:57 AM, Jamie Johnson  wrote:

> I have a query which starts out with something like name:"john", I
> need to expand this to something like name:("john" "johnny").  I've
> implemented a custom tokenzier which gets close, but isn't quite right
> it outputs name:"john johnny".  Is there a simple example of doing
> what I'm attempting?
>


Re: Tokenizer Question

2011-07-20 Thread Jamie Johnson
My use case really isn't names, I just used that as a simplification.
I did look at the Synonym filter to see if I could implement a similar
filter (if that was a more appropriate place to do so) but even after
doing that I ended up with the same result.

On Wed, Jul 20, 2011 at 12:07 PM, Kyle Lee  wrote:
> I'm not sure how to accomplish what you're asking, but have you considered
> using a synonyms file? This would also allow you to catch ostensibly
> unrelated name substitutes such as Robert -> Bob and Richard -> Dick.
>
> On Wed, Jul 20, 2011 at 10:57 AM, Jamie Johnson  wrote:
>
>> I have a query which starts out with something like name:"john", I
>> need to expand this to something like name:("john" "johnny").  I've
>> implemented a custom tokenzier which gets close, but isn't quite right
>> it outputs name:"john johnny".  Is there a simple example of doing
>> what I'm attempting?
>>
>


Re: Geospatial queries in Solr

2011-07-20 Thread Jamie Johnson
Thanks for the update David, I'll give that a try now.

On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W.  wrote:
> Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a 
> "mvn clean install" and you'll be back in business.
>
> On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:
>
>> Thanks for responding so quickly, I don't mind waiting a bit.  I'll
>> hang out until the updates have been  made.  Thanks again.
>>
>> On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W.  wrote:
>>> Hi Jamie.
>>> I work on LSP; it can index polygons and query for them. Although the 
>>> capability is there, we have more testing & benchmarking to do, and then we 
>>> need to put together a tutorial to explain how to use it at the Solr layer. 
>>>  I recently cleaned up the READMEs a bit.  Try downloading the trunk 
>>> codebase, and follow the README.  It points to another README which shows 
>>> off a demo webapp.  At the conclusion of this, you'll need to examine the 
>>> tests and webapp a bit to figure out how to apply it in your app.  We don't 
>>> yet have a tutorial as the framework has been in flux  although it has 
>>> stabilized a good deal.
>>>
>>> Oh... by the way, this works off of Lucene/Solr trunk.  Within the past 
>>> week there was a major change to trunk and LSP won't compile until we make 
>>> updates.  Either Ryan McKinley or I will get to that by the end of the 
>>> week.  So unless you have access to 2-week old maven artifacts of 
>>> Lucene/Solr, you're stuck right now.
>>>
>>> ~ David Smiley
>>> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>>>
>>> On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:
>>>
 I have looked at the code being shared on the
 lucene-spatial-playground and was wondering if anyone could provide
 some details as to its state.  Specifically I'm looking to add
 geospatial support to my application based on a user provided polygon,
 is this currently possible using this extension?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>


Re: How can i find a document by a special id?

2011-07-20 Thread Per Newgro

Am 20.07.2011 18:03, schrieb Kyle Lee:

Perhaps I'm missing something, but if your fields are indexed as "1234567"
but users are searching for "AB1234567," is it not possible simply to strip
the prefix from the user's input before sending the request?

On Wed, Jul 20, 2011 at 10:57 AM, Per Newgro  wrote:


Hi,

i'm new to solr. I built an application using the standard solr 3.3
examples as default.
My id field is a string and is copied to a solr.TextField ("searchtext")
for search queries.
All works fine except i try to get documents by a special id.

Let me explain the detail's. Assume id = "1234567". I would like to query
this document
by using q=searchtext:AB1234567. The prefix ("AB") is acting as a pseudo-id
in our
system. Users know and search for it. But it's not findable because
solr-index only knows
the "short id".

Adding a new document with the prefixed-id as id is not an option. Then i
have to add
many documents.

For my understanding stemming and ngram tokenizing is not possible
because they act on tokens longer then the search token.

How can i do this?

Thanks
Per

Sorry for being not clear here. I only use a single search field. It can 
contain multiple search words.

One of them is the id. So i don't realy know that the search word is an id.
The usecase is: We have a product database with some items. The product 
has an id, name, features
etc. They all go in the described serachtext field. We promote our 
products in different medias. So every
product can have a mediaid (AB is mediacode 1234567 is the id). And 
users should be able to find

the product by id and mediaid.

I hope i could explain myself better.

Thanks for helping me
Per


Wiki Error JSON syntax

2011-07-20 Thread Remy Loubradou
Hi,
I was writing a Solr Client API for Node and I found an error on this page
http://wiki.apache.org/solr/UpdateJSON ,on the section "Update Commands" the
JSON is not valid because there are duplicate keys and two times with "add"
and "delete".I tried with an array and it doesn't work as well, I got error
400, I think that's because the syntax is bad.

I don't really know if I am at the good place to talk about that but ...
that the only place I found. Sorry if it's not.

Thanks,

And I love Solr :)


Re: Solr 3.3: Exception in thread "Lucene Merge Thread #1"

2011-07-20 Thread mdz-munich
Here we go ...

This time we tried to use the old LogByteSizeMergePolicy and
SerialMergeScheduler:




We did this before, just to be sure ... 

~300 Documents:

/
SEVERE: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:264)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at org.apache.lucene.index.FieldsReader.(FieldsReader.java:129)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4192)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3859)
at
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2714)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2709)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2705)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3509)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1850)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1814)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1778)
at 
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:143)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:183)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:416)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:98)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:403)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:301)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:162)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:140)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:736)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779)
... 44 more

20.07.2011 18:07:30 org.apache.solr.core.SolrCore execute
INFO: [core.digi20] webapp=/solr path=/update params={} status=500
QTime=12302 
20.07.2011 18:07:30 org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:264)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at org.apache.lucene.index.FieldsReader.(FieldsReader.java:129)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244)
  

Re: Wiki Error JSON syntax

2011-07-20 Thread Yonik Seeley
On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou
 wrote:
> Hi,
> I was writing a Solr Client API for Node and I found an error on this page
> http://wiki.apache.org/solr/UpdateJSON ,on the section "Update Commands" the
> JSON is not valid because there are duplicate keys and two times with "add"
> and "delete".

It's a common misconception that it's invalid JSON.  Duplicate keys
are in fact legal.

-Yonik
http://www.lucidimagination.com

I tried with an array and it doesn't work as well, I got error
> 400, I think that's because the syntax is bad.
>
> I don't really know if I am at the good place to talk about that but ...
> that the only place I found. Sorry if it's not.
>
> Thanks,
>
> And I love Solr :)
>


Re: query time boosting in solr

2011-07-20 Thread Tomás Fernández Löbbe
So, what you want is to have the same exact results set as if the query was
"scientific", but the documents that also match Field1:[20 TO 30] to have
more score, right?

On Wed, Jul 20, 2011 at 10:53 AM, Sowmya V.B.  wrote:

> Hi Tomas
>
> Here is what I was trying to give.
>
>
> http://localhost:8085/apache-solr-3.3.0/select?indent=on&version=2.2&defType=dismax&q=scientific&bq=Field1:[20%20TO%2030]
> ^10&start=0&rows=30&qf=text&fl=Field1,docid&debugQuery=on
>

This query seems OK for that purpose.

>
> Over here, I was trying to change the range of Field1, keeping everything
> else intact. Here are my observations:
>
> 1) The number of results found remain intact. Only that the order of the
> results varies.
>
Isn't this what was expected?


> 2) The boost factor (10) does not seem to throw any influence at all.
>
It's on the parsed query, Why do you think it doesn't have an influence? Can
you send the debug query output for a document that match the bt?
I tried it with the Solr example and this is what I see:

http://localhost:8983/solr/select?defType=dismax&q=display&bq=weight:[0%20TO%2010]
^10&start=0&rows=30&debugQuery=on&qf=features%20name

This is the debug output for a document that match the query and the boost
query:



1.137027 = (MATCH) sum of:
  0.1994111 = (MATCH) max of:
0.1994111 = (MATCH) weight(features:display in 0), product of:
  0.34767273 = queryWeight(features:display), product of:
3.7080503 = idf(docFreq=1, maxDocs=30)
0.0937616 = queryNorm
  0.57355976 = (MATCH) fieldWeight(features:display in 0), product of:
1.4142135 = tf(termFreq(features:display)=2)
3.7080503 = idf(docFreq=1, maxDocs=30)
0.109375 = fieldNorm(field=features, doc=0)
  0.937616 = (MATCH) ConstantScore(weight:[0.0 TO 10.0]^10.0)^10.0, product of:
10.0 = boost
0.0937616 = queryNorm


and this is the debug output for a document that only match the main query:


0.4834455 = (MATCH) sum of:
  0.4834455 = (MATCH) max of:
0.4834455 = (MATCH) weight(name:display in 12), product of:
  0.34767273 = queryWeight(name:display), product of:
3.7080503 = idf(docFreq=1, maxDocs=30)
0.0937616 = queryNorm
  1.3905189 = (MATCH) fieldWeight(name:display in 12), product of:
1.0 = tf(termFreq(name:display)=1)
3.7080503 = idf(docFreq=1, maxDocs=30)
0.375 = fieldNorm(field=name, doc=12)


Do you have something similar??



> Here is what the debugQuery says:
> +DisjunctionMaxQuery((text:scientif)) ()
> Field1:[20.0 TO 30.0]^10.0
> +(text:scientif) () Field1:[20.0 TO
> 30.0]^10.0
>
> From these, it seems like its just filtering the results based on the
> Field1
> values, rather than performing a Boost Query.
>
> S.
>
> 2011/7/20 Tomás Fernández Löbbe 
>
> > Yes, it should,  but make sure you specify at least the "qf" parameter
> for
> > dismax. You can activate debugQuery and you'll see which documents get
> > boosted and which aren't.
> >
> > On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B.  wrote:
> >
> > > Hi Tomasso
> > >
> > > Thanks for a quick response.
> > >
> > > So, if I say:
> > > http://localhost:8085/apache-solr-3.3.0/select?indent=on&version=2.2*
> > >
> &defType=dismax*&q=scientific&bq=Field1:[20%20TO%2025]^10&start=0&rows=30
> > > -will it be right?
> > >
> > > The above query: boosts the documents which suit the given query
> > > ("scientific"), which has Field1 values between 20-25, by a factor of
> 10
> > :
> > > Is that right??
> > >
> > > S
> > >
> > > 2011/7/20 Tomás Fernández Löbbe 
> > >
> > > > Hi Sowmya, "bq" is a great way of boosting, but you have to be using
> > the
> > > > Dismax Query Parser or the Extended Dismax (edismax) query parser, it
> > > > doesn't work with the Lucene Query Parser. If you can use any of
> those,
> > > > then
> > > > that's the solution. If you need to use the Lucene Query Parser, for
> a
> > > user
> > > > query like:
> > > >
> > > > scientific temper
> > > >
> > > > you could create a query like:
> > > >
> > > > (scientific temper) OR (scientific temper AND (field1:[10 TO
> 2030]))^X
> > > >
> > > > being "X" the boost you want for those documents.
> > > >
> > > > with your query:
> > > > scientific temper field1:[10 TO 2030]
> > > >
> > > > you are either adding the condition of the range value for the field
> > (if
> > > > your default operator is AND) or adding another way of matching the
> > query
> > > > (if your default operator ir OR, you can have documents in your
> result
> > > set
> > > > that only matched the range query, and this is not what the user
> > wanted).
> > > >
> > > > Hope this helps,
> > > >
> > > > Tomás
> > > >
> > > > On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. 
> > wrote:
> > > >
> > > > > Can anyone throw some light on this issue?
> > > > >
> > > > > My problem is to: give a query time boost to certain documents,
> which
> > > > have
> > > > > a
> > > > > field, say field1, in the range that the user chooses during query
> > > time.
> > > > I
> > 

Re: How can i find a document by a special id?

2011-07-20 Thread Kyle Lee
Is the mediacode always alphabetic, and is the ID always numeric?


Schema design/data import

2011-07-20 Thread Travis Low
Greetings.  I am struggling to design a schema and a data import/update
strategy for some semi-complicated data.  I would appreciate any input.

What we have is a bunch of database records that may or may not have files
attached.  Sometimes no files, sometimes 50.

The requirement is to index the database records AND the documents, and the
search results would be just links to the database records.

I'd love to crawl the site with Nutch and be done with it, but we have a
complicated search form with various codes and attributes for the database
records, so we need a detailed schema that will loosely correspond to boxes
on the search form.  I don't think we could easily do that if we just crawl
the site.  But with a detailed schema, I'm having trouble understanding how
we could import and index from the database, and also index the related
files, and have the same schema being populated, especially with the number
of related documents being variable (maybe index them all to one field?).

We have a lot of flexibility on how we can build this, so I'm open to any
suggestions or pointers for further reading.  I've spent a fair amount of
time on the wiki but I didn't see anything that seemed directly relevant.

An additional difficulty, that I am willing to overlook for the first cut,
is that some of these files are zipped, and some of the zip files may
contain other zip files, to maybe 3 or 4 levels deep.

Help, please?

cheers,

Travis



-- 

**

*Travis Low, Director of Development*


** * *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* 

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.


Re: How can i find a document by a special id?

2011-07-20 Thread Per Newgro

Am 20.07.2011 19:23, schrieb Kyle Lee:

Is the mediacode always alphabetic, and is the ID always numeric?


No sadly not. We expose our products on "too" many medias :-).

Per


Re: Geospatial queries in Solr

2011-07-20 Thread Jamie Johnson
So I've pulled the latest and can run the example, I've tried to move
my config over and am having a bit of an issue when executing queries,
specifically I get this:

Unable to read: POLYGON((...

looking at the code it's usign the simple spatial context, how do I
specify JtsSpatialContext?

On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson  wrote:
> Thanks for the update David, I'll give that a try now.
>
> On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W.  wrote:
>> Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a 
>> "mvn clean install" and you'll be back in business.
>>
>> On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:
>>
>>> Thanks for responding so quickly, I don't mind waiting a bit.  I'll
>>> hang out until the updates have been  made.  Thanks again.
>>>
>>> On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W.  wrote:
 Hi Jamie.
 I work on LSP; it can index polygons and query for them. Although the 
 capability is there, we have more testing & benchmarking to do, and then 
 we need to put together a tutorial to explain how to use it at the Solr 
 layer.  I recently cleaned up the READMEs a bit.  Try downloading the 
 trunk codebase, and follow the README.  It points to another README which 
 shows off a demo webapp.  At the conclusion of this, you'll need to 
 examine the tests and webapp a bit to figure out how to apply it in your 
 app.  We don't yet have a tutorial as the framework has been in flux  
 although it has stabilized a good deal.

 Oh... by the way, this works off of Lucene/Solr trunk.  Within the past 
 week there was a major change to trunk and LSP won't compile until we make 
 updates.  Either Ryan McKinley or I will get to that by the end of the 
 week.  So unless you have access to 2-week old maven artifacts of 
 Lucene/Solr, you're stuck right now.

 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

 On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:

> I have looked at the code being shared on the
> lucene-spatial-playground and was wondering if anyone could provide
> some details as to its state.  Specifically I'm looking to add
> geospatial support to my application based on a user provided polygon,
> is this currently possible using this extension?







>>
>>
>


Re: Wiki Error JSON syntax

2011-07-20 Thread Remy Loubradou
I think I can trust you but this is weird.
Funny things if you try to validate on http://jsonlint.com/ this JSON,
duplicates keys are automatically removed. But the thing is, how can you
possibly generate this json with Javascript Object?

It will be really nice to combine both ways that you show on the page.
Something like:

{
"add": [
{
"doc": {
"id": "DOC1",
"my_boosted_field": {
"boost": 2.3,
"value": "test"
},
"my_multivalued_field": [
"aaa",
"bbb"
]
}
},
{
"commitWithin": 5000,
"overwrite": false,
"boost": 3.45,
"doc": {
"f1": "v2"
}
}
],
"commit": {},
"optimize": {
"waitFlush": false,
"waitSearcher": false
},
"delete": [
{
"id": "ID"
},
{
"query": "QUERY"
}
]
}

Thanks you for you previous response Yonik.

2011/7/20 Yonik Seeley 

> On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou
>  wrote:
> > Hi,
> > I was writing a Solr Client API for Node and I found an error on this
> page
> > http://wiki.apache.org/solr/UpdateJSON ,on the section "Update Commands"
> the
> > JSON is not valid because there are duplicate keys and two times with
> "add"
> > and "delete".
>
> It's a common misconception that it's invalid JSON.  Duplicate keys
> are in fact legal.
>
> -Yonik
> http://www.lucidimagination.com
>
> I tried with an array and it doesn't work as well, I got error
> > 400, I think that's because the syntax is bad.
> >
> > I don't really know if I am at the good place to talk about that but ...
> > that the only place I found. Sorry if it's not.
> >
> > Thanks,
> >
> > And I love Solr :)
> >
>


Re: Geospatial queries in Solr

2011-07-20 Thread Smiley, David W.
You can set the system property SpatialContextProvider to 
com.googlecode.lucene.spatial.base.context.JtsSpatialContext

~ David

On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote:

> So I've pulled the latest and can run the example, I've tried to move
> my config over and am having a bit of an issue when executing queries,
> specifically I get this:
> 
> Unable to read: POLYGON((...
> 
> looking at the code it's usign the simple spatial context, how do I
> specify JtsSpatialContext?
> 
> On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson  wrote:
>> Thanks for the update David, I'll give that a try now.
>> 
>> On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W.  wrote:
>>> Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do 
>>> a "mvn clean install" and you'll be back in business.
>>> 
>>> On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:
>>> 
 Thanks for responding so quickly, I don't mind waiting a bit.  I'll
 hang out until the updates have been  made.  Thanks again.
 
 On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W.  
 wrote:
> Hi Jamie.
> I work on LSP; it can index polygons and query for them. Although the 
> capability is there, we have more testing & benchmarking to do, and then 
> we need to put together a tutorial to explain how to use it at the Solr 
> layer.  I recently cleaned up the READMEs a bit.  Try downloading the 
> trunk codebase, and follow the README.  It points to another README which 
> shows off a demo webapp.  At the conclusion of this, you'll need to 
> examine the tests and webapp a bit to figure out how to apply it in your 
> app.  We don't yet have a tutorial as the framework has been in flux  
> although it has stabilized a good deal.
> 
> Oh... by the way, this works off of Lucene/Solr trunk.  Within the past 
> week there was a major change to trunk and LSP won't compile until we 
> make updates.  Either Ryan McKinley or I will get to that by the end of 
> the week.  So unless you have access to 2-week old maven artifacts of 
> Lucene/Solr, you're stuck right now.
> 
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
> 
> On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:
> 
>> I have looked at the code being shared on the
>> lucene-spatial-playground and was wondering if anyone could provide
>> some details as to its state.  Specifically I'm looking to add
>> geospatial support to my application based on a user provided polygon,
>> is this currently possible using this extension?
> 
> 
> 
> 
> 
> 
> 
>>> 
>>> 
>> 



Re: Geospatial queries in Solr

2011-07-20 Thread Jamie Johnson
Where do you set that?

On Wed, Jul 20, 2011 at 2:37 PM, Smiley, David W.  wrote:
> You can set the system property SpatialContextProvider to 
> com.googlecode.lucene.spatial.base.context.JtsSpatialContext
>
> ~ David
>
> On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote:
>
>> So I've pulled the latest and can run the example, I've tried to move
>> my config over and am having a bit of an issue when executing queries,
>> specifically I get this:
>>
>> Unable to read: POLYGON((...
>>
>> looking at the code it's usign the simple spatial context, how do I
>> specify JtsSpatialContext?
>>
>> On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson  wrote:
>>> Thanks for the update David, I'll give that a try now.
>>>
>>> On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W.  
>>> wrote:
 Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do 
 a "mvn clean install" and you'll be back in business.

 On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:

> Thanks for responding so quickly, I don't mind waiting a bit.  I'll
> hang out until the updates have been  made.  Thanks again.
>
> On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W.  
> wrote:
>> Hi Jamie.
>> I work on LSP; it can index polygons and query for them. Although the 
>> capability is there, we have more testing & benchmarking to do, and then 
>> we need to put together a tutorial to explain how to use it at the Solr 
>> layer.  I recently cleaned up the READMEs a bit.  Try downloading the 
>> trunk codebase, and follow the README.  It points to another README 
>> which shows off a demo webapp.  At the conclusion of this, you'll need 
>> to examine the tests and webapp a bit to figure out how to apply it in 
>> your app.  We don't yet have a tutorial as the framework has been in 
>> flux  although it has stabilized a good deal.
>>
>> Oh... by the way, this works off of Lucene/Solr trunk.  Within the past 
>> week there was a major change to trunk and LSP won't compile until we 
>> make updates.  Either Ryan McKinley or I will get to that by the end of 
>> the week.  So unless you have access to 2-week old maven artifacts of 
>> Lucene/Solr, you're stuck right now.
>>
>> ~ David Smiley
>> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>>
>> On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:
>>
>>> I have looked at the code being shared on the
>>> lucene-spatial-playground and was wondering if anyone could provide
>>> some details as to its state.  Specifically I'm looking to add
>>> geospatial support to my application based on a user provided polygon,
>>> is this currently possible using this extension?
>>
>>
>>
>>
>>
>>
>>


>>>
>
>


Culr Tika not working with blanks into literal.field

2011-07-20 Thread Peralta Gutiérrez del Álamo





Hi. 
I'm trying to index binary documents with curl and Tika for extracting text. 

The problem  is that when I set the value of a field with spaces blanks using 
the input parameter: literal.=, the document is not indexed. 

The sentence I send is the follow: 

curl 
http://localhost:8983/solr/update/extract?literal.id="doc1"\&literal.url="/mnt/windows/Ofertas/2006
 Portal 
Intranet/DOCUMENTACION/datos.doc"\&uprefix=attr_\&fmap.content=text\&commit=true
 -F myfile=\@"/mnt/windows/Ofertas/DOCUMENTACION/datos.doc" 


That is literal.url="value with blanks" apparently is not working 
  

Re: Geospatial queries in Solr

2011-07-20 Thread Smiley, David W.
The notion of a "system property" is a java concept; google it and you'll learn 
more.

BTW, despite my responsiveness in helping right now; I'm pretty busy this week 
so this won't necessarily last long.
~ David

On Jul 20, 2011, at 2:43 PM, Jamie Johnson wrote:

> Where do you set that?
> 
> On Wed, Jul 20, 2011 at 2:37 PM, Smiley, David W.  wrote:
>> You can set the system property SpatialContextProvider to 
>> com.googlecode.lucene.spatial.base.context.JtsSpatialContext
>> 
>> ~ David
>> 
>> On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote:
>> 
>>> So I've pulled the latest and can run the example, I've tried to move
>>> my config over and am having a bit of an issue when executing queries,
>>> specifically I get this:
>>> 
>>> Unable to read: POLYGON((...
>>> 
>>> looking at the code it's usign the simple spatial context, how do I
>>> specify JtsSpatialContext?
>>> 
>>> On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson  wrote:
 Thanks for the update David, I'll give that a try now.
 
 On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W.  
 wrote:
> Ryan just updated LSP for Lucene/Solr trunk compatibility so you should 
> do a "mvn clean install" and you'll be back in business.
> 
> On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:
> 
>> Thanks for responding so quickly, I don't mind waiting a bit.  I'll
>> hang out until the updates have been  made.  Thanks again.
>> 
>> On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W.  
>> wrote:
>>> Hi Jamie.
>>> I work on LSP; it can index polygons and query for them. Although the 
>>> capability is there, we have more testing & benchmarking to do, and 
>>> then we need to put together a tutorial to explain how to use it at the 
>>> Solr layer.  I recently cleaned up the READMEs a bit.  Try downloading 
>>> the trunk codebase, and follow the README.  It points to another README 
>>> which shows off a demo webapp.  At the conclusion of this, you'll need 
>>> to examine the tests and webapp a bit to figure out how to apply it in 
>>> your app.  We don't yet have a tutorial as the framework has been in 
>>> flux  although it has stabilized a good deal.
>>> 
>>> Oh... by the way, this works off of Lucene/Solr trunk.  Within the past 
>>> week there was a major change to trunk and LSP won't compile until we 
>>> make updates.  Either Ryan McKinley or I will get to that by the end of 
>>> the week.  So unless you have access to 2-week old maven artifacts of 
>>> Lucene/Solr, you're stuck right now.
>>> 
>>> ~ David Smiley
>>> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>>> 
>>> On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:
>>> 
 I have looked at the code being shared on the
 lucene-spatial-playground and was wondering if anyone could provide
 some details as to its state.  Specifically I'm looking to add
 geospatial support to my application based on a user provided polygon,
 is this currently possible using this extension?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
 
>> 
>> 



RE: embeded solrj doesn't refresh index

2011-07-20 Thread Jianbin Dai
Hi Thanks for response. Here is the whole picture:
I use DIH to import and index data. And use embedded solrj connecting to the
index file for search and other operations.
Here is what I found: Once data are indexed (and committed), I can see the
changes through solr web server, but not from embedded solrj. If I restart
the embedded solr server, I do see the changes.
Hope it helps. Thanks.


-Original Message-
From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] 
Sent: Wednesday, July 20, 2011 5:09 AM
To: solr-user@lucene.apache.org
Subject: Re: embeded solrj doesn't refresh index

You should send a commit to you embedded solr

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Jianbin Dai 

> Hi,
>
>
>
> I am using embedded solrj. After I add new doc to the index, I can see the
> changes through solr web, but not from embedded solrj. But after I restart
> the embedded solrj, I do see the changes. It works as if there was a
cache.
> Anyone knows the problem? Thanks.
>
>
>
> Jianbin
>
>



set queryNorm to 1?

2011-07-20 Thread Elaine Li
Hi Folks,

My boost function bf=div(product(num_clicks,0.3),sum(num_clicks,25))
I would like to directly add the score of it to the final scoring instead of
letting it be normalized by the queryNorm value.
Is there anyway to do it?

Thanks.

Elaine


Re: query time boosting in solr

2011-07-20 Thread Sowmya V.B.
Hi Tomas

Yeah, I now understand it. I was confused about interpreting the output.

Thanks for the comments.

Sowmya.

2011/7/20 Tomás Fernández Löbbe 

> So, what you want is to have the same exact results set as if the query was
> "scientific", but the documents that also match Field1:[20 TO 30] to have
> more score, right?
>
> On Wed, Jul 20, 2011 at 10:53 AM, Sowmya V.B.  wrote:
>
> > Hi Tomas
> >
> > Here is what I was trying to give.
> >
> >
> >
> http://localhost:8085/apache-solr-3.3.0/select?indent=on&version=2.2&defType=dismax&q=scientific&bq=Field1:[20%20TO%2030]
> > ^10&start=0&rows=30&qf=text&fl=Field1,docid&debugQuery=on
> >
>
> This query seems OK for that purpose.
>
> >
> > Over here, I was trying to change the range of Field1, keeping everything
> > else intact. Here are my observations:
> >
> > 1) The number of results found remain intact. Only that the order of the
> > results varies.
> >
> Isn't this what was expected?
>
>
> > 2) The boost factor (10) does not seem to throw any influence at all.
> >
> It's on the parsed query, Why do you think it doesn't have an influence?
> Can
> you send the debug query output for a document that match the bt?
> I tried it with the Solr example and this is what I see:
>
>
> http://localhost:8983/solr/select?defType=dismax&q=display&bq=weight:[0%20TO%2010]
> ^10&start=0&rows=30&debugQuery=on&qf=features%20name
>
> This is the debug output for a document that match the query and the boost
> query:
>
> 
>
> 1.137027 = (MATCH) sum of:
>  0.1994111 = (MATCH) max of:
>0.1994111 = (MATCH) weight(features:display in 0), product of:
>  0.34767273 = queryWeight(features:display), product of:
>3.7080503 = idf(docFreq=1, maxDocs=30)
>0.0937616 = queryNorm
>  0.57355976 = (MATCH) fieldWeight(features:display in 0), product of:
>1.4142135 = tf(termFreq(features:display)=2)
>3.7080503 = idf(docFreq=1, maxDocs=30)
>0.109375 = fieldNorm(field=features, doc=0)
>  0.937616 = (MATCH) ConstantScore(weight:[0.0 TO 10.0]^10.0)^10.0, product
> of:
>10.0 = boost
>0.0937616 = queryNorm
> 
>
> and this is the debug output for a document that only match the main query:
>
> 
> 0.4834455 = (MATCH) sum of:
>  0.4834455 = (MATCH) max of:
>0.4834455 = (MATCH) weight(name:display in 12), product of:
>  0.34767273 = queryWeight(name:display), product of:
>3.7080503 = idf(docFreq=1, maxDocs=30)
>0.0937616 = queryNorm
>  1.3905189 = (MATCH) fieldWeight(name:display in 12), product of:
>1.0 = tf(termFreq(name:display)=1)
>3.7080503 = idf(docFreq=1, maxDocs=30)
>0.375 = fieldNorm(field=name, doc=12)
> 
>
> Do you have something similar??
>
>
>
> > Here is what the debugQuery says:
> > +DisjunctionMaxQuery((text:scientif)) ()
> > Field1:[20.0 TO 30.0]^10.0
> > +(text:scientif) () Field1:[20.0 TO
> > 30.0]^10.0
> >
> > From these, it seems like its just filtering the results based on the
> > Field1
> > values, rather than performing a Boost Query.
> >
> > S.
> >
> > 2011/7/20 Tomás Fernández Löbbe 
> >
> > > Yes, it should,  but make sure you specify at least the "qf" parameter
> > for
> > > dismax. You can activate debugQuery and you'll see which documents get
> > > boosted and which aren't.
> > >
> > > On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B. 
> wrote:
> > >
> > > > Hi Tomasso
> > > >
> > > > Thanks for a quick response.
> > > >
> > > > So, if I say:
> > > >
> http://localhost:8085/apache-solr-3.3.0/select?indent=on&version=2.2*
> > > >
> > &defType=dismax*&q=scientific&bq=Field1:[20%20TO%2025]^10&start=0&rows=30
> > > > -will it be right?
> > > >
> > > > The above query: boosts the documents which suit the given query
> > > > ("scientific"), which has Field1 values between 20-25, by a factor of
> > 10
> > > :
> > > > Is that right??
> > > >
> > > > S
> > > >
> > > > 2011/7/20 Tomás Fernández Löbbe 
> > > >
> > > > > Hi Sowmya, "bq" is a great way of boosting, but you have to be
> using
> > > the
> > > > > Dismax Query Parser or the Extended Dismax (edismax) query parser,
> it
> > > > > doesn't work with the Lucene Query Parser. If you can use any of
> > those,
> > > > > then
> > > > > that's the solution. If you need to use the Lucene Query Parser,
> for
> > a
> > > > user
> > > > > query like:
> > > > >
> > > > > scientific temper
> > > > >
> > > > > you could create a query like:
> > > > >
> > > > > (scientific temper) OR (scientific temper AND (field1:[10 TO
> > 2030]))^X
> > > > >
> > > > > being "X" the boost you want for those documents.
> > > > >
> > > > > with your query:
> > > > > scientific temper field1:[10 TO 2030]
> > > > >
> > > > > you are either adding the condition of the range value for the
> field
> > > (if
> > > > > your default operator is AND) or adding another way of matching the
> > > query
> > > > > (if your default operator ir OR, you can have documents in your
> > result
> > > > set
> > > > > that only matched the range query, and this is not what t

Schema Design/Data Import

2011-07-20 Thread travis

[Apologies if this is a duplicate -- I have sent several messages from my work 
email and they just vanish, so I subscribed with my personal email]
 
Greetings.  I am struggling to design a schema and a data import/update  
strategy for some semi-complicated data.  I would appreciate any input.

What we have is a bunch of database records that may or may not have files 
attached.  Sometimes no files, sometimes 50.
 
The requirement is to index the database records AND the documents,  and the 
search results would be just links to the database records.

I'd  love to crawl the site with Nutch and be done with it, but we have a  
complicated search form with various codes and attributes for the  database 
records, so we need a detailed schema that will loosely  correspond to boxes on 
the search form.  I don't think we could easily  do that if we just crawl the 
site.  But with a detailed schema, I'm  having trouble understanding how we 
could import and index from the  database, and also index the related files, 
and have the same schema  being populated, especially with the number of 
related documents being  variable (maybe index them all to one field?).
 
We have a lot of flexibility on how we can build this, so I'm open  to any 
suggestions or pointers for further reading.  I've spent a fair  amount of time 
on the wiki but I didn't see anything that seemed  directly relevant.
 
An additional difficulty, that I am willing to overlook for the  first cut, is 
that some of these files are zipped, and some of the zip  files may contain 
other zip files, to maybe 3 or 4 levels deep.  

Help, please?
 
cheers,

Travis

Re: How can i find a document by a special id?

2011-07-20 Thread Chris Hostetter

: Am 20.07.2011 19:23, schrieb Kyle Lee:
: > Is the mediacode always alphabetic, and is the ID always numeric?
: > 
: No sadly not. We expose our products on "too" many medias :-).

If i'm understanding you correctly, you're saying even the prefix "AB" is 
not special, that there could be any number of prefixes identifying 
differnet "mediacodes" ? and the product ids aren't all numeric?

your question seems  absurd.  

I can only assume that I am horribly missunderstanding your situation.  
(which is very easy to do when you only have a single contrieved piece of 
example data to go on)

As a general rule, it's not a good idea to think about Solr in the same 
way as a relational database, but Perhaps if you imagine for a moment that 
your Solr index *was* a (read only) relational database, with each 
solr field corrisponding to a column in your DB, and then you described in 
psuedo-code/sql how you would go about doing the types of id lookups you 
want to do, it might give us a better idea of your situation so we can 
suggest an approach for dealing with it.


-Hoss


Re: Tokenizer Question

2011-07-20 Thread Chris Hostetter

When the QueryParser gives hunks of text to an analyzer, and that analyzer 
produces multiple terms, the query parser has to decide how to build a 
query out of it.

if the terms have identicle position information, then it always builds an 
"OR" query (this is the typical synonym situation).  If the terms have 
differing positions, then the behavior is driven by the 
autoGeneratePhraseQueries attribute of the FieldType -- the default value 
of this depends on the version attribute of your top level  tag.


: I have a query which starts out with something like name:"john", I
: need to expand this to something like name:("john" "johnny").  I've
: implemented a custom tokenzier which gets close, but isn't quite right
: it outputs name:"john johnny".  Is there a simple example of doing
: what I'm attempting?
: 

-Hoss


RE: defType argument weirdness

2011-07-20 Thread Chris Hostetter

: I do understand what they do (at least well enough to use them), but I 
: find it confusing that it's called "defType" as a main param, but "type" 
: in a LocalParam, when to me they both seem to do the same thing -- which 

"type" as a localparam in a query string defines the type of query string 
it is -- picking hte parser.

"defType" determins the default value for "type" in the primary query 
string.

: (and then there's 'qt', often confused with defType/type by newbies, 
: since they guess it stands for 'query type', but which should probably 
: actually have been called 'requestHandler'/'rh' instead, since that's 
: what it actually chooses, no?  It gets very confusing).
: 
: If it's generally recognized it's confusing and perhaps a somewhat 
: inconsistent mental model being implied, I wonder if there'd be any 
: interest in renaming these to be more clear, leaving the old ones as 
: aliases/synonyms for backwards compatibility (perhaps with a long 

qt is historic and already being de-emphasized in favor of using 
path based names (ie: http://solr/handlername instead of 
http://solr/select?qt=/handlername) so adding yet another alias for that 
would be moving in the wrong direction.

"type" and "defType" probably make more sense when you think of 
them in that order.  I don't see a strong need to confuse/complicate the 
issue by adding more aliases for them.



-Hoss


Re: defType argument weirdness

2011-07-20 Thread Jonathan Rochkind
Huh, I'm still not completely following. I'm sure it makes sense if you 
understand the underlying implemetnation, but I don't understand how 
'type' and 'defType' don't mean exactly the same thing, just need to be 
expressed differently in different location.


Sorry for beating a dead horse, but maybe it would help if you could 
tell me what I'm getting wrong here:


defType can only go in top-level param, and determines the query parser 
for the overall &q top level param.


type can only go  in a LocalParam, and determines the query parser that 
applies to whatever query (top-level or nested) that the LocalParam 
syntax lives in.  (Just as any other LocalParams apply only to the query 
that the LocalParam block lives in -- and nested queries inherit their 
query parser from the query they are nested in unless over-ridden, just 
as they inherit every other param from the query they are nested in 
unless over-ridden, nothing special here).


Therefore for instance:

&defType=dismax&q=foo

is equivalent to

&defType=lucene&q={!type=dismax}foo


Where am I straying in my mental model here? Because if all that is 
true, I don't understand how 'type' and 'defType' mean anything 
different -- they both choose the query parser, do they not? (which to 
me means I wish they were both called 'parser' instead of 'type' -- a 
'type' here is the name of a query parser, is it not?)  It's just that 
if it's in the top-level param you have to use 'defType', and if it's in 
a LocalParam you have to use 'type'.  That's been my mental model, which 
has served me well so far, but if it's wrong and it's going to trip me 
up on some as yet unencountered use cases, it would probably be good for 
me to know it!  (And probably good for some documentation to be written 
somewhere explaining it too). (And if they really are different, 
prefixing "def" to "type" is not making it very clear what the 
difference is! What's "def" supposed to stand for anyway?)


Jonathan


On 7/20/2011 3:49 PM, Chris Hostetter wrote:

: I do understand what they do (at least well enough to use them), but I
: find it confusing that it's called "defType" as a main param, but "type"
: in a LocalParam, when to me they both seem to do the same thing -- which

"type" as a localparam in a query string defines the type of query string
it is -- picking hte parser.

"defType" determins the default value for "type" in the primary query
string.

: (and then there's 'qt', often confused with defType/type by newbies,
: since they guess it stands for 'query type', but which should probably
: actually have been called 'requestHandler'/'rh' instead, since that's
: what it actually chooses, no?  It gets very confusing).
:
: If it's generally recognized it's confusing and perhaps a somewhat
: inconsistent mental model being implied, I wonder if there'd be any
: interest in renaming these to be more clear, leaving the old ones as
: aliases/synonyms for backwards compatibility (perhaps with a long

qt is historic and already being de-emphasized in favor of using
path based names (ie: http://solr/handlername instead of
http://solr/select?qt=/handlername) so adding yet another alias for that
would be moving in the wrong direction.

"type" and "defType" probably make more sense when you think of
them in that order.  I don't see a strong need to confuse/complicate the
issue by adding more aliases for them.



-Hoss



Re: Tokenizer Question

2011-07-20 Thread Jamie Johnson
Thanks, I'll try that now, I'm assuming I need to add the position
increment and offset attributes?

On Wed, Jul 20, 2011 at 3:44 PM, Chris Hostetter
 wrote:
>
> When the QueryParser gives hunks of text to an analyzer, and that analyzer
> produces multiple terms, the query parser has to decide how to build a
> query out of it.
>
> if the terms have identicle position information, then it always builds an
> "OR" query (this is the typical synonym situation).  If the terms have
> differing positions, then the behavior is driven by the
> autoGeneratePhraseQueries attribute of the FieldType -- the default value
> of this depends on the version attribute of your top level  tag.
>
>
> : I have a query which starts out with something like name:"john", I
> : need to expand this to something like name:("john" "johnny").  I've
> : implemented a custom tokenzier which gets close, but isn't quite right
> : it outputs name:"john johnny".  Is there a simple example of doing
> : what I'm attempting?
> :
>
> -Hoss
>


solrj and XML result sets

2011-07-20 Thread Joe Shubitowski
Does anyone have advice as to how to produce an XML result set using SolrJ?? 
My Java coder says he can *only* produce result sets in javabin - which is fine 
in most cases - but we have a need for an XML output stream as well.

Thanks...


RE: Solr 3.3: Exception in thread "Lucene Merge Thread #1"

2011-07-20 Thread Robert Petersen
Says it is caused by a Java out of memory error, no?  

-Original Message-
From: mdz-munich [mailto:sebastian.lu...@bsb-muenchen.de] 
Sent: Wednesday, July 20, 2011 9:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.3: Exception in thread "Lucene Merge Thread #1"

Here we go ...

This time we tried to use the old LogByteSizeMergePolicy and
SerialMergeScheduler:




We did this before, just to be sure ... 

~300 Documents:

/
SEVERE: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirector
y.java:264)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at
org.apache.lucene.index.FieldsReader.(FieldsReader.java:129)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreRead
ers.java:244)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4192)
at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3859)
at
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.
java:37)
at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2714)
at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2709)
at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2705)
at
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3509)
at
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1850)
at
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1814)
at
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1778)
at
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:143)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHand
ler2.java:183)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.
java:416)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpd
ateProcessorFactory.java:85)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:98)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:67)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:240)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:164)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticator
Base.java:462)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:164)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:100)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563
)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:4
03)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:30
1)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:162)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:140)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.j
ava:309)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
r.java:897)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
va:919)
at java.lang.Thread.run(Thread.java:736)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779)
... 44 more

20.07.2011 18:07:30 org.apache.solr.core.SolrCore execute
INFO: [core.digi20] webapp=/solr path=/update params={} status=500
QTime=12302 
20.07.2011 18:07:30 org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirect

Re: How can i find a document by a special id?

2011-07-20 Thread Bill Bell
Why not just search the 2 fields?

q=*:*&fq=mediacode:AB OR id:123456

You could take the user input and replace it:

q=*:*&fq=mediacode:$input OR id:$input

Of course you can also use dismax and wrap with an OR.

Bill Bell
Sent from mobile


On Jul 20, 2011, at 3:38 PM, Chris Hostetter  wrote:

> 
> : Am 20.07.2011 19:23, schrieb Kyle Lee:
> : > Is the mediacode always alphabetic, and is the ID always numeric?
> : > 
> : No sadly not. We expose our products on "too" many medias :-).
> 
> If i'm understanding you correctly, you're saying even the prefix "AB" is 
> not special, that there could be any number of prefixes identifying 
> differnet "mediacodes" ? and the product ids aren't all numeric?
> 
> your question seems  absurd.  
> 
> I can only assume that I am horribly missunderstanding your situation.  
> (which is very easy to do when you only have a single contrieved piece of 
> example data to go on)
> 
> As a general rule, it's not a good idea to think about Solr in the same 
> way as a relational database, but Perhaps if you imagine for a moment that 
> your Solr index *was* a (read only) relational database, with each 
> solr field corrisponding to a column in your DB, and then you described in 
> psuedo-code/sql how you would go about doing the types of id lookups you 
> want to do, it might give us a better idea of your situation so we can 
> suggest an approach for dealing with it.
> 
> 
> -Hoss


RE: embeded solrj doesn't refresh index

2011-07-20 Thread Jianbin Dai
Hi Thanks for response. Here is the whole picture:
I use DIH to import and index data. And use embedded solrj connecting to the
index file for search and other operations.
Here is what I found: Once data are indexed (and committed), I can see the
changes through solr web server, but not from embedded solrj. If I restart
the embedded solr server, I do see the changes.
Hope it helps. Thanks.

-Original Message-
From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] 
Sent: Wednesday, July 20, 2011 5:09 AM
To: solr-user@lucene.apache.org
Subject: Re: embeded solrj doesn't refresh index

You should send a commit to you embedded solr

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Jianbin Dai 

> Hi,
>
>
>
> I am using embedded solrj. After I add new doc to the index, I can see the
> changes through solr web, but not from embedded solrj. But after I restart
> the embedded solrj, I do see the changes. It works as if there was a
cache.
> Anyone knows the problem? Thanks.
>
>
>
> Jianbin
>
>



Re: Data Import from a Queue

2011-07-20 Thread Bill Bell
Yes this is a good reason for using a queue. I have used Amazon SQS this way 
and it was simple to set up.

Bill Bell
Sent from mobile


On Jul 20, 2011, at 2:59 AM, Stefan Matheis  
wrote:

> Brandon,
> 
> i don't know how they are using it in detail, but Part of Chef's Architecture 
> is this one:
> 
> Chef Server -> RabbitMQ -> Chef Solr Indexer -> Solr
> http://wiki.opscode.com/download/attachments/7274878/chef-server-arch.png
> 
> Perhaps not exactly, what you're looking for - but may give you an idea?
> 
> Regards
> Stefan
> 
> Am 19.07.2011 19:04, schrieb Brandon Fish:
>> Let me provide some more details to the question:
>> 
>> I was unable to find any example implementations where individual documents
>> (single document per message) are read from a message queue (like ActiveMQ
>> or RabbitMQ) and then added to Solr via SolrJ, a HTTP POST or another
>> method. Does anyone know of any available examples for this type of import?
>> 
>> If no examples exist, what would be a recommended commit strategy for
>> performance? My best guess for this would be to have a queue per core and
>> commit once the queue is empty.
>> 
>> Thanks.
>> 
>> On Mon, Jul 18, 2011 at 6:52 PM, Erick 
>> Ericksonwrote:
>> 
>>> This is a really cryptic problem statement.
>>> 
>>> you might want to review:
>>> 
>>> http://wiki.apache.org/solr/UsingMailingLists
>>> 
>>> Best
>>> Erick
>>> 
>>> On Fri, Jul 15, 2011 at 1:52 PM, Brandon Fish
>>> wrote:
 Does anyone know of any existing examples of importing data from a queue
 into Solr?
 
 Thank you.
 
>>> 
>> 


Re: Geospatial queries in Solr

2011-07-20 Thread Jamie Johnson
Thanks David.  When trying to execute queries on a complex irregular
polygon (say the shape of NJ) I'm getting results which are actually
outside of that polygon. Is there a setting which controls this
resolution?

On Wed, Jul 20, 2011 at 2:53 PM, Smiley, David W.  wrote:
> The notion of a "system property" is a java concept; google it and you'll 
> learn more.
>
> BTW, despite my responsiveness in helping right now; I'm pretty busy this 
> week so this won't necessarily last long.
> ~ David
>
> On Jul 20, 2011, at 2:43 PM, Jamie Johnson wrote:
>
>> Where do you set that?
>>
>> On Wed, Jul 20, 2011 at 2:37 PM, Smiley, David W.  wrote:
>>> You can set the system property SpatialContextProvider to 
>>> com.googlecode.lucene.spatial.base.context.JtsSpatialContext
>>>
>>> ~ David
>>>
>>> On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote:
>>>
 So I've pulled the latest and can run the example, I've tried to move
 my config over and am having a bit of an issue when executing queries,
 specifically I get this:

 Unable to read: POLYGON((...

 looking at the code it's usign the simple spatial context, how do I
 specify JtsSpatialContext?

 On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson  wrote:
> Thanks for the update David, I'll give that a try now.
>
> On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W.  
> wrote:
>> Ryan just updated LSP for Lucene/Solr trunk compatibility so you should 
>> do a "mvn clean install" and you'll be back in business.
>>
>> On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:
>>
>>> Thanks for responding so quickly, I don't mind waiting a bit.  I'll
>>> hang out until the updates have been  made.  Thanks again.
>>>
>>> On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W.  
>>> wrote:
 Hi Jamie.
 I work on LSP; it can index polygons and query for them. Although the 
 capability is there, we have more testing & benchmarking to do, and 
 then we need to put together a tutorial to explain how to use it at 
 the Solr layer.  I recently cleaned up the READMEs a bit.  Try 
 downloading the trunk codebase, and follow the README.  It points to 
 another README which shows off a demo webapp.  At the conclusion of 
 this, you'll need to examine the tests and webapp a bit to figure out 
 how to apply it in your app.  We don't yet have a tutorial as the 
 framework has been in flux  although it has stabilized a good deal.

 Oh... by the way, this works off of Lucene/Solr trunk.  Within the 
 past week there was a major change to trunk and LSP won't compile 
 until we make updates.  Either Ryan McKinley or I will get to that by 
 the end of the week.  So unless you have access to 2-week old maven 
 artifacts of Lucene/Solr, you're stuck right now.

 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

 On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:

> I have looked at the code being shared on the
> lucene-spatial-playground and was wondering if anyone could provide
> some details as to its state.  Specifically I'm looking to add
> geospatial support to my application based on a user provided polygon,
> is this currently possible using this extension?







>>
>>
>
>>>
>>>
>
>


Updating fields in an existing document

2011-07-20 Thread Benson Margulies
We find ourselves in the following quandry:

At initial index time, we store a value in a field, and we use it for
facetting. So it, seemingly, has to be there as a field.

However, from time to time, something happens that causes us to want
to change this value. As far as we know, this requires us to
completely re-index the document, which is slow.

It struck me that we can't be the only people to go down this road, so
I write to inquire if we are missing something.


Re: Question on the appropriate software

2011-07-20 Thread Erick Erickson
Solr would work find for this, your PDF files would have to be interpreted
by Tika, but see Data Import handler, FileListEntityProcessor and
TikaEntityProcessor. I don't quite think Nutch is the tool here.

You'll be wanting to do highlighting and a couple of other things

You'll spend some time tweaking results to be what you want, but this
is certainly do-able.

Best
Erick

On Tue, Jul 19, 2011 at 1:29 PM, Matthew Twomey  wrote:
> Greetings,
>
> I'm interesting in having a server based personal document library with a
> few specific features and I'm trying to determine what the most appropriate
> tools are to build it.
>
> I have the following content which I wish to include in the archive:
>
> 1. A smallish collection of technical books in PDF format (around 100)
> 2. Many years of several different magazine subscriptions in PDF format
> (probably another 100 - 200 PDFs)
> 3. Several years of personal documents which were scanned in and converted
> to searchable PDF format (300 - 500 documents)
> 4. I also have local mirrors of several HTML based reference sites
>
> I'd like to have the ability to index all of this content and search it from
> a web form (so that I and a few other can reach it from multiple locations).
> Here are two examples of the functionality I'm looking for:
>
> Scenario 1. "What was that software that has all the nutritional data and
> hooks up to some USDA database? I know I read about it in one of my Linux
> Journals last year."
>
> Now I'd like to be able to pull up the webform and search for "nutrition
> USDA". I'd like to restrict the search to the Linux Journal magazine PDFs
> (or refine the results). I'd like results to contain context snippets with
> each search result. Finally most importantly, I'd like multiple results per
> PDF (or all occurrences). The last one is important so that I can actually
> quickly find the right issue (in case there is some advertisement in every
> issue for the last year that contains those terms). When I click on the
> desired result, the PDF is downloaded by my browser.
>
> Scenario 2. "How much have I been paying for property taxes for the last
> five years again?" (the bills are all scanned in)
>
> In this case I'd like to search for my property identification number (which
> is on the bills) and the results should show all the documents that have it,
> with context. Clicking on results downloads the documents. I assume this
> example is simple to achieve if example 1 can be done.
>
> So in general, my question is - can this be done in a fairly straight
> forward manner with Solr? Is there a more appropriate tool to be using (e.g.
> Nutch?). Also, I have looked high and low for a free, already baked solution
> which can do scenario 1 but haven't been able to find something - so if
> someone knows of such a thing, please let me know.
>
> Thanks!
>
> -Matt
>


RE: Updating fields in an existing document

2011-07-20 Thread Jonathan Rochkind
Nope, you're not missing anything, there's no way to alter a document in an 
index but reindexing the whole document. Solr's architecture would make it 
difficult (although never say impossible) to do otherwise. But you're right it 
would be convenient for people other than you. 

Reindexing a single document ought not to be slow, although if you have many of 
them at once it could be, or if you end up needing to very frequently commit to 
an index it can indeed cause problems. 

From: Benson Margulies [bimargul...@gmail.com]
Sent: Wednesday, July 20, 2011 6:05 PM
To: solr-user
Subject: Updating fields in an existing document

We find ourselves in the following quandry:

At initial index time, we store a value in a field, and we use it for
facetting. So it, seemingly, has to be there as a field.

However, from time to time, something happens that causes us to want
to change this value. As far as we know, this requires us to
completely re-index the document, which is slow.

It struck me that we can't be the only people to go down this road, so
I write to inquire if we are missing something.


RE: Solr 3.3: Exception in thread "Lucene Merge Thread #1"

2011-07-20 Thread mdz-munich
Yeah, indeed.

But since the VM is equipped with plenty of RAM (22GB) and it works so far
(Solr 3.2) very well with this setup, I AM slightly confused, am I?

Maybe we should LOWER the dedicated Physical Memory? The remaining 10GB are
used for a second tomcat (8GB) and the OS (Suse). As far as I understand NIO
(mostly un-far), this package "can directly use the most efficient
operations of the underlying platform". 






 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-3-Exception-in-thread-Lucene-Merge-Thread-1-tp3185248p3186986.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr not returning results for some key words

2011-07-20 Thread Matthew Twomey

Greetings,

I'm having trouble getting Solr to return results for key words that I 
know for sure are in the index. As a test, I've indexed a PDF of a book 
on Java. I'm trying to search the index for 
"UnsupportedOperationException" but I get no results. I can "see" it in 
the index though:


#
[root@myhost apache-solr-1.4.1]# strings 
example/solr/data/index/_0.fdt|grep UnsupportedOperationException

UnsupportedOperationException if the iterator returned by this collec-
throw new UnsupportedOperationException();
UnsupportedOperationException Object does not support methodCHAPTER 
9 EXCEPTIONS

UnsupportedOperationException, 87,
[root@myhost apache-solr-1.4.1]#
#

On the other hand, if I search the index for the word "support" (which 
is also contained in the grep above), I get a hit on this document. 
Furthermore, if I search on "support" and include highlighted snippets, 
I can see the word "UnsupportedOperationException" right in there in the 
highlight results!


#
of an object has
been detected where it is prohibited
UnsupportedOperationException Object does not support
#

So why do I get no hits when I search for it?

This happens with many different key words. Any thoughts on how I can 
trouble shoot this or ideas on why it's not working properly?


Thanks,

-Matt


Re: Manipulating a Fuzzy Query's Prefix Length

2011-07-20 Thread Kyle Lee
Update:

Solr/Lucene 4.0 will incorporate a new fuzzy search algorithm with
substantial performance improvements.

To tide us over until this release, we've simply rebuilt from source with a
default prefix length of 2, which will suit our needs until then.

On Wed, Jul 20, 2011 at 10:09 AM, Kyle Lee wrote:

> We're performing fuzzy searches on a field possessing a large number of
> unique terms. Specifying a required minimum similarity of 0.7 results in a
> query execution time of 13-15 seconds, which stands in stark contrast to our
> average query time of 40ms.
>
> We suspect that the performance problem most likely emanates from the
> enumeration over all the unique terms in the index. The Lucene documentation
> for FuzzyQuery supports this theory with the following warning:
>
> *"Warning:* this query is not very scalable with its default prefix length
> of 0 - in this case, *every* term will be enumerated and cause an edit score
> calculation."
>
> We would therefore like to set the prefix length to one or two, mandating
> that the first couple of characters match and thereby substantially reduce
> the number of terms enumerated. Is this possible with Solr? I haven't yet
> discovered a method, if so. Any help would be greatly appreciated.
>


Announcement/Invitation: Melbourne Solr/Lucene Users Group

2011-07-20 Thread Tal Rotbart
Hi all,

I hope you won't mind me informing the list, but I thought some
Melbourne-based members would find this relevant.

We have noticed that there is a blossoming of Apache Solr/Lucene usage
& development in Melbourne in addition to a lack of an unofficial,
relaxed gathering to allow some fruitful information and experience
exchange.

We're trying to put together a laid back meet up for developers (and
other interested people) who are currently using Apache Solr (and/or
Lucene) or would like to learn more about it.  Aiming for it to be a
high signal/noise ratio group, with meet ups probably once every two
months.

The first meet up is still TBD, but please join the group if you're
keen to join us for pizza, beer, and a discussion about Solr once we
figure out the date of the first meeting.

Also, please feel free to suggest quick (15 minute) presentations -
whether it be a problem you've solved, a problem you need help solving
or a general interesting experience of using Solr.

We're keeping registrations here: http://www.meetup.com/melbourne-solr/

Feel free to pass to co-workers, colleagues who would be interested.

Cheers,
Tal


Re: Announcement/Invitation: Melbourne Solr/Lucene Users Group

2011-07-20 Thread Dave Hall

Hi Tal,

On 21/07/11 14:04, Tal Rotbart wrote:

We have noticed that there is a blossoming of Apache Solr/Lucene usage
&  development in Melbourne in addition to a lack of an unofficial,
relaxed gathering to allow some fruitful information and experience
exchange.

We're trying to put together a laid back meet up for developers (and
other interested people) who are currently using Apache Solr (and/or
Lucene) or would like to learn more about it.  Aiming for it to be a
high signal/noise ratio group, with meet ups probably once every two
months.


This sounds great!  I'm not sure I'll be a regular, but if I'm around 
town when it is on I will try to drop in.



The first meet up is still TBD, but please join the group if you're
keen to join us for pizza, beer, and a discussion about Solr once we
figure out the date of the first meeting.
Once a date is decided please update the Melbourne *UG wiki page so 
others can find out about it.  The wiki has meeting times for various 
user groups around town, which might help you find a time which doesn't 
clash with other groups.  Check out at http://perl.net.au/wiki/Melbourne


Cheers

Dave


Re: Solr not returning results for some key words

2011-07-20 Thread Matthew Twomey
Ok, apparently I'm not the first to have fallen prey to maxFieldLength 
gotcha:


http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html

All fixed now.

-Matt

On 07/20/2011 07:13 PM, Matthew Twomey wrote:

Greetings,

I'm having trouble getting Solr to return results for key words that I 
know for sure are in the index. As a test, I've indexed a PDF of a 
book on Java. I'm trying to search the index for 
"UnsupportedOperationException" but I get no results. I can "see" it 
in the index though:


#
[root@myhost apache-solr-1.4.1]# strings 
example/solr/data/index/_0.fdt|grep UnsupportedOperationException

UnsupportedOperationException if the iterator returned by this collec-
throw new UnsupportedOperationException();
UnsupportedOperationException Object does not support method
CHAPTER 9 EXCEPTIONS

UnsupportedOperationException, 87,
[root@myhost apache-solr-1.4.1]#
#

On the other hand, if I search the index for the word "support" (which 
is also contained in the grep above), I get a hit on this document. 
Furthermore, if I search on "support" and include highlighted 
snippets, I can see the word "UnsupportedOperationException" right in 
there in the highlight results!


#
of an object has
been detected where it is prohibited
UnsupportedOperationException Object does not support
#

So why do I get no hits when I search for it?

This happens with many different key words. Any thoughts on how I can 
trouble shoot this or ideas on why it's not working properly?


Thanks,

-Matt




Re: Question on the appropriate software

2011-07-20 Thread Matthew Twomey
Excellent, thanks for the confirmation Erik. I've started working with 
Solr (just getting my feet wet at this point).


-Matt

On 07/20/2011 05:38 PM, Erick Erickson wrote:

Solr would work find for this, your PDF files would have to be interpreted
by Tika, but see Data Import handler, FileListEntityProcessor and
TikaEntityProcessor. I don't quite think Nutch is the tool here.

You'll be wanting to do highlighting and a couple of other things

You'll spend some time tweaking results to be what you want, but this
is certainly do-able.

Best
Erick

On Tue, Jul 19, 2011 at 1:29 PM, Matthew Twomey  wrote:

Greetings,

I'm interesting in having a server based personal document library with a
few specific features and I'm trying to determine what the most appropriate
tools are to build it.

I have the following content which I wish to include in the archive:

1. A smallish collection of technical books in PDF format (around 100)
2. Many years of several different magazine subscriptions in PDF format
(probably another 100 - 200 PDFs)
3. Several years of personal documents which were scanned in and converted
to searchable PDF format (300 - 500 documents)
4. I also have local mirrors of several HTML based reference sites

I'd like to have the ability to index all of this content and search it from
a web form (so that I and a few other can reach it from multiple locations).
Here are two examples of the functionality I'm looking for:

Scenario 1. "What was that software that has all the nutritional data and
hooks up to some USDA database? I know I read about it in one of my Linux
Journals last year."

Now I'd like to be able to pull up the webform and search for "nutrition
USDA". I'd like to restrict the search to the Linux Journal magazine PDFs
(or refine the results). I'd like results to contain context snippets with
each search result. Finally most importantly, I'd like multiple results per
PDF (or all occurrences). The last one is important so that I can actually
quickly find the right issue (in case there is some advertisement in every
issue for the last year that contains those terms). When I click on the
desired result, the PDF is downloaded by my browser.

Scenario 2. "How much have I been paying for property taxes for the last
five years again?" (the bills are all scanned in)

In this case I'd like to search for my property identification number (which
is on the bills) and the results should show all the documents that have it,
with context. Clicking on results downloads the documents. I assume this
example is simple to achieve if example 1 can be done.

So in general, my question is - can this be done in a fairly straight
forward manner with Solr? Is there a more appropriate tool to be using (e.g.
Nutch?). Also, I have looked high and low for a free, already baked solution
which can do scenario 1 but haven't been able to find something - so if
someone knows of such a thing, please let me know.

Thanks!

-Matt





Re: Announcement/Invitation: Melbourne Solr/Lucene Users Group

2011-07-20 Thread Mark Mandel
Sounds great :) I'll sign up as well.

Look forward to a meeting!

Mark

On Thu, Jul 21, 2011 at 2:14 PM, Dave Hall  wrote:

> Hi Tal,
>
>
> On 21/07/11 14:04, Tal Rotbart wrote:
>
>> We have noticed that there is a blossoming of Apache Solr/Lucene usage
>> &  development in Melbourne in addition to a lack of an unofficial,
>> relaxed gathering to allow some fruitful information and experience
>> exchange.
>>
>> We're trying to put together a laid back meet up for developers (and
>> other interested people) who are currently using Apache Solr (and/or
>> Lucene) or would like to learn more about it.  Aiming for it to be a
>> high signal/noise ratio group, with meet ups probably once every two
>> months.
>>
>
> This sounds great!  I'm not sure I'll be a regular, but if I'm around town
> when it is on I will try to drop in.
>
>
>  The first meet up is still TBD, but please join the group if you're
>> keen to join us for pizza, beer, and a discussion about Solr once we
>> figure out the date of the first meeting.
>>
> Once a date is decided please update the Melbourne *UG wiki page so others
> can find out about it.  The wiki has meeting times for various user groups
> around town, which might help you find a time which doesn't clash with other
> groups.  Check out at 
> http://perl.net.au/wiki/**Melbourne
>
> Cheers
>
> Dave
>



-- 
E: mark.man...@gmail.com
T: http://www.twitter.com/neurotic
W: www.compoundtheory.com

cf.Objective(ANZ) + Flex - Nov 17, 18 - Melbourne Australia
http://www.cfobjective.com.au


Re: Announcement/Invitation: Melbourne Solr/Lucene Users Group

2011-07-20 Thread Ranveer Kumar
Hi,

I m intrested to atained but not in aus.:-(

Regards
 On 21-Jul-2011 9:45 AM, "Dave Hall"  wrote:
> Hi Tal,
>
> On 21/07/11 14:04, Tal Rotbart wrote:
>> We have noticed that there is a blossoming of Apache Solr/Lucene usage
>> & development in Melbourne in addition to a lack of an unofficial,
>> relaxed gathering to allow some fruitful information and experience
>> exchange.
>>
>> We're trying to put together a laid back meet up for developers (and
>> other interested people) who are currently using Apache Solr (and/or
>> Lucene) or would like to learn more about it. Aiming for it to be a
>> high signal/noise ratio group, with meet ups probably once every two
>> months.
>
> This sounds great! I'm not sure I'll be a regular, but if I'm around
> town when it is on I will try to drop in.
>
>> The first meet up is still TBD, but please join the group if you're
>> keen to join us for pizza, beer, and a discussion about Solr once we
>> figure out the date of the first meeting.
> Once a date is decided please update the Melbourne *UG wiki page so
> others can find out about it. The wiki has meeting times for various
> user groups around town, which might help you find a time which doesn't
> clash with other groups. Check out at http://perl.net.au/wiki/Melbourne
>
> Cheers
>
> Dave