date:20100305

Indexing MultiCore

2010-03-05 Thread Suram


Hi,

How can i Indexing the xml file in multicoreAdmin. while tried to
execute the following comman am getting error like this:

\solr\example\exampledocs>java -Ddata=args -Dcom
ocalhost:8080/solr/core0/update -jar post.jar Example.xml




Mar 5, 2010 3:37:00 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Unexpected character 'E' (code
69) in prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:873)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Unknown Source)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
character 'E' (code 69) in prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]
at
com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
at
com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2047)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:90)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
... 19 more

-- 
View this message in context: 
http://old.nabble.com/Indexing-MultiCore-tp27792135p27792135.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Clustering from anlayzed text instead of raw input

2010-03-05 Thread Stanislaw Osinski

>  I'll give a try to stopwords treatbment, but the problem is that we
> perform
> POS tagging and then use payloads to keep only Nouns and Adjectives, and we
> thought that could be interesting to perform clustering only with these
> elements, to avoid senseless words.
>

POS tagging could help a lot in clustering (not yet implemented in Carrot2
though), but ideally, we'd need to have POS tags attached to the original
tokenized text (so each token would be a tuple along the lines of: raw_text
+ stemmed + POS). If we have just nouns and adjectives, cluster labels will
be most likely harder to read (e.g. because of missing prepositions). I'm
not too familiar with Solr internals, but I'm assuming this type of
representation should be possible to implement using payloads? Then, we
could refactor Carrot2 a bit to work either on raw text or on the
tokenized/augmented representation.

Cheers,

S.

Re: Warning : no lockType configured for...

2010-03-05 Thread Mani EZZAT


Should I fill a bug ?

Mani EZZAT wrote:
I tired using the default solrconfig and schema (from the example in 1.3 
release) and I still get the same warnings


When I look at the log, the solrconfig seems correcly loaded, but 
something is strange :

newSearcher warming query from solrconfig.xml}]}
2010-03-04 10:35:32,545 DEBUG [Config] solrconfig.xml missing optional 
mainIndex/deletionPolicy/@class
2010-03-04 10:35:32,556 DEBUG [Config] solrconfig.xml 
mainIndex/unlockOnStartup=false
2010-03-04 10:35:32,563 WARN  [SolrCore] [core] Solr index directory 
'./solr/data/index' doesn't exist. Creating new index...
2010-03-04 10:35:32,589 WARN  [SolrIndexWriter] No lockType configured 
for ./solr/data/index/ assuming 'simple'


Here I can see solr checking the properties in the Config (or maybe 
SolrConfig, not sure about the class) and the lockType property isn't 
here... and here comes the warning..


I'm not sure what it means.  The information is lost somwhere maybe, but 
everything seems fine to me when I look the source code


Also, It happens for the first core I create (and every cores after), so 
I don't think its related to the fact that I create dynamically several 
cores. Even If i create only 1 core, I'll get the warning since I get it 
for the first one anyway


Mani EZZAT wrote:
  
I don't know, I didn't try because I have the need to create a different 
core each time.


I'll do some tests with the default config and will report back to all 
of you

Thank you for your time

Tom Hill. wrote:
  


Hi Mani,

Mani EZZAT wrote:
  

  
I'm dynamically creating cores with a new index, using the same schema 
and solrconfig.xml

  


Does the problem occur if you use the same configuration in a single, static
core?

Tom

Store input text after analyzers and token filters

2010-03-05 Thread JCodina



In an stored field, the content stored is the raw input text.
But when the analyzers perform some cleaning or interesting transformation
of the text, then it could be interesting to store the text after the
tokenizer/Filter chain
there is a way to do this? To be able to get back the text of the document
after being processed??

thanks
Joan
-- 
View this message in context: 
http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27792550.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Clustering Search taking 4sec for 100 results

2010-03-05 Thread Stanislaw Osinski

Hi,

It might be also interesting to add some logging of clustering time (just
filed: https://issues.apache.org/jira/browse/SOLR-1809) to see what the
index search vs clustering proportions are.

Cheers,

S.

On Fri, Mar 5, 2010 at 03:26, Erick Erickson wrote:

> Search time is only partially dependent on the
> number of results returned. Far more important
> is the number of docs in the index, the
> complexity of the query, any sorting you do, etc.
>
> So your question isn't really very answerable,
> you need to provide many more details. Things
> like your index size, the machine you're operating
> on etc.
>
> Are you firing off warmup queries? Also, using
> debugQuery=on on your URL will provide
> significant timing output, that would help us
> diagnose your issues.
>
> HTH
> Erick
>
>
>
> On Thu, Mar 4, 2010 at 9:02 PM, Allahbaksh Asadullah <
> allahbaks...@gmail.com
> > wrote:
>
> > Hi,
> > I am using Solr for clustering. I am have set number of row as 100 and I
> am
> > using clustering handler. The problem is that I am getting the search
> time
> > for clustering search roughly 4sec. I have set -Xmx1024m. What is the
> best
> > way to reduce the time.
> > Regards,
> > allahbaksh
> >
>

Re: Store input text after analyzers and token filters

2010-03-05 Thread Ahmet Arslan

> In an stored field, the content stored is the raw input
> text.
> But when the analyzers perform some cleaning or interesting
> transformation
> of the text, then it could be interesting to store the text
> after the
> tokenizer/Filter chain
> there is a way to do this? To be able to get back the text
> of the document
> after being processed??

You can get term vectors [1] of analyzed text.

Also you can see analyzed text in solr/admin/analysis.jsp if you copy and paste 
sample text data.

[1] http://wiki.apache.org/solr/TermVectorComponent

Re: Can I used .XML files instead of .OSM files

2010-03-05 Thread mamathahl


The body field is of "string" type.  When it was tried giving "text", it
gives error. There is nothing called Textparser.  Its a stringparser. The
body content of a few records are really huge.  I am not sure whether string
can handle such huge amount of data.  When ant index is done, it says
"Indexing done in [some time such as 2084 ms] for 0 docs."  I'm not sure of
the reason why indexing is not happening.  Without getting this step right,
I'm unable to move ahead.  Everything has come to a standstill.  Please help
me resolve this problem.

Lance Norskog-2 wrote:
> 
> Is the 'body' field a text type? If it is a string, searching for
> words will not work.
> 
> Does search for 'id:1' work?
> 
> On Thu, Mar 4, 2010 at 3:44 AM, mamathahl  wrote:
>>
>> I forgot to mention that I have been working on geo-saptial examples
>> downloaded from
>> http://www.ibm.com/developerworks/java/library/j-spatial/.
>> I have replaced the OSM files(data) which initially existed, with my data
>> (i.e XML file with OSM extension).  My XML file has many data records.
>>  The
>> 1st record is shown below.
>> > I use the following commands to index and retrieve the data:
>> ant index
>> ant start-solr
>> and then hit the url http://localhost:8983/solr/admin
>> But when a keyword that exists in the data file is given, I get the
>> following
>> −
>> 
>> −
>> 
>> 0
>> 0
>> −
>> 
>> on
>> 0
>> DRI
>> 2.2
>> 10
>> 
>> 
>> 
>> 
>> Since there is no error message being displayed, I'm unable to figure out
>> what is going wrong.  Kindly help me by providing an appropriate
>> solution.
>>
>> mamathahl wrote:
>>>
>>> I'm very new to Solr.  I downloaded apache-solr-1.5-dev and was trying
>>> out
>>> the example in order to first figure out how Solr is working.  I found
>>> out
>>> that the data directory consisted of .OSM files.  But I have an XML file
>>> consisting of latitude, longitude and relevant news for that location.
>>> Can I just use the XML file to index the data or is it necessary for me
>>> to
>>> convert this file to .OSM file using some tool and then proceed further?
>>> Also the attribute value from the .OSM file is being considered in that
>>> example.  Since there are no attributes for the tags in my XML file, how
>>> can I extract only the contents of my tags?Any help in this direction
>>> will
>>> be appreciated.  Thanks in advance.
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Can-I-used-.XML-files-instead-of-.OSM-files-tp27769082p27779694.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Can-I-used-.XML-files-instead-of-.OSM-files-tp27769082p27793567.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: get english spell dictionary

2010-03-05 Thread michaelnazaruk


Hi,all! Tell my please, where I can get spell dictionary for solr? 


-- 
View this message in context: 
http://old.nabble.com/english-%28american%29-spell-dictionary-tp27778741p27793939.html
Sent from the Solr - User mailing list archive at Nabble.com.

example solr xml working fine but my own xml files not working

2010-03-05 Thread venkatesh uruti


I am trying to imoport xml file in solr, it is successfully importing, but it
is not showing any results while sarching in solr 

in solr home/example docs/ directory all example xmls are working fine but
when i create a new XML file and trying to upload to solr its not flying

can any one please post the steps to import xml file in solr 
-- 
View this message in context: 
http://old.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp27793958p27793958.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can I used .XML files instead of .OSM files

2010-03-05 Thread Erick Erickson

I think you need to back up a step or three here. If I'm
reading your messages right, you've essentially taken
an arbitrary file, renamed it and tried to index it. This won't
work unless you make your schema match, and the xml
file has the proper tags.

SOLR doesn't magically index arbitrary XML. The tools
provided take a very specific XML format and recognized
tags that indicate "actions", like


  value
.
.
.



If the XML you try to index doesn't follow these conventions,
nothing will get indexed.

Furthermore, the schema.xml has to define
how to store the fields specified above, for instance,
you have to have a ... definition
in your schema.xml that corresponds to the
tags inside the document.

So I'd back up and work through the tutorial
*without* trying anything new until you got
that working. Then make a few changes
(add new fields, do a few searches, etc).
Or at least make your custom file follow the
same format as the example file, with
the same tags and structure.

If this is wy off base, you need to
provide some examples of what you're trying to
index, your schema file, and more details
about what doesn't work.

HTH
Erick


On Fri, Mar 5, 2010 at 7:59 AM, mamathahl  wrote:

>
> The body field is of "string" type.  When it was tried giving "text", it
> gives error. There is nothing called Textparser.  Its a stringparser. The
> body content of a few records are really huge.  I am not sure whether
> string
> can handle such huge amount of data.  When ant index is done, it says
> "Indexing done in [some time such as 2084 ms] for 0 docs."  I'm not sure of
> the reason why indexing is not happening.  Without getting this step right,
> I'm unable to move ahead.  Everything has come to a standstill.  Please
> help
> me resolve this problem.
>
> Lance Norskog-2 wrote:
> >
> > Is the 'body' field a text type? If it is a string, searching for
> > words will not work.
> >
> > Does search for 'id:1' work?
> >
> > On Thu, Mar 4, 2010 at 3:44 AM, mamathahl  wrote:
> >>
> >> I forgot to mention that I have been working on geo-saptial examples
> >> downloaded from
> >> http://www.ibm.com/developerworks/java/library/j-spatial/.
> >> I have replaced the OSM files(data) which initially existed, with my
> data
> >> (i.e XML file with OSM extension).  My XML file has many data records.
> >>  The
> >> 1st record is shown below.
> >>  >> I use the following commands to index and retrieve the data:
> >> ant index
> >> ant start-solr
> >> and then hit the url http://localhost:8983/solr/admin
> >> But when a keyword that exists in the data file is given, I get the
> >> following
> >> -
> >> 
> >> -
> >> 
> >> 0
> >> 0
> >> -
> >> 
> >> on
> >> 0
> >> DRI
> >> 2.2
> >> 10
> >> 
> >> 
> >> 
> >> 
> >> Since there is no error message being displayed, I'm unable to figure
> out
> >> what is going wrong.  Kindly help me by providing an appropriate
> >> solution.
> >>
> >> mamathahl wrote:
> >>>
> >>> I'm very new to Solr.  I downloaded apache-solr-1.5-dev and was trying
> >>> out
> >>> the example in order to first figure out how Solr is working.  I found
> >>> out
> >>> that the data directory consisted of .OSM files.  But I have an XML
> file
> >>> consisting of latitude, longitude and relevant news for that location.
> >>> Can I just use the XML file to index the data or is it necessary for me
> >>> to
> >>> convert this file to .OSM file using some tool and then proceed
> further?
> >>> Also the attribute value from the .OSM file is being considered in that
> >>> example.  Since there are no attributes for the tags in my XML file,
> how
> >>> can I extract only the contents of my tags?Any help in this direction
> >>> will
> >>> be appreciated.  Thanks in advance.
> >>>
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/Can-I-used-.XML-files-instead-of-.OSM-files-tp27769082p27779694.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Can-I-used-.XML-files-instead-of-.OSM-files-tp27769082p27793567.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: example solr xml working fine but my own xml files not working

2010-03-05 Thread Erick Erickson

Does your new xml follow the same structure of the
example? That is,

?
Have you tried looking at the results with the admin page
to see what's actually in your index?

More data please. What did you do to try to index your
new data? What response did SOLR give? etc

Erick

On Fri, Mar 5, 2010 at 8:40 AM, venkatesh uruti
wrote:

>
> I am trying to imoport xml file in solr, it is successfully importing, but
> it
> is not showing any results while sarching in solr
>
> in solr home/example docs/ directory all example xmls are working fine but
> when i create a new XML file and trying to upload to solr its not flying
>
> can any one please post the steps to import xml file in solr
> --
> View this message in context:
> http://old.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp27793958p27793958.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: highlight multi-valued field returns weird cut-off highlighted terms

2010-03-05 Thread Koji Sekiguchi


uwdanny wrote:

in this "error" case, the origin query "q=pizza"






















thanks

-
the best is yet to come~
  

What is PhraseTokenFactory in the above?
If the Tokenizer's end() method doesn't work correctly,
you may get the trouble you were facing.

Also consult:
https://issues.apache.org/jira/browse/LUCENE-2207

Koji

--
http://www.rondhuit.com/en/

Stemming

2010-03-05 Thread Suram


Hi,

How can i set Features of stemming (set for Italian) anyone can tell me
-- 
View this message in context: 
http://old.nabble.com/Stemming-tp27794521p27794521.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Documents disappearing

2010-03-05 Thread Pascal Dimassimo


Hi,

hossman wrote:
> 
> : We index using 4 processes that read from a queue of documents. Each
> process
> : send one document at a time to the /update handler.
> 
> Hmmm.. then you should have a message from the LogUpdateProcessorFactory 
> for every individual "add" command that was recieved ... did you crunch 
> those to see if anything odd popped up (ie: duplicated IDs)
> 
> what did the "start commit" log messages look like?
> 
> (FWIW: I have no hunches as to what caused that behavior, i'm just 
> scrounging for more data)
> 

A quick check did show me a couple of duplicates, but if I understand
correctly, even if two different process send the same document, the last
one should update the previous. If I send the same documents 10 times, in
the end, it should only be in my index once, no?

The "start commit" message is always:
start
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)


hossman wrote:
> 
> : Yes, I double checked that no delete occur. Since that indexation, I
> : re-index the same set of documents twice and we always end up with 7725
> : documents, but it did not show that ~1 documents count that we saw
> the
> : first time. But the difference between the first indexation and the
> others
> : was that the first time, the indexation last a couple of hours because
> the
> : documents were not always accessible in our document queue. The others
> 
> Hmmm... what exactly does yout indexing code do when the documents aren't 
> available?  ... and what happens if you forcibly commit in the middle of 
> reindexing (to see some of those counts again)
> 

If no document is available, the threads are sleeping. If a commit is send
manually during the re-indexation, it just commit what has been sent to the
index so far.

I will redo the test with the same documents and in the same conditions as
in our first indexation to see if the counts will be the same again.

Again, thanks a lot for your help.


-- 
View this message in context: 
http://old.nabble.com/Documents-disappearing-tp27659047p27794641.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Store input text after analyzers and token filters

2010-03-05 Thread JCodina


Thanks,
It can be useful as a workarrond, 
but I get a vector not a "result" that I may use wherever I could used the
stored text. 
I'm thinking in clustering.


Ahmet Arslan wrote:
> 
>> In an stored field, the content stored is the raw input
>> text.
>> But when the analyzers perform some cleaning or interesting
>> transformation
>> of the text, then it could be interesting to store the text
>> after the
>> tokenizer/Filter chain
>> there is a way to do this? To be able to get back the text
>> of the document
>> after being processed??
> 
> You can get term vectors [1] of analyzed text.
> 
> Also you can see analyzed text in solr/admin/analysis.jsp if you copy and
> paste sample text data.
> 
> [1] http://wiki.apache.org/solr/TermVectorComponent 
> 
> 
>   
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27794689.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stemming

2010-03-05 Thread Grant Ingersoll

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SnowballPorterFilterFactory


On Mar 5, 2010, at 9:24 AM, Suram wrote:

> 
> Hi,
> 
> How can i set Features of stemming (set for Italian) anyone can tell me
> -- 
> View this message in context: 
> http://old.nabble.com/Stemming-tp27794521p27794521.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Re: SolrJ commit options

2010-03-05 Thread Jerome L Quinn

Shalin Shekhar Mangar  wrote on 02/25/2010 07:38:39
AM:

> On Thu, Feb 25, 2010 at 5:34 PM, gunjan_versata
wrote:
>
> >
> > We are using SolrJ to handle commits to our solr server.. All runs
fine..
> > But whenever the commit happens, the server becomes slow and stops
> > responding.. therby resulting in TimeOut errors on our production. We
are
> > using the default commit with waitFlush = true, waitSearcher = true...
> >
> > Can I change there values so that the requests coming to solr dont
block on
> > recent commit?? Also, what will be the impact of changing these
values??
> >
>
> Solr does not block reads during a commit/optimize. Write operations are
> queued up but they are still accepted. Are you using the same Solr server
> for reads as well as writes?

I've seen similar things with Solr 1.3 (not using SolrJ).  If I try to
optimize the
index, queries will take much longer - easily a minute or more, resulting
in timeouts.

Jerry

SolrConfig - constructing the object

2010-03-05 Thread Kimberly Kantola


Hi All,
  I am new to using the Solr classes in development.  I am trying to
determine how to create  a SolrConfig object.  
  Is it just a matter of calling new SolrConfig with the location of the
solrconfig.xml file ?
 
SolrConfig config = new SolrConfig("/path/to/solrconfig.xml");
 
Thanks for any help!
Kim
-- 
View this message in context: 
http://old.nabble.com/SolrConfig---constructing-the-object-tp27795339p27795339.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrConfig - constructing the object

2010-03-05 Thread Mark Miller


On 03/05/2010 10:29 AM, Kimberly Kantola wrote:

Hi All,
   I am new to using the Solr classes in development.  I am trying to
determine how to create  a SolrConfig object.
   Is it just a matter of calling new SolrConfig with the location of the
solrconfig.xml file ?

SolrConfig config = new SolrConfig("/path/to/solrconfig.xml");

Thanks for any help!
Kim
   


Sure, that's one way that will work if you are happy with all of the 
other defaults that will occur.


--
- Mark

http://www.lucidimagination.com

Re: Can I used .XML files instead of .OSM files

2010-03-05 Thread mamathahl


Thanks for your valuable suggestion.
My XML file does not contain   tags at all.
Its just of this format
 
According to me(if my understanding is right),  posts each record
into a Solr document.  This could be done, if I would be using only data
consisting of lat,lng and body.  But I need to generate 3 extra fields
namely, geohash, lat_radians, lng_radians.  I read somewhere that, these
fields should have their respective values before indexing.  The OSMhandler
file has something called "geohash utils." This will take care of generating
the appropriate geohash values.  The function to_rads converts the given lat
and lng values to radians.  These were made use of in the examples that I
had downloaded.  They also made use of the OSM files for indexing.  It was
found that, files ending with any other extension was not taken for indexing
at all.  Thats why the file was renamed as .OSM though it was just a .XML
file.  The schema.xml and Solrconfig.xml have been changed to match my
requirements.  Now the problem is if I use a plain XML file, I should have
pre-calculated values for geohash, lat_radians and lng_radians.  How do I
get these values? Can you please guide me, how to go about it, now that I've
given more details?

Erick Erickson wrote:
> 
> I think you need to back up a step or three here. If I'm
> reading your messages right, you've essentially taken
> an arbitrary file, renamed it and tried to index it. This won't
> work unless you make your schema match, and the xml
> file has the proper tags.
> 
> SOLR doesn't magically index arbitrary XML. The tools
> provided take a very specific XML format and recognized
> tags that indicate "actions", like
> 
> 
>   value
> .
> .
> .
> 
> 
> 
> If the XML you try to index doesn't follow these conventions,
> nothing will get indexed.
> 
> Furthermore, the schema.xml has to define
> how to store the fields specified above, for instance,
> you have to have a ... definition
> in your schema.xml that corresponds to the
> tags inside the document.
> 
> So I'd back up and work through the tutorial
> *without* trying anything new until you got
> that working. Then make a few changes
> (add new fields, do a few searches, etc).
> Or at least make your custom file follow the
> same format as the example file, with
> the same tags and structure.
> 
> If this is wy off base, you need to
> provide some examples of what you're trying to
> index, your schema file, and more details
> about what doesn't work.
> 
> HTH
> Erick
> 
> 
> On Fri, Mar 5, 2010 at 7:59 AM, mamathahl  wrote:
> 
>>
>> The body field is of "string" type.  When it was tried giving "text", it
>> gives error. There is nothing called Textparser.  Its a stringparser. The
>> body content of a few records are really huge.  I am not sure whether
>> string
>> can handle such huge amount of data.  When ant index is done, it says
>> "Indexing done in [some time such as 2084 ms] for 0 docs."  I'm not sure
>> of
>> the reason why indexing is not happening.  Without getting this step
>> right,
>> I'm unable to move ahead.  Everything has come to a standstill.  Please
>> help
>> me resolve this problem.
>>
>> Lance Norskog-2 wrote:
>> >
>> > Is the 'body' field a text type? If it is a string, searching for
>> > words will not work.
>> >
>> > Does search for 'id:1' work?
>> >
>> > On Thu, Mar 4, 2010 at 3:44 AM, mamathahl  wrote:
>> >>
>> >> I forgot to mention that I have been working on geo-saptial examples
>> >> downloaded from
>> >> http://www.ibm.com/developerworks/java/library/j-spatial/.
>> >> I have replaced the OSM files(data) which initially existed, with my
>> data
>> >> (i.e XML file with OSM extension).  My XML file has many data records.
>> >>  The
>> >> 1st record is shown below.
>> >> > >> I use the following commands to index and retrieve the data:
>> >> ant index
>> >> ant start-solr
>> >> and then hit the url http://localhost:8983/solr/admin
>> >> But when a keyword that exists in the data file is given, I get the
>> >> following
>> >> -
>> >> 
>> >> -
>> >> 
>> >> 0
>> >> 0
>> >> -
>> >> 
>> >> on
>> >> 0
>> >> DRI
>> >> 2.2
>> >> 10
>> >> 
>> >> 
>> >> 
>> >> 
>> >> Since there is no error message being displayed, I'm unable to figure
>> out
>> >> what is going wrong.  Kindly help me by providing an appropriate
>> >> solution.
>> >>
>> >> mamathahl wrote:
>> >>>
>> >>> I'm very new to Solr.  I downloaded apache-solr-1.5-dev and was
>> trying
>> >>> out
>> >>> the example in order to first figure out how Solr is working.  I
>> found
>> >>> out
>> >>> that the data directory consisted of .OSM files.  But I have an XML
>> file
>> >>> consisting of latitude, longitude and relevant news for that
>> location.
>> >>> Can I just use the XML file to index the data or is it necessary for
>> me
>> >>> to
>> >>> convert this file to .OSM file using some tool and then proceed
>> further?
>> >>> Also the attribute value from the .OSM file is being considered in
>> that
>> >>> example.  Since there are no attri

how to boost first token

2010-03-05 Thread Сергей Кашин

I have some documents in Solr index like this


toyota


shock front





toyota


front shock



If I query with 'shock' phrase the result is not sorted and may be
--
shock front
front shock
--
or
--
front shock
shock front
--
Can i boost document with name "shock front" by query phrase 'shock'
and boost document  with name "front shock" by query phrase 'front' ?

Re: highlight multi-valued field returns weird cut-off highlighted terms

2010-03-05 Thread uwdanny


Thanks a lot Koji;

I'll do some deep diving on my tokenizer modification part.

appreciate the pointers!


Koji Sekiguchi-2 wrote:
> 
> uwdanny wrote:
>> in this "error" case, the origin query "q=pizza"
>>
>> > indexed="true" stored="true" termVectors="false" omitNorms="true"/>
>>
>> > positionIncrementGap="100">
>> 
>> > class="org.apache.lucene.analysis.PhraseTokenFactory"
>> phraseSynonyms="phrase_synonyms.txt" includeSubphrases="true"/>
>> > class="org.apache.lucene.analysis.ApostropheTokenFactory"/>
>> > synonyms="headings_synonyms.txt" ignoreCase="true" expand="true"
>> tokenizerFactory="org.apache.lucene.analysis.PhraseTokenFactory"/>
>> > synonyms="listing_name_synonyms.txt" ignoreCase="true" expand="true"
>> tokenizerFactory="org.apache.lucene.analysis.PhraseTokenFactory"/>
>> > synonyms="space_variants.txt" ignoreCase="true" expand="true"
>> tokenizerFactory="org.apache.lucene.analysis.PhraseTokenFactory"/>
>> > generateWordParts="0" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
>> preserveOriginal="1"/>
>> 
>> > class="org.apache.lucene.analysis.KStemFilterFactory" cacheSize="2"/>
>> 
>> 
>> > class="org.apache.lucene.analysis.PhraseTokenFactory"/>
>> > generateWordParts="0" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
>> preserveOriginal="1"/>
>> 
>> > class="org.apache.lucene.analysis.KStemFilterFactory" cacheSize="2"/>
>> 
>> 
>>
>> thanks
>>
>> -
>> the best is yet to come~
>>   
> What is PhraseTokenFactory in the above?
> If the Tokenizer's end() method doesn't work correctly,
> you may get the trouble you were facing.
> 
> Also consult:
> https://issues.apache.org/jira/browse/LUCENE-2207
> 
> Koji
> 
> -- 
> http://www.rondhuit.com/en/
> 
> 
> 


-
the best is yet to come~
-- 
View this message in context: 
http://old.nabble.com/highlight-multi-valued-field-returns-weird-cut-off-highlighted-terms-tp27785795p27797310.html
Sent from the Solr - User mailing list archive at Nabble.com.

indexing a huge data

2010-03-05 Thread Mark N

what should be the fastest way to index a documents , I am indexing huge
collection of data after extracting certain meta - data information
for example author and filename of each files

i am extracting these information and storing in XML format

for example :1abc 
abc.doc
  2abc 
abc1.doc

I can not index these documents directly to solr as it is not in the format
required by solr ( i can not change the format as its used in other modules)

should converting these file to CSV will be better and faster approach
compared to XML?



please  suggest




-- 
Nipen Mark

Re: indexing a huge data

2010-03-05 Thread Joe Calderon

ive found the csv update to be exceptionally fast, though others enjoy
the flexibility of the data import handler

On Fri, Mar 5, 2010 at 10:21 AM, Mark N  wrote:
> what should be the fastest way to index a documents , I am indexing huge
> collection of data after extracting certain meta - data information
> for example author and filename of each files
>
> i am extracting these information and storing in XML format
>
> for example :    1abc 
> abc.doc
>                      2abc 
> abc1.doc
>
> I can not index these documents directly to solr as it is not in the format
> required by solr ( i can not change the format as its used in other modules)
>
> should converting these file to CSV will be better and faster approach
> compared to XML?
>
>
>
> please  suggest
>
>
>
>
> --
> Nipen Mark
>

Comma delimited Search Strings for Location Data

2010-03-05 Thread Kevin Penny


Hello -

article  Background: 
(on solr 1.3)


We're doing a similar thing with our location data - however we're 
finding that if the 'input' string is one long string i.e.: q=

*Philadelphia,PA,19103,US*

that we're getting *0* matches -

Instead of relying on the application to convert the input query to : q=
*Philadelphia, PA, 19103, US* (which does work), we're hoping that there 
could be something we could apply something to the q parameter make it 
treat the commas as spaces and execute the search.


Here are the 2 results from the search:

Mar 5, 2010 1:59:20 PM org.apache.solr.core.SolrCore execute
INFO: [geocore] webapp=/solr path=/select 
params={explainOther=&fl=*,score&debug

Query=on&indent=on&start=0&q=Philadelphia,PA,19103,US&hl.fl=&qt=standard&wt=stan
dard&version=2.2&rows=10} hits=*0 *status=0 QTime=93

Mar 5, 2010 1:59:44 PM org.apache.solr.core.SolrCore execute
INFO: [geocore] webapp=/solr path=/select 
params={explainOther=&fl=*,score&debug

Query=on&indent=on&start=0&q=Philadelphia,+PA,+19103,+US&hl.fl=&qt=standard&wt=s
tandard&version=2.2&rows=10} hits=*174736* status=0 QTime=406

And the second query's debug output:

2.1785781 = (MATCH) product of:
  2.9047709 = (MATCH) sum of:
2.59102 = (MATCH) weight(text:philadelphia in 102623), product of:
  0.5993686 = queryWeight(text:philadelphia), product of:
9.781643 = idf(docFreq=27, numDocs=182380)
0.061274834 = queryNorm
  4.322916 = (MATCH) fieldWeight(text:philadelphia in 102623), 
product of:

1.4142135 = tf(termFreq(text:philadelphia)=2)
9.781643 = idf(docFreq=27, numDocs=182380)
0.3125 = fieldNorm(field=text, doc=102623)
0.29292774 = (MATCH) weight(text:pa in 102623), product of:
  0.23966041 = queryWeight(text:pa), product of:
3.9112372 = idf(docFreq=9922, numDocs=182380)
0.061274834 = queryNorm
  1.617 = (MATCH) fieldWeight(text:pa in 102623), product of:
1.0 = tf(termFreq(text:pa)=1)
3.9112372 = idf(docFreq=9922, numDocs=182380)
0.3125 = fieldNorm(field=text, doc=102623)
0.02082298 = (MATCH) weight(text:us in 102623), product of:
  0.063898034 = queryWeight(text:us), product of:
1.0428104 = idf(docFreq=174736, numDocs=182380)
0.061274834 = queryNorm
  0.32587826 = (MATCH) fieldWeight(text:us in 102623), product of:
1.0 = tf(termFreq(text:us)=1)
1.0428104 = idf(docFreq=174736, numDocs=182380)
0.3125 = fieldNorm(field=text, doc=102623)
  0.75 = coord(3/4)


--

* Kevin Penny */(e)  /** | Application 
Architect/Team Lead


**

*

*

*Jobs2Web Inc. * 	|  10901 Red Circle Drive Suite 200 	|  Minnetonka, MN 
55343

* p: * 952-697-2949 | *c:* 952-807-3358 | *f:* 952-400-5676

*/CONFIDENTIAL COMMUNICATION
/*/This message (which includes any attachments) is intended only for 
the designated recipient(s).  It may contain confidential or proprietary 
information.  If you are not a designated recipient, you may not review, 
use, copy or distribute this message.  If you received this in error, 
please notify the sender by reply email and delete this message and all 
attachments, including any copies thereof.  Thank you/

Re: Comma delimited Search Strings for Location Data

2010-03-05 Thread Kevin Penny


With the 2 searches:

Here's the debug output:

q=Pittsburgh,PA,15222,US
parsedquery_toString
*text:"pittsburgh pa 15222 us"*
(returns 0 matches)

q=Pittsburgh, PA, 15222, US
parsedquery_toString
*text:pittsburgh text:pa text:15222 text:us**
(*returns x matches)

So the first query is searching on the entire phrase, while the second 
one is splitting out the words and applying 'text' field to each term


This is with our current configuration but at least I see what its doing 
- I just have to figure out if there's a way to alter that w/o having to 
'filter' the q term before it get's to solr.


On 3/5/2010 2:02 PM, Kevin Penny wrote:

Hello -

article  Background:
(on solr 1.3)

We're doing a similar thing with our location data - however we're
finding that if the 'input' string is one long string i.e.: q=
*Philadelphia,PA,19103,US*

that we're getting *0* matches -

Instead of relying on the application to convert the input query to : q=
*Philadelphia, PA, 19103, US* (which does work), we're hoping that there
could be something we could apply something to the q parameter make it
treat the commas as spaces and execute the search.

Here are the 2 results from the search:

Mar 5, 2010 1:59:20 PM org.apache.solr.core.SolrCore execute
INFO: [geocore] webapp=/solr path=/select
params={explainOther=&fl=*,score&debug
Query=on&indent=on&start=0&q=Philadelphia,PA,19103,US&hl.fl=&qt=standard&wt=stan
dard&version=2.2&rows=10} hits=*0 *status=0 QTime=93

Mar 5, 2010 1:59:44 PM org.apache.solr.core.SolrCore execute
INFO: [geocore] webapp=/solr path=/select
params={explainOther=&fl=*,score&debug
Query=on&indent=on&start=0&q=Philadelphia,+PA,+19103,+US&hl.fl=&qt=standard&wt=s
tandard&version=2.2&rows=10} hits=*174736* status=0 QTime=406

And the second query's debug output:

2.1785781 = (MATCH) product of:
2.9047709 = (MATCH) sum of:
  2.59102 = (MATCH) weight(text:philadelphia in 102623), product of:
0.5993686 = queryWeight(text:philadelphia), product of:
  9.781643 = idf(docFreq=27, numDocs=182380)
  0.061274834 = queryNorm
4.322916 = (MATCH) fieldWeight(text:philadelphia in 102623),
product of:
  1.4142135 = tf(termFreq(text:philadelphia)=2)
  9.781643 = idf(docFreq=27, numDocs=182380)
  0.3125 = fieldNorm(field=text, doc=102623)
  0.29292774 = (MATCH) weight(text:pa in 102623), product of:
0.23966041 = queryWeight(text:pa), product of:
  3.9112372 = idf(docFreq=9922, numDocs=182380)
  0.061274834 = queryNorm
1.617 = (MATCH) fieldWeight(text:pa in 102623), product of:
  1.0 = tf(termFreq(text:pa)=1)
  3.9112372 = idf(docFreq=9922, numDocs=182380)
  0.3125 = fieldNorm(field=text, doc=102623)
  0.02082298 = (MATCH) weight(text:us in 102623), product of:
0.063898034 = queryWeight(text:us), product of:
  1.0428104 = idf(docFreq=174736, numDocs=182380)
  0.061274834 = queryNorm
0.32587826 = (MATCH) fieldWeight(text:us in 102623), product of:
  1.0 = tf(termFreq(text:us)=1)
  1.0428104 = idf(docFreq=174736, numDocs=182380)
  0.3125 = fieldNorm(field=text, doc=102623)
0.75 = coord(3/4)


   



--

* Kevin Penny */(e)  /** | Application 
Architect/Team Lead


**

*

*

*Jobs2Web Inc. * 	|  10901 Red Circle Drive Suite 200 	|  Minnetonka, MN 
55343

* p: * 952-697-2949 | *c:* 952-807-3358 | *f:* 952-400-5676

*/CONFIDENTIAL COMMUNICATION
/*/This message (which includes any attachments) is intended only for 
the designated recipient(s).  It may contain confidential or proprietary 
information.  If you are not a designated recipient, you may not review, 
use, copy or distribute this message.  If you received this in error, 
please notify the sender by reply email and delete this message and all 
attachments, including any copies thereof.  Thank you/

Searching, indexing, not matching.

2010-03-05 Thread John Ament

Hey

So I just downloaded and am trying solr 1.4, wonderful tool.

One thing I noticed, I created a data config, looks something like this:








It loads all of the entries fine.

I added the following to my schema.xml, to match the above






















Now when I load your default files, I'm able to get search results when I
run a query like:

/select/?indent=on&q=dell&wt=json

now I have text in my data import that includes product_name containing
"ski" or "copy_field" containing "warm," however, if I run a search for
either of those, I get no results.

I don't see anything different between how I index and how the xml files are
indexed, other than the data in the file being loaded via XML. What I do
find curious is if I give SOLR the hint of what field to look in, I do get
results.  So this does return data:

/select/?indent=on&q=product_name:*ski*&wt=json

Any ideas?

Re: indexing a huge data

2010-03-05 Thread Otis Gospodnetic

That is indeed the fastest way in.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Joe Calderon 
> To: solr-user@lucene.apache.org
> Sent: Fri, March 5, 2010 2:36:29 PM
> Subject: Re: indexing a huge data
> 
> ive found the csv update to be exceptionally fast, though others enjoy
> the flexibility of the data import handler
> 
> On Fri, Mar 5, 2010 at 10:21 AM, Mark N wrote:
> > what should be the fastest way to index a documents , I am indexing huge
> > collection of data after extracting certain meta - data information
> > for example author and filename of each files
> >
> > i am extracting these information and storing in XML format
> >
> > for example :   1abc 
> > abc.doc
> > 2abc 
> > abc1.doc
> >
> > I can not index these documents directly to solr as it is not in the format
> > required by solr ( i can not change the format as its used in other modules)
> >
> > should converting these file to CSV will be better and faster approach
> > compared to XML?
> >
> >
> >
> > please  suggest
> >
> >
> >
> >
> > --
> > Nipen Mark
> >

Re: Stemming

2010-03-05 Thread Otis Gospodnetic

Suram,

You have to use Italian-specific analyzer:

   http://www.search-lucene.com/?q=italian

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Suram 
> To: solr-user@lucene.apache.org
> Sent: Fri, March 5, 2010 9:24:33 AM
> Subject: Stemming
> 
> 
> Hi,
> 
> How can i set Features of stemming (set for Italian) anyone can tell me
> -- 
> View this message in context: 
> http://old.nabble.com/Stemming-tp27794521p27794521.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: get english spell dictionary

2010-03-05 Thread Otis Gospodnetic

Hi,

As in a list of (common) English words?  My Ubuntu has 
/usr/share/dict/american-english and british-english with about 100K words each.

See also: http://www.search-lucene.com/?q=%2Benglish+%2Bdictionary

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: michaelnazaruk 
> To: solr-user@lucene.apache.org
> Sent: Fri, March 5, 2010 8:38:32 AM
> Subject: Re: get english spell dictionary
> 
> 
> Hi,all! Tell my please, where I can get spell dictionary for solr? 
> 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/english-%28american%29-spell-dictionary-tp27778741p27793939.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Store input text after analyzers and token filters

2010-03-05 Thread Otis Gospodnetic

Hi Joan,

You could use the FieldAnalysisRequestHandler: 
http://www.search-lucene.com/?q=FieldAnalysisRequestHandler

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: JCodina 
> To: solr-user@lucene.apache.org
> Sent: Fri, March 5, 2010 6:01:37 AM
> Subject: Store input text after analyzers and token filters
> 
> 
> 
> In an stored field, the content stored is the raw input text.
> But when the analyzers perform some cleaning or interesting transformation
> of the text, then it could be interesting to store the text after the
> tokenizer/Filter chain
> there is a way to do this? To be able to get back the text of the document
> after being processed??
> 
> thanks
> Joan
> -- 
> View this message in context: 
> http://old.nabble.com/Store-input-text-after-analyzers-and-token-filters-tp27792550p27792550.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching, indexing, not matching.

2010-03-05 Thread Otis Gospodnetic

John,

Maybe your default search field is set to some field that doesn't have "dell" 
in it.
The default search field is specified in schema.xml.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: John Ament 
> To: solr-user@lucene.apache.org
> Sent: Fri, March 5, 2010 5:19:33 PM
> Subject: Searching, indexing, not matching.
> 
> Hey
> 
> So I just downloaded and am trying solr 1.4, wonderful tool.
> 
> One thing I noticed, I created a data config, looks something like this:
> 
> 
> 
> url="jdbc:oracle:thin:@..." user="..." password="..."/>
> 
> 
> 
> 
> 
> It loads all of the entries fine.
> 
> I added the following to my schema.xml, to match the above
> 
> 
> stored="true"/>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Now when I load your default files, I'm able to get search results when I
> run a query like:
> 
> /select/?indent=on&q=dell&wt=json
> 
> now I have text in my data import that includes product_name containing
> "ski" or "copy_field" containing "warm," however, if I run a search for
> either of those, I get no results.
> 
> I don't see anything different between how I index and how the xml files are
> indexed, other than the data in the file being loaded via XML. What I do
> find curious is if I give SOLR the hint of what field to look in, I do get
> results.  So this does return data:
> 
> /select/?indent=on&q=product_name:*ski*&wt=json
> 
> Any ideas?

Re: Comma delimited Search Strings for Location Data

2010-03-05 Thread Otis Gospodnetic

If you want to treat commas as spaces, one quick and dirty way of doing that is 
this:

  s/,/, /g 


Do that to the query string before you send it to Solr and you are done. :)
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Kevin Penny 
> To: solr-user@lucene.apache.org
> Sent: Fri, March 5, 2010 3:02:22 PM
> Subject: Comma delimited Search Strings for Location Data
> 
> Hello -
> 
> article Background: (on solr 
> 1.3)
> 
> We're doing a similar thing with our location data - however we're finding 
> that 
> if the 'input' string is one long string i.e.: q=
> *Philadelphia,PA,19103,US*
> 
> that we're getting *0* matches -
> 
> Instead of relying on the application to convert the input query to : q=
> *Philadelphia, PA, 19103, US* (which does work), we're hoping that there 
> could 
> be something we could apply something to the q parameter make it treat the 
> commas as spaces and execute the search.
> 
> Here are the 2 results from the search:
> 
> Mar 5, 2010 1:59:20 PM org.apache.solr.core.SolrCore execute
> INFO: [geocore] webapp=/solr path=/select 
> params={explainOther=&fl=*,score&debug
> Query=on&indent=on&start=0&q=Philadelphia,PA,19103,US&hl.fl=&qt=standard&wt=stan
> dard&version=2.2&rows=10} hits=*0 *status=0 QTime=93
> 
> Mar 5, 2010 1:59:44 PM org.apache.solr.core.SolrCore execute
> INFO: [geocore] webapp=/solr path=/select 
> params={explainOther=&fl=*,score&debug
> Query=on&indent=on&start=0&q=Philadelphia,+PA,+19103,+US&hl.fl=&qt=standard&wt=s
> tandard&version=2.2&rows=10} hits=*174736* status=0 QTime=406
> 
> And the second query's debug output:
> 
> 2.1785781 = (MATCH) product of:
>   2.9047709 = (MATCH) sum of:
> 2.59102 = (MATCH) weight(text:philadelphia in 102623), product of:
>   0.5993686 = queryWeight(text:philadelphia), product of:
> 9.781643 = idf(docFreq=27, numDocs=182380)
> 0.061274834 = queryNorm
>   4.322916 = (MATCH) fieldWeight(text:philadelphia in 102623), product of:
> 1.4142135 = tf(termFreq(text:philadelphia)=2)
> 9.781643 = idf(docFreq=27, numDocs=182380)
> 0.3125 = fieldNorm(field=text, doc=102623)
> 0.29292774 = (MATCH) weight(text:pa in 102623), product of:
>   0.23966041 = queryWeight(text:pa), product of:
> 3.9112372 = idf(docFreq=9922, numDocs=182380)
> 0.061274834 = queryNorm
>   1.617 = (MATCH) fieldWeight(text:pa in 102623), product of:
> 1.0 = tf(termFreq(text:pa)=1)
> 3.9112372 = idf(docFreq=9922, numDocs=182380)
> 0.3125 = fieldNorm(field=text, doc=102623)
> 0.02082298 = (MATCH) weight(text:us in 102623), product of:
>   0.063898034 = queryWeight(text:us), product of:
> 1.0428104 = idf(docFreq=174736, numDocs=182380)
> 0.061274834 = queryNorm
>   0.32587826 = (MATCH) fieldWeight(text:us in 102623), product of:
> 1.0 = tf(termFreq(text:us)=1)
> 1.0428104 = idf(docFreq=174736, numDocs=182380)
> 0.3125 = fieldNorm(field=text, doc=102623)
>   0.75 = coord(3/4)
> 
> 
> -- 
> * Kevin Penny */(e) /** | Application 
> Architect/Team Lead
> 
> **
> 
> *
> 
> *
> 
> *Jobs2Web Inc. * |  10901 Red Circle Drive Suite 200 |  Minnetonka, 
> MN 
> 55343
> * p: * 952-697-2949 | *c:* 952-807-3358 | *f:* 952-400-5676
> 
> */CONFIDENTIAL COMMUNICATION
> /*/This message (which includes any attachments) is intended only for the 
> designated recipient(s).  It may contain confidential or proprietary 
> information.  If you are not a designated recipient, you may not review, use, 
> copy or distribute this message.  If you received this in error, please 
> notify 
> the sender by reply email and delete this message and all attachments, 
> including 
> any copies thereof.  Thank you/

Re: SolrJ commit options

2010-03-05 Thread Otis Gospodnetic

Jerry,

This is why people often do index modifications on one server (master) and 
replicate the read-only index to 1+ different servers (slaves).
If you do that, does the problem go away?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Jerome L Quinn 
> To: solr-user@lucene.apache.org
> Sent: Fri, March 5, 2010 10:13:03 AM
> Subject: Re: SolrJ commit options
> 
> Shalin Shekhar Mangar wrote on 02/25/2010 07:38:39
> AM:
> 
> > On Thu, Feb 25, 2010 at 5:34 PM, gunjan_versata
> wrote:
> >
> > >
> > > We are using SolrJ to handle commits to our solr server.. All runs
> fine..
> > > But whenever the commit happens, the server becomes slow and stops
> > > responding.. therby resulting in TimeOut errors on our production. We
> are
> > > using the default commit with waitFlush = true, waitSearcher = true...
> > >
> > > Can I change there values so that the requests coming to solr dont
> block on
> > > recent commit?? Also, what will be the impact of changing these
> values??
> > >
> >
> > Solr does not block reads during a commit/optimize. Write operations are
> > queued up but they are still accepted. Are you using the same Solr server
> > for reads as well as writes?
> 
> I've seen similar things with Solr 1.3 (not using SolrJ).  If I try to
> optimize the
> index, queries will take much longer - easily a minute or more, resulting
> in timeouts.
> 
> Jerry

Re: facet on null value

2010-03-05 Thread Lance Norskog

(I don't know where filter queries came in.)

If you get a result with
- 
 
- 
- 
 40
 60
 20
 2
 
 
 

and you want to get facets of '000' and Null, this query will include
documents that match those facets:

&q=features:000 OR -features[* TO *]

On Thu, Mar 4, 2010 at 8:16 PM, Andy  wrote:
> My understanding is that 2 means there are 2 documents missing a 
> facet value.
>
> But how does adding  fq=-fieldName:[* TO *] enable users to click on that 
> value to filter? There was no value, only the count (2) was returned.
>
> --- On Thu, 3/4/10, Lance Norskog  wrote:
>
> From: Lance Norskog 
> Subject: Re: facet on null value
> To: solr-user@lucene.apache.org
> Date: Thursday, March 4, 2010, 10:33 PM
>
> I have added facet.limit=5 to the above to make this easier. Here is
> the  part of the response:
>
>
> - 
>   
> - 
> - 
>   0
>   0
>   0
>   0
>   0
>   2
>   
>   
>   
>   
>
> (What is the 2?)
>
> On Thu, Mar 4, 2010 at 7:30 PM, Lance Norskog  wrote:
>> Set up the out-of-the-box example Solr. Index the documents in
>> example/exampledocs.
>>
>> Run this query:
>>
>> http://localhost:8983/solr/select/?q=*:*&fq=-features:[* TO
>> *]&version=2.2&start=0&rows=10&indent=on&facet=true&facet.field=features&facet.missing=on
>>
>> Now, change facet.missing=on to =off. There is no change. You get all
>> of the 0-valued facets anyway.
>>
>> What exactly is facet.missing supposed to do with this query?
>>
>> On Thu, Mar 4, 2010 at 6:39 PM, Andy  wrote:
>>> What would the response look like with this query?
>>>
>>> Can you give an example?
>>>
>>> --- On Thu, 3/4/10, Chris Hostetter  wrote:
>>>
>>> From: Chris Hostetter 
>>> Subject: Re: facet on null value
>>> To: solr-user@lucene.apache.org
>>> Date: Thursday, March 4, 2010, 8:40 PM
>>>
>>>
>>> : > I want to find a way to let users to find those documents. One way is to
>>> : > make Null an option the users can choose, something like:
>>>
>>> : Isn't it facet.missing=on?
>>> : http://wiki.apache.org/solr/SimpleFacetParameters#facet.missing
>>>
>>> that will get you the count, but if you then want to let them click on
>>> that value to filter your query you need:  fq=-fieldName:[* TO *]
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: SolrJ commit options

2010-03-05 Thread Lance Norskog

One technique to control commit times is to do automatic commits: you
can configure a core to commit every N seconds (really milliseconds,
but less than 5 minutes becomes difficult) and/or every N documents.
This promotes a more fixed amount of work per commit.

Also, the maxMergeDocs parameter lets you force a maximum segment size
(in documents). This may cap the longest possible commit times.

http://www.lucidimagination.com/search/document/CDRG_ch08_8.1.2.3?q=maxMergeDocs

On Fri, Mar 5, 2010 at 2:57 PM, Otis Gospodnetic
 wrote:
> Jerry,
>
> This is why people often do index modifications on one server (master) and 
> replicate the read-only index to 1+ different servers (slaves).
> If you do that, does the problem go away?
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> - Original Message 
>> From: Jerome L Quinn 
>> To: solr-user@lucene.apache.org
>> Sent: Fri, March 5, 2010 10:13:03 AM
>> Subject: Re: SolrJ commit options
>>
>> Shalin Shekhar Mangar wrote on 02/25/2010 07:38:39
>> AM:
>>
>> > On Thu, Feb 25, 2010 at 5:34 PM, gunjan_versata
>> wrote:
>> >
>> > >
>> > > We are using SolrJ to handle commits to our solr server.. All runs
>> fine..
>> > > But whenever the commit happens, the server becomes slow and stops
>> > > responding.. therby resulting in TimeOut errors on our production. We
>> are
>> > > using the default commit with waitFlush = true, waitSearcher = true...
>> > >
>> > > Can I change there values so that the requests coming to solr dont
>> block on
>> > > recent commit?? Also, what will be the impact of changing these
>> values??
>> > >
>> >
>> > Solr does not block reads during a commit/optimize. Write operations are
>> > queued up but they are still accepted. Are you using the same Solr server
>> > for reads as well as writes?
>>
>> I've seen similar things with Solr 1.3 (not using SolrJ).  If I try to
>> optimize the
>> index, queries will take much longer - easily a minute or more, resulting
>> in timeouts.
>>
>> Jerry
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: SolrJ commit options

2010-03-05 Thread gunjan_versata


But can anyone explain me the use of these parameters.. I have read upon it..
what i could  not understand was.. if can i set both the params to false,
after how much time will my changes start reflecting?

-- 
View this message in context: 
http://old.nabble.com/SolrJ-commit-options-tp27714405p27802041.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: example solr xml working fine but my own xml files not working

2010-03-05 Thread Suram




venkatesh uruti wrote:
> 
> I am trying to imoport xml file in solr, it is successfully importing, but
> it is not showing any results while sarching in solr 
> 
> in solr home/example docs/ directory all example xmls are working fine but
> when i create a new XML file and trying to upload to solr its not flying
> 
> can any one please post the steps to import xml file in solr 
> 

create new fields and type in schema for what you have in your own xml

-- 
View this message in context: 
http://old.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp27793958p27802042.html
Sent from the Solr - User mailing list archive at Nabble.com.

multiCore

2010-03-05 Thread Suram


Hi,


 how can i send the xml file to solr after created the multicore.i tried it
refuse accept
-- 
View this message in context: 
http://old.nabble.com/multiCore-tp27802043p27802043.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiCore

2010-03-05 Thread Siddhant Goel

Can you provide the error message that you got?

On Sat, Mar 6, 2010 at 11:13 AM, Suram  wrote:

>
> Hi,
>
>
>  how can i send the xml file to solr after created the multicore.i tried it
> refuse accept
> --
> View this message in context:
> http://old.nabble.com/multiCore-tp27802043p27802043.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
- Siddhant

Re: multiCore

2010-03-05 Thread Suram

Siddhant Goel wrote:
> 
> Can you provide the error message that you got?
> 
> On Sat, Mar 6, 2010 at 11:13 AM, Suram  wrote:
> 
>>
>> Hi,
>>
>>
>>  how can i send the xml file to solr after created the multicore.i tried
>> it
>> refuse accept
>> --
>> View this message in context:
>> http://old.nabble.com/multiCore-tp27802043p27802043.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> - Siddhant
> 
> 

i excute command like this :

D:\solr\example\exampledocs>java -Ddata=args -Dcommit=yes -Durl=http://l
ocalhost:8080/solr/core0/update -jar post.jar Example.xml

Mar 6, 2010 12:20:36 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Unexpected character 'E' (code
69) in prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:873)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Unknown Source)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
character 'E' (code 69) in prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]
at
com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
at
com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2047)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:90)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
... 19 more

if i exceute command like this :

java -jar post.jar Example.xml

SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
othe
r encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8080/solr/update..
SimplePostTool: POSTing file Example.xml
SimplePostTool: FATAL: Solr returned an error: Bad Request

-- 
View this message in context: 
http://old.nabble.com/multiCore-tp27802043p27802330.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceted search in 2 indexes

2010-03-05 Thread Kranti™ K K Parisa

Hi,

Even I am looking for a solution for this.

case:
Index1: has the meta data and the contents of the files (basically read only
for the end users)
Index2: will have the tags attached to the search results that user may get
out of index1 (so read/write).

so next time when user searches it again, we have to display the results
along with the tags attached by that user (previously). and also display
some facets for the tags.

please give some ideas/suggestions.

Best Regards,
Kranti K K Parisa



2010/2/23 André Maldonado 

> Hi all.
>
> I have 2 indexes with some similar fields and some distinct fields. I need
> to make a faceted search that returns the union of the same search in these
> 2 indexes.
>
> How can I make it?
>
> Thank's
>
> "Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És
> verdadeiramente o Filho de Deus." (Mateus 14:33)
>

42 matches

Mail list logo