Re: two cores but have single result set in solr

2011-09-24 Thread hadi
I do not know how to search both cores and not define "shard"
parameter,could you show me some solutions for solve my issue?

On 9/24/11, Yury Kats [via Lucene]
 wrote:
>
>
> On 9/23/2011 6:00 PM, hadi wrote:
>> I index my files with solrj and crawl my sites with nutch 1.3 ,as you
>> know, i have to overwrite the nutch schema on solr schema in order to
>> have view the result in solr/browse, in this case i should define two
>> cores,but i want have single result or the user can search into both
>> core indexes at the same time
>
> Can you not use 'shard' parameter and specify both cores there?
>
>
>
> ___
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/two-cores-but-have-single-result-set-in-solr-tp3363043p3363164.html
>
> To unsubscribe from two cores but have single result set in solr, visit
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3363043&code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzMzYzMDQzfC02NDQ5ODMwMjM=


--
View this message in context: 
http://lucene.472066.n3.nabble.com/two-cores-but-have-single-result-set-in-solr-tp3363043p3363901.html
Sent from the Solr - User mailing list archive at Nabble.com.

matching reponse and request

2011-09-24 Thread Roland Tollenaar

Hi,

sorry for this question but I am hoping it has a quick solution.

I am sending multiple get request queries to solr but solr is not 
returning the responses in the sequence I send the requests.


The shortest responses arrive back first

I am wondering whether I can add a tag to the request which will be 
given back to me in the response so that when the response comes I can 
connect it to re original request and handle it in the appropriate manner.


If this is possible, how?

Help appreciated!

Regards,

Roland.


Re: levenshtein ranked results

2011-09-24 Thread Roland Tollenaar

Thanks Otis,

this helps me tremendously.

Kind regards,

Roland

Otis Gospodnetic wrote:

Hi Roland,

I did this:
http://search-lucene.com/?q=sort+by+function&fc_project=Solr&fc_type=wiki


Which took me to this:
http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function


And further on that page you'll find strdist function documented:
http://wiki.apache.org/solr/FunctionQuery#strdist


I hope this helps.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message -

From: Roland Tollenaar 
To: "solr-user@lucene.apache.org" 
Cc: 
Sent: Friday, September 23, 2011 1:50 AM

Subject: levenshtein ranked results

Hi,

I tried an internet search to find out how to query solr to get the results 
ranked (ordered) by levenshtein distance.


This appears to be possible but I could not find a concrete example as to how I 
would have to formulate the query, or if its a schema setting on a particular 
field, how to set up the schema.


I am new to solr, any help appreciated.

tia.

Roland.





Sending pdf files to slor for indexing

2011-09-24 Thread ahmad ajiloo
Hi all
I want to send a pdf file to slor for indexing. there is a command to send
Solr a file via HTTP POST:
http://wiki.apache.org/solr/ExtractingRequestHandler#Getting_Started_with_the_Solr_Example

but "*curl*" is for linux and I want to use Solr in Windows.
thanks a lot.


Re: Sending pdf files to slor for indexing

2011-09-24 Thread ahmad ajiloo
Also when I use that command in Linux, see this error:
---



*Error 400 ERROR:unknown field 'ignored_meta'*

HTTP ERROR 400
Problem accessing /solr/update/extract. Reason:
ERROR:unknown field 'ignored_meta'Powered
by Jetty://






















-
My command is:
 curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true";
-F "myfile=@ebrat.pdf"

On Sat, Sep 24, 2011 at 1:33 PM, ahmad ajiloo wrote:

> Hi all
> I want to send a pdf file to slor for indexing. there is a command to send
> Solr a file via HTTP POST:
>
> http://wiki.apache.org/solr/ExtractingRequestHandler#Getting_Started_with_the_Solr_Example
>
> but "*curl*" is for linux and I want to use Solr in Windows.
> thanks a lot.
>


Re: two cores but have single result set in solr

2011-09-24 Thread Yury Kats
On 9/24/2011 3:09 AM, hadi wrote:
> I do not know how to search both cores and not define "shard"
> parameter,could you show me some solutions for solve my issue?

See this: http://wiki.apache.org/solr/DistributedSearch


indexing a xml file

2011-09-24 Thread ahmad ajiloo
hello
Solr Tutorial page explains about index a xml file. but when I try to index
a xml file with this command:
~/Desktop/apache-solr-3.3.0/example/exampledocs$ java -jar post.jar solr.xml
I get this error:
SimplePostTool: FATAL: Solr returned an error #400 ERROR:unknown field
'name'

can anyone help me?
thanks


Re: indexing a xml file

2011-09-24 Thread GR
i think the xml to be indexed has to follow a certain schema, defined  
in schema.xml under conf directory. maybe, your solr.xml is not doing  
that


Sent from my iPhone

On 24 Sep 2011, at 18:15, ahmad ajiloo  wrote:


hello
Solr Tutorial page explains about index a xml file. but when I try  
to index

a xml file with this command:
~/Desktop/apache-solr-3.3.0/example/exampledocs$ java -jar post.jar  
solr.xml

I get this error:
SimplePostTool: FATAL: Solr returned an error #400 ERROR:unknown field
'name'

can anyone help me?
thanks


Re: two cores but have single result set in solr

2011-09-24 Thread hadi
I read the link but the
'http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr'
have a XML response that is not useful for me, i want to create query
in solr/browse so this is need to change the template engine,do you
know how to change that to search both cores?  thanks

On 9/24/11, Yury Kats [via Lucene]
 wrote:
>
>
> On 9/24/2011 3:09 AM, hadi wrote:
>> I do not know how to search both cores and not define "shard"
>> parameter,could you show me some solutions for solve my issue?
>
> See this: http://wiki.apache.org/solr/DistributedSearch
>
>
> ___
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/two-cores-but-have-single-result-set-in-solr-tp3363043p3364157.html
>
> To unsubscribe from two cores but have single result set in solr, visit
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3363043&code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzMzYzMDQzfC02NDQ5ODMwMjM=


--
View this message in context: 
http://lucene.472066.n3.nabble.com/two-cores-but-have-single-result-set-in-solr-tp3363043p3364459.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sending pdf files to slor for indexing

2011-09-24 Thread pulkitsinghal
You should get cygwin for windows and make sure to select curl as one of the 
many packages that come with cygwin when it's installer runs.

Sent from my iPhone

On Sep 24, 2011, at 5:29 AM, ahmad ajiloo  wrote:

> Also when I use that command in Linux, see this error:
> ---
> 
> 
> 
> *Error 400 ERROR:unknown field 'ignored_meta'*
> 
> HTTP ERROR 400
> Problem accessing /solr/update/extract. Reason:
> ERROR:unknown field 'ignored_meta'Powered
> by Jetty://
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -
> My command is:
> curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true";
> -F "myfile=@ebrat.pdf"
> 
> On Sat, Sep 24, 2011 at 1:33 PM, ahmad ajiloo wrote:
> 
>> Hi all
>> I want to send a pdf file to slor for indexing. there is a command to send
>> Solr a file via HTTP POST:
>> 
>> http://wiki.apache.org/solr/ExtractingRequestHandler#Getting_Started_with_the_Solr_Example
>> 
>> but "*curl*" is for linux and I want to use Solr in Windows.
>> thanks a lot.
>> 


Re: strategy for post-processing answer set

2011-09-24 Thread Fred Zimmerman
ok.  this is a very basic question so please bear with me.

I see where the velocity templates are and I have looked at the
documentation and get the idea of how to write them.

it looks to me as if Solr just brings back the URLs. what I want to do is to
get the actual documents in the answer set, simplify their HTML and remove
all the javascript, ads, etc., and append them into a single document.

Now ... does Nutch already have the documents? can I get them from its db?
or do I have to go get the documents again with something like a wget?

Fred

On Fri, Sep 23, 2011 at 16:02, Erik Hatcher  wrote:

> conf/velocity by default.  See Solr's example configuration.
>
>   Erik
>
> On Sep 23, 2011, at 12:37, Fred Zimmerman  wrote:
>
> > ok, answered my own question, found velocity rw in solrconfig.xml.  next
> > question:
> >
> > where does velocity look for its templates?
> >
> > -
> > Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for
> > monthly updates
> >
> >
> >
> > On Fri, Sep 23, 2011 at 11:57, Fred Zimmerman 
> wrote:
> >
> >> This seems to be out of date. I am running Solr 3.4
> >>
> >> * the file structure of apachehome/contrib is different and I don't see
> >> velocity anywhere underneath
> >> * the page referenced below only talks about Solr 1.4 and 4.0
> >>
> >> ?
> >>
> >> On Thu, Sep 22, 2011 at 19:51, Markus Jelsma <
> markus.jel...@openindex.io>wrote:
> >>
> >>> Hi,
> >>>
> >>> Solr support the Velocity template engine and has veyr good support.
> Ideal
> >>> for
> >>> generating properly formatted output from the search engine. There's a
> >>> clustering example and it's easy to format documents indexed by Nutch.
> >>>
> >>> http://wiki.apache.org/solr/VelocityResponseWriter
> >>>
> >>> Cheers
> >>>
> > Hi,
> 
>  I would like to take the HTML documents that are the result of a Solr
>  search and combine them into a single HTML document that combines the
> >>> body
>  text of each individual document.  What is a good strategy for this? I
> >>> am
>  crawling with Nutch and Carrot2 for clustering.
>  Fred
> >>>
> >>
> >>
>


Re: Is verboten?

2011-09-24 Thread Erick Erickson
Does wrapping your content in CDATAs work?

Best
Erick

On Mon, Sep 19, 2011 at 6:39 PM, chadsteele.com  wrote:
> It seems xml docs that use  fail to be indexed properly and I've
> recently discovered the following fails on my installation.
>
> /solr/update?stream.body=
>
> thoughts?
>
> I need to allow content to have  in the xml.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Is-doc-verboten-tp3350714p3350714.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: shareSchema="true" - location of schema.xml?

2011-09-24 Thread rkuris
I have 300 cores so I feel your pain :-)

What we do is use a relative path for the file.  It works if you use
../../common/schema.xml for each core, then just create a common directory
off your solr home and put your schema file there.

I found this works great with solrconfig.xml and all of it's dependencies as
well.

Another choice is to look at the sharedLib parameter, which adds some
directory to your classpath.  I played with this for a little bit and
couldn't get it working, so I went with the relative path solution.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/shareSchema-true-location-of-schema-xml-tp3297392p3364809.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching reponse and request

2011-09-24 Thread rkuris
I don't think you can do this.

If you are sending multiple GET requests, you are doing it across different
HTTP connections.  The web service has no way of knowing these are related.

One solution would be to pass a spare, unused parameter to your request,
like sequenceId=NNN and get the response to echo that back.  Then at least
you can tell which one is coming back and fix the order up in your program.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-reponse-and-request-tp3363976p3364816.html
Sent from the Solr - User mailing list archive at Nabble.com.


resource to see which versions build from trunk?

2011-09-24 Thread Jason Toy
Hi all, I am testing various versions of solr from trunk, I am finding that
often times the example doesn't build and I can't test out the version.  Is
there a resource that shows which versions build correctly so that we can
test it out?


RE: JdbcDataSource and threads

2011-09-24 Thread rkuris
My guess on this is that you're making a LOT of database requests and have a
million TIME-WAIT connections, and your port range for local ports is
running out.

You should first confirm that's true by running netstat on the machine while
the load is running.  See if it gives a lot of output.

One way to solve this problem is to use a connection pool.  Look at adding a
pooled JNDI connection into your web service and connect with that instead.

The best way is to avoid making the extra connections.  If the data in the
subqueries is really short, look into caching the results using a
CachedSqlEntityProcessor instead.  I wasn't able to use this approach
because I had a lot of data in the inner queries.  What I ended out doing
was writing my own OrderedSqlEntityProcessor which correlates an outer
ordered query with an inner ordered query.  This ran a lot faster and
reduced my load times from 20 hours to 20 minutes.  Let me know if you're
interested in that code.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/JdbcDataSource-and-threads-tp3359874p3364831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: field value getting null with special char

2011-09-24 Thread Erick Erickson
I can't imagine that the ( or ) is a problem. So I think we need to see
how you're using SolrJ. In particular, are you asking for the
field in question to be returned (e.g. SolrQuery.setFields or addField)?

Second question: Are you sure your SolrJ is connecting to the server you
connect to with the browser? You should see activity in the logs for both
cases.

Best
Erick

On Tue, Sep 20, 2011 at 6:23 PM, Ranveer Kumar  wrote:
> Is any help.. I am unable to figure out..
> On 20-Sep-2011 2:22 PM, "Ranveer"  wrote:
>> Hi All,
>>
>> I am facing problem to get value from solr server for a particular field.
>> My environment is :
>> Red hat 5.3
>> Solr 3.3
>> Jdk 1.6.24
>> Tomcat 6.2x
>> Fetching value using SolrJ
>>
>> When using *:* on browser it show but when using solrj all value coming
>> except few fields those have special char.
>>
>> Scoring (TEST)
>> Scoring rate 3/4 (5)
>>
>> above value coming on browser but getting blank in solrj. I also noticed
>> that all field with '(' or ')' have this kind of problem.
>>
>> If this is related to '(' then how to skip special char so can get value
>> in solrj.
>>
>> regards
>> Ranveer
>>
>>
>>
>


Re: q and fq in solr 1.4.1

2011-09-24 Thread Erick Erickson
Why is it important? What are you worried about that this implementation
detail is necessary to know about?

But the short answer is that the fq's are calculated against the whole index
and the results are efficiently cached. That's the only way that the fq can
be re-used against a different search term. The fq clauses are applied before
sorting.

Best
Erick

On Tue, Sep 20, 2011 at 10:55 PM, roz dev  wrote:
> Hi All
>
> I am sure that q vs fq question has been answered several times.
>
> But, I still have a question which I would like to know the answers for:
>
> if we have a solr query like this
>
> q=*&fq=field_1:XYZ&fq=field_2:ABC&sortBy=field_3+asc
>
> How does SolrIndexSearcher fire query in 1.4.1
>
> Will it fire query against whole index first because q=* then filter the
> results against field_1 and field_2 or is it in parallel?
>
> and, if we say that get only 20 rows at a time then will solr do following
> 1) get all the docs (because q is set to *) and sort them by field_3
> 2) then, filter the results by field_1 and field_2
>
> Or, will it apply sorting after doing the filter?
>
> Please let me know how Solr 1.4.1 works.
>
> Thanks
> Saroj
>


Re: JSON response with SolrJ

2011-09-24 Thread Erick Erickson
Hmmm, what advantage does JSON have over the SolrDocument
you get back? Perhaps if you describe that we can offer better
suggestions.

Best
Erick

On Wed, Sep 21, 2011 at 5:01 AM, Kissue Kissue  wrote:
> Hi,
>
> I am using solr 3.3 with SolrJ. Does anybody have any idea how i can
> retrieve JSON response with SolrJ? Is it possible? It seems to be more
> focused on XML and Beans.
>
> Thanks.
>


Re: Selective values for facets

2011-09-24 Thread Erick Erickson
You don't do anything special for facet at index time unless you, say,
wanted to remove some value from the facet field, but then it would
NEVER be available. So if you're saying that at index time you have
certain documents 'New Year's Offers' that ONLY EVER want to
map to NEWA, NEWB, NEWY, you could just take care of that at
index time (don't index those values for that document).

Assuming that it isn't that deterministic, have you looked at facet queries?

Best
Erick

On Wed, Sep 21, 2011 at 7:58 AM, ntsrikanth  wrote:
> Hi,
>
>  The dataset I have got is for special offers.
> We got lot of offer codes. But I need to create few facets for specific
> conditions only.
>
> For example, I got the following codes: ABCD, AGTR, KUYH, NEWY, NEWA, NEWB,
> EAS1, EAS2
>
> And I need to create a facet like
> 'New Year Offers' mapped with NEWA, NEWB, NEWY and
> 'Easter Offers' mapped with EAS1, EAS2
>
> I dont want other codes returned in the facet when I query it. How to
> prevent other values to be ignored while creating the facet during indexing
> time?
>
> Thanks,
> Srikanth NT
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Selective-values-for-facets-tp3355676p3355676.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Production Issue: SolrJ client throwing this error even though field type is not defined in schema

2011-09-24 Thread Erick Erickson
You might want to review:

http://wiki.apache.org/solr/UsingMailingLists

There's really not much to go on here.

Best
Erick

On Wed, Sep 21, 2011 at 12:13 PM, roz dev  wrote:
> Hi All
>
> We are getting this error in our Production Solr Setup.
>
> Message: Element type "t_sort" must be followed by either attribute
> specifications, ">" or "/>".
> Solr version is 1.4.1
>
> Stack trace indicates that solr is returning malformed document.
>
>
> Caused by: org.apache.solr.client.solrj.SolrServerException: Error
> executing query
>        at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
>        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
>        at 
> com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232)
>        ... 15 more
> Caused by: org.apache.solr.common.SolrException: parsing error
>        at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140)
>        at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101)
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>        at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
>        ... 17 more
> Caused by: javax.xml.stream.XMLStreamException: ParseError at
> [row,col]:[3,136974]
> Message: Element type "t_sort" must be followed by either attribute
> specifications, ">" or "/>".
>        at 
> com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594)
>        at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282)
>        at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410)
>        at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360)
>        at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241)
>        at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125)
>        ... 21 more
>


Re: NRT and commit behavior

2011-09-24 Thread Erick Erickson
No . The problem is that "number of documents" isn't a reliable
indicator of resource consumption. Consider the difference between
indexing a twitter message and a book. I can put a LOT more docs
of 140 chars on a single machine of size X than I can books.

Unfortunately, the only way I know of is to test. Use something like
jMeter of SolrMeter to fire enough queries at your machine to
determine when you're over-straining resources and shard at that
point (or get a bigger machine )..

Best
Erick

On Wed, Sep 21, 2011 at 8:24 PM, Tirthankar Chatterjee
 wrote:
> Okay, but is there any number that if we reach on the index size or total 
> docs in the index or the size of physical memory that sharding should be 
> considered.
>
> I am trying to find the winning combination.
> Tirthankar
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, September 16, 2011 7:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: NRT and commit behavior
>
> Uhm, you're putting  a lot of index into not very much memory. I really think 
> you're going to have to shard your index across several machines to get past 
> this problem. Simply increasing the size of your caches is still limited by 
> the physical memory you're working with.
>
> You really have to put a profiler on the system to see what's going on. At 
> that size there are too many things that it *could* be to definitively answer 
> it with e-mails
>
> Best
> Erick
>
> On Wed, Sep 14, 2011 at 7:35 AM, Tirthankar Chatterjee 
>  wrote:
>> Erick,
>> Also, we had  our solrconfig where we have tried increasing the cache 
>> making the below value for autowarm count as 0 helps returning the commit 
>> call within the second, but that will slow us down on searches
>>
>> >      class="solr.FastLRUCache"
>>      size="16384"
>>      initialSize="4096"
>>      autowarmCount="4096"/>
>>
>>    
>>
>>   
>>    >      class="solr.LRUCache"
>>      size="16384"
>>      initialSize="4096"
>>      autowarmCount="4096"/>
>>
>>  
>>    >      class="solr.LRUCache"
>>      size="512"
>>      initialSize="512"
>>      autowarmCount="512"/>
>>
>> -Original Message-
>> From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com]
>> Sent: Wednesday, September 14, 2011 7:31 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: NRT and commit behavior
>>
>> Erick,
>> Here is the answer to your questions:
>> Our index is 267 GB
>> We are not optimizing...
>> No we have not profiled yet to check the bottleneck, but logs indicate 
>> opening the searchers is taking time...
>> Nothing except SOLR
>> Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and
>> JVM and Tomcat
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Sunday, September 11, 2011 11:37 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: NRT and commit behavior
>>
>> Hmm, OK. You might want to look at the non-cached filter query stuff, it's 
>> quite recent.
>> The point here is that it is a filter that is applied only after all of the 
>> less expensive filter queries are run, One of its uses is exactly ACL 
>> calculations. Rather than calculate the ACL for the entire doc set, it only 
>> calculates access for docs that have made it past all the other elements of 
>> the query See SOLR-2429 and note that it is a 3.4 (currently being 
>> released) only.
>>
>> As to why your commits are taking so long, I have no idea given that you 
>> really haven't given us much to work with.
>>
>> How big is your index? Are you optimizing? Have you profiled the application 
>> to see what the bottleneck is (I/O, CPU, etc?). What else is running on your 
>> machine? It's quite surprising that it takes that long. How much memory are 
>> you giving the JVM? etc...
>>
>> You might want to review:
>> http://wiki.apache.org/solr/UsingMailingLists
>>
>> Best
>> Erick
>>
>>
>> On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee 
>>  wrote:
>>> Erick,
>>> What you said is correct for us the searches are based on some Active 
>>> Directory permissions which are populated in Filter query parameter. So we 
>>> don't have any warming query concept as we cannot fire for every user ahead 
>>> of time.
>>>
>>> What we do here is that when user logs in we do an invalid query(which 
>>> return no results instead of '*') with the correct filter query (which is 
>>> his permissions based on the login). This way the cache gets warmed up with 
>>> valid docs.
>>>
>>> It works then.
>>>
>>>
>>> Also, can you please let me know why commit is taking 45 mins to 1 hours on 
>>> a good resourced hardware with multiple processors and 16gb RAM 64 bit VM, 
>>> etc. We tried passing waitSearcher as false and found that inside the code 
>>> it hard coded to be true. Is there any specific reason. Can we change that 
>>> value to honor what is being passed.
>>>
>>> Thanks,
>>> Tirthankar
>>>
>>> -Original Message-
>>> From: Erick 

Re: Solr Indexing - Null Values in date field

2011-09-24 Thread Erick Erickson
Solr dates are very specific, and your parsing exception is expected. See:
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

Best
Erick

On Thu, Sep 22, 2011 at 6:28 AM, mechravi25  wrote:
> Hi,
>
> Thanks for the suggestions. This is the option I tried.
>
> I changed the data type in my source to date and then indexed the field once
> again.
>
> for the particular field , in my query in dataimport file, I gave the
> following condition IFNULL(startdate,NULL).
>
> The document was indexed sucessfully. But the field startdate was not
> present in the document.
>
> I have few other records in my source where in there is a value present in
> the startdate but when I index that I am getting this exception
>
> org.apache.solr.common.SolrException: Invalid Date String:'2011-09-21
> 18:28:32.733'
>        at org.apache.solr.schema.DateField.parseMath(DateField.java:163)
>        at 
> org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:171)
>        at org.apache.solr.schema.SchemaField.createField(SchemaField.java:95)
>        at
> org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204)
>        at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
>        at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
>        at 
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
>        at
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:618)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:261)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
>        at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
>        at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
>
>
> Please help.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Indexing-Null-Values-in-date-field-tp3355068p3358752.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: mlt content stream help

2011-09-24 Thread Erick Erickson
What version of Solr? When you copied the default, did you set up
default values for MLT?

Showing us the request you used and the relevant portions of your
solrconifg file would help a lot, you might want to review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Thu, Sep 22, 2011 at 9:08 AM, dan whelan  wrote:
> I would like to use MLT and the content stream feature in solr like on this
> page:
>
> http://wiki.apache.org/solr/MoreLikeThisHandler
>
> How should the request handler / solrconfig be setup?
>
> I enabled streaming and I set a requestHandler up by copying the default
> request handler and I changed the name to:
>
> name="/mlt"
>
> but when accessing the url like the example on the wiki I get a NPE because
> q is not supplied
>
> I'm sure I am just doing it wrong just not sure what.
>
> Thanks,
>
> dan
>


Update ingest rate drops suddenly

2011-09-24 Thread eks dev
just looking for hints where to look for...

We were testing single threaded ingest rate on solr, trunk version on
atypical collection (a lot of small documents), and we noticed
something we are not able to explain.

Setup:
We use defaults for index settings, windows 64 bit, jdk 7 U2. on SSD,
machine with enough memory and 8 cores.   Schema has 5 stored fields,
4 of them indexed no positions no norms.
Average net document size (optimized index size / number of documents)
is around 100 bytes.

On a test with 40 Mio document:
- we had update ingest rate  on first 4,4Mio documents @  incredible
34k records / second...
- then it dropped, suddenly to 20k records per second and this rate
remained stable (variance 1k) until...
- we hit 13Mio, where ingest rate dropped again really hard, from one
instant in time to another to 10k records per second.

it stayed there until we reached the end @40Mio (slightly reducing, to
ca 9k, but this is not long enough to see trend).

Nothing unusual happening with jvm memory ( tooth-saw  200- 450M fully
regular). CPU in turn was  following the ingest rate trend, inicating
that we were waiting on something. No searches , no commits, nothing.

autoCommit was turned off. Updates were streaming directly from the database.

-
I did not expect something like this, knowing lucene merges in
background. Also, having such sudden drops in ingest rate is
indicative that we are not leaking something. (drop would have been
much more gradual). It is some caches, but why two really significant
drops? 33k/sec to 20k and than to 10k... We would love to keep it  @34
k/second :)

I am not really acquainted with the new MergePolicy and flushing
settings, but I suspect this is something there we could tweak.

Could it be windows is somehow, hmm, quirky with solr default
directory on win64/jvm (I think it is MMAP by default)... We did not
saturate IO with such a small documents I guess, It is a just couple
of Gig over 1-2 hours.

All in all, it works good, but is having such hard update ingest rate
drops normal?

Thanks,
eks.


Re: Production Issue: SolrJ client throwing - Element type must be followed by either attribute specifications, ">" or "/>".

2011-09-24 Thread Erick Erickson
I suspect this is an issue with, say, your servelet container truncating
the response or some such, but that's a guess...

Best
Erick

On Thu, Sep 22, 2011 at 9:09 PM, roz dev  wrote:
> Wanted to update the list with our finding.
>
> We reduced the number of documents which are being retrieved from Solr and
> this error did not appear again.
> Might be the case that due to high number of documents, solr is returning
> incomplete documents.
>
> -Saroj
>
>
> On Wed, Sep 21, 2011 at 12:13 PM, roz dev  wrote:
>
>> Hi All
>>
>> We are getting this error in our Production Solr Setup.
>>
>> Message: Element type "t_sort" must be followed by either attribute 
>> specifications, ">" or "/>".
>> Solr version is 1.4.1
>>
>> Stack trace indicates that solr is returning malformed document.
>>
>>
>> Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing 
>> query
>>       at 
>> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
>>       at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
>>       at 
>> com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232)
>>       ... 15 more
>> Caused by: org.apache.solr.common.SolrException: parsing error
>>       at 
>> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140)
>>       at 
>> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101)
>>       at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
>>       at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>>       at 
>> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
>>       ... 17 more
>> Caused by: javax.xml.stream.XMLStreamException: ParseError at 
>> [row,col]:[3,136974]
>> Message: Element type "t_sort" must be followed by either attribute 
>> specifications, ">" or "/>".
>>       at 
>> com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594)
>>       at 
>> org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282)
>>       at 
>> org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410)
>>       at 
>> org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360)
>>       at 
>> org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241)
>>       at 
>> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125)
>>       ... 21 more
>>
>>
>


Re: How to map database table for facted search?

2011-09-24 Thread Erick Erickson
In general, you flatten the data when you put things into Solr. I know
that's anathema
to DB training, but this is searching ...

If you have a reasonable number of distinct column names, you could
just define your
schema to have an entry for each and index the associated values that way. Then
your facets become easy, you're just faceting on the "facet_hobby"
field in your example.

If that's impractical (say you can add arbitrary columns), you can do
something very similar
with dynamic fields.

You could also create a field with the column/name pairs (watch your
tokenizer!) in a single
field and facet by prefix, where the prefix was the column name (e.g.
index tokens like
hobby_sailing hobby_camping interest_reading then facet with
facet.prefix:hobby_).

There are tradeoffs for each that you'll have to experiment with.

Note that there is no penalty in Solr for defining fields in your
schema but not using
them.

Best
Erick

On Fri, Sep 23, 2011 at 12:06 AM, Chorherr Nikolaus
 wrote:
> Hi All!
>
> We are working first time with solr and have a simple data model
>
> Entity Person(column surname) has 1:n Attribute(column name) has 1:n 
> Value(column text)
>
> We need faceted search on the content of Attribute:name not on Attribute:name 
> itself, e.g if an Attribute of person has name=hobby, we would like to have 
> something like ... "facet=true&facet.name=hobby" and get back
> all related Value with count.(We do not need a "facet.name=name" and get back 
> all distinct values of the name column of Attribute)
>
> How do we have to map our database, define or document and/or define our 
> schema?
>
> Any help is highly appreciated - Thx in advance
>
> Niki
>


Re: Solr wildcard searching

2011-09-24 Thread Erick Erickson
Really, really, get in the habit of looking at your query with
&debugQuery=on appended, it'll save you a world of pain ..

customer_name:John Do*
doesn't do what you think. It parses into
customer_name:John OR default_search_field:Do*

you want something like customer_name:(+John +Do*) or
+customer_name:John +customer_name:Do*

You want to look particularly at the parsedquery part of the return, the
scoring stuff is useful for understanding scoring...

And, watch out for your default operator. In the absence of you setting it
in your schema.xml file, it is OR, and a result (with &debugQuery=on) of
customer_name:John customer_name:Do* translates as though it is a
SHOULD clause...

Best
Erick

On Thu, Sep 22, 2011 at 7:08 PM, jaystang  wrote:
> Hey guys,
> Very new to solr.  I'm using the data import handler to pull customer data
> out of my database and index it.  All works great so far.  Now I'm trying to
> query against a specific field and I seem to be struggling with doing a
> wildcard search. See below.
>
> I have several indexed documents with a "customer_name" field containing
> "John Doe".  I have a UI that contains a listing of this indexed data as
> well has a keyword filter field (filter as you type).  So I would like when
> the user starts typing "J", "John Doe will return, and "Jo", "John Doe" will
> return, "Joh"... etc, etc...
>
> I've tried the following:
>
> Search: customer_name:Joh*
> Returns: The correct "John Doe" Record"
>
> Search: customer_name:John Do*
> Returns: No results (nothing returns w/ 2 works since I don't have the
> string in quotes.)
>
> Search: customer_name:"Joh*"
> Returns: No results
>
> Search: customer_name:"John Do*"
> Returns: No results
>
> Search: customer_NAME:"John Doe*"
> Returns: The correct "John Doe" Record"
>
> I feel like I'm close, only issue is when there are multiple words.
>
> Any advice would be appreciated.
>
> Thanks!
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-wildcard-searching-tp3360681p3360681.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solrj - when a request fails

2011-09-24 Thread Erick Erickson
Hmmm. I'm a little confused. Are you sure your log is going
somewhere and that you are NOT seeing any stack traces?
Because it looks like you *are* seeing them. In which case
re-throwing an error breaks your file fetch loop and stops
your processing.

I'd actually expect that you're losing some files before as well, since
even if you do trap those errors you're not doing a commit operation,
unless perhaps autocommit is saving you.

BTW, committing after every document is probably a bad idea, it'll create
lots and lots of segments unnecessarily. I'd rely on the autocommit
features and optionally commit after the run is completed.

Best
Erick

On Fri, Sep 23, 2011 at 5:55 AM, Walter Closenfleight
 wrote:
> *
> I have a java program which sends thousands of Solr XML files up to Solr
> using the following code. It works fine until there is a problem with one of
> the Solr XML files. The code fails on the solrServer.request(up) line, but
> it does not throw an exception, my application therefore cannot catch it and
> recover, and my whole application dies.
>
> I've fixed this individual file that made it fail, but want to better trap
> these so my application does not die.
>
> Thanks for any insight you can provide. Java code and log below-
>
>
> // ... start of a loop to process each file removed ...
>
> try {
>
>   String xml = read(filename);
>   DirectXmlRequest up = new DirectXmlRequest( "/update", xml );
>
>   solrServer.request( up );
>   solrServer.commit();
>
> } catch (SolrServerException e) {
>   log.warn("Exception: "+ e.toString());
>   throw new MyException(e);
> } catch (IOException e) {
>   log.warn("Exception: "+ e.toString());
>   throw new MyException(e);
> }
> DEBUG >> "[\n]" - (Wire.java:70)
> DEBUG Request body sent - (EntityEnclosingMethod.java:508)
> DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
> DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70)
> DEBUG << "Server: Apache-Coyote/1.1[\r][\n]" - (Wire.java:70)
> DEBUG << "Content-Type: text/html;charset=utf-8[\r][\n]" - (Wire.java:70)
> DEBUG << "Content-Length: 1271[\r][\n]" - (Wire.java:70)
> DEBUG << "Date: Fri, 23 Sep 2011 12:08:05 GMT[\r][\n]" - (Wire.java:70)
> DEBUG << "Connection: close[\r][\n]" - (Wire.java:70)
> DEBUG << "[\r][\n]" - (Wire.java:70)
> DEBUG << "Apache Tomcat/6.0.29 - Error
> report
> HTTP Status 400 - Unexpected character 'x' (code 120) in
> prolog; expected '<'[\n]" - (Wire.java:70)
> DEBUG << " at [row,col {unknown-source}]: [3,1] noshade="noshade">type Status reportmessage
> Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
> (Wire.java:70)
> DEBUG << " at [row,col {unknown-source}]: [3,1]description
> " - (Wire.java:84)
> DEBUG << "The request sent by the client was syntactically incorrect
> (Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" -
> (Wire.java:70)
> DEBUG << " at [row,col {unknown-source}]: [3,1]). noshade="noshade">Apache Tomcat/6.0.29" -
> (Wire.java:84)
> DEBUG Should close connection in response to directive: close -
> (HttpMethodBase.java:1008)
> *
>


Re: Solr 3.4 Problem with integrating Query Parser Plug In

2011-09-24 Thread Erick Erickson
Could you please add some details here? It's really hard to figure
out what the problem is. Perhaps you could review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Fri, Sep 23, 2011 at 9:28 AM, Ahson Iqbal  wrote:
> Hi
>
> I have indexed some 1M documents, just for performance testing. I have 
> written a query parser plug, when i add it in solr lib folder under tomcat 
> wepapps folder. and try to load solr admin page it keeps on loading and when 
> I delete jar file of query parser plugin from lib it works fine. but jar file 
> works good with solr 3.3 and also with solr 1.4.
>
> please help.
>
> Regards
> Ahsan
>


Re: what are the disdvantages of using dynamic fields?

2011-09-24 Thread Erick Erickson
There are really no differences between dynamic and static
fields performance-wise that I know of.

Personally, though, I tend to prefer static over dynamic from
a maintenance/debugging perspective. At issue is tracking
down why results weren't as expected, then spending several
days discovering that I managed to mis-spell some field
in my docs, in my SolrJ program or in my queries. A variant
of the "fail early" notion.

Dynamic fields have great uses, but I think you're better off
using static when possible.

Best
Erick

On Fri, Sep 23, 2011 at 3:14 PM, Jason Toy  wrote:
> Hi all,
>
>  I'd like to know what the specific disadvantages are for using dynamic
> fields in my schema are? About half of my fields are dynamic, but I could
> move all of them to be static fields. WIll my searches run faster? If there
> are no disadvantages, can I just set all my fields to be dynamic?
>
> Jason
>


Re: two cores but have single result set in solr

2011-09-24 Thread Erick Erickson
I think you should step back and consider what you're asking
for as Ken pointed out. You have different schemas. And
presumably different documents in each schema. The scores
from the different cores are NOT comparable. So how could
you "combine" the meaningfully? Further, assuming that the
documents have different characteristics, the term frequencies
and document frequencies will be different.

Solr really only supports this notion if your schemas are identical
and you're indexing similar documents to shards, using shards
with this intent probably won't do what you expect.

But why not just index your SolrJ documents directly into the
same core that you use for Nutch and just search the one index?
You don't have to provide values for any fields that don't have
'required="true" ' set. And if you do this, I suspect you'll have trouble
with relevance, but at least you'll get started.

Best
Erick

On Sat, Sep 24, 2011 at 8:02 AM, hadi  wrote:
> I read the link but the
> 'http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr'
> have a XML response that is not useful for me, i want to create query
> in solr/browse so this is need to change the template engine,do you
> know how to change that to search both cores?  thanks
>
> On 9/24/11, Yury Kats [via Lucene]
>  wrote:
>>
>>
>> On 9/24/2011 3:09 AM, hadi wrote:
>>> I do not know how to search both cores and not define "shard"
>>> parameter,could you show me some solutions for solve my issue?
>>
>> See this: http://wiki.apache.org/solr/DistributedSearch
>>
>>
>> ___
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://lucene.472066.n3.nabble.com/two-cores-but-have-single-result-set-in-solr-tp3363043p3364157.html
>>
>> To unsubscribe from two cores but have single result set in solr, visit
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3363043&code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzMzYzMDQzfC02NDQ5ODMwMjM=
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/two-cores-but-have-single-result-set-in-solr-tp3363043p3364459.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: resource to see which versions build from trunk?

2011-09-24 Thread Erick Erickson
Hmmm, why are you doing this? Why not use the latest
successful trunk build?

You can get a series of built artifacts at:
https://builds.apache.org//view/S-Z/view/Solr/job/Solr-trunk/
but I'm not sure how far back they go. How are you getting
the trunk source code? And *how* don't they build?

But I really question how useful this is unless it's curiosity,
since none of the trunk builds will be officially supported
until the release...

Best
Erick

On Sat, Sep 24, 2011 at 11:08 AM, Jason Toy  wrote:
> Hi all, I am testing various versions of solr from trunk, I am finding that
> often times the example doesn't build and I can't test out the version.  Is
> there a resource that shows which versions build correctly so that we can
> test it out?
>


Re: Solr wildcard searching

2011-09-24 Thread lboutros
And to complete the answer of Erick,

in this search,

customer_name:"Joh*" 

* is not considered as a wildcard, it is an exact search.

another thing, (it is not your problem...),

Words with wildcards are not analyzed, 
so, if your analyzer contains a lower case filter,
in the index, these words are stored in lower case: 

John -> john
Do -> do

so,

customer_name:Do* will not find anything.

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-wildcard-searching-tp3360681p3365086.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr wildcard searching

2011-09-24 Thread Erick Erickson
Thanks Ludovic, you're absolutely right, I should have added that.

BTW, there are patches that haven't been committed, see:
https://issues.apache.org/jira/browse/SOLR-1604
and similar.

Best
Erick

On Sat, Sep 24, 2011 at 1:32 PM, lboutros  wrote:
> And to complete the answer of Erick,
>
> in this search,
>
> customer_name:"Joh*"
>
> * is not considered as a wildcard, it is an exact search.
>
> another thing, (it is not your problem...),
>
> Words with wildcards are not analyzed,
> so, if your analyzer contains a lower case filter,
> in the index, these words are stored in lower case:
>
> John -> john
> Do -> do
>
> so,
>
> customer_name:Do* will not find anything.
>
> Ludovic.
>
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-wildcard-searching-tp3360681p3365086.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SOLR error with custom FacetComponent

2011-09-24 Thread Ravi Bulusu
Erik,

Unfortunately the facet fields are not static. The field are dynamic SOLR
fields and are generated by different applications.
The field names will be populated into a data store (like memcache) and
facets have to be driven from that data store.

I need to write a Custom FacetComponent which picks up the facet fields from
the data store.
Thanks for your response.

-Ravi Bulusu

Subject:
Re: SOLR error with custom FacetComponent
From:
Erik Hatcher 
Date:
2011-09-21 18:18
Why create a custom facet component for this?

Simply add lines like this to your request handler(s):

manu_exact

either in defaults or appends sections.

Erik

On Wed, Sep 21, 2011 at 2:00 PM, Ravi Bulusu  wrote:

> Hi All,
>
>
> I'm trying to write a custom SOLR facet component and I'm getting some
> errors when I deploy my code into the SOLR server.
>
> Can you please let me know what Im doing wrong? I appreciate your help on
> this issue. Thanks.
>
> *Issue*
>
> I'm getting an error saying "Error instantiating SearchComponent  Class> is not a org.apache.solr.handler.component.SearchComponent".
>
> My custom class inherits from *FacetComponent* which extends from *
> SearchComponent*.
>
> My custom class is defined as follows…
>
> I implemented the process method to meet our functionality.
>
> We have some default facets that have to be sent every time, irrespective
> of the Query request.
>
>
> /**
>
>  *
>
>  * @author ravibulusu
>
>  */
>
> public class MyFacetComponent extends FacetComponent {
>
> ….
>
> }
>


Re: resource to see which versions build from trunk?

2011-09-24 Thread Erik Hatcher
Hey, the more hammering on trunk the better!


On Sep 24, 2011, at 13:31 , Erick Erickson wrote:

> Hmmm, why are you doing this? Why not use the latest
> successful trunk build?
> 
> You can get a series of built artifacts at:
> https://builds.apache.org//view/S-Z/view/Solr/job/Solr-trunk/
> but I'm not sure how far back they go. How are you getting
> the trunk source code? And *how* don't they build?
> 
> But I really question how useful this is unless it's curiosity,
> since none of the trunk builds will be officially supported
> until the release...
> 
> Best
> Erick
> 
> On Sat, Sep 24, 2011 at 11:08 AM, Jason Toy  wrote:
>> Hi all, I am testing various versions of solr from trunk, I am finding that
>> often times the example doesn't build and I can't test out the version.  Is
>> there a resource that shows which versions build correctly so that we can
>> test it out?
>> 



Re: resource to see which versions build from trunk?

2011-09-24 Thread Erick Erickson
Agreed, but I'd rather see hammering on latest code 

On Sat, Sep 24, 2011 at 1:53 PM, Erik Hatcher  wrote:
> Hey, the more hammering on trunk the better!
>
>
> On Sep 24, 2011, at 13:31 , Erick Erickson wrote:
>
>> Hmmm, why are you doing this? Why not use the latest
>> successful trunk build?
>>
>> You can get a series of built artifacts at:
>> https://builds.apache.org//view/S-Z/view/Solr/job/Solr-trunk/
>> but I'm not sure how far back they go. How are you getting
>> the trunk source code? And *how* don't they build?
>>
>> But I really question how useful this is unless it's curiosity,
>> since none of the trunk builds will be officially supported
>> until the release...
>>
>> Best
>> Erick
>>
>> On Sat, Sep 24, 2011 at 11:08 AM, Jason Toy  wrote:
>>> Hi all, I am testing various versions of solr from trunk, I am finding that
>>> often times the example doesn't build and I can't test out the version.  Is
>>> there a resource that shows which versions build correctly so that we can
>>> test it out?
>>>
>
>


Re: indexing a xml file

2011-09-24 Thread Bill Bell
Send us the example "solr.xml" and "schema.xml'". You are missing fields
in the schema.xml that you are referencing.

On 9/24/11 8:15 AM, "ahmad ajiloo"  wrote:

>hello
>Solr Tutorial page explains about index a xml file. but when I try to
>index
>a xml file with this command:
>~/Desktop/apache-solr-3.3.0/example/exampledocs$ java -jar post.jar
>solr.xml
>I get this error:
>SimplePostTool: FATAL: Solr returned an error #400 ERROR:unknown field
>'name'
>
>can anyone help me?
>thanks




Re: Update ingest rate drops suddenly

2011-09-24 Thread Otis Gospodnetic
eks,

This is clear as day - you're using Winblows!  Kidding.

I'd:
* watch IO with something like vmstat 2 and see if the rate drops correlate to 
increased disk IO or IO wait time
* monitor the DB from which you were pulling the data - maybe the DB or the 
server that runs it had issues
* monitor the network over which you pull data from DB

If none of the above reveals the problem I'd still:
* grab all data you need to index and copy it locally
* index everything locally

Out of curiosity, how big is your ramBufferSizeMB and your -Xmx?
And on that 8-core box you have ~8 indexing threads going?

Otis

Sematext is Hiring -- http://sematext.com/about/jobs.html




>
>From: eks dev 
>To: solr-user 
>Sent: Saturday, September 24, 2011 3:18 PM
>Subject: Update ingest rate drops suddenly
>
>just looking for hints where to look for...
>
>We were testing single threaded ingest rate on solr, trunk version on
>atypical collection (a lot of small documents), and we noticed
>something we are not able to explain.
>
>Setup:
>We use defaults for index settings, windows 64 bit, jdk 7 U2. on SSD,
>machine with enough memory and 8 cores.   Schema has 5 stored fields,
>4 of them indexed no positions no norms.
>Average net document size (optimized index size / number of documents)
>is around 100 bytes.
>
>On a test with 40 Mio document:
>- we had update ingest rate  on first 4,4Mio documents @  incredible
>34k records / second...
>- then it dropped, suddenly to 20k records per second and this rate
>remained stable (variance 1k) until...
>- we hit 13Mio, where ingest rate dropped again really hard, from one
>instant in time to another to 10k records per second.
>
>it stayed there until we reached the end @40Mio (slightly reducing, to
>ca 9k, but this is not long enough to see trend).
>
>Nothing unusual happening with jvm memory ( tooth-saw  200- 450M fully
>regular). CPU in turn was  following the ingest rate trend, inicating
>that we were waiting on something. No searches , no commits, nothing.
>
>autoCommit was turned off. Updates were streaming directly from the database.
>
>-
>I did not expect something like this, knowing lucene merges in
>background. Also, having such sudden drops in ingest rate is
>indicative that we are not leaking something. (drop would have been
>much more gradual). It is some caches, but why two really significant
>drops? 33k/sec to 20k and than to 10k... We would love to keep it  @34
>k/second :)
>
>I am not really acquainted with the new MergePolicy and flushing
>settings, but I suspect this is something there we could tweak.
>
>Could it be windows is somehow, hmm, quirky with solr default
>directory on win64/jvm (I think it is MMAP by default)... We did not
>saturate IO with such a small documents I guess, It is a just couple
>of Gig over 1-2 hours.
>
>All in all, it works good, but is having such hard update ingest rate
>drops normal?
>
>Thanks,
>eks.
>
>
>

Re: matching reponse and request

2011-09-24 Thread Otis Gospodnetic
Hi Roland,

Check this:



0
0

on
0
solr
1            <=== from &foo=1
2.2
10

 
I added &foo=1 to the request to Solr and got the above back.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/




>
>From: Roland Tollenaar 
>To: solr-user@lucene.apache.org
>Sent: Saturday, September 24, 2011 4:07 AM
>Subject: matching reponse and request
>
>Hi,
>
>sorry for this question but I am hoping it has a quick solution.
>
>I am sending multiple get request queries to solr but solr is not returning 
>the responses in the sequence I send the requests.
>
>The shortest responses arrive back first
>
>I am wondering whether I can add a tag to the request which will be given back 
>to me in the response so that when the response comes I can connect it to re 
>original request and handle it in the appropriate manner.
>
>If this is possible, how?
>
>Help appreciated!
>
>Regards,
>
>Roland.
>
>
>

Best Solr escaping?

2011-09-24 Thread Bill Bell
What is the best algorithm for escaping strings before sending to Solr? Does
someone have some code?

A few things I have witnessed in "q" using DIH handler
* Double quotes - " that are not balanced can cause several issues from an
error (strip the double quote?), to no results.
* Should we use + or %20 ­ and what cases make sense:
> * "Dr. Phil Smith" or "Dr.+Phil+Smith" or "Dr.%20Phil%20Smith" - also what is
> the impact of double quotes?
* Unmatched parenthesis I.e. Opening ( and not closing.
> * (Dr. Holstein
> * Cardiologist+(Dr. Holstein
Regular encoding of strings does not always work for the whole string due to
several issues like white space:
* White space works better when we use back quote "Bill\ Bell" especially
when using facets.

Thoughts? Code? Ideas? Better Wikis?





Re: Search query doesn't work in solr/browse pnnel

2011-09-24 Thread Bill Bell
Yes. It appears that "&" cannot be encoded in the URL or there is really
bad results.
For example we get an error on first request, but if we refresh it goes
away.



On 9/23/11 2:57 PM, "hadi"  wrote:

>When I create a query like "something&fl=content" in solr/browse the "&"
>and
>"=" in URL converted to %26 and %3D and no result occurs. but it works in
>solr/admin advanced search and also in URL bar directly, How can I solve
>this problem?  Thanks
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Search-query-doesn-t-work-in-solr-brows
>e-pnnel-tp3363032p3363032.html
>Sent from the Solr - User mailing list archive at Nabble.com.




RE: JdbcDataSource and threads

2011-09-24 Thread Vazquez, Maria (STM)
Thanks a lot for your response!
I think that is exactly what's happening. It runs ok for a short time and 
starts throwing that error while some of the ueriea run successfully.
I had it setup with 10 threads, maybe that was too much.
I'd be very interested in that code if you don't mind sharing.
I'm migrating code from pure Lucene to Solr and indexing time went from less 
that one hour to more than 4 because it's using only one thread.
Thanks a lot again, very helpful.
Maria


Sent from my Motorola ATRIX™ 4G on AT&T

-Original message-
From: rkuris 
To: solr-user@lucene.apache.org
Sent: Sat, Sep 24, 2011 18:11:20 GMT+00:00
Subject: RE: JdbcDataSource and threads

My guess on this is that you're making a LOT of database requests and have a
million TIME-WAIT connections, and your port range for local ports is
running out.

You should first confirm that's true by running netstat on the machine while
the load is running.  See if it gives a lot of output.

One way to solve this problem is to use a connection pool.  Look at adding a
pooled JNDI connection into your web service and connect with that instead.

The best way is to avoid making the extra connections.  If the data in the
subqueries is really short, look into caching the results using a
CachedSqlEntityProcessor instead.  I wasn't able to use this approach
because I had a lot of data in the inner queries.  What I ended out doing
was writing my own OrderedSqlEntityProcessor which correlates an outer
ordered query with an inner ordered query.  This ran a lot faster and
reduced my load times from 20 hours to 20 minutes.  Let me know if you're
interested in that code.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/JdbcDataSource-and-threads-tp3359874p3364831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr UpdateJSON - extra fields

2011-09-24 Thread msingla
If JSON being posted to ''http://localhost:8983/solr/update/json' URL has
extra fields that are not defined in the index schema definition, will those
be silently ignored or an error thrown.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-UpdateJSON-extra-fields-tp3366066p3366066.html
Sent from the Solr - User mailing list archive at Nabble.com.