Re: Solr WARN Log

2014-09-13 Thread Joseph V J
Thank you for the info Chris.

~Regards
Joe

On Thu, Sep 11, 2014 at 10:36 PM, Chris Hostetter 
wrote:

>
> : But it seems this has not fixed in 4.10, as the issue SOLR-6179
> :  is not in changes list
> : http://lucene.apache.org/solr/4_10_0/changes/Changes.html
>
> there was a jira glitch recording the commits, but tim added a comment
> with the details.
>
> it is in Changes.html, in the "Other Changes" section...
>
> https://lucene.apache.org/solr/4_10_0/changes/Changes.html#v4.10.0.other_changes
>
>
> -Hoss
> http://www.lucidworks.com/
>


Solr: How to delete a document

2014-09-13 Thread FiMka
Hi guys, could you say how to delete a document in Solr? After I delete a
document it still persists in the search results. For example there is the
following document saved in Solr:
After I POST the following data to localhost:8983/solr/update/?commit=true:
Solr each time says 200 OK and responds the following:
If I try to search
localhost:8983/solr/lexikos/select?q=phrase%3A+%22qwerty%22&wt=json&indent=true
for the document once again, it still shown in the results. So how to remove
the document from Solr index as well or what else to do? Thanks in advance
for any assistance!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-delete-a-document-tp4158649.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr: How to delete a document

2014-09-13 Thread François Schiettecatte
How about adding 'expungeDeletes=true' as well as 'commit=true'?

François

On Sep 13, 2014, at 4:09 PM, FiMka  wrote:

> Hi guys, could you say how to delete a document in Solr? After I delete a
> document it still persists in the search results. For example there is the
> following document saved in Solr:
> After I POST the following data to localhost:8983/solr/update/?commit=true:
> Solr each time says 200 OK and responds the following:
> If I try to search
> localhost:8983/solr/lexikos/select?q=phrase%3A+%22qwerty%22&wt=json&indent=true
> for the document once again, it still shown in the results. So how to remove
> the document from Solr index as well or what else to do? Thanks in advance
> for any assistance!
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-How-to-delete-a-document-tp4158649.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Solr: Tricky exact match, unwanted search results

2014-09-13 Thread FiMka
Hi guys, could you help me with implementing exact match search in Solr.
Say I have the following Solr documents: And my search query is: 
By default Solr for the given documents and the search query "cat" will give
all the partially matched documents ("cat", "pussy cat" and "cats"), but
only the "cat" and "cats" results are wanted by me.
To enable the desired Sorl search behavior, as it was suggested  here

 
, I add prefixes and suffixes to phrases each time when adding new documents
(much more details are given  here

 
).
I can say in most of cases (like for the above documents) this solution is
working (we have now only "cat" and "cats" as the results for the query
"http://localhost:8983/solr/select?q=phraseExact%3A+%22_prefix_+cat+_suffix_%22";).
But if we have the documents like:
If submit the search query
"http://localhost:8983/solr/select?q=phraseExact%3A+%22_prefix_+in+case+if++_suffix_%22";,
then all three documents are returned ("in case if", "worst-case condition",
"optimistic-case module").
I can suppose that such behavior can be because of preposition and
conjunction in the searched phrase "*in* case *if*". These words possibly
are not treated by Solr? Also it is interesting that the number of words
(three words) in the original phrase ("in case if") matches the resulting
phrases ("worst-case condition" and "optimistic-case module").
Do you have some ideas why I have such results and what can be the reasons?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Tricky-exact-match-unwanted-search-results-tp4158652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr: How to delete a document

2014-09-13 Thread FiMka
*François*, thank you for help, it is really working now!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-delete-a-document-tp4158649p4158654.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ : fieldcontent from (multiple) file(s)

2014-09-13 Thread Erick Erickson
bq: I'd ideally like to put the burden of tika-extraction into the Solr-process.

Why? That puts the entire parsing burden on the Solr
machine. Under any significant indexing load, parsing the doc
may become a bottleneck. If you do the Tika extraction on the client,
you can spread that (sometimes quite heavy) load over as
many clients as you can muster without adversely affecting
searching. And that would increase your maximum indexing rate.

Or perhaps I'm just not understanding the division you need here.

If you really would be better served by doing the extraction on the
Solr side, see the ExtractingRequestHandler (ERH) as another option
here:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

So you'd have something like this:
find the metadata you care about from your system. Include
all the fields as "literals" on the call to SolrCell, and
let ERH then extract the text content from the doc and index
it along with the data you've passed as literals.

On word of caution if you want to extract metadata _from the
document_. Unless you have very uniform documents, this
gets "interesting".Say you wanted to pull out the "last edited"
date _from the document_. Word might have a meta-data
field conveniently named "last_edited", which you could map
into your Solr schema. However, a PDF file might have a field
"latest_change" expressing the same concept. Note: the names
are made up, the point is there's no standard amongst different
types of docs . I find all this easier to deal with a SolrJ program
using Tika, but I admit that's largely a matter where my comfort
zone is.

The previous paragraph of course doesn't apply at all if all you care
about is the text.

Even so, I'm still puzzled by why you see it as advantageous to
make this split. If I understand correctly, you are already having
a SolrJ program to find the docs to send to Solr. You're already
going to have to send SolrInputDocuments to Solr with certain
metadata. Adding the Tika extraction is just a few lines and has,
from my perspective, several near and long-term advantages.

Which probably just means I don't understand your problem
space in sufficient depth

Best,
Erick

On Fri, Sep 12, 2014 at 11:36 PM, Clemens Wyss DEV  wrote:
> Erick, thanks for you input. You are right that the "miraculous connection" 
> is not always that miraculous ;)
>
> In your example the extraction is being done in the client side. But as I 
> said, I'd ideally like to put the burden of tika-extraction into the 
> Solr-process. All fields, but the file-content-based-fields, should be field 
> on the client side and only the file-content-based-fields shall be extracted 
> (before indexing) in Solr. So it would "only" be the files that needed tob e 
> "shared"
>
> --Clemens
>
> -Ursprüngliche Nachricht-
> Von: Erick Erickson [mailto:erickerick...@gmail.com]
> Gesendet: Freitag, 12. September 2014 17:57
> An: solr-user@lucene.apache.org
> Betreff: Re: SolrJ : fieldcontent from (multiple) file(s)
>
> bq: I could of course push in the filename(s) in a field, but this would 
> require Solr (due to field-type e.g. "filecontent") to extract the content 
> from the given file.
>
> Why? If you're already dealing with SolrJ, you do all the work you need to 
> there by adding fields to a SolrInputDocument, including any metadata and 
> content your client extracts. Here's an example that uses Tika (shipped with 
> Solr) to do just that, as well as extract DB contents etc.
>
> http://searchhub.org/2012/02/14/indexing-with-solrj/
>
> Best,
> Erick
>
> On Fri, Sep 12, 2014 at 5:55 AM, Clemens Wyss DEV  
> wrote:
>> Thanks Alex,
>>> Do you just care about document content?
>> content only.
>>
>> The documents (not necessarily coming from a Db) are being pushed (through 
>> Solrj). This is at least the initial idea, mainly due to the dynamic nature 
>> of our index/search architecture.
>> I could of course push in the filename(s) in a field, but this would require 
>> Solr (due to field-type e.g. "filecontent") to extract the content from the 
>> given file. Is something alike this possible in Solr-indexing?
>>
>>> DataImportHandler
>> Would I need to write a custom DIH? Or is the DIH as is, i.e. just 
>> configurable through the data-config.xml?
>>
>>> nested entities design
>> Could you link me to this concept/idea?
>>
>> -Ursprüngliche Nachricht-
>> Von: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Gesendet: Freitag, 12. September 2014 14:12
>> An: solr-user
>> Betreff: Re: SolrJ : fieldcontent from (multiple) file(s)
>>
>> Do you just care about document content? Not metadata, such as file name, 
>> date, author, etc?
>>
>> Does it have to be push into Solr or can be pull? If pull, DataImportHandler 
>> should be able to do what you want with nested entities design.
>>
>> Regards,
>>Alex.
>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources
>> and newsletter: http://www.solr-start.com/ 

New tiny high-performance HTTP/Servlet server for Solr

2014-09-13 Thread Jayson Minard
Instead of within an Application Server such as Jetty, Tomcat or Wildly ...
Solr can also now be run standalone on Undertow without the overhead or
complexity of a full application server. Open-sourced on
https://github.com/bremeld/solr-undertow

solr-undertow

Solr running in standalone server - High Performance, tiny, fast, easy,
standalone deployment. Requires JDK 1.7 or newer. Less than 4MB download,
faster than Jetty, Tomcat and all the others. Written in the Kotlin language
 for the JVM.

Releases are available here
 on GitHub.

This application launches a Solr WAR file as a standalone server running a
high performance HTTP front-end based on undertow.io (the engine behind
WildFly, the new JBoss). It has no features of an application server, does
nothing more than load Solr servlets and also service the Admin UI. It is
production-quality for a stand-alone Solr server.


Re: SolrJ : fieldcontent from (multiple) file(s)

2014-09-13 Thread Alexandre Rafalovitch
On 13 September 2014 17:03, Erick Erickson  wrote:
> Which probably just means I don't understand your problem
> space in sufficient depth

I suspect this means the clients do not have access to the shared
drive with the files, but the Solr server does. A firewall in between
or some such.

If I am right, that would make invoking DataImportHandler a bit
complicated as well, due to change of push to pull.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: Solr: How to delete a document

2014-09-13 Thread Erick Erickson
H, it should not be necessary to add expungeDeletes, so
I'd like to understand what's happening here.

FiMka: Could you give us the exact URL you send? Because
trying the below from a browser works just fine for me on 4.x.
with the sample data.

http://localhost:8983/solr/collection1/update?commit=true&stream.body=DELL

Is your "id" field a simple string type in schema.xml of is it
analyzed in any way?

I'm a bit afraid that the expungeDeletes is masking something
else, I'd like to understand what.

I've also been fooled in the past by having the browser give me
back cached pages, so you might try slightly altering the query
when you look for the doc after deletion.

Best,
Erick

On Sat, Sep 13, 2014 at 1:43 PM, FiMka  wrote:
> *François*, thank you for help, it is really working now!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-How-to-delete-a-document-tp4158649p4158654.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: How to delete a document

2014-09-13 Thread Alexandre Rafalovitch
If "commit" was the answer, you may want to step back and review your
understanding of Solr.

The main point is that Solr commit is not the same as SQL transaction,
but is something that has to be triggered manually (or through timeout
specifications in the request and/or solrconfig.xml). Also, commit
will apply to all the changes introduced to that point from all the
different clients, not just the changes of that specific client.

Regards,
   Alex.
P.s. Just commit should have been sufficient as well. I don't think
you need expungeDeletes
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 13 September 2014 16:43, FiMka  wrote:
> *François*, thank you for help, it is really working now!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-How-to-delete-a-document-tp4158649p4158654.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: Tricky exact match, unwanted search results

2014-09-13 Thread Erick Erickson
The easiest way to make your examples work would
be to use a copyField to an "exact match" field that
uses the KeywordTokenizer (and perhaps a lowercase filter).
Then your exact match would be satisfied by a simple
wildcard search for cat*.

You'll have to be a little careful to escape spaces for
muti-term bits, like
exact_field:pussy\ cat.
Quoting these is sometimes tricky.

Anyway, as the thread you linked to says, details matter
a lot in this case. The keyword tokenizer won't let you find
"dog has" in a field populated with "my dog has fleas" for
instance with this techinque.

As far as your question about "if" and "in", what you're probably
getting here is stopword removal, but that's a guess. Try it
again without including a stopword filter factory to test that
out.

Best,
Erick


On Sat, Sep 13, 2014 at 1:33 PM, FiMka  wrote:
> Hi guys, could you help me with implementing exact match search in Solr.
> Say I have the following Solr documents: And my search query is:
> By default Solr for the given documents and the search query "cat" will give
> all the partially matched documents ("cat", "pussy cat" and "cats"), but
> only the "cat" and "cats" results are wanted by me.
> To enable the desired Sorl search behavior, as it was suggested  here
> 
> , I add prefixes and suffixes to phrases each time when adding new documents
> (much more details are given  here
> 
> ).
> I can say in most of cases (like for the above documents) this solution is
> working (we have now only "cat" and "cats" as the results for the query
> "http://localhost:8983/solr/select?q=phraseExact%3A+%22_prefix_+cat+_suffix_%22";).
> But if we have the documents like:
> If submit the search query
> "http://localhost:8983/solr/select?q=phraseExact%3A+%22_prefix_+in+case+if++_suffix_%22";,
> then all three documents are returned ("in case if", "worst-case condition",
> "optimistic-case module").
> I can suppose that such behavior can be because of preposition and
> conjunction in the searched phrase "*in* case *if*". These words possibly
> are not treated by Solr? Also it is interesting that the number of words
> (three words) in the original phrase ("in case if") matches the resulting
> phrases ("worst-case condition" and "optimistic-case module").
> Do you have some ideas why I have such results and what can be the reasons?
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Tricky-exact-match-unwanted-search-results-tp4158652.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ : fieldcontent from (multiple) file(s)

2014-09-13 Thread Erick Erickson
Alexandre:

Hmmm, if you're correct, that pretty much shoots SolrCel in the
head too. You'd probably have to do something with a custom
UpdateRequestProcessor in that case...

On Sat, Sep 13, 2014 at 2:06 PM, Alexandre Rafalovitch
 wrote:
> On 13 September 2014 17:03, Erick Erickson  wrote:
>> Which probably just means I don't understand your problem
>> space in sufficient depth
>
> I suspect this means the clients do not have access to the shared
> drive with the files, but the Solr server does. A firewall in between
> or some such.
>
> If I am right, that would make invoking DataImportHandler a bit
> complicated as well, due to change of push to pull.
>
> Regards,
>Alex.
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: Solr: How to delete a document

2014-09-13 Thread FiMka
I've got the answer! The problem was not in absence of 'expungeDeletes=true',
I've double checked and this does not matter actually. But in fact first
time I sent the documents removal request to
localhost:8983/solr/update/?commit=true with no specifying any exact Solr
core, e.g. "collection1". Solr still responds with 200 OK but of course
nothing was removed in my specific core.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-delete-a-document-tp4158649p4158668.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: How to delete a document

2014-09-13 Thread Erick Erickson
Ahhh! Thanks for letting us know, I was wondering!

And that fact was right there in the URL you pasted and
I overlooked it totally. Siiigggh.

Erick

On Sat, Sep 13, 2014 at 2:28 PM, FiMka  wrote:
> I've got the answer! The problem was not in absence of 'expungeDeletes=true',
> I've double checked and this does not matter actually. But in fact first
> time I sent the documents removal request to
> localhost:8983/solr/update/?commit=true with no specifying any exact Solr
> core, e.g. "collection1". Solr still responds with 200 OK but of course
> nothing was removed in my specific core.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-How-to-delete-a-document-tp4158649p4158668.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: How to delete a document

2014-09-13 Thread Alexandre Rafalovitch
Well, I missed it as well. :-)

I usually put my URLs on their own lines to make looking at them
easier. Wonder if that would have helped in this particular case.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 13 September 2014 17:55, Erick Erickson  wrote:
> Ahhh! Thanks for letting us know, I was wondering!
>
> And that fact was right there in the URL you pasted and
> I overlooked it totally. Siiigggh.
>
> Erick
>
> On Sat, Sep 13, 2014 at 2:28 PM, FiMka  wrote:
>> I've got the answer! The problem was not in absence of 'expungeDeletes=true',
>> I've double checked and this does not matter actually. But in fact first
>> time I sent the documents removal request to
>> localhost:8983/solr/update/?commit=true with no specifying any exact Solr
>> core, e.g. "collection1". Solr still responds with 200 OK but of course
>> nothing was removed in my specific core.
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Solr-How-to-delete-a-document-tp4158649p4158668.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: New tiny high-performance HTTP/Servlet server for Solr

2014-09-13 Thread William Bell
Can we get some stats? Do you have any numbers on performance?

On Sat, Sep 13, 2014 at 3:03 PM, Jayson Minard 
wrote:

> Instead of within an Application Server such as Jetty, Tomcat or Wildly ...
> Solr can also now be run standalone on Undertow without the overhead or
> complexity of a full application server. Open-sourced on
> https://github.com/bremeld/solr-undertow
>
> solr-undertow
>
> Solr running in standalone server - High Performance, tiny, fast, easy,
> standalone deployment. Requires JDK 1.7 or newer. Less than 4MB download,
> faster than Jetty, Tomcat and all the others. Written in the Kotlin
> language
>  for the JVM.
>
> Releases are available here
>  on GitHub.
>
> This application launches a Solr WAR file as a standalone server running a
> high performance HTTP front-end based on undertow.io (the engine behind
> WildFly, the new JBoss). It has no features of an application server, does
> nothing more than load Solr servlets and also service the Admin UI. It is
> production-quality for a stand-alone Solr server.
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


AW: SolrJ : fieldcontent from (multiple) file(s)

2014-09-13 Thread Clemens Wyss DEV
Thanks for all you advices and thoughts.

The "client" in our case is/are the tomcats. To be more precise the webapps 
running in the tomcats. These should serve http request.

I'd also like to note that it's he batch-updates that in my opinion cause load 
(cpu and memory (dependeing on the pdf)) which I would like to take of the 
webapps.  Not the single document insertions/updates.

But if I don't get a clean/stable "Solr-way-to-do-it" solution to this problem 
I will do the extraction in the webapps, as is 


-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Samstag, 13. September 2014 23:22
An: solr-user@lucene.apache.org
Betreff: Re: SolrJ : fieldcontent from (multiple) file(s)

Alexandre:

Hmmm, if you're correct, that pretty much shoots SolrCel in the head too. You'd 
probably have to do something with a custom UpdateRequestProcessor in that 
case...

On Sat, Sep 13, 2014 at 2:06 PM, Alexandre Rafalovitch  
wrote:
> On 13 September 2014 17:03, Erick Erickson  wrote:
>> Which probably just means I don't understand your problem space in 
>> sufficient depth
>
> I suspect this means the clients do not have access to the shared 
> drive with the files, but the Solr server does. A firewall in between 
> or some such.
>
> If I am right, that would make invoking DataImportHandler a bit 
> complicated as well, due to change of push to pull.
>
> Regards,
>Alex.
>
> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
> popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: Advice on highlighting

2014-09-13 Thread Ramkumar R. Aiyengar
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-2878
provides lucene API what you are trying to do, it's not yet in though.
There's a fork which has the change in
https://github.com/flaxsearch/lucene-solr-intervals
On 12 Sep 2014 21:24, "Craig Longman"  wrote:

> In order to take our Solr usage to the next step, we really need to
> improve its highlighting abilities.  What I'm trying to do is to be able
> to write a new component that can return the fields that matched the
> search (including numeric fields) and the start/end positions for the
> alphanumeric matches.
>
>
>
> I see three different approaches take, either way will require making
> some modifications to the lucene/solr parts, as it just does not appear
> to be doable as a completely stand alone component.
>
>
>
> 1) At initial search time.
>
> This seemed like a good approach.  I can follow IndexSearcher creating
> the TermContext that parses through AtomicReaderContexts to see if it
> contains a match and then adds it to the contexts available for later.
> However, at this point, inside SegmentTermsEnum.seekExact() it seems
> like Solr is not really looking for matching terms as such, it's just
> scanning what looks like the raw index.  So, I don't think I can easily
> extract term positions at this point.
>
>
>
> 2) Write a odified HighlighterComponent.  We have managed to get phrases
> to highlight properly, but it seems like getting the full field matches
> would be more difficult in this module, however, because it does its
> highlighting oblivious to any other criteria, we can't use it as is.
> For example, this search:
>
>
>
>   (body:large+AND+user_id:7)+OR+user_id:346
>
>
>
> Will highlight "large" in records that have user_id = 346 when
> technically (for our purposes at least) it should not be considered a
> hit because the "large" was accompanied by the user_id = 7 criteria.
> It's not immediately clear to me how difficult it would be to change
> this.
>
>
>
> 3) Make a modified DebugComponent and enhance the existing explain()
> methods (in the query types we require it at least) to include more
> information such as the start/end positions of the term that was hit.
> I'm exploring this now, but I don't easily see how I can figure out what
> those positions might be from the explain() information.  Any pointers
> on how, at the point that TermQuery.explain() is being called that I can
> figure out which indexed token was the actual hit on?
>
>
>
>
>
> Craig Longman
>
> C++ Developer
>
> iCONECT Development, LLC
> 519-645-1663
>
>
>
>
>
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential.
> If the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, notify the sender immediately by
> return email and delete the message and any attachments from your system.
>
>