Re: Any way to get reference to original request object from within Solr component?

2012-03-17 Thread pravesh
Hi Sujit,

The Http parameters ordering is above the SOLR level. Don't think this could
be controlled at SOLR level.
You can append all required values in a single Http param at then break at
your component level.

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-way-to-get-reference-to-original-request-object-from-within-Solr-component-tp3833703p3834082.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting Index Results by User's Score

2012-03-17 Thread Mikhail Khludnev
Phill,

You can find a lot of blog posts describes how to sort results by an
external field. FWIW, LucidWorks Enterprise has something like relevance
feedback framework.

Regards

On Sat, Mar 17, 2012 at 4:39 AM, Phill Tornroth wrote:

> I'm puzzled on whether or not Solr is the right system for solving this
> problem I've got. I'm using some Solr indexes for autocompletion, and I
> have a desire to rank the results by their value to the requesting user.
> Essentially, I'll tally the number of times the user has chosen particular
> results, and I have a need to include that value in the process of sorting
> and limiting results.
>
> This doesn't seem like a request that would be un-typical, but I'm
> wondering how Solr experts suggest it be done? It seems impractical to hold
> my scores elsewhere and ask Solr for unlimited results and then do the
> ordering/limiting on my side.. but I don't see an obvious way to do this
> within Solr itself, though the JOIN functionality and the Function Query
> stuff look like they might be a part of the right solution.
>
> Any help would be greatly appreciated.
>
> Thanks!
>
> Phill
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-17 Thread Mikhail Khludnev
Sure it does

http://my.safaribooksonline.com/book/web-development/9781847195883/indexing-data/ch03lvl1sec03#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODE4NDcxOTU4ODMvNjg=

On Sat, Mar 17, 2012 at 2:55 AM, vybe3142  wrote:

> Hi,
> Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP
> streaming.
>
> Use case:
> * Text Files to be indexed are on file server (A) (some potentially large -
> several 100 MB)
> * SOLRJ client is on server (B)
> * SOLR server is on server (C) running with dynamically created SOLR cores
>
> Looking at how ContentStreamUpdateRequest is typically used in SOLRJ, it
> looks like the files would be read from A to the client on B (across the
> wire) and then sent across the wire via an HTTP request (in the body) to C
> to be indexed.
>
> Is there a more efficient way to accomplish this i.e. pass a path to the
> file when making the request from B so that the SOLR server on C can read
> directly from file server A ?
>
> Thanks
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3833419.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: maxClauseCount Exception

2012-03-17 Thread Darren Govoni
Thanks for the tip Hoss.

I notice that it appears sometimes and was varying because my index runs
would sometimes have different amount of docs, etc.

So all I want to do is a simple "all docs with something in this field,
and to highlight the field". 

Is the query expansion to "all possible terms in the index" really
necessary? I could have 100's of thousands of possible terms. Why should
they all become explicit query elements? Seems overkill and
underperformant.

Is there a another way with Lucene or not really?

On Thu, 2012-03-08 at 16:18 -0800, Chris Hostetter wrote:
> :   I am suddenly getting a maxClauseCount exception for no reason. I am
> : using Solr 3.5. I have only 206 documents in my index.
> 
> Unless things have changed the reason you are seeing this is because 
> _highlighting_ a query (clause) like "type_s:[*+TO+*]" requires rewriting 
> it into a giant boolean query of all the terms in that field -- so even if 
> you only have 206 docs, if you have more then 206 values in that field in 
> your index, you're going to go over 1024 terms.
> 
> (you don't get this problem in a basic query, because it doens't need to 
> enumerate all the terms, it rewrites it to a ConstantScoreQuery)
> 
> what you most likeley want to do, is move some of those clauses like 
> "type_s:[*+TO+*]: and "usergroup_sm:admin") out of your main "q" query and 
> into "fq" filters ... so they can be cached independently, won't 
> contribute to scoring (just matching) and won't be used in highlighting.
> 
> : 
> params={hl=true&hl.snippets=4&hl.simple.pre=&fl=*,score&hl.mergeContiguous=true&hl.usePhraseHighlighter=true&hl.requireFieldMatch=true&echoParams=all&hl.fl=text_t&q={!lucene+q.op%3DOR+df%3Dtext_t}+(+kind_s:doc+OR+kind_s:xml)+AND+(type_s:[*+TO+*])+AND+(usergroup_sm:admin)&rows=20&start=0&wt=javabin&version=2}
>  hits=204 status=500 QTime=166 |#]
> 
> : [#|2012-02-22T13:40:13.131-0500|SEVERE|glassfish3.1.1|
> : org.apache.solr.servlet.SolrDispatchFilter|
> : _ThreadID=22;_ThreadName=Thread-2;|org.apache.lucene.search.BooleanQuery
> : $TooManyClauses: maxClauseCount is set to 1024
> : at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:136)
>   ...
> : at
> : org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:304)
> : at
> : 
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:158)
> 
> -Hoss
> 




Re: Any way to get reference to original request object from within Solr component?

2012-03-17 Thread SUJIT PAL
Thanks Pravesh,

Yes, converting the myparam to a single (comma-separated) field is probably the 
best approach, but as I mentioned, this is probably a bit too late for this to 
be practical in my case... 

The myparam parameters are facet filter queries, and so far order did not 
matter, since the filters were just AND-ed together and applied to the result 
set and facets were being returned in count order. But now the requirement is 
to "bubble up" the selected facets so the one is most currently selected is on 
the top. This was uncovered during user-acceptance testing (since the client 
shows only the top N facets, and the currently selected facet to disappear 
since its no longer within the top N facets).

Asking the client to switch to a single comma-separated field is an option, but 
its the last option at this point, so I was wondering if it was possible to 
switch to some other data structure, or at least get a handle to the original 
HTTP servlet request from within the component so I could grab the parameters 
from there.

I noticed that the /select call does preserve the order of the parameters, but 
that is because its probably being executed by SolrServlet, which gets its 
parameters from the HttpServletRequest.

I guess I will have to just run the request through a debugger and see where 
exactly the parameter order gets messed up...I'll update this thread if I find 
out.

Meanwhile, if any of you have simpler alternatives, would really appreciate 
knowing...

Thanks,
-sujit

On Mar 17, 2012, at 12:01 AM, pravesh wrote:

> Hi Sujit,
> 
> The Http parameters ordering is above the SOLR level. Don't think this could
> be controlled at SOLR level.
> You can append all required values in a single Http param at then break at
> your component level.
> 
> Regds
> Pravesh
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Any-way-to-get-reference-to-original-request-object-from-within-Solr-component-tp3833703p3834082.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-17 Thread Mark Miller
Nodes talk to ZooKeeper as well as to each other. You can see the addresses 
they are trying to use to communicate with each other in the 'cloud' view of 
the Solr Admin UI. Sometimes you have to override these, as the detected 
default may not be an address that other nodes can reach. As a limited example: 
for some reason my mac cannot talk to my linux box with its default detected 
host address of halfmetal:8983/solr - but the mac can reach my linux box if I 
use halfmetal.Local - so I have to override the published address of my linux 
box using the host attribute if I want to setup a cluster between my macbook 
and linux box.

Each nodes talks to ZooKeeper to learn about the other nodes, including their 
addresses. Recovery is then done node to node using the appropriate addresses.


- Mark Miller
lucidimagination.com

On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:

> I'm still having issues replicating in my work environment. Can anyone
> explain how the replication mechanism works? Is it communicating across
> ports or through zookeeper to manager the process?
> 
> 
> 
> 
> On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker <
> mpar...@apogeeintegration.com> wrote:
> 
>> All,
>> 
>> I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
>> apache-solr-4.0-2012-02-29_09-07-30) , sent some document through Manifold
>> using its crawler, and it looks like it's replicating fine once the
>> documents are committed.
>> 
>> This must be related to my environment somehow. Thanks for your help.
>> 
>> Regards,
>> 
>> Matt
>> 
>> On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson 
>> wrote:
>> 
>>> Matt:
>>> 
>>> Just for paranoia's sake, when I was playing around with this (the
>>> _version_ thing was one of my problems too) I removed the entire data
>>> directory as well as the zoo_data directory between experiments (and
>>> recreated just the data dir). This included various index.2012
>>> files and the tlog directory on the theory that *maybe* there was some
>>> confusion happening on startup with an already-wonky index.
>>> 
>>> If you have the energy and tried that it might be helpful information,
>>> but it may also be a total red-herring
>>> 
>>> FWIW
>>> Erick
>>> 
>>> On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller 
>>> wrote:
> I assuming the windows configuration looked correct?
 
 Yeah, so far I can not spot any smoking gun...I'm confounded at the
>>> moment. I'll re read through everything once more...
 
 - Mark
>>> 
>> 
>> 















Re: Adding a 'Topics' pulldown for refined initial searches.

2012-03-17 Thread Lance Norskog
Usually these are just 'facets' with a one-to-one mapping to the
document. You add a category field and add a string for each document.
Then, when you do a search add parameters to include facets.  Here is
the main wiki page:
http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/SimpleFacetParameters

For more information on facets, or anything else in Solr, go here:
http://www.lucidimagination.com/search/?q=facet#%2Fp%3Asolr

On Fri, Mar 16, 2012 at 10:47 AM, Valentin, AJ  wrote:
> Hello all,
>
> Yesterday was my first time using this (or any) email list and I think I did 
> something wrong.  Anyways, I will try this again.
>
>
> I have installed Solr search on my Drupal 7 installation.  Currently, it 
> works as an 'All' search tool.  I'd like to limit the scope of the search 
> with an available pull-down to set the topic for searching.
>
> If I've researched correctly, I think my term 'scope' or 'topic' is the same 
> as 'clustering'...I may be wrong.
>
> Here is a link to a screenshot for what I have to get implemented soon.
> http://imageupload.org/en/file/200809/solr-scope.png.html
>
> Regards,
> AJ
>
>
> CONFIDENTIALITY NOTICE:  This email constitutes an electronic communication 
> within the meaning of the Electronic Communications Privacy Act, 18 U.S.C. 
> 2510, and its disclosure is strictly limited to the named recipient(s) 
> intended by the sender of this message.  This email, and any attachments, may 
> contain confidential and/or proprietary information of Scientific Research 
> Corporation.  If you are not a named recipient, any copying, using, 
> disclosing or distributing to others the information in this email and 
> attachments is STRICTLY PROHIBITED.  If you have received this email in 
> error, please notify the sender immediately and permanently delete the email, 
> any attachments, and all copies thereof from any drives or storage media and 
> destroy any printouts or hard copies of the email and attachments.
>
> EXPORT COMPLIANCE NOTICE:  This email and any attachments may contain 
> technical data subject to U.S export restrictions under the International 
> Traffic in Arms Regulations (ITAR) or the Export Administration Regulations 
> (EAR).  Export or transfer of this technical data and/or related information 
> to any foreign person(s) or entity(ies), either within the U.S. or outside of 
> the U.S., may require advance export authorization by the appropriate U.S. 
> Government agency prior to export or transfer.  In addition, technical data 
> may not be exported or transferred to certain countries or specified 
> designated nationals identified by U.S. embargo controls without prior export 
> authorization.  By accepting this email and any attachments, all recipients 
> confirm that they understand and will comply with all applicable ITAR, EAR 
> and embargo compliance requirements.
>



-- 
Lance Norskog
goks...@gmail.com


Re: Solr core swap after rebuild in HA-setup / High-traffic

2012-03-17 Thread Bill Bell
DIH sets the time of update to the start time not the end time,

So when the index is rebuilt, if you run an delta and use the update time you 
should be okay. We normally go back a few minutes to make sure we have all s a 
fail safe as well.

Sent from my Mobile device
720-256-8076

On Mar 14, 2012, at 12:58 PM, KeesSchepers  wrote:

> Hello everybody,
> 
> I am designing a new Solr architecture for one of my clients. This sorl
> architecture is for a high-traffic website with million of visitors but I am
> facing some design problems were I hope you guys could help me out.
> 
> In my situation there are 4 Solr servers running, 1 server is master and 3
> are slave. They are running Solr version 1.4.
> 
> I use two cores 'live' and 'rebuild' and I use Solr DIH to rebuild a core
> which goes like this:
> 
> 1. I wipe the reindex core
> 2. I run the DIH to the complete dataset (4 million documents) in peices of
> 20.000 records (to prevent very long mysql locks)
> 3. After the DIH is finished (2 hours) we have to also have to update the
> rebuild core with changes from the last two hours, this is a problem
> 4. After updating is done and the core is not more then some seconds behind
> we want to SWAP the cores.
> 
> Everything goes well except for step 3. The rebuild and the core swap is all
> okay. 
> 
> Because the website is undergoing changes every minute we cannot pauze the
> delta-import on the live and walk behind for 2 hours. The problem is that I
> can't figure out a closing system with not delaying the live core to long
> and use the DIH instead of writing a lot of code.
> 
> Did anyone face this problem before or could give me some tips?
> 
> Thanks!
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-core-swap-after-rebuild-in-HA-setup-High-traffic-tp3826461p3826461.html
> Sent from the Solr - User mailing list archive at Nabble.com.