DataImport using SqlEntityProcessor running Out of Memory

2014-05-11 Thread O. Olson
I have a Data Schema which is Hierarchical i.e. I have an Entity and a number
of attributes. For a small subset of the Data - about 300 MB, I can do the
import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot
do the import with 9 GB of memory. 
I am using the SqlEntityProcessor as below: 









 


 










What is the best way to import this data? Doing it without a cache, results
in many SQL queries. With the cache, I run out of memory. 

I’m curious why 4GB of data cannot entirely fit in memory. One thing I need
to mention is that I have about 400 to 500 attributes. 

Thanks in advance for any helpful advice. 
O. O. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImport-using-SqlEntityProcessor-running-Out-of-Memory-tp4135080.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: not getting any mails

2014-05-11 Thread Erick Erickson
There was an infrastructure problem, _nobody_ was getting e-mails. I
think it's fixed now.

But the backlog will take a while to work through.

Erick

On Sat, May 10, 2014 at 5:30 AM, Aman Tandon  wrote:
> Hi,
>
> I am not getting any mails from this group, did my subscription just got
> ended? Is there anybody can help.
>
> With Regards
> Aman Tandon


retreive all the fields in join

2014-05-11 Thread Aman Tandon
Hi,

Is there a way possible to retrieve all the fields present in both the
cores(core 1 and core2).

e.g.
core1: {id:111,name: "abc" }

core2: {page:17, type: "fiction"}

I want is that, on querying both the cores I want to retrieve the results
containing all the 4 fields, fields id, name from core1 and page, type from
core2. Is it possible?

With Regards
Aman Tandon


Replica as a "leader"

2014-05-11 Thread adfel70
Solr & Collection Info:
solr 4.8 , 4 shards, 3 replicas per shard, 30-40 milion docs per shard.

Process:
1. Indexing 100-200 docs per second.
2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while
indexing).
3. Indexing for 10-20 minutes and doing hard commit. 
4. Doing Pkill -9 java to the leader and then starting one replica in shard
3 (while indexing).
5. After 20 minutes starting another replica in shard 3 ,while indexing (not
the leader in step 1). 

Results:
2. Only the leader is active in shard 3.
3. Thousands of docs were added to the leader in shard 3.
4. After staring the replica, it's state was down and after 10 minutes it
became the leader in cluster state (and still down). no servers hosting
shards for index and search requests.
5. After starting another replica, it's state was recovering for 2-3 minutes
and then it became active (not leader in cluster state).
6. Index, commit and search requests are handeled in the other replicae
(*active status, not leader!!!*).


Expected:
5. To stay in down status.
*6. Not to handel index, commit and search requests - no servers hosting
shards!*

Thanks!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replica-as-a-leader-tp4135077.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr, How to index scripts *.sh and *.SQL

2014-05-11 Thread Gora Mohanty
On 8 May 2014 12:25, Visser, Marc  wrote:
>
> HI All,
> Recently I have set up an image with SOLR. My goal is to index and extract 
> files on a Windows and Linux server. It is possible for me to index and 
> extract data from multiple file types. This is done by the SOLR CELL request 
> handler. See the post.jar cmd below.
>
> j ava -Dauto -Drecursive -jar post.jar Y:\ SimplePostTool version 1.5 Posting 
> files to base url localhost:8983/solr/update.. Entering auto mode. File 
> endings considered are xml,json,csv,pdf,doc,docx,ppt,pp 
> tx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log Entering recursive 
> mode, max depth=999, delay=0s 0 files indexed.
>
> Is it possible to index and extract metadata/content from file types like .sh 
> and .sql? If it is possible I would like to know how of course :)

Don't know about Windows, but on Linux these are just text files. What
metadata are you referring to? Normally, a Linux text file only has content,
unless you are talking about metadata such as obtained from:
   file cmd.sh

Regards,
Gora


is it possible for solr to calculate and give back the price of a product based on its sub-products

2014-05-11 Thread Gharbi Mohamed
Hi,

I am using Solr for searching magento products in my project,
I want to know, is it possible for solr to calculate and give back the price
of a product based on its sub-products(items);

For instance, i have a product P1 and it is the parent of items m1, m2.
i need to get the minimal price of items and return it as a price of product
P1.

I'm wondering if that is possible, can you help me ?
I need to know if solr can do that or if there is a feature or a way to do
it ?
And finally i thank you!

regards,
Mohamed.

 



Is it possible for solr to calculate and give back the price of a product based on its sub-products

2014-05-11 Thread gharbi mohamed
Hi,

I am using Solr for searching magento products in my project,
I want to know, is it possible for solr to calculate and give back the
price of a product based on its sub-products(items);

For instance, i have a product P1 and it is the parent of items m1, m2.
i need to get the minimal price of items and return it as a price of
product P1.

I'm wondering if that is possible ?
I need to know if solr can do that or if there is a feature or a way to do
it ?
And finally i thank you!

regards,
Mohamed.


Re: not getting any mails

2014-05-11 Thread Ahmet Arslan



Hi Amon,

Its not just you. There was a general problem with Apache mailing lists. But it 
is fixed now. 
Please see for more info : https://blogs.apache.org/infra/entry/mail_outage

Ahmet


On Sunday, May 11, 2014 7:41 AM, Aman Tandon  wrote:
Hi,

I am not getting any mails from this group, did my subscription just got
ended? Is there anybody can help.

With Regards
Aman Tandon



solr 4.2.1 spellcheck strange results

2014-05-11 Thread HL

Hi

I am querying the solr server spellcheck and the results I get back 
although at first glance look ok
it seems like solr is replying back as if it made the search with the 
wrong key.


so while I query the server with the word
"καρδυα"
Solr is responding me as if it was querying the database with the word 
"καρδυ" eliminating the last char

---



---

Ideally, Solr should properly indicate that the suggestions correspond 
with "καρδυα" rather than "καρδυ".


Is there a way to make solr respond with the original search word from 
the query in it's responce, instead of the one that is getting the hits 
from ??


Regars,
Harry



here is the complete solr responce
---


0
23

true
*,score
0
καρδυα
καρδυα

title_short^750 title_full_unstemmed^600 title_full^400 title^500 
title_alt^200 title_new^100 series^50 series2^30 author^300 
author_fuller^150 contents^10 topic_unstemmed^550 topic^500 
geographic^300 genre^300 allfields_unstemmed^10 fulltext_unstemmed^10 
allfields fulltext isbn issn


basicSpell
arrarr
dismax
xml
0






3
0
6
0


καρδ
5


καρδι
3


καρυ
1



false






Re: SWF content not indexed

2014-05-11 Thread Ahmet Arslan



Hi,

Solr/lucene only deals with text. There are some other projects that extract 
text from rich documents.
Solr-cell uses http://tika.apache.org for extraction. May be tika (or any other 
tool) already extracts text from swf?


On Sunday, May 11, 2014 9:40 AM, Mauro Gregorio Binetti 
 wrote:
Hi guys,
how can I make it possibile to index content of SWF files? I'm using Solr
3.6.0.

Regards,
Mauro


Re: LetterTokenizerFactory doesn't work as expected

2014-05-11 Thread Ahmet Arslan
Hi,

Are you really using letter tokenizer? You should see LT as abbreviation in 
analysis page.

      
         
          
         
      

Ahmet





On Sunday, May 11, 2014 12:42 PM, ienjreny  wrote:
Dears:I am applying  LetterTokenizerFactory as it is mentioned at the
following
linkhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LetterTokenizerFactoryBut
when I am using the analyzer on "I can't" the results are:LCF text i
can'tBut in the documentation is mentioned to be "i", "can", "t"Is it by
mistake or there something wrong with my schema code                      



--
View this message in context: 
http://lucene.472066.n3.nabble.com/LetterTokenizerFactory-doesn-t-work-as-expected-tp4135082.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with getting tv.tf_idf values.

2014-05-11 Thread Silvia Suárez
Dear all,

I'm trying to get tv.tf_idf values from my Solr documents into the
termvector. I am using SolrJ.

However it is not working.

I'm setting the values to true (tv.tf_idf) in my search:

 public static QueryResponse doQuery(SolrServer server, String queryEnt,
String myCollection)
throws SolrServerException {
  SolrQuery solrQuery = new SolrQuery();
  solrQuery.setQuery(queryEnt);
solrQuery.set("collectionName", myCollection);
solrQuery.addHighlightField("texto")
 .addHighlightField("titular")
 .setHighlightSnippets(20)
 .setHighlightFragsize(0);
solrQuery.set("hl.useFastVectorHighlighter", true);
solrQuery.set("hl.fragListBuilder", "single");
solrQuery.set("tv.fl", "titular");
solrQuery.set("tv.tf", true);
solrQuery.set("tv.tf_idf", true);
solrQuery.set("hl.fragmentsBuilder", "colored");
solrQuery.setHighlightSimplePre("");
solrQuery.setHighlightSimplePost("");
  solrQuery.set("fl", "c_noticia titular texto");
solrQuery.setStart(start);
solrQuery.setRows(nbDocuments);

 return server.query(solrQuery);
}

and then I am using the code in:
http://stackoverflow.com/questions/8977852/how-to-parse-the-termvectorcomponent-response-to-which-java-object,
to get the values:

public static void printOr(QueryResponse response) throws
SolrServerException, ParseException, IOException
 {
Iterator> termVectors = ((NamedList)
response.getResponse().get("termVectors")).iterator();
while(termVectors.hasNext()){ Entry docTermVector =
termVectors.next(); for(Iterator> fi =
((NamedList)docTermVector.getValue()).iterator(); fi.hasNext(); ){
Entry fieldEntry = fi.next();
if(fieldEntry.getKey().equals("contents")){ for(Iterator> tvInfoIt = ((NamedList)fieldEntry.getValue()).iterator();
tvInfoIt.hasNext(); ){ Entry tvInfo = tvInfoIt.next();
NamedList tv = (NamedList) tvInfo.getValue(); System.out.println("Vector
Info: " + tvInfo.getKey() + " tf: " + tv.get("tf") + " df: " + tv.get("df")
+ " tf-idf: " + tv.get("tf-idf")); } } } }


}


But i'm having this error:

Exception in thread "main" java.lang.ClassCastException: java.lang.String
cannot be cast to org.apache.solr.common.util.NamedList

Could anybody helpme please with this problem?

thanks a lot in advance.

Silvia


solrcore.properties in solrcloud

2014-05-11 Thread Suchi Amalapurapu
Hi
Can some one clarify the role of solrcore.properties in Solr 4.3+?
We can define custom properties in solrcore.properties and use them in
solrconfig.xml
Will this functionality be retained going forward?
Suchi


Is it possible for solr to calculate and give back the price of a product based on its sub-products

2014-05-11 Thread gharbi mohamed
Hi,

I am using Solr for searching magento products in my project,
I want to know, is it possible for solr to calculate and give back the
price of a product based on its sub-products(items);

For instance, i have a product P1 and it is the parent of items m1, m2.
i need to get the minimal price of items and return it as a price of
product P1.

I'm wondering if that is possible ?
I need to know if solr can do that or if there is a feature or a way to do
it ?
And finally i thank you!

regards,
Mohamed.

-- 

*Mohamed GHARBI*

*Software engineering intren at DECADE-Tunisia*


Re: timeAllowed in not honoring

2014-05-11 Thread Aman Tandon
Apologies for late reply, Thanks Toke for a great explaination :)
I am new in solr so i am unaware of DocValues, so please can you explain.

With Regards
Aman Tandon


On Fri, May 2, 2014 at 1:52 PM, Toke Eskildsen wrote:

> On Thu, 2014-05-01 at 23:03 +0200, Aman Tandon wrote:
> > So can you explain how enum is faster than default.
>
> The fundamental difference is than enum iterates terms and counts how
> many of the documents associated to the terms are in the hits, while fc
> iterates all hits and updates a counter for the term associated to the
> document.
>
> A bit too simplified we have enum: terms->docs, fc: hits->terms. enum
> wins when there are relatively few unique terms and is much less
> affected by index updates than fc. As Shawn says, you are best off by
> testing.
>
> > We are planning to move to SolrCloud with the version solr 4.7.1, so does
> > this 14 GB of RAM will be sufficient? or should we increase it?
>
> Switching to SolrCloud does not change your fundamental memory
> requirements for searching. The merging part adds some overhead, but
> with a heap of 14GB, I would be surprised if that would require an
> increase.
>
> Consider using DocValues for facet fields with many unique values, for
> getting both speed and low memory usage at the cost of increased index
> size.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Re: LetterTokenizerFactory doesn't work as expected

2014-05-11 Thread Ismaeel Enjreny
I got the error based on your feedback

I am using  inside 
instead of , I did the changes and it is working fine

Thanks for your support


On Sun, May 11, 2014 at 3:50 PM, Ahmet Arslan  wrote:

> Hi,
>
> Are you really using letter tokenizer? You should see LT as abbreviation
> in analysis page.
>
>positionIncrementGap="100">
>  
>   
>  
>   
>
> Ahmet
>
>
>
>
>
> On Sunday, May 11, 2014 12:42 PM, ienjreny 
> wrote:
> Dears:I am applying  LetterTokenizerFactory as it is mentioned at the
> following
> linkhttp://
> wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LetterTokenizerFactoryBut
> when I am using the analyzer on "I can't" the results are:LCF text i
> can'tBut in the documentation is mentioned to be "i", "can", "t"Is it by
> mistake or there something wrong with my schema code
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/LetterTokenizerFactory-doesn-t-work-as-expected-tp4135082.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Why Solrj Default Format is XML Rather than Javabin?

2014-05-11 Thread Shawn Heisey
On 5/9/2014 6:11 AM, Furkan KAMACI wrote:
> When I index documents via Solrj it sends documents as XML. Solr processes
> it with XMLLoader. Then sends response as javabin format.
> 
> Why Solrj client does not send data as javabin format as default?
> 
> PS: I use Solr 4.5.1

I do not know the history here.  There was probably a good reason at one
time, and it's stayed like that because of inertia -- when things work,
changing them is risky.

It is easy to change when you use SolrJ, if you're using HttpSolrServer.

server.setRequestWriter(new BinaryRequestWriter());

I believe that in newer versions, CloudSolrServer actually does use
BinaryRequestWriter by default.

Thanks,
Shawn



Re: is it possible for solr to calculate and give back the price of a product based on its sub-products

2014-05-11 Thread Walter Underwood
Do that at index time. Index a document that has the product, the child 
products, and the total price.

Flatten your data to match what the search results will display. Solr is not a 
relational system, so don't design with normalized tables, joins, and 
calculated result columns.

wunder

On May 9, 2014, at 12:29 PM, "Gharbi Mohamed"  
wrote:

> Hi,
> 
> I am using Solr for searching magento products in my project,
> I want to know, is it possible for solr to calculate and give back the price
> of a product based on its sub-products(items);
> 
> For instance, i have a product P1 and it is the parent of items m1, m2.
> i need to get the minimal price of items and return it as a price of product
> P1.
> 
> I'm wondering if that is possible, can you help me ?
> I need to know if solr can do that or if there is a feature or a way to do
> it ?
> And finally i thank you!
> 
> regards,
> Mohamed.
> 






Re: ContributorsGroup add request

2014-05-11 Thread Shawn Heisey
On 5/10/2014 6:35 PM, Jim Martin wrote:
>Please add me the ContributorsGroup; I've got some Solr icons I'd
> like to suggest to the community. Perhaps down the road I can contribute
> more. I'm the team lead at Overstock.Com for search, and Solr is the
> foundation of what we do.
> 
>Username: JamesMartin

I went to add you, but someone else has already done so.

It's entirely possible that because of the Apache email outage, they
have already replied, but the message hasn't made it through to the list
yet.  I'm adding you as a CC here (which I normally don't do) so that
you'll get notified faster.

Thanks,
Shawn



Re: Inconsistent response from Cloud Query

2014-05-11 Thread Shawn Heisey
On 5/9/2014 11:42 AM, Cool Techi wrote:
> We have noticed Solr returns in-consistent results during replica recovery 
> and not all replicas are in the same state, so when your query goes to a 
> replica which might be recovering or still copying the index then the counts 
> may differ.
> regards,Ayush

SolrCloud should never send requests to a replica that is recovering.
If that is happening (which I think is unlikely), then it's a bug.

If *you* send a request to a replica that is still recovering, I would
expect SolrCloud to redirect the request elsewhere unless distrib=false
is used.  I'm not sure whether that actually happens, though.

Thanks,
Shawn



Website running Solr

2014-05-11 Thread Olivier Austina
Hi All,
Is there a way to know if a website use Solr? Thanks.
Regards
Olivier


Re: Website running Solr

2014-05-11 Thread Ahmet Arslan
Hi,

Some site owners put themselves here :

https://wiki.apache.org/solr/PublicServers



Besides, I would try *:* match all docs query.

Ahmet


On Sunday, May 11, 2014 7:55 PM, Olivier Austina  
wrote:
Hi All,
Is there a way to know if a website use Solr? Thanks.
Regards
Olivier


Re: LetterTokenizerFactory doesn't work as expected

2014-05-11 Thread Jack Krupansky
Please post your full field type analyzer. The letter tokenizer should in 
fact return "I", "can", and "t" - if it is used properly.


-- Jack Krupansky

-Original Message- 
From: ienjreny

Sent: Saturday, May 10, 2014 8:28 AM
To: solr-user@lucene.apache.org
Subject: LetterTokenizerFactory doesn't work as expected

Dears:I am applying  LetterTokenizerFactory as it is mentioned at the
following
linkhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LetterTokenizerFactoryBut
when I am using the analyzer on "I can't" the results are:LCF text i
can'tBut in the documentation is mentioned to be "i", "can", "t"Is it by
mistake or there something wrong with my schema code



--
View this message in context: 
http://lucene.472066.n3.nabble.com/LetterTokenizerFactory-doesn-t-work-as-expected-tp4135082.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: is it possible for solr to calculate and give back the price of a product based on its sub-products

2014-05-11 Thread Jack Krupansky
Are these really sub-products? It sounds like you have the same product from 
multiple vendors and just want to return the lowest cost vendor, with no 
actual calculation. Either way, there is no direct method for this 
calculation/selection in Solr. Sure, you can always code a custom query 
component and/or custom function query value source, but this sort of sounds 
like an operation that you should be doing in your application layer rather 
than in Solr itself.


You can certainly sort "ascending" by your price field, then the cheapest 
vendor would be returned first.


In any case, please clarify your use case.

-- Jack Krupansky

-Original Message- 
From: Gharbi Mohamed

Sent: Friday, May 9, 2014 3:29 PM
To: solr-user@lucene.apache.org
Subject: is it possible for solr to calculate and give back the price of a 
product based on its sub-products


Hi,

I am using Solr for searching magento products in my project,
I want to know, is it possible for solr to calculate and give back the price
of a product based on its sub-products(items);

For instance, i have a product P1 and it is the parent of items m1, m2.
i need to get the minimal price of items and return it as a price of product
P1.

I'm wondering if that is possible, can you help me ?
I need to know if solr can do that or if there is a feature or a way to do
it ?
And finally i thank you!

regards,
Mohamed.





Re: Website running Solr

2014-05-11 Thread Paul Libbrecht
Not with certainty as solr may be working far behind another set of tools that 
make queries (and nothing licensing prevents it).
If you get a software that has maybe Solr inside, I think the credits section 
should include a mention of some sort.
However, there may be hints if a website uses solr, and that would be by 
following http queries (e.g. with the web-inspector) and finding commonalities 
with the standard solr params (see here 
http://wiki.apache.org/solr/CommonQueryParameters).

paul

> Hi All,
> Is there a way to know if a website use Solr? Thanks.
> Regards
> Olivier



Re: is it possible for solr to calculate and give back the price of a product based on its sub-products

2014-05-11 Thread Mohamed23
Thanks jack for your response,
Actually my use case is that the sub products are the different sizes and
colors of the parent product.
So in order to have the adequate price for my product in the current search
i need to return the minimum price of the matching sub-products, in other
words i need that if i tape products name for instance i get the product
with min-price over all it sub products and then, when refining the search,
specifiying a size for example i need to get my product in the result with
the min-price the minimum of it sub-products with matching the refinement
size.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-for-solr-to-calculate-and-give-back-the-price-of-a-product-based-on-its-sub-products-tp4135212p4135215.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Too many documents Exception

2014-05-11 Thread Erick Erickson
You'll need to shard your index. This assumes that your index doesn't
have a huge percentage of deleted docs. This latter shouldn't be
happening since as you delete docs, the number of deleted docs should
drop as segments are merged.

I suspect you aren't searching on this index yet, because I'm pretty
sure performance will be poor. I've never heard of a Solr node
handling 2B+ docs _on a single node_. There are many sharded
collections that have many more docs, but not a single node.

Best,
Erick

On Tue, May 6, 2014 at 5:54 PM, [Tech Fun]山崎  wrote:
> Hello everybody,
>
> Solr 4.3.1(and 4.7.1), Num Docs + Deleted Docs >
> 2147483647(Integer.MAX_VALUE) over
> Caused by: java.lang.IllegalArgumentException: Too many documents,
> composite IndexReaders cannot exceed 2147483647
>
> It seems to be trouble similar to the unresolved e-mail.
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/browser
>
> If How can I fix this?
> This Solr Specification?
>
>
> log.
>
> ERROR org.apache.solr.core.CoreContainer  – Unable to create core: collection1
> org.apache.solr.common.SolrException: Error opening new searcher
> at org.apache.solr.core.SolrCore.(SolrCore.java:821)
> at org.apache.solr.core.SolrCore.(SolrCore.java:618)
> at 
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1438)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1550)
> at org.apache.solr.core.SolrCore.(SolrCore.java:796)
> ... 13 more
> Caused by: org.apache.solr.common.SolrException: Error opening Reader
> at 
> org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172)
> at 
> org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:183)
> at 
> org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:179)
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1414)
> ... 15 more
> Caused by: java.lang.IllegalArgumentException: Too many documents,
> composite IndexReaders cannot exceed 2147483647
> at 
> org.apache.lucene.index.BaseCompositeReader.(BaseCompositeReader.java:77)
> at 
> org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:368)
> at 
> org.apache.lucene.index.StandardDirectoryReader.(StandardDirectoryReader.java:42)
> at 
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:71)
> at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
> at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
> at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88)
> at 
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
> at 
> org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169)
> ... 18 more
> ERROR org.apache.solr.core.CoreContainer  –
> null:org.apache.solr.common.SolrException: Unable to create core:
> collection1
> at 
> org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
> at org.apache.solr.

Problem when i'm trying to search something

2014-05-11 Thread Anon15
Hi everyone,

I'm using an updated version of Solr module for Drupal 7.
I've indexed my entire site, and i have about 8000 documents in index.

I have a problem when I'm trying to research something.

For example: 
-> "incre" => 0 results
-> "incred" => n results for the word 'incredible'
-> "incredi" => 0 results
-> "incredib" => n results for the word 'incredible'

This is just an example.

How is it possible? Someone can help me with an detailed explanation?

Thanks !




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-when-i-m-trying-to-search-something-tp4135045.html
Sent from the Solr - User mailing list archive at Nabble.com.


Payload use case

2014-05-11 Thread Leonardo Oliveira
Hi, everybody...  This is my first post

a wrote an PDF text extractor able to return text in the follow format:

"The|(1,2)(3,4) quick|(5,6)(7,8) brown|(9,10)(11,12) ..."

where

each (x,y) is a coordinate on a two dimensions of the page in which the
terms are positioned, ie:

"The"
(1,2) is the upper left coordinate of the letter 'T'
(3,4) is the lower right coordinate of the letter 'e'

"quick"
(5,6) is the upper left coordinate of the letter 'q'
(7,8) is the lower right coordinate of the letter 'k'

and so on ...

For text indexing, i think to store each coordinate as
paylodas for each word/term of sentence. I already know how to store them
through a custom
DelimitedPayloadTokenFilter, but I don't know what is the best way to read
those payloads at query time, ie, i need to read the payloads terms that
match with user's query, so, with this information i'll be able to
highlight the words found in the user's screen.

I don't want to use the highlight on the text as occurs with default
Highlighter or
FastVectorHighlighter, but over the image (thumbnail), ie, i want a
2-dimensional payload based highlighter. This way I would not need to store
the original text and decrease index size,  moreover improves the user
experience with "visual highlighted text fragment"

My question is: Am I doing the proper use of payloads for my use case? Or
should I use another
strategy to store those coordinates to be able to read them at query time?

I would have some performance issue if i`ll need to read a lot of payloads
that match with
user's query?

Are payloads part of the lucene cache?

Payloads should be used only for relevance purposes with a custom
implementation of Similarity class?

How can i use coordinates as "term offsets"? because in this case, my
"offset" is a relative to global cartesian'`s axis, not based on global
offset from source text.

Thank you for listening.

Regards


Re: Problem when i'm trying to search something

2014-05-11 Thread Doug Turnbull
Hey Anon15

It sounds like your encountering a heuristic known as stemming. Stemming
takes various of words and reduces them to a base form. For example

jumped
jumping
jumper

might all get reduced to "jump"

So now when a user searches for "jump" they can retrieve all docs with the
various forms of the word "jump".

There's various algorithms that Solr/Lucene use for stemming. They all
represent hueristics that work well for most cases. Some stemmers are very
aggressive in how they reduce to a common form, others are less aggressive.
You can read more about Solr/Lucene's stemmers here:

https://wiki.apache.org/solr/LanguageAnalysis

All of these are configured in your Solr schema in the corresponding field
type's analysis chain. Do you know what field type is being used for this
field you are searching? Would you be interested in sharing, and we could
give you even more info?

Cheers,
-Doug
---
Search Consultant
OpenSource Connections
http://opensourceconnections.com



On Wed, May 7, 2014 at 5:28 AM, Anon15  wrote:

> Hi everyone,
>
> I'm using an updated version of Solr module for Drupal 7.
> I've indexed my entire site, and i have about 8000 documents in index.
>
> I have a problem when I'm trying to research something.
>
> For example:
> -> "incre" => 0 results
> -> "incred" => n results for the word 'incredible'
> -> "incredi" => 0 results
> -> "incredib" => n results for the word 'incredible'
>
> This is just an example.
>
> How is it possible? Someone can help me with an detailed explanation?
>
> Thanks !
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-when-i-m-trying-to-search-something-tp4135045.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Doug Turnbull
Search & Big Data Architect
OpenSource Connections 


Re: retreive all the fields in join

2014-05-11 Thread Aman Tandon
please help me out here!!

With Regards
Aman Tandon


On Sun, May 11, 2014 at 1:44 PM, Aman Tandon wrote:

> Hi,
>
> Is there a way possible to retrieve all the fields present in both the
> cores(core 1 and core2).
>
> e.g.
> core1: {id:111,name: "abc" }
>
> core2: {page:17, type: "fiction"}
>
> I want is that, on querying both the cores I want to retrieve the results
> containing all the 4 fields, fields id, name from core1 and page, type from
> core2. Is it possible?
>
> With Regards
> Aman Tandon
>


Re: Spatial Score by overlap area

2014-05-11 Thread quercus
Hi,

I have the same requirement for an application that I'm working on and I'm
wondering if you were able to make it work as per David's comment? Thanks.



geoport wrote
> Hi,
> i am using solr 4.6 and i´ve indexed bounding boxes. Now, i want to test
> the "area overlap sorting"
> link
> 
>   
> (slide 23), have some of you an example for me?Thanks for helping me.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-Score-by-overlap-area-tp4116439p4135076.html
Sent from the Solr - User mailing list archive at Nabble.com.