Re: Unique key error while indexing pdf files

2013-07-01 Thread archit2112
Okay. Can you please suggest a way (with an example) of assigning this unique key to a pdf file. Say, a unique number to each pdf file. How do i achieve this? -- View this message in context: http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074592.html Sen

Re: How to re-index Solr & get term frequency within documents

2013-07-01 Thread Tony Mullins
I use Nutch as input datasource for my Solr. So I cannot re-run all the Nutch jobs to generate data again for Solr as it will take very long to generate that much data. I was hoping there would be an easier way inside Solr to just re-index all the existing data. Thanks, Tony On Tue, Jul 2, 2013

Re: Unique key error while indexing pdf files

2013-07-01 Thread archit2112
Can you please suggest a way (with example) of assigning this unique key to a pdf file? -- View this message in context: http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074588.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Schema design for parent child field

2013-07-01 Thread Mikhail Khludnev
from my experience deeply nested scopes is for SOLR-3076 almost only. On Sat, Jun 29, 2013 at 1:08 PM, Sperrink wrote: > Good day, > I'm seeking some guidance on how best to represent the following data > within > a solr schema. > I have a list of subjects which are detailed to n levels. > Each

Re: Converting nested data model to solr schema

2013-07-01 Thread Mikhail Khludnev
On Mon, Jul 1, 2013 at 5:56 PM, adfel70 wrote: > This requires me to override the solr document distribution mechanism. > I fear that with this solution I may loose some of solr cloud's > capabilities. > It's not clear whether you aware of http://searchhub.org/2013/06/13/solr-cloud-document-rout

Re: dataconfig to index ZIP Files

2013-07-01 Thread ericrs22
not sure if this will help any. Here's the verbose log INFO - 2013-07-01 23:17:08.632; org.apache.solr.handler.dataimport.DataImporter; Loading DIH Configuration: tika-data-config.xml INFO - 2013-07-01 23:17:08.648; org.apache.solr.handler.dataimport.DataImporter; Data Configuration loaded suc

Re: Disable Document Id from being printed in the logs...

2013-07-01 Thread Shawn Heisey
On 7/1/2013 3:24 PM, Niran Fajemisin wrote: I noticed that for Solr 4.2, when an internal call is made between two nodes Solr uses the list of matching document ids to fetch the document details. At this time, it prints out all matching document ids as a part of the query. Is there a way to su

Re: full-import failed after 5 hours with Exception: ORA-01555: snapshot too old: rollback segment number with name "" too small ORA-22924: snapshot too old

2013-07-01 Thread Michael Della Bitta
I would say definitely investigate the performance of the query, but also since you're using CachedSqlEntityProcessor, you might want to back off on the transaction isolation to READ_COMMITTED, which I think is the lowest one that Oracle supports: http://wiki.apache.org/solr/DataImportHandler#Conf

Disable Document Id from being printed in the logs...

2013-07-01 Thread Niran Fajemisin
Hi all, I noticed that for Solr 4.2, when an internal call is made between two nodes Solr uses the list of matching document ids to fetch the document details. At this time, it prints out all matching document ids as a part of the query. Is there a way to suppress these log statements from bein

Re: Improving performance to return 2000+ documents

2013-07-01 Thread Utkarsh Sengar
Thanks Erick/Jagdish. Just to give some background on my queries. 1. All my queries are unique. A query can be: "ipod" and "ipod 8gb" (but these are unique). These are about 1.2M in total. So, I assume setting a high queryResultCache, queryResultWindowSize and queryResultMaxDocsCached won't help.

Using per-segment FieldCache or DocValues in custom component?

2013-07-01 Thread Michael Ryan
I have some custom code that uses the top-level FieldCache (e.g., FieldCache.DEFAULT.getLongs(reader, "foobar", false)). I'd like to redesign this to use the per-segment FieldCaches so that re-opening a Searcher is fast(er). In most cases, I've got a docId and I want to get the value for a part

Re: How to re-index Solr & get term frequency within documents

2013-07-01 Thread Jack Krupansky
Or, go with a commercial product that has a single-click Solr re-index capability, such as: 1. DataStax Enterprise - data is stored in Cassandra and reindexed into Solr from there. 2. LucidWorks Search - data sources are declared so that the package can automatically re-crawl the data source

Re: are fields stored or unstored by default xml

2013-07-01 Thread Jack Krupansky
Correct - the field definitions inherit the attributes of the field type, and it is the field type that has the actual default values for indexed and stored (and other attributes.) -- Jack Krupansky -Original Message- From: Yonik Seeley Sent: Monday, July 01, 2013 3:56 PM To: solr-us

Re: "Classic" 4.2 master-slave replication not completing

2013-07-01 Thread Shawn Heisey
On 7/1/2013 1:07 PM, Neal Ensor wrote: is it conceivable that there's too much traffic, causing Solr to stall re-opening the searcher (thus releasing to the new index)? I'm grasping at straws, and this is beginning to bug me a lot. The traffic logs wouldn't seem to support this (apart from peri

Re: are fields stored or unstored by default xml

2013-07-01 Thread Yonik Seeley
On Mon, Jul 1, 2013 at 3:50 PM, Jack Krupansky wrote: > "stored" and "indexed" both default to "true". > > This is legal: > > Actually, for fields I believe the defaults come from the fieldType. The fieldType defaults to true for both indexed and stored if they are not specified there. -Yoni

Re: are fields stored or unstored by default xml

2013-07-01 Thread Jack Krupansky
"stored" and "indexed" both default to "true". This is legal: This detail will be in Early Access Release #2 of my book on Friday. -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Monday, July 01, 2013 2:21 PM To: solr-user@lucene.apache.org Subject: Re: are f

Re: FileDataSource vs JdbcDataSouce (speed) Solr 3.5

2013-07-01 Thread Shawn Heisey
On 7/1/2013 12:56 PM, Mike L. wrote: Hey Ahmet / Solr User Group, I tried using the built in UpdateCSV and it runs A LOT faster than a FileDataSource DIH as illustrated below. However, I am a bit confused about the numDocs/maxDoc values when doing an import this way. Here's my Get comman

Perf. difference when the solr core is 'current' or not 'current'

2013-07-01 Thread jchen2000
in Solr's admin statistics page, there is a 'current' flag indicating whether the core index reader is 'current' or not. According to some discussions in this mailing list a few months back, it wouldn't affect anything. But my observation is completely different. When the current flag was not check

Re: "Classic" 4.2 master-slave replication not completing

2013-07-01 Thread Neal Ensor
is it conceivable that there's too much traffic, causing Solr to stall re-opening the searcher (thus releasing to the new index)? I'm grasping at straws, and this is beginning to bug me a lot. The traffic logs wouldn't seem to support this (apart from periodic health-check pings, the load is dist

Re: FileDataSource vs JdbcDataSouce (speed) Solr 3.5

2013-07-01 Thread Mike L.
 Hey Ahmet / Solr User Group,      I tried using the built in UpdateCSV and it runs A LOT faster than a FileDataSource DIH as illustrated below. However, I am a bit confused about the numDocs/maxDoc values when doing an import this way. Here's my Get command against a Tab delimted file: (I remov

Re: How to re-index Solr & get term frequency within documents

2013-07-01 Thread Otis Gospodnetic
If all your fields are stored, you can do it with http://search-lucene.com/?q=solrentityprocessor Otherwise, just reindex the same way you indexed in the first place. *Always* be ready to reindex from scratch. Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring --

Re: are fields stored or unstored by default xml

2013-07-01 Thread Otis Gospodnetic
Haven't tried it recently, but is that even legal? Just be explicit :) Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Jul 1, 2013 at 2:16 PM, Katie McCorkell wrote: > In schema.xml I know you can label a field as stored="

Re: dataconfig to index ZIP Files

2013-07-01 Thread ericrs22
I'm using the Tika plugin to do so and according to http://tika.apache.org/0.5/formats.html it does *ZIP archive (application/zip) Tika uses Java's built-in Zip classes to parse ZIP files. Support for ZIP was added in Tika 0.2.* -- View this message in context: http://lucene.472066.n3.nabble.

are fields stored or unstored by default xml

2013-07-01 Thread Katie McCorkell
In schema.xml I know you can label a field as stored="false" or stored="true", but if you say neither, which is it by default? Thank you Katie

Re: dataconfig to index ZIP Files

2013-07-01 Thread Noble Paul നോബിള്‍ नोब्ळ्
IIRC Zip files are not supported On Mon, Jul 1, 2013 at 10:30 PM, ericrs22 wrote: > To answer the previous Post: > > I was not sure what datasource="binaryFile" I took it from a PDF sample > thinking that would help. > > after setting datasource="null" I'm still gett the same errors... > > >

Re: How to re-index Solr & get term frequency within documents

2013-07-01 Thread Tony Mullins
Thanks Jack , it worked. Could you please provide some info on how to re-index existing data in Solr, after changing the schema.xml ? Thanks, Tony On Mon, Jul 1, 2013 at 8:21 PM, Jack Krupansky wrote: > You can write any function query in the field list of the "fl" parameter. > Sounds like you

Re: Does solr cloud required passwordless ssh?

2013-07-01 Thread Mark Miller
No, SolrCloud does not currently use ssh. - Mark On Jul 1, 2013, at 12:58 PM, adfel70 wrote: > Hi > Does solr cloud on a cluster of servers require passwordless ssh to be > configured between the servers? > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Doe

Re: cores sharing an instance

2013-07-01 Thread Roman Chyla
as for the second option: If you look inside SolrResourceLoader, you will notice that before a CoreContainer is created, a new class loader is also created line:111 this.classLoader = createClassLoader(null, parent); however, this parent object is always null, because it is called from: public

Re: dataconfig to index ZIP Files

2013-07-01 Thread ericrs22
To answer the previous Post: I was not sure what datasource="binaryFile" I took it from a PDF sample thinking that would help. after setting datasource="null" I'm still gett the same errors...

Does solr cloud required passwordless ssh?

2013-07-01 Thread adfel70
Hi Does solr cloud on a cluster of servers require passwordless ssh to be configured between the servers? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-solr-cloud-required-passwordless-ssh-tp4074398.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: documentCache not used in 4.3.1?

2013-07-01 Thread Daniel Collins
Regrettably, visibility is key for us :( Documents must be searchable as soon as they have been indexed (or as near as we can make it). Our old search system didn't do relevance sort, it was time-ordered (so it had a much simpler job) but it did have sub-second latency, and that is what is expect

Concurrent Modification Exception

2013-07-01 Thread adityab
Hi, I have recently upgraded from Solr 3.5 to 4.2.1. Also we have added spellcheck feature to our search query. During our performance testing we have observed that for every 2000 request, 1 request fails. The exception we observe in solr log are ConcurrentModificationException. Below is the comp

Re: How to re-index Solr & get term frequency within documents

2013-07-01 Thread Jack Krupansky
You can write any function query in the field list of the "fl" parameter. Sounds like you want "termfreq": termfreq(field_arg,term) fl=id,a,b,c,termfreq(a,xyz) -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Monday, July 01, 2013 10:47 AM To: solr-user@lucene.apache.o

Re: ConcurrentUpdateSolrServer hanging

2013-07-01 Thread qungg
Hi, BlockUntilFinish block indefinitely sometimes. But if I send a commit from another thread to the instance, the concurrentUpdateServer unblock and send the rest of the documents and commit. So the squence look like this: 1. adding documents as usual... 2. finish adding documents... 3. block un

How to re-index Solr & get term frequency within documents

2013-07-01 Thread Tony Mullins
Hi, I am using Solr 4.3.0. If I change my solr's schema.xml then do I need to re-index my solr ? And if yes , how to ? My 2nd question is I need to find the frequency of term per document in all documents of search result. My field is And I am trying this query http://localhost:8080/solr/se

Re: Distinct values in multivalued fields

2013-07-01 Thread Jack Krupansky
Unfortunately, update processors only "see" the new, fresh, incoming data, not any existing document data. This is a case where your best bet may be to read the document first and then merge your new value into the existing list of values. -- Jack Krupansky -Original Message- From:

Re: Converting nested data model to solr schema

2013-07-01 Thread Jack Krupansky
Simply duplicate a subset of the fields that you want to query of the parent document on each child document and then you can directly query the child documents without any join. Yes, given the complexity of your data, a two-step query process may be necessary for some queries - do one query t

Re: Distinct values in multivalued fields

2013-07-01 Thread Upayavira
Have a look at the DedupUpdateProcessorFactory, which may help you. Although, I'm not sure if it works with multivalued fields. Upayavira On Mon, Jul 1, 2013, at 02:34 PM, tuedel wrote: > Hello everybody, > > i have tried to make use of the UniqFieldsUpdateProcessorFactory in > order to achieve

Converting nested data model to solr schema

2013-07-01 Thread adfel70
Hi, I have the following data model: 1. Document (fields: doc_id, author, content) 2. Each Document has multiple attachment types. Each attachment type has multiple instances. And each attachment type may have different fields. for example: 1 john some long long text...

Distinct values in multivalued fields

2013-07-01 Thread tuedel
Hello everybody, i have tried to make use of the UniqFieldsUpdateProcessorFactory in order to achieve distinct values in multivalued fields. Example below: title tag_type uniq_fields However the data being is indexed one

Re: Shard tolerant partial results

2013-07-01 Thread Mark Miller
On Jul 1, 2013, at 6:56 AM, Phil Hoy wrote: > Perhaps an http header could be added or another attribute added to the solr > result node. I thought that was already done - I'm surprised that it's not. If that's really the case, please make a JIRA issue. - Mark

Re: Stemming query in Solr

2013-07-01 Thread snkar
I was just wondering if another solution might work. If we are able to extract the stem of the input search term(maybe using a C# based stemmer, some open source implementation of the Porter algorithm) for cases where the stemming option is selected, and submit the query to solr as a multiple ch

Re: RemoveDuplicatesTokenFilterFactory to avoid import duplicate values in multivalued field

2013-07-01 Thread Jack Krupansky
Your stated problem seems to have nothing to do with the message subject line relating to RemoveDuplicatesTokenFilterFactory. Please start a new message thread unless you really are concerned with an issue related to RemoveDuplicatesTokenFilterFactory. This kind of "thread hijacking" is inappr

Re: Stemming query in Solr

2013-07-01 Thread snkar
So the general solution is to index the field twice, once with stemming and once without in order to have the ability to do both stemmed and exact matches I am already indexing the text twice using the ContentSearch and ContentSearchStemming fields. But what this allows me is to return "burnin

Re: Unique key error while indexing pdf files

2013-07-01 Thread Jack Krupansky
It's really 100% up to you how you want to come up with the unique key values for your documents. What would you like them to be? Just use that. Anything (within reason) - anything goes. But it also comes back to your data model. You absolutely must come up with a data model for how you expect

Re: RemoveDuplicatesTokenFilterFactory to avoid import duplicate values in multivalued field

2013-07-01 Thread tuedel
Hey, i have tried to make use of the UniqFieldsUpdateProcessorFactory in order to achieve distinct values in multivalued fields. Example below: title tag_type uniq_fields However the data being is indexed one by one. This may h

Re: Unique key error while indexing pdf files

2013-07-01 Thread archit2112
Im new to solr. Im just trying to understand and explore various features offered by solr and their implementations. I would be very grateful if you could solve my problem with any example of your choice. I just want to learn how i can index pdf documents using data import handler. -- View this

Re: Unique key error while indexing pdf files

2013-07-01 Thread Jack Krupansky
It all depends on your data model - tell us more about your data model. For example, how will users or applications query these documents and what will they expect to be able to do with the ID/key for the documents? How are you expecting to identify documents in your data model? -- Jack Krupa

Unique key error while indexing pdf files

2013-07-01 Thread archit2112
Hi Im trying to index pdf files in solr 4.3.0 using the data import handler. *My request handler - * data-config1.xml *My data-config1.xml * Now When i try and index the files i get the following error - org.apache.solr.common.SolrException: Do

Shard tolerant partial results

2013-07-01 Thread Phil Hoy
Hi, When doing distributed searches with shards.tolerant set whilst the hosts for a slice are down and therefore the response is partial, how best that inferred as we would like to not cache the results upstream and perhaps inform the end user in some way. I am aware that shards.info could be

Re: Multiple groups of boolean queries in a single query.

2013-07-01 Thread Erick Erickson
Have you tried the query you indicated? Because it should "just work" barring syntax errors. The only other thing you might want is to turn on grouping by field type. That'll return separate sections by type, say the top 3 (default 1) documents in each type. If you don't group, you have the possibi

Re: Stemming query in Solr

2013-07-01 Thread Erick Erickson
bq: But looks like it is executing the search for an exact text based match with the stem "burn". Right. You need to appreciate index time as opposed to query time stemming. Your field definition has both turned on. The admin/analysis page will help here .. At index time, the terms are stemmed,

Re: Index pdf files.

2013-07-01 Thread archit2112
I figured it out. It was a problem with the regular expression i used in data-config.xml . -- View this message in context: http://lucene.472066.n3.nabble.com/Index-pdf-files-tp4074278p4074304.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index pdf files.

2013-07-01 Thread Erick Erickson
OK, have you done anything custom? You get this where? solr logs? Echoed back in the browser? In response to what command? You haven't provided enough info to help us help you. You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Jul 1, 2013 at 6:08 AM, archit2112

Re: documentCache not used in 4.3.1?

2013-07-01 Thread Erick Erickson
Daniel: Soft commits invalidate the "top level" caches, which include things like filterCache, queryResultCache etc. Various "segment-level" caches are NOT invalidated, but you really don't have a lot of control from the Solr level over those anyway. But yeah, the tension between caching a bunch

Re: Set spellcheck field on query time?

2013-07-01 Thread Jan Høydahl
Check out http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.dictionary - you can define multiple dictionaries in the same handler, each with its own source field. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 1. juli 2013 kl. 11:34 skrev Timo Schmidt : > H

Re: Index pdf files.

2013-07-01 Thread archit2112
Hi Thanks a lot. I did what you said. Now I'm getting the following error. Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0 -- View this message in context: http://lucene.472066.n3.nabb

Re: Multiple groups of boolean queries in a single query.

2013-07-01 Thread samabhiK
My entire concern is to be able to make a single query to fetch all the types of records. If I had to create three different cores for this different types of data, I would have to make 3 calls to solr to fetch the entire set of data. And I will be having approx 15 such types in real. Also, at any

Multiple groups of boolean queries in a single query.

2013-07-01 Thread samabhiK
Hello friends, I have a schema which contains various types of records of three different categories for ease of management and for making a single query to fetch all the data. The fields are grouped into three different types of records. For example: fields type 1: fields type 2: field

Sum as a Projection for Facet Queries

2013-07-01 Thread samarth s
Hi, We have a need of finding the sum of a field for each facet.query. We have looked at StatsComponent but that supports only facet.field. Has anyone written a patch over StatsComponent that supports the same along with some performance measures? Is t

Set spellcheck field on query time?

2013-07-01 Thread Timo Schmidt
Hello together, we are currently working on a mutilanguage single core setup. During that I stumbled upon the question if it is possible to define different sources for the spellcheck. For now I only see the possibility to define different request handlers. Is it somehow possible to set the so

Re: Stemming query in Solr

2013-07-01 Thread snkar
Hi Erick, Thanks for the reply. Here is what the situation is: Relevant portion of Solr Schema:

Re: Index pdf files.

2013-07-01 Thread Shalin Shekhar Mangar
The tika jars are not in your classpath. You need to add all the jars inside contrib/extraction/lib directory to your classpath. On Mon, Jul 1, 2013 at 2:00 PM, archit2112 wrote: > Hi I'm new to Solr. I want to index pdf files usng the Data Import Handler. > Im using Solr-4.3.0. I followed the st

Index pdf files.

2013-07-01 Thread archit2112
Hi I'm new to Solr. I want to index pdf files usng the Data Import Handler. Im using Solr-4.3.0. I followed the steps given in this post http://lucene.472066.n3.nabble.com/indexing-with-DIH-and-with-problems-td3731129.html However, I get the following error - Full Import failed:java.lang.NoClass

Re: dataconfig to index ZIP Files

2013-07-01 Thread Bernd Fehling
Try setting dataSource="null" for your toplevel entity and use filename="\.zip$" as filename selector. Am 28.06.2013 23:14, schrieb ericrs22: > unfortunately not. I had tried that before with the logs saying: > > Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: > java.

Re: documentCache not used in 4.3.1?

2013-07-01 Thread Daniel Collins
We see similar results, again we softCommit every 1s (trying to get as NRT as we can), and we very rarely get any hits in our caches. As an unscheduled test last week, we did shutdown indexing and noticed about 80% hit rate in caches (and average query time dropped from ~1s to 100ms!) so I think w