Proper type(s) for adding a DatePointField value [was: problems with indexing documents]

2019-04-04 Thread Mark H. Wood
One difficulty is that the documentation of SolrInputDocument.addField(String, Object) is not at all specific. I'm aware of SOLR-2298 and I accept that the patch is an improvement, but still... @param value Value of the field, should be of same class type as defined by "type" attribute of the

Re: problems with indexing documents

2019-04-02 Thread Bill Tantzen
Right, as Mark said, this is how the dates were indexed previously. However, instead of passing in the actual String, we passed a java.util.Date object which was automagically converted to the correct string. Now (the code on our end has not changed), solr throws an exception because the string it

Re: problems with indexing documents

2019-04-02 Thread Mark H. Wood
I'm also working on this with Bill. On Tue, Apr 02, 2019 at 09:44:16AM +0800, Zheng Lin Edwin Yeo wrote: > Previously, did you index the date in the same format as you are using now, > or in the Solr format of "-MM-DDTHH:MM:SSZ"? As may be seen from the sample code: > > doc.addField ( "date"

Re: problems with indexing documents

2019-04-01 Thread Zheng Lin Edwin Yeo
Hi Bill, Previously, did you index the date in the same format as you are using now, or in the Solr format of "-MM-DDTHH:MM:SSZ"? Regards, Edwin On Tue, 2 Apr 2019 at 00:32, Bill Tantzen wrote: > In a legacy application using Solr 4.1 and solrj, I have always been > able to add documents

problems with indexing documents

2019-04-01 Thread Bill Tantzen
In a legacy application using Solr 4.1 and solrj, I have always been able to add documents with TrieDateField types using java.util.Date objects, for instance, doc.addField ( "date", new java.util.Date() ); having recently upgraded to Solr 7.7, and updating my schema to leverage DatePointField as

Re: Indexing documents from S3 bucket

2018-10-08 Thread ☼ R Nair
avion On Mon, Oct 8, 2018, 11:26 AM marotosg wrote: > Hi, > > At the moment I have a SolrCloud Cluster with a documents collection being > populated indexing documents coming from a DFS server. Linux boxes are > mounting that DFS server using samba. > > There is a request to

Indexing documents from S3 bucket

2018-10-08 Thread marotosg
Hi, At the moment I have a SolrCloud Cluster with a documents collection being populated indexing documents coming from a DFS server. Linux boxes are mounting that DFS server using samba. There is a request to move that DFS server to a AWS S3 bucket. Does anyone have previous experience about

Re: SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Erick Erickson
Here's some numbers for batching improvements: https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/ And I totally agree with Shawn that for 40K documents anything more complex is probably overkill. Best, Erick On Fri, Nov 18, 2016 at 6:02 AM, Shawn Heisey wrote: > On 11/18/2016

Re: SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Shawn Heisey
On 11/18/2016 6:00 AM, Sebastian Riemer wrote: > I am looking to improve indexing speed when loading many documents as part of > an import. I am using the SolrJ-Client and currently I add the documents > one-by-one using HttpSolrClient and its method add(SolrInputDocument doc, > int commitWithi

SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Sebastian Riemer
Hi all, I am looking to improve indexing speed when loading many documents as part of an import. I am using the SolrJ-Client and currently I add the documents one-by-one using HttpSolrClient and its method add(SolrInputDocument doc, int commitWithinMs). My first step would be to change that t

Indexing documents stored in HDFS

2016-07-15 Thread Rishabh Patel
Hello, I am trying to find a way to index some documents, all located in a directory in HDFS. Since HDFS has a REST API, I was trying to use the DataImportHandler(DIH) along with the datasource type as URLDataSource, to index the documents. Is this approach wrong? If so, then is there a canonica

Re: Indexing documents in Chinese

2015-06-10 Thread Zheng Lin Edwin Yeo
I've tried to use solr.HMMChineseTokenizerFactory with the following configurations: It is able to be indexed, but when I tried to search for the words, it matches many more other words and not just the words that I search. Why is this so? For example, the query ht

Re: Indexing documents in Chinese

2015-06-09 Thread Alexandre Rafalovitch
You may find the series of article on CJK analysis/search helpful: http://discovery-grindstone.blogspot.com.au/ It's a little out of date, but should be a very solid intro. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 10 J

Indexing documents in Chinese

2015-06-09 Thread Zheng Lin Edwin Yeo
Hi, I'm trying to index rich-text documents that are in chinese. Currently, there's no problem with indexing, but there's problem with the searching. Does anyone knows what is the best Tokenizer and Filter Factory to use? I'm now using the solr.StandardTokenizerFactory which I heard that it's not

Re: Indexing documents/files for production use

2014-10-30 Thread Olivier Austina
Thank you Alexandre, Jürgen and Erick for your replies. It is clear for me. Regards Olivier 2014-10-28 23:35 GMT+01:00 Erick Erickson : > And one other consideration in addition to the two excellent responses > so far > > In a SolrCloud environment, SolrJ via CloudSolrServer will automatica

Re: Indexing documents/files for production use

2014-10-28 Thread Erick Erickson
And one other consideration in addition to the two excellent responses so far In a SolrCloud environment, SolrJ via CloudSolrServer will automatically route the documents to the correct shard leader, saving some additional overhead. Post.jar and cURL send the docs to a node, which in turn forw

Re: Indexing documents/files for production use

2014-10-28 Thread Jürgen Wagner (DVT)
Hello Olivier, for real production use, you won't really want to use any toys like post.jar or curl. You want a decent connector to whatever data source there is, that fetches data, possibly massages it a bit, and then feeds it into Solr - by means of SolrJ or directly into the web service of Sol

Re: Indexing documents/files for production use

2014-10-28 Thread Alexandre Rafalovitch
What is your production use? You have to answer that for yourself. post.jar makes a couple of things easy. If your production use fits into those (e.g. no cluster) - great, use it. It is certainly not any worse than cURL. But if you are running a cluster and have specific requirements, then yes,

Indexing documents/files for production use

2014-10-28 Thread Olivier Austina
Hi All, I am reading the solr documentation. I have understood that post.jar is not meant for production use, cURL is not recommande

RE: Indexing documents with ContentStreamUpdateRequest (SolrJ) asynchronously

2014-08-30 Thread Tomer Levi
Thread.sleep(1000); } Hope it helps, Tomer -Original Message- From: Jorge Moreira [mailto:j.moreira...@gmail.com] Sent: Thursday, August 28, 2014 11:50 AM To: solr-user@lucene.apache.org Subject: Indexing documents with ContentStreamUpdateRequest (SolrJ) asynchronously I am using SolrJ API

Indexing documents with ContentStreamUpdateRequest (SolrJ) asynchronously

2014-08-28 Thread Jorge Moreira
I am using SolrJ API 4.8 to index rich documents to solr. But i want to index these documents asynchronously. The function that I made send documents synchronously but i don't know how to change it to make it asynchronously. Any idea? Function: public Boolean indexDocument(HttpSolrServer server,

Re: indexing documents

2013-05-31 Thread Erick Erickson
Solr JSON isn't intended to index arbitrary JSON, and especially not intended to index nested documents. I suspect your issue is that "cat" has an array of name/value pairs that Solr doesn't understand. So no, I don't think you can index these docs without putting them into a form Solr understands

Re: Continue Indexing Documents when single doc does not match schema

2013-05-31 Thread Erick Erickson
Hmmm, not sure that would work for diferent values? But it does point the way to a different solution, write a custom update processor that removed multivalued entries FWIW, Erick On Thu, May 30, 2013 at 1:54 PM, Alexandre Rafalovitch wrote: > On Thu, May 30, 2013 at 1:03 PM, Iain Lopata wr

Re: Continue Indexing Documents when single doc does not match schema

2013-05-31 Thread Erick Erickson
work on this with SOLR-445, but i tdied on the vine. Wish I had a better answer Erick On Thu, May 30, 2013 at 1:03 PM, Iain Lopata wrote: > I am using Nutch 1.6 and Solr 1.4.1 on Ubuntu in local mode and using > Nutch's solrindex to index documents into Solr. > > > > When

indexing documents

2013-05-30 Thread Igor Littig
Good day everyone. I recently faced another problem. I've got a bunch of documents to index. The problem, that they in the same time database for another application. These documents stored in JSON format in the following scheme: { "id": 10, "name": "dad 177", "cat":[{ "id":254, "name

Re: Continue Indexing Documents when single doc does not match schema

2013-05-30 Thread Alexandre Rafalovitch
On Thu, May 30, 2013 at 1:03 PM, Iain Lopata wrote: > For example, a document which has two address fields when > my Solr schema.xml does not specify address as being multi-valued (and I do > not want it to be). No help on the core topic, but a workaround for the specific situation could be: ht

Re: Continue Indexing Documents when single doc does not match schema

2013-05-30 Thread Shawn Heisey
On 5/30/2013 11:03 AM, Iain Lopata wrote: When indexing documents, I hit an occasional document that does not match the Solr schema. For example, a document which has two address fields when my Solr schema.xml does not specify address as being multi-valued (and I do not want it to be). Ideally

Continue Indexing Documents when single doc does not match schema

2013-05-30 Thread Iain Lopata
I am using Nutch 1.6 and Solr 1.4.1 on Ubuntu in local mode and using Nutch's solrindex to index documents into Solr. When indexing documents, I hit an occasional document that does not match the Solr schema. For example, a document which has two address fields when my Solr schema.xml

Continue Indexing Documents when single doc does not match schema

2013-05-30 Thread Iain Lopata
I am using Nutch 1.6 and Solr 1.4.1 on Ubuntu in local mode and using Nutch's solrindex to index documents into Solr. When indexing documents, I hit an occasional document that does not match the Solr schema. For example, a document which has two address fields when my Solr schema.xml

Re: Looking for tips on indexing documents containing multi-valued tuple fields

2013-02-26 Thread Timothy Potter
Ok - I suspected grouping by company_name was too obvious here ;-) A couple of tricks to think about (not claiming any of these will help) are: 1) Document transformer - you can return any company fields you need in the response from a database lookup using a Document transformer. This lets you a

Re: Looking for tips on indexing documents containing multi-valued tuple fields

2013-02-26 Thread Clint Miller
Tim, thanks for the response. I definitely owe you a beer next time you're in Austin. I hadn't thought of your approach of turning things around. But, I don't think it will work because of some stuff I left out in my original email. First, the relationship between Company and Article is many-to-ma

Re: Looking for tips on indexing documents containing multi-valued tuple fields

2013-02-26 Thread Timothy Potter
Hi Clint, Nice to see you on this list! What about treating each article as the indexed unit (i.e. each article is a document) with structure: articleID publishDate source company_name company_desc contents Then you can do grouping by company_name field. I happen to know you're very familiar w

Re: Prevent indexing documents with some terms

2012-12-07 Thread Jack Krupansky
PM To: solr-user@lucene.apache.org Subject: Prevent indexing documents with some terms Hi: Is there any way that I can prevent a document from being indexed? I've a separated core only for query suggestions, this queries are stored right from the frontend app, so I'm trying to prevent s

Prevent indexing documents with some terms

2012-12-07 Thread Jorge Luis Betancourt Gonzalez
Hi: Is there any way that I can prevent a document from being indexed? I've a separated core only for query suggestions, this queries are stored right from the frontend app, so I'm trying to prevent some kind of bad intended queries to be stored in my query, but keeping the logic of what I cons

Re: indexing documents in Apache Solr using php-curl library

2012-07-02 Thread Sascha SZOTT
Hi, perhaps it's better to use a PHP Solr client library. I used https://code.google.com/p/solr-php-client/ in a project of mine and it worked just fine. -Sascha Asif wrote: > I am indexing the file using php curl library. I am stuck here with the code > echo "Stored in: " . "upload/" . $_F

indexing documents in Apache Solr using php-curl library

2012-07-02 Thread Asif
. curl_error($ch); } else { curl_close($ch); print "curl exited okay\n"; echo "Data returned...\n"; echo "\n"; echo $data; echo "----

indexing documents from a git repository

2012-05-25 Thread Welty, Richard
i have a need to incrementally index documents (probably MS Office/OpenOffice/pdf files) from a GIT repository using Tika. i'm expecting to run periodic pulls against the repository to find new and updated docs. does anyone have any experience and/or thoughts/suggestions that they'd like to sha

Re: Batch indexing documents using ContentStreamUpdateRequest

2011-11-04 Thread Tod
Answering my own question. ContentStreamUpdateRequest (csur) needs to be within the while loop not outside as I had it. Still not seeing any dramatic performance improvements over perl though (the point of this exercise). Indexing locks after about 30-45 minutes of activity, even a commit wo

Batch indexing documents using ContentStreamUpdateRequest

2011-11-04 Thread Tod
This is a code fragment of how I am doing a ContentStreamUpdateRequest using CommonHTTPSolrServer: ContentStreamBase.URLStream csbu = new ContentStreamBase.URLStream(url); InputStream is = csbu.getStream(); FastInputStream fis = new FastInputStream(is); csur.addContentStream(csbu); c

Re: Indexing documents with "complex multivalued fields"

2011-05-23 Thread anass talby
Thank you very much On Mon, May 23, 2011 at 4:27 PM, Stefan Matheis < matheis.ste...@googlemail.com> wrote: > Anass, > > what about combining them both into one? so to say: > 1|red > 2|green > > "synchronized" multivalued fields are not possible, afaik. > > Regards > Stefan > > On Mon, May 23, 20

Re: Indexing documents with "complex multivalued fields"

2011-05-23 Thread anass talby
Thank you Renaud. I appreciate your help On Mon, May 23, 2011 at 4:47 PM, Renaud Delbru wrote: > Hi, > > you could look at this recent thread [1], it is similar to your problem. > > [1] > http://search.lucidimagination.com/search/document/33ec1a98d3f93217/search_across_related_correlated_multiva

Re: Indexing documents with "complex multivalued fields"

2011-05-23 Thread Renaud Delbru
Hi, you could look at this recent thread [1], it is similar to your problem. [1] http://search.lucidimagination.com/search/document/33ec1a98d3f93217/search_across_related_correlated_multivalue_fields_in_solr#1f66876c782c78d5 -- Renaud Delbru On 23/05/11 14:40, anass talby wrote: Hi, I'm new

Re: Indexing documents with "complex multivalued fields"

2011-05-23 Thread Stefan Matheis
Anass, what about combining them both into one? so to say: 1|red 2|green "synchronized" multivalued fields are not possible, afaik. Regards Stefan On Mon, May 23, 2011 at 3:40 PM, anass talby wrote: > Hi, > > I'm new in solr and would like to index documents that have complex > multivalued fie

Indexing documents with "complex multivalued fields"

2011-05-23 Thread anass talby
Hi, I'm new in solr and would like to index documents that have complex multivalued fields. I do want to do something like: 1 1 red 2 green ... ... How can i do this with solr thanks in advance. -- Anass

Re: Multiple Cores with Solr Cell for indexing documents

2011-03-25 Thread Erick Erickson
jel...@openindex.io] > Sent: Friday, March 25, 2011 1:23 PM > To: solr-user@lucene.apache.org > Cc: Upayavira > Subject: Re: Multiple Cores with Solr Cell for indexing documents > > You can only set properties for a lib dir that must be used in solrconfig.xml. > You can use sharedLi

RE: Multiple Cores with Solr Cell for indexing documents

2011-03-25 Thread Brandon Waterloo
__ From: Markus Jelsma [markus.jel...@openindex.io] Sent: Friday, March 25, 2011 1:23 PM To: solr-user@lucene.apache.org Cc: Upayavira Subject: Re: Multiple Cores with Solr Cell for indexing documents You can only set properties for a lib dir that must be used in solrconfig.xml. You can use shared

Re: Multiple Cores with Solr Cell for indexing documents

2011-03-25 Thread Markus Jelsma
solr.xml file is > > sharedLib="lib">. That is housed in .../example/solr/. So, does it > > > look in .../example/lib or .../example/solr/lib? > > > > > > ~Brandon Waterloo > > > ____ > > > From: Markus Jelsma [markus.jel...@openindex.io] > > &g

Re: Multiple Cores with Solr Cell for indexing documents

2011-03-25 Thread Upayavira
jel...@openindex.io] > > Sent: Thursday, March 24, 2011 11:29 AM > > To: solr-user@lucene.apache.org > > Cc: Brandon Waterloo > > Subject: Re: Multiple Cores with Solr Cell for indexing documents > > > > Sounds like the Tika jar is not on the class path. Add it to a

Re: Multiple Cores with Solr Cell for indexing documents

2011-03-24 Thread Markus Jelsma
_ > From: Markus Jelsma [markus.jel...@openindex.io] > Sent: Thursday, March 24, 2011 11:29 AM > To: solr-user@lucene.apache.org > Cc: Brandon Waterloo > Subject: Re: Multiple Cores with Solr Cell for indexing documents > > Sounds like the Tika jar is not on the class pat

Multiple Cores with Solr Cell for indexing documents

2011-03-24 Thread Brandon Waterloo
Markus Jelsma [markus.jel...@openindex.io] Sent: Thursday, March 24, 2011 11:29 AM To: solr-user@lucene.apache.org Cc: Brandon Waterloo Subject: Re: Multiple Cores with Solr Cell for indexing documents Sounds like the Tika jar is not on the class path. Add it to a directory where Solr's looking f

RE: Multiple Cores with Solr Cell for indexing documents

2011-03-24 Thread Brandon Waterloo
Markus Jelsma [markus.jel...@openindex.io] Sent: Thursday, March 24, 2011 11:29 AM To: solr-user@lucene.apache.org Cc: Brandon Waterloo Subject: Re: Multiple Cores with Solr Cell for indexing documents Sounds like the Tika jar is not on the class path. Add it to a directory where Solr's looking f

Re: Multiple Cores with Solr Cell for indexing documents

2011-03-24 Thread Markus Jelsma
Sounds like the Tika jar is not on the class path. Add it to a directory where Solr's looking for libs. On Thursday 24 March 2011 16:24:17 Brandon Waterloo wrote: > Hello everyone, > > I've been trying for several hours now to set up Solr with multiple cores > with Solr Cell working on each core

Multiple Cores with Solr Cell for indexing documents

2011-03-24 Thread Brandon Waterloo
Hello everyone, I've been trying for several hours now to set up Solr with multiple cores with Solr Cell working on each core. The only items being indexed are PDF, DOC, and TXT files (with the possibility of expanding this list, but for now, just assume the only things in the index should be d

Multiple Cores with Solr Cell for indexing documents

2011-03-22 Thread Brandon Waterloo
Hello everyone, I've been trying for several hours now to set up Solr with multiple cores with Solr Cell working on each core. The only items being indexed are PDF, DOC, and TXT files (with the possibility of expanding this list, but for now, just assume the only things in the index should be

Re: Indexing documents with SOLR

2010-12-11 Thread Adam Estrada
Pankaj, Check this article out on how to get going with Nutch. http://bit.ly/dbBdK4This is a few months old so you will have to note that there is a new parameter called something like -SolrUrl that will allow you to update your solr index with the crawled data. For crawling your local file syste

Re: Indexing documents with SOLR

2010-12-10 Thread Adam Estrada
Nutch is also a great option if you want a crawler. I have found that you will need to use the latest version of PDFBox and a it's dependencies for better results. Also, make sure to set JAVA_OPT to something really large so that you won't exceed your heap size. Adam On Fri, Dec 10, 2010 at 6:27

Re: Indexing documents with SOLR

2010-12-10 Thread Tommaso Teofili
Hi Pankaj, you can find the needed documentation right here [1]. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/ExtractingRequestHandler 2010/12/10 pankaj bhatt > Hi All, > I am a newbie to SOLR and trying to integrate TIKA + SOLR. > Can anyone please guide me, how to achieve

Indexing documents with SOLR

2010-12-10 Thread pankaj bhatt
Hi All, I am a newbie to SOLR and trying to integrate TIKA + SOLR. Can anyone please guide me, how to achieve this. * My Req is:* I have a directory containing a lot of PDF,DOC's and i need to make a search within the documents. I am using SOLR web application. I just need some

Re: Indexing documents in multiple languages

2009-01-28 Thread Otis Gospodnetic
/ -- Lucene - Solr - Nutch - Original Message > From: Alejandro Valdez > To: solr-user@lucene.apache.org > Sent: Tuesday, January 27, 2009 3:05:40 PM > Subject: Indexing documents in multiple languages > > Hi, I plan to use solr to index a large number of document

Re: Indexing documents in multiple languages

2009-01-27 Thread Erick Erickson
First, I'd search the mail archive for the topic of languages, it's been discussed often and there's a wealth of information that might be of benefit, far more information than I can remember. As to whether your approach will be "too big, too slow...", you really haven't given enough information t

Indexing documents in multiple languages

2009-01-27 Thread Alejandro Valdez
Hi, I plan to use solr to index a large number of documents extracted from emails bodies, such documents could be in different languages, and a single document could be in more than one language. In the same way, the query string could be words in different languages. I read that a common approac

Re: Indexing documents

2007-11-19 Thread zqzuk
nts, creating analyzers, and indexing. >> >> Where can I start from, or could your please point me to the main api >> packages for doing these please, many thanks! >> -- >> View this message in context: >> http://www.nabble.com/Indexing-documents-tf483073

Re: Indexing documents

2007-11-19 Thread Grant Ingersoll
. Where can I start from, or could your please point me to the main api packages for doing these please, many thanks! -- View this message in context: http://www.nabble.com/Indexing-documents-tf4830738.html#a13820487 Sent from the Solr - User mailing list archive at Nabble.com

RE: indexing documents (or pieces of a document) by access controls

2007-06-13 Thread Ard Schrijvers
Hello, > When I had those kind of problems (less complex) with lucene, > the only > idea was to filter from the front-end, according to the ACL policy. > Lucene docs and fields weren't protected, but tagged. Searching was > always applied with a field "audience", with hierarchical values like

Re: indexing documents (or pieces of a document) by access controls

2007-06-13 Thread Frédéric Glorieux
Hello, > With all do respect, I really think the problem is largely underestimated here, and is far more complex then these suggestions...unless we are talking about 100.000 documents, couple of users, and updating ones a day. If you want millions of documents, facetted authorized navigatio

RE: indexing documents (or pieces of a document) by access controls

2007-06-13 Thread Ard Schrijvers
Hello, > Hi > > And about the fields, if they are/aren't going to be present on the > responses based on the user group, you can do it in many > different ways > (using XML transformation to remove the undesirable fields, > implementing > your own RequestHandler able to process your group > in

RE: indexing documents (or pieces of a document) by access controls

2007-06-13 Thread Ard Schrijvers
Hello, > Given the requirement to break down a document into separately > controlled pieces, I'd create a servlet that "fronts" the Solr > servlet and handles this conversion. I could think of ways to do it > using Solr, but they feel like unnatural acts. > > As a general comment on ACLs, one

Re: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Daniel Alheiros
Hi And about the fields, if they are/aren't going to be present on the responses based on the user group, you can do it in many different ways (using XML transformation to remove the undesirable fields, implementing your own RequestHandler able to process your group information, filtering the data

Re: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Ken Krugler
Hi all, Can anyone give me some advice on breaking a document up and indexing it by access control lists. What we have are xml documents that are transformed based on the user viewing it. Some users might see all of the document, while other may see a few fields, and yet others see nothing at a

RE: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Ard Schrijvers
Excuse me, I meant solr ofcourse :-) > For these reasons, I do not think you can achieve with solar

RE: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Ard Schrijvers
Hello Nate, IMHO, you will not be able to do this in solr unless you accept pretty hard constraints on your ACLs (I will get back to this in a moment). IMO, it is not possible to index documents along with ACLs. ACLs can be very fine grained, and the thing you describe, ACL specific parts of a

indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Nathaniel A. Johnson
Hi all, Can anyone give me some advice on breaking a document up and indexing it by access control lists. What we have are xml documents that are transformed based on the user viewing it. Some users might see all of the document, while other may see a few fields, and yet others see nothing at a