AW: Newbie in Solr

2017-03-26 Thread Ercan Karadeniz
Hi Alexandre, thanks for your response. I will check the provided URLs. Probably I will bother you with questions. Cheers, Ercan Von: Alexandre Rafalovitch Gesendet: Freitag, 24. März 2017 01:00 An: solr-user Betreff: Re: Newbie in Solr Glad to hear you

Re: to handle expired documents: collection alias or delete by id query

2017-03-26 Thread Derek Poh
Hi Tom The moving alias design is interesting, will explore it. Regarding themethod of creating the collection on a node for indexing only and adding replicas of it to other nodes for queryinguponcompletion of indexing. Am I right to say this is used in conjunction with collection alias or th

Re: Classify document using bag of words

2017-03-26 Thread Koji Sekiguchi
Hi, I'm not sure that it can help you but I'd like to show you the link of an article which I wrote about document classification years ago: Comparing Document Classification Functions of Lucene and Mahout http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.

Re: Multi word synonyms

2017-03-26 Thread Doug Turnbull
You might have stumbled on all these articles, but you can probably read our orgs progression with this problem as a play in 3 acts Act I Introducing the characters http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/ Act II Heroes Meet Despair http://open

Re: Multi word synonyms

2017-03-26 Thread John Blythe
Sure thing. Post back w what you find! Good luck- On Sun, Mar 26, 2017 at 3:36 PM Sanjana Sridhar wrote: > Hi John, > > Thanks for letting me know what works for you. I'm going to try that out. > Sounds like a suitable solution to my problem. > > Best, > Sanjana > > > > On Sun, Mar 26, 2017 at

Re: Classify document using bag of words

2017-03-26 Thread John Blythe
Glad to hear it! On Sun, Mar 26, 2017 at 3:49 PM Sergio García Maroto wrote: > Sorry it actually works. Thanks a lot. > > On 26 March 2017 at 21:45, Sergio García Maroto > wrote: > > > Hi John. > > thanks for that. > > > > That's actually a good option but I would need the category text on the

Re: Streaming expressions and result transfomers

2017-03-26 Thread Erick Erickson
No, streaming expressions don't flow through the regular document output process and don't support DocTransformers. Best, Erick On Sun, Mar 26, 2017 at 9:03 AM, adfel70 wrote: > Hi > does streaming expressions support doc transformers? > To be more specific, I have a nested docs data model. > I

RE: Index scanned documents

2017-03-26 Thread Phil Scadden
While building directly into Solr might be appealing, I would argue that it is best to use OCR software first, outside of SOLR, to convert the PDF into "searchable" PDF format. That way when the document is retrieved, it is a lot more useful to the searcher - making it easy to find the text with

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.175.190.72:8999/solr/product: Rollback is currently not supported in SolrCloud mode. (SOLR-4895)

2017-03-26 Thread Mikhail Ibraheem
Hi, When I try to rollback in solrCloud I get this exception : org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.175.190.72:8999/solr/product: Rollback is currently not supported in SolrCloud mode. (SOLR-4895) Does that mean there is no rol

Re: Classify document using bag of words

2017-03-26 Thread Sergio García Maroto
Hi John. thanks for that. That's actually a good option but I would need the category text on the field so I can facet on the field and get every category and the number. On 26 March 2017 at 18:27, John Blythe wrote: > You could use keepwords to filter out any other words besides your bag and >

Re: Classify document using bag of words

2017-03-26 Thread Sergio García Maroto
Sorry it actually works. Thanks a lot. On 26 March 2017 at 21:45, Sergio García Maroto wrote: > Hi John. > thanks for that. > > That's actually a good option but I would need the category text on the > field so I can facet on the field and get every category and the number. > > On 26 March 2017

Re: Multi word synonyms

2017-03-26 Thread Sanjana Sridhar
Hi John, Thanks for letting me know what works for you. I'm going to try that out. Sounds like a suitable solution to my problem. Best, Sanjana On Sun, Mar 26, 2017 at 12:30 PM, John Blythe wrote: > I use the keyword tokenizer and then pattern replace to transform multi > words into undersco

Re: Multi word synonyms

2017-03-26 Thread John Blythe
I use the keyword tokenizer and then pattern replace to transform multi words into underscore connected tokens. For instance, "Burger Joint" transforms to "burger_joint" which then looks in my synonym filter for underscored synonyms. When it matches I then replace underscores with spaces or just to

Re: Classify document using bag of words

2017-03-26 Thread John Blythe
You could use keepwords to filter out any other words besides your bag and then have a synonym filter that translates the remaining word(s) to a corresponding category/classification On Sun, Mar 26, 2017 at 12:05 PM marotosg wrote: > Hi, > > I have a very simple use case where I would need to cl

Classify document using bag of words

2017-03-26 Thread marotosg
Hi, I have a very simple use case where I would need to classify a document using a bag of words. Basically if a field within the document contains any of the words on my bag then I use a new field to assign a category to the document. Is this something achievable on Solr? I was thinking on usin

Streaming expressions and result transfomers

2017-03-26 Thread adfel70
Hi does streaming expressions support doc transformers? To be more specific, I have a nested docs data model. I want to use streaming expressions and get the results with ChildDocTransformerFactory. Is it possible? -- View this message in context: http://lucene.472066.n3.nabble.com/Streaming-ex

Multi word synonyms

2017-03-26 Thread Sanjana Sridhar
Hello, Does anyone have a good solution for working with multi word synonyms? I've been reading a lot about this online and haven't really found a great solution to it. I use the SynonymFilterFactory at index time, but words don't really get matched to the appropriate multi word synonyms, even tho

Re: Index scanned documents

2017-03-26 Thread Arian Pasquali
Hi Walled, I've never done that with solr, but you would probably need to use some OCR preprocessing before indexing. The most popular library I know for the job is tesseract-orc . If you want to do that inside solr I've found that Tika has some support for that

Re: Index scanned documents

2017-03-26 Thread Zheng Lin Edwin Yeo
I'm also working on this issue right now, to extract the text in the scanned image in PDF files. >From what I know, we can use Tesseract OCR to extract the text in the image through Apache Tika, and it will come together with the Solr. By the way, which Solr version are you using? Regards, Edwin

Index scanned documents

2017-03-26 Thread Waleed Raza
Hello I want to ask you that how can we extract in solr text from images which are inside pdf and MS office documents ? i found many websites but did not get a reply of it please guide me.

Re: Index scanned documents

2017-03-26 Thread Waleed Raza
Hello I want to ask you that how can we extract text in solr from images which are inside pdf and MS office documents ? i found many websites but did not get a reply of it please guide me. On Sun, Mar 26, 2017 at 2:57 PM, Waleed Raza wrote: > Hello > I want to ask you that how can we extract in