solr Indexing PDF attachments not working. in ubuntu

2016-01-23 Thread Moncif Aidi
HI,

I have a problem with integrating solr in Ubuntu server.Before using solr
on ubuntu server i tested it on my mac it was working perfectly. it indexed
my PDF,Doc,Docx documents.so after installing solr on ubuntu server and
using the same configuration files and librairies. i've found out that solr
doesn't index PDf documents.But i can search over .Doc and .Docx documents.
here some parts of my solrconfig.xml contents :


  



  true
  ignored_
  _text_

  


-- 
M:+212 658541045
Linkedin



|  Facebook
 |  *Skype :* moncif44


Indexing docuements in Solr 5 Using Tika extraction error

2016-03-25 Thread Moncif Aidi
 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:159)
... 9 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: -1
at java.lang.String.substring(Unknown Source)
at 
org.apache.tika.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:407)
at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:256)
at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:196)
at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:105)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:201)
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:172)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
... 12 more



Cordialement

*Moncif AIDI*. Ingénieur Chef d'équipe à TeslaTeam-Maroc
<http://www.teslateam.ma/>
M:+212 658 541 045 | T:+212 537 70 81 21
Linkedin
<https://www.linkedin.com/profile/view?id=131220035&trk=nav_responsive_tab_profile>
 | Facebook <https://www.facebook.com/M0ziNsof> | Twitter
<http://twitter.com/teslateam> | *Skype :* moncif44


Facet full-text

2018-03-06 Thread Moncif Aidi
Hello,

I am using Solr to power faceting features for our  application.

I know that SOLR can do free text search but what is the best practice for
faceting on common terms inside SOLR text fields?

For example, we have a large blob of text (a description of a property)
which contains useful text to facet on like 'city', 'formation', 'year',
'school', 'skill', ... dozens more like these.

I would like to create a view which lets users see the number of properties
with each of these terms and allow the users to drill down to the relevant
properties.

One obvious solution is to pre-process the data, parse the text, and create
the facets for each of these key phrases with a boolean yes/no value.

I'd ideally like to automate this, so I imagine the SOLR free text search
engine might allow this? e.g. Can I use the free text search engine to
remove stop words and collect counts of common phrases which we can then
present to the user?

If pre-processing is the only way, is there a common/best practice approach
to this or any open source libraries which perform this function?

What is the best practice for counting and grouping common phrases from a
text field in SOLR?


Cordialement

*Moncif AIDI*. Ingénieur Chef d'équipe à TeslaTeam-Maroc
<http://www.teslateam.ma/>
M:+212 658 541 045 | T:+212 537 70 81 21
Linkedin
<https://www.linkedin.com/profile/view?id=131220035&trk=nav_responsive_tab_profile>
 | Facebook <https://www.facebook.com/M0ziNsof> | Twitter
<http://twitter.com/teslateam> | *Skype :* moncif44