LucidWorks Solr

2010-04-18 Thread Andy
Just wanted to know if anyone has used LucidWorks Solr. 

- How do you compare it to the standard Apache Solr?

- the non-blocking IO of LucidWorks Solr -- is that for networking IO or disk 
IO? what are its effects?

- LucidWorks website also talked about "significantly improved faceting 
performance" -- what improvements are they? How much improvements?

Would you recommend using it?

Thanks.


  


Re: LucidWorks Solr

2010-04-18 Thread Paolo Castagna

Thanks for asking, I am interested as well in reading the response to
your questions.

Paolo

Andy wrote:
Just wanted to know if anyone has used LucidWorks Solr. 


- How do you compare it to the standard Apache Solr?

- the non-blocking IO of LucidWorks Solr -- is that for networking IO or disk 
IO? what are its effects?

- LucidWorks website also talked about "significantly improved faceting 
performance" -- what improvements are they? How much improvements?

Would you recommend using it?

Thanks.


  


Autofill 'id' field with the URL of files posted to Solr?

2010-04-18 Thread Praveen Agrawal
Hi,
I need to submit thousands of online PDF/html files to Solr. I can submit
one file using SolrJ (StreamingUpdateSolrServer and
..solr.common.util.ContentStreamBase.URLStream), setting
literal.idparameter to the url. I can't do the same with a batch of
multiple files, as
their 'id' should be unique (set to their urls).

I couldn't get this to work. Is there a way to somehow get the 'id' field
set automatically to the url of the files posted to Solr (something like to
'stream_name')? How to set this in solrconfig.xml or schema.xml?  or any
other way?

Thanks.


Autofill 'id' field with the URL of files posted to Solr?

2010-04-18 Thread pk

Hi,
I need to submit thousands of online PDF/html files to Solr. I can submit
one file using SolrJ (StreamingUpdateSolrServer and
..solr.common.util.ContentStreamBase.URLStream), setting literal.id
parameter to the url. I can't do the same with a batch of multiple files, as
their 'id' should be unique (set to their urls).

I couldn't get this to work. Is there a way to somehow get the 'id' field
set automatically to the url of the files posted to Solr (something like to
'stream_name')? How to set this in solrconfig.xml or schema.xml?  or any
other way?

If their url can be put in some other field (like 'url' iitself) that will
also serve my purpose.

Thanks for your help.
-- 
View this message in context: 
http://n3.nabble.com/Autofill-id-field-with-the-URL-of-files-posted-to-Solr-tp727985p727985.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet count problem

2010-04-18 Thread Ranveer Kumar
I am.using text for type, which is static. For example: type is a field and
I am using type for categorization. For news type I am using news and for
blog using blog.. type is a text field.

On Apr 17, 2010 8:38 PM, "Ahmet Arslan"  wrote:

> I am facing problem to get facet result count. I must be > wrong
somewhere. > I am getting proper ...
Are you faceting on a tokenized field? What is the fieldType of your field?


Solr throws TikaException while parsing sample PDF

2010-04-18 Thread pk

Hi,
while posting a sample pdf (that comes with Solr dist'n) to solr, i'm
getting a TikaException. 
Using Solr-1.4, SolrJ (StreamingUpdateSolrServer) for posting pdf to solr.
Other sample pdfs can be parsed and indexed successfully.. I;m getting same
error with some other pdfs also (but adobe reader can open them fine, so i
dont think they have an issue in formatting or are corrupt etc)... Here is
the trace...


found uploaded file : C:\solr_1.4.0\docs\Installing Solr in Tomcat.pdf ::
size=286242
Apr 18, 2010 10:31:34 PM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {} 0 640
Apr 18, 2010 10:31:34 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Una
ble to extract PDF content
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocu
mentLoader.java:211)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStrea
mHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.jav
a:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(Re
questHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241
)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFil
terChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain
.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:
172)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:10
8)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:873)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConn
ection(Http11BaseProtocol.java:665)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:5
28)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorke
rThread.java:81)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:6
89)
at java.lang.Thread.run(Thread.java:595)
Caused by: org.apache.tika.exception.TikaException: Unable to extract PDF
content
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:58)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:51)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocu
mentLoader.java:190)
... 20 more
Caused by: java.util.zip.ZipException: incorrect header check
at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
at org.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:101)
at org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:132)
at
org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:202)
at
org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
at
org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:53)
... 24 more

Apr 18, 2010 10:31:34 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract
params={wt=javabin&waitFlush=true&literal.index
Date=2010-04-18+&commit=true&waitSearcher=true&version=1&literal.id=C%253A%255Csolr_1.4.0%
255Cdocs%255CInstalling%2BSolr%2Bin%2BTomcat.pdf} status=500 QTime=640
Exception in handling an uplaoded file:C:\solr_1.4.0\docs\Installing Solr in
Tomcat.pdf :
Internal Server Error

Internal Server Error

request:
http://localhost:8080/solr/update/extract?literal.id=

Re: Solr Schema Question

2010-04-18 Thread Serdar Sahin
Thanks everyone, It works! I have successfully indexed them. Thanks again!

I have couple of more questions regarding with solr, if you don't mind.

1-) As I said before, the text files are quite large, between
100kb-10mb, but I need to store them as well for highlighting,
including with their title, description, tags (I concat tags while
fetching from the db, and treat them as one row). For search result on
the page, I have to get;

username  (string)
lang (string)
cat (string)
view_count (int)
imgid (int)
thumbs_up (int)
thumbs_down (int)

these columns as well. These columns are not used for indexing, just
for storing. Do you think it is better idea to store these columns as
well and not query the database? Or, I can just get the ids and query
the database myself. Which approach is better from memory usage and
performance perspective? I was using Sphinx for full text searching on
my production websites, so I am not used to this format as Sphinx only
returns document IDs.

2-) I was using Sphinx for other purposes as well, like "browse"
section on the website. http://www.youtube.com/videos. It gives better
performance on large datasets (sorting, ordering etc). I know some
people also use solr(lucene) for this, but I have not seen any website
that use solr on their "browse" section without using Facets. So, even
if I don't use Facets, is it still useful to use solr on that section?
I will be storing a large amount of data on solr, and expect to have 1
TB data after 6-8 months.

3-) I will be using http://wiki.apache.org/solr/MoreLikeThis option
too. As I said the text files are large. Do you have any suggestions
regarding with this feature?

Thanks again,





On Sun, Apr 18, 2010 at 7:53 AM, Lance Norskog  wrote:
> Man you people are fast!
>
> There is a bug in Solr/Lucene. It keeps memory around from previous
> fields, so giant text files might run out of memory when they should
> not. This bug is fixed in the trunk.
>
> On 4/17/10, Lance Norskog  wrote:
>> The DataImportHandler can let you fetch the file name from the
>> database record, and then load the file as a field and process the
>> text with Tika.
>>
>> It will not be easy :) but it is possible.
>>
>> http://wiki.apache.org/solr/DataImportHandler
>>
>> On 4/17/10, Serdar Sahin  wrote:
>>> Hi,
>>>
>>> I am rather new to Solr and have a question.
>>>
>>> We have around 200.000 txt files which are placed into the file cloud.
>>> The file path is something similar to this:
>>>
>>> file/97/8f/840/fa4-1.txt
>>> file/a6/9d/ab0/ca2-2.txt etc.
>>>
>>> and we also store the metadata (like title, description, tags etc)
>>> about these files in the mysql server. So, what I want to do is to
>>> index title, description, tags and other data from mysql, and also get
>>> the txt file from file server, and link them as one record for
>>> searching, but I could not figure out how to automatize this process.
>>> I can give the path from the sql query like, Select id, title,
>>> description, file_path, and then solr can use this path to retrieve
>>> txt file, but I don't know whether is it possible or not.
>>>
>>> What is the best way to index these files with their tag title and
>>> description without coding in Java (Perl is ok). These txt files are
>>> large, between 100kb-10mb, so the last option is to store them in the
>>> database.
>>>
>>> Thanks,
>>>
>>> Serdar
>>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: Solr throws TikaException while parsing sample PDF

2010-04-18 Thread Grant Ingersoll
Can you extract content from this using Tika's standalone command line tool?  
PDF's are notorious for problems in extracting.  To me, it looks like a bug in 
PDFBox.  I would try to isolate it down to there and then send, if possible, 
the sample document to PDFBox and see if they can come up w/ a fix.

-Grant

On Apr 18, 2010, at 1:12 PM, pk wrote:

> 
> Hi,
> while posting a sample pdf (that comes with Solr dist'n) to solr, i'm
> getting a TikaException. 
> Using Solr-1.4, SolrJ (StreamingUpdateSolrServer) for posting pdf to solr.
> Other sample pdfs can be parsed and indexed successfully.. I;m getting same
> error with some other pdfs also (but adobe reader can open them fine, so i
> dont think they have an issue in formatting or are corrupt etc)... Here is
> the trace...
> 
> 
> found uploaded file : C:\solr_1.4.0\docs\Installing Solr in Tomcat.pdf ::
> size=286242
> Apr 18, 2010 10:31:34 PM org.apache.solr.update.processor.LogUpdateProcessor
> finish
> INFO: {} 0 640
> Apr 18, 2010 10:31:34 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: Una
> ble to extract PDF content
>at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocu
> mentLoader.java:211)
>at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStrea
> mHandlerBase.java:54)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.jav
> a:131)
>at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(Re
> questHandlers.java:233)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241
> )
>at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFil
> terChain.java:215)
>at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain
> .java:188)
>at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
> 213)
>at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:
> 172)
>at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
>at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:10
> 8)
>at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
>at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:873)
>at
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConn
> ection(Http11BaseProtocol.java:665)
>at
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:5
> 28)
>at
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorke
> rThread.java:81)
>at
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:6
> 89)
>at java.lang.Thread.run(Thread.java:595)
> Caused by: org.apache.tika.exception.TikaException: Unable to extract PDF
> content
>at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:58)
>at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:51)
>at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
>at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
>at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocu
> mentLoader.java:190)
>... 20 more
> Caused by: java.util.zip.ZipException: incorrect header check
>at
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
>at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
>at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
>at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
>at org.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:101)
>at org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:132)
>at
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:202)
>at
> org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
>at
> org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
>at
> org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
>at
> org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
>at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
>at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:53)
> 

Re: LucidWorks Solr

2010-04-18 Thread Grant Ingersoll

On Apr 18, 2010, at 3:53 AM, Andy wrote:

> Just wanted to know if anyone has used LucidWorks Solr. 
> 
> - How do you compare it to the standard Apache Solr?

We take a release of Solr.  We wrap it w/ an installer, tomcat/jetty, our 
reference guide, Luke, etc.  We also add in an optimized version of KStem.  
Finally, we apply certain patches that came after whatever the release was that 
didn't make it into the release (we usually delay our release by a few weeks).  
Many of these things we package simply cannot be in an ASF release b/c of ASF 
policies, others are there for convenience so that people don't have to go all 
over the web to get them.

> 
> - the non-blocking IO of LucidWorks Solr -- is that for networking IO or disk 
> IO? what are its effects?

I think this is a legacy from the 1.3 CD on our website.  I believe what this 
is referring to is in Solr 1.4, as it was a patch that was applied to trunk 
after 1.3 was released.   I'll let our web team know to update that.

> 
> - LucidWorks website also talked about "significantly improved faceting 
> performance" -- what improvements are they? How much improvements?

Same as the previous issue.   I'll let our web team know to update that.

> 
> Would you recommend using it?
> 

Sure, but I'm biased. ;-)  Hopefully, you will find it useful, but choose the 
one that best fits your needs (and let me know if you need help assessing that.)

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: geometric distance

2010-04-18 Thread Darren Govoni
FAIK, There are no columns per se. But in the past I've just used UTM
values for each lat lon and just do
basic numeric operators >, < to search within a bounding geographic
region. Add them as numeric fields though. Easy.

There is new support for spatial searching, however I'm not sure how it
compares to what I described, which works great.
Probably does some automatic conversions or something. Check the wiki.

On Sat, 2010-04-17 at 18:39 -0700, Dennis Gearon wrote:

> How does solr/lucene do geometric distances?
> 
> Does it use a GEOS point datum, or two columns one for latitude, one for 
> longitude?
> 
> 
> Dennis Gearon
> 
> Signature Warning
> 
> EARTH has a Right To Life,
>   otherwise we all die.
> 
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php




Re: Autofill 'id' field with the URL of files posted to Solr?

2010-04-18 Thread Lance Norskog
The DataImportHandler has a tool for doing PDF extraction. This allows
you to create new fields, do multiple files, and supply lists of
access to get the multiple files.

http://wiki.apache.org/solr/TikaEntityProcessor

On Sun, Apr 18, 2010 at 9:52 AM, pk  wrote:
>
> Hi,
> I need to submit thousands of online PDF/html files to Solr. I can submit
> one file using SolrJ (StreamingUpdateSolrServer and
> ..solr.common.util.ContentStreamBase.URLStream), setting literal.id
> parameter to the url. I can't do the same with a batch of multiple files, as
> their 'id' should be unique (set to their urls).
>
> I couldn't get this to work. Is there a way to somehow get the 'id' field
> set automatically to the url of the files posted to Solr (something like to
> 'stream_name')? How to set this in solrconfig.xml or schema.xml?  or any
> other way?
>
> If their url can be put in some other field (like 'url' iitself) that will
> also serve my purpose.
>
> Thanks for your help.
> --
> View this message in context: 
> http://n3.nabble.com/Autofill-id-field-with-the-URL-of-files-posted-to-Solr-tp727985p727985.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com


Re: Solr Schema Question

2010-04-18 Thread Lance Norskog
Highlighting is a complex topic. A field has to be stored to be
highlight. It does not have to be indexed. But, if it is not,
highlighting analyzes it just like if it was indexed in order to
highlight it.

http://www.lucidimagination.com/search/document/CDRG_ch07_7.9?q=highlighting

http://www.lucidimagination.com/blog/2009/02/17/highlighting-highlighter-thoughts/

On Sun, Apr 18, 2010 at 10:12 AM, Serdar Sahin  wrote:
> Thanks everyone, It works! I have successfully indexed them. Thanks again!
>
> I have couple of more questions regarding with solr, if you don't mind.
>
> 1-) As I said before, the text files are quite large, between
> 100kb-10mb, but I need to store them as well for highlighting,
> including with their title, description, tags (I concat tags while
> fetching from the db, and treat them as one row). For search result on
> the page, I have to get;
>
> username  (string)
> lang (string)
> cat (string)
> view_count (int)
> imgid (int)
> thumbs_up (int)
> thumbs_down (int)
>
> these columns as well. These columns are not used for indexing, just
> for storing. Do you think it is better idea to store these columns as
> well and not query the database? Or, I can just get the ids and query
> the database myself. Which approach is better from memory usage and
> performance perspective? I was using Sphinx for full text searching on
> my production websites, so I am not used to this format as Sphinx only
> returns document IDs.
>
> 2-) I was using Sphinx for other purposes as well, like "browse"
> section on the website. http://www.youtube.com/videos. It gives better
> performance on large datasets (sorting, ordering etc). I know some
> people also use solr(lucene) for this, but I have not seen any website
> that use solr on their "browse" section without using Facets. So, even
> if I don't use Facets, is it still useful to use solr on that section?
> I will be storing a large amount of data on solr, and expect to have 1
> TB data after 6-8 months.
>
> 3-) I will be using http://wiki.apache.org/solr/MoreLikeThis option
> too. As I said the text files are large. Do you have any suggestions
> regarding with this feature?
>
> Thanks again,
>
>
>
>
>
> On Sun, Apr 18, 2010 at 7:53 AM, Lance Norskog  wrote:
>> Man you people are fast!
>>
>> There is a bug in Solr/Lucene. It keeps memory around from previous
>> fields, so giant text files might run out of memory when they should
>> not. This bug is fixed in the trunk.
>>
>> On 4/17/10, Lance Norskog  wrote:
>>> The DataImportHandler can let you fetch the file name from the
>>> database record, and then load the file as a field and process the
>>> text with Tika.
>>>
>>> It will not be easy :) but it is possible.
>>>
>>> http://wiki.apache.org/solr/DataImportHandler
>>>
>>> On 4/17/10, Serdar Sahin  wrote:
 Hi,

 I am rather new to Solr and have a question.

 We have around 200.000 txt files which are placed into the file cloud.
 The file path is something similar to this:

 file/97/8f/840/fa4-1.txt
 file/a6/9d/ab0/ca2-2.txt etc.

 and we also store the metadata (like title, description, tags etc)
 about these files in the mysql server. So, what I want to do is to
 index title, description, tags and other data from mysql, and also get
 the txt file from file server, and link them as one record for
 searching, but I could not figure out how to automatize this process.
 I can give the path from the sql query like, Select id, title,
 description, file_path, and then solr can use this path to retrieve
 txt file, but I don't know whether is it possible or not.

 What is the best way to index these files with their tag title and
 description without coding in Java (Perl is ok). These txt files are
 large, between 100kb-10mb, so the last option is to store them in the
 database.

 Thanks,

 Serdar

>>>
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Facet count problem

2010-04-18 Thread Erick Erickson
Can we see the actual field definitions from your schema file.
Ahmet's question is vital and is best answered if you'll
copy/paste the relevant configuration entries But based
on what you *have* posted, I'd guess you're trying to
facet on tokenized fields, which is not recommended.

You might take a look at:
http://wiki.apache.org/solr/UsingMailingLists, it'll help you
frame your questions in a manner that gets you your
answers as fast as possibld.

Best
Erick

On Sun, Apr 18, 2010 at 12:59 PM, Ranveer Kumar wrote:

> I am.using text for type, which is static. For example: type is a field and
> I am using type for categorization. For news type I am using news and for
> blog using blog.. type is a text field.
>
> On Apr 17, 2010 8:38 PM, "Ahmet Arslan"  wrote:
>
> > I am facing problem to get facet result count. I must be > wrong
> somewhere. > I am getting proper ...
> Are you faceting on a tokenized field? What is the fieldType of your field?
>


Re: DIH dataimport.properties with

2010-04-18 Thread Michael Tibben
Because there is a lot of data, and for scalability reasons we want all 
non-write operations to happen from a slave - we don't want to be using 
the master unless necessary



On 17/04/10 08:28, Otis Gospodnetic wrote:

Hm, why not just go to the MySQL master then?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
   

From: Michael Tibben
To: solr-user@lucene.apache.org
Sent: Thu, April 15, 2010 10:15:14 PM
Subject: DIH dataimport.properties with

Hi,
 

I am using the DIH to import data from a mysql slave. However, the
   

slave sometimes runs behind the master. The delay is variable, most of the time
it is in sync, but sometimes can run behind by a few minutes.
 

This is a
   

problem, because DIH uses dataimport.properties to determine the last_index_time
for delta updates. This last_index_time does not correspond to the position of
the slave, and so documents are being missed.
 

What I need to be able to
   

do is tell DIH what the last_index_time should be. Or alternatively, be able to
specify another property in dataimport.properties, perhaps called
datasource_version or similar.
 

Is this possible?


I have
   

thought of a sneaky way to hack around the issue. Just before the delta update
is run, I will switch the system time to the mysql slave's replication time. The
system is used for nothing but solr master, so I think this should work OK. Any
thoughts?
 

Regards,

Michael
   


Re: DIH dataimport.properties with

2010-04-18 Thread Michael Tibben

I don't really understand how this will help. Can you elaborate ?

Do you mean that the last_index_time can be imported from somewhere 
outside solr?  But I need to be able to *set* what last_index_time is 
stored in dataimport.properties, not get properties from somewhere else




On 18/04/10 10:02, Lance Norskog wrote:

The SolrEntityProcessor allows you to query a Solr instance and use
the results as DIH properties. You would have to create your own
regular query to do the delta-import instead of using the delta-import
feature.

https://issues.apache.org/jira/browse/SOLR-1499

On 4/16/10, Otis Gospodnetic  wrote:
   

Hm, why not just go to the MySQL master then?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 

From: Michael Tibben
To: solr-user@lucene.apache.org
Sent: Thu, April 15, 2010 10:15:14 PM
Subject: DIH dataimport.properties with

Hi,
   

I am using the DIH to import data from a mysql slave. However, the
 

slave sometimes runs behind the master. The delay is variable, most of the
time
it is in sync, but sometimes can run behind by a few minutes.
   

This is a
 

problem, because DIH uses dataimport.properties to determine the
last_index_time
for delta updates. This last_index_time does not correspond to the
position of
the slave, and so documents are being missed.
   

What I need to be able to
 

do is tell DIH what the last_index_time should be. Or alternatively, be
able to
specify another property in dataimport.properties, perhaps called
datasource_version or similar.
   

Is this possible?


I have
 

thought of a sneaky way to hack around the issue. Just before the delta
update
is run, I will switch the system time to the mysql slave's replication
time. The
system is used for nothing but solr master, so I think this should work
OK. Any
thoughts?
   

Regards,

Michael

 


   


Re: Facet count problem

2010-04-18 Thread Ranveer Kumar
Hi Erick,

My schema configuration is following.


 
  









  
  
  
  



   





  






 





On Mon, Apr 19, 2010 at 6:22 AM, Erick Erickson wrote:

> Can we see the actual field definitions from your schema file.
> Ahmet's question is vital and is best answered if you'll
> copy/paste the relevant configuration entries But based
> on what you *have* posted, I'd guess you're trying to
> facet on tokenized fields, which is not recommended.
>
> You might take a look at:
> http://wiki.apache.org/solr/UsingMailingLists, it'll help you
> frame your questions in a manner that gets you your
> answers as fast as possibld.
>
> Best
> Erick
>
> On Sun, Apr 18, 2010 at 12:59 PM, Ranveer Kumar  >wrote:
>
> > I am.using text for type, which is static. For example: type is a field
> and
> > I am using type for categorization. For news type I am using news and for
> > blog using blog.. type is a text field.
> >
> > On Apr 17, 2010 8:38 PM, "Ahmet Arslan"  wrote:
> >
> > > I am facing problem to get facet result count. I must be > wrong
> > somewhere. > I am getting proper ...
> > Are you faceting on a tokenized field? What is the fieldType of your
> field?
> >
>


Re: LucidWorks Solr

2010-04-18 Thread Andy


--- On Sun, 4/18/10, Grant Ingersoll  wrote:
 
> 
> Sure, but I'm biased. ;-)  Hopefully, you will find it
> useful, but choose the one that best fits your needs (and
> let me know if you need help assessing that.)
> 

Thanks for the explanation Grant.

WHat is the advantage of KStem over the standard Solr stemmer?

On your website it was mentioned that KStem only works for English. What would 
happen if some of my documents are in other languages? What about the standard 
Solr stemmer -- does it also work on English only?

Is there a stemmer that's sort of "universal" & work on multiple languages?





Re: Autofill 'id' field with the URL of files posted to Solr?

2010-04-18 Thread pk

Lance,
I can submit and extract pdf contents using Solr and SolrJ, as i indicated
earlier. 
I've made 'id' a mandatory field and i had to submit its value while
submitting (request.addParams("literal.id",url))..

If i put multiple files/streams in the request, then i can't put 'id' this
way as the params are common to all files/streams which is not what i want.

If somehow i can map stream_name/url of the files to 'id' field, that's all
i need.
Thanks.

-- 
View this message in context: 
http://n3.nabble.com/Autofill-id-field-with-the-URL-of-files-posted-to-Solr-tp727985p728932.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query regarding "copyField"

2010-04-18 Thread Sandhya Agarwal
Hello,

Is it a problem if I use *copyField* for some fields and not for others. In my 
query, I have both fields, the ones mentioned in copyField and ones that are 
not copied to a common destination. Will this cause an anomaly in my search 
results. I am seeing some weird behavior.

Thanks,
Sandhya