semantic Search in Farsi News for more relevant results returned from search engines

2012-08-25 Thread Alireza Kh


 
Best regard

- Forwarded Message -
From: Alireza Kh 
To: "u...@uima.apache.org"  
Sent: Tuesday, August 21, 2012 4:14 PM
Subject: 
 

I am a graduate student .my name's Ali Raza Khodabakhshi.  My thesis title 
is(semantic Search in Farsi News for more relevant results returned from search 
engines). I've done the research, I realized that softwares (( 
solr-nutch-siren-uima) can help me on this, but I doubt it in some aspects.

Best regard
-1  Whether these applications fully support Persian language?
2-For semantic search engines have another tool to add to the above list?
Faithfully yours
>>MSc, Computer Engineer (Software)

Re: semantic Search in Farsi News for more relevant results returned from search engines

2012-08-25 Thread Jack Krupansky
Could you detail the specific requirements for "fully support Persian 
language"?


What are the qualities, aspects, and characteristics that need support, both 
for indexing of content and processing of queries?


-- Jack Krupansky

-Original Message- 
From: Alireza Kh

Sent: Saturday, August 25, 2012 6:20 AM
To: solr-user@lucene.apache.org
Subject: semantic Search in Farsi News for more relevant results returned 
from search engines





Best regard

- Forwarded Message -
From: Alireza Kh 
To: "u...@uima.apache.org" 
Sent: Tuesday, August 21, 2012 4:14 PM
Subject:


I am a graduate student .my name's Ali Raza Khodabakhshi.  My thesis title 
is(semantic Search in Farsi News for more relevant results returned from 
search engines). I've done the research, I realized that softwares (( 
solr-nutch-siren-uima) can help me on this, but I doubt it in some aspects.


Best regard
-1  Whether these applications fully support Persian language?
2-For semantic search engines have another tool to add to the above list?
Faithfully yours
MSc, Computer Engineer (Software) 




RE: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser

2012-08-25 Thread Fuad Efendi

This is bug in Solr 4.0.0-Beta Schema Browser: "Load Term Info" shows "9682
News", but direct query shows 3577.

/solr/core0/select?q=channel:News&facet=true&facet.field=channel&rows=0



0
1

true
channel:News
channel
0







3577
0
0
0





 


-Original Message-
Sent: August-24-12 11:29 PM
To: solr-user@lucene.apache.org
Cc: sole-...@lucene.apache.org
Subject: RE: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser
Importance: High

Any news? 
CC: Dev


-Original Message-
Subject: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser

Hi there,

"Load term Info" shows 3650 for a specific term "MyTerm", and when I execute
query "channel:MyTerm" it shows 650 documents foundŠ possibly bugŠ it
happens after I commit data too, nothing changes; and this field is
single-valued non-tokenized string.

-Fuad

--
Fuad Efendi
416-993-2060
http://www.tokenizer.ca






Re: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser

2012-08-25 Thread Ryan McKinley
If you optimize the index, are the results the same?

maybe it is showing counts for deleted docs (i think it does... and
this is expected)

ryan


On Sat, Aug 25, 2012 at 9:57 AM, Fuad Efendi  wrote:
>
> This is bug in Solr 4.0.0-Beta Schema Browser: "Load Term Info" shows "9682
> News", but direct query shows 3577.
>
> /solr/core0/select?q=channel:News&facet=true&facet.field=channel&rows=0
>
> 
> 
> 0
> 1
> 
> true
> channel:News
> channel
> 0
> 
> 
> 
> 
> 
> 
> 
> 3577
> 0
> 0
> 0
> 
> 
> 
> 
> 
> 
>
>
> -Original Message-
> Sent: August-24-12 11:29 PM
> To: solr-user@lucene.apache.org
> Cc: sole-...@lucene.apache.org
> Subject: RE: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser
> Importance: High
>
> Any news?
> CC: Dev
>
>
> -Original Message-
> Subject: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser
>
> Hi there,
>
> "Load term Info" shows 3650 for a specific term "MyTerm", and when I execute
> query "channel:MyTerm" it shows 650 documents foundŠ possibly bugŠ it
> happens after I commit data too, nothing changes; and this field is
> single-valued non-tokenized string.
>
> -Fuad
>
> --
> Fuad Efendi
> 416-993-2060
> http://www.tokenizer.ca
>
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>


Re: Solr Score threshold 'reasonably', independent of results returned

2012-08-25 Thread Ramzi Alqrainy
It will never return no result because its relative to score in previous
result

If score<0.25*last_score then stop

Since score>0 and last score is 0 for initial hit it will not stop



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Score-threshold-reasonably-independent-of-results-returned-tp4002312p4003247.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Score threshold 'reasonably', independent of results returned

2012-08-25 Thread Ramzi Alqrainy
You are right Mr.Ravish, because this depends on (ranking and search fields)
formula, but please allow me to tell you that Solr score can help us to
define this document is relevant or not in some cases. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Score-threshold-reasonably-independent-of-results-returned-tp4002312p4003248.html
Sent from the Solr - User mailing list archive at Nabble.com.


RecursivePrefixTreeStrategy class not found

2012-08-25 Thread Jones, Dan
According to the document I was reading here:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

First, you must register a spatial field type in the Solr schema.xml file. The 
instructions in this whole document imply the 
RecursivePrefixTreeStrategy
 based field type used in a geospatial context.




I need to set the fieldType to 
RecursivePrefixTreeStrategy
 and of course, I'm getting class not found. I'm using the latest solr 
4.0.0-BETA

I have a field that I would like to import into solr that is a MULTIPOLYGON

For Example:
TUVTuvalu  MULTIPOLYGON (((179.21322733454343 -8.561290924154292, 
179.20240933453334 -8.465417924064994, 179.2183813345482 -8.481890924080346, 
179.2251453345545 -8.492217924089957, 179.23109133456006 -8.50491792410179, 
179.23228133456115 -8.51841792411436, 179.23149133456042 -8.533499924128407, 
179.22831833455746 -8.543426924137648, 179.22236333455191 -8.554145924147633, 
179.21322733454343 -8.561290924154292)), ((177.2902543327525 
-6.114445921875486, 177.28137233274424 -6.109863921871224, 177.27804533274116 
-6.099445921861516, 177.28137233274424 -6.089445921852203, 177.3055273327667 
-6.10597292186759, 177.2958093327577 -6.113890921874969, 177.2902543327525 
-6.114445921875486)), ((176.30636333183617 -6.288335922037433, 
176.29871833182904 -6.285135922034456, 176.29525433182584 -6.274581922024623, 
176.30601833183584 -6.260135922011173, 176.31198133184142 -6.28215492203168, 
176.30636333183617 -6.288335922037433)), ((178.69580033406152 
-7.484163923151129, 178.68885433405507 -7.480835923148035, 178.68878133405497 
-7.467572923135677, 178.7017813340671 -7.475208923142787, 178.69580033406152 
-7.484163923151129)))


Since the LSP was moved into Solr, would there be a different name for the 
class?
(I'm not sure the factory class above can be found yet either)

Any help would be much appreciated!





This communication (including all attachments) is intended solely for
the use of the person(s) to whom it is addressed and should be treated
as a confidential AAA communication. If you are not the intended
recipient, any use, distribution, printing, or copying of this email is
strictly prohibited. If you received this email in error, please
immediately delete it from your system and notify the originator. Your
cooperation is appreciated.


RE: RecursivePrefixTreeStrategy class not found

2012-08-25 Thread Jones, Dan
SORRY!

RecursivePrefixTreeFieldType cannot be found!




Sent: Saturday, August 25, 2012 6:30 PM
To: solr-user@lucene.apache.org
Subject: RecursivePrefixTreeStrategy class not found

According to the document I was reading here:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

First, you must register a spatial field type in the Solr schema.xml file. The 
instructions in this whole document imply the 
RecursivePrefixTreeStrategy
 based field type used in a geospatial context.




I need to set the fieldType to 
RecursivePrefixTreeStrategy
 and of course, I'm getting class not found. I'm using the latest solr 
4.0.0-BETA

I have a field that I would like to import into solr that is a MULTIPOLYGON

For Example:
TUVTuvalu  MULTIPOLYGON (((179.21322733454343 -8.561290924154292, 
179.20240933453334 -8.465417924064994, 179.2183813345482 -8.481890924080346, 
179.2251453345545 -8.492217924089957, 179.23109133456006 -8.50491792410179, 
179.23228133456115 -8.51841792411436, 179.23149133456042 -8.533499924128407, 
179.22831833455746 -8.543426924137648, 179.22236333455191 -8.554145924147633, 
179.21322733454343 -8.561290924154292)), ((177.2902543327525 
-6.114445921875486, 177.28137233274424 -6.109863921871224, 177.27804533274116 
-6.099445921861516, 177.28137233274424 -6.089445921852203, 177.3055273327667 
-6.10597292186759, 177.2958093327577 -6.113890921874969, 177.2902543327525 
-6.114445921875486)), ((176.30636333183617 -6.288335922037433, 
176.29871833182904 -6.285135922034456, 176.29525433182584 -6.274581922024623, 
176.30601833183584 -6.260135922011173, 176.31198133184142 -6.28215492203168, 
176.30636333183617 -6.288335922037433)), ((178.69580033406152 
-7.484163923151129, 178.68885433405507 -7.480835923148035, 178.68878133405497 
-7.467572923135677, 178.7017813340671 -7.475208923142787, 178.69580033406152 
-7.484163923151129)))


Since the LSP was moved into Solr, would there be a different name for the 
class?
(I'm not sure the factory class above can be found yet either)

Any help would be much appreciated!





This communication (including all attachments) is intended solely for the use 
of the person(s) to whom it is addressed and should be treated as a 
confidential AAA communication. If you are not the intended recipient, any use, 
distribution, printing, or copying of this email is strictly prohibited. If you 
received this email in error, please immediately delete it from your system and 
notify the originator. Your cooperation is appreciated.

This communication (including all attachments) is intended solely for
the use of the person(s) to whom it is addressed and should be treated
as a confidential AAA communication.  If you are not the intended
recipient, any use, distribution, printing, or copying of this email is
strictly prohibited.  If you received this email in error, please
immediately delete it from your system and notify the originator.  Your
cooperation is appreciated.



Re: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser

2012-08-25 Thread Lance Norskog
The index directory will include files which list deleted documents.
(I do not remember the suffix.)

If you do not like this behavior, you can add 'expunge deletes' to
your commit requests.

On Sat, Aug 25, 2012 at 10:27 AM, Ryan McKinley  wrote:
> If you optimize the index, are the results the same?
>
> maybe it is showing counts for deleted docs (i think it does... and
> this is expected)
>
> ryan
>
>
> On Sat, Aug 25, 2012 at 9:57 AM, Fuad Efendi  wrote:
>>
>> This is bug in Solr 4.0.0-Beta Schema Browser: "Load Term Info" shows "9682
>> News", but direct query shows 3577.
>>
>> /solr/core0/select?q=channel:News&facet=true&facet.field=channel&rows=0
>>
>> 
>> 
>> 0
>> 1
>> 
>> true
>> channel:News
>> channel
>> 0
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 3577
>> 0
>> 0
>> 0
>> 
>> 
>> 
>> 
>> 
>> 
>>
>>
>> -Original Message-
>> Sent: August-24-12 11:29 PM
>> To: solr-user@lucene.apache.org
>> Cc: sole-...@lucene.apache.org
>> Subject: RE: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser
>> Importance: High
>>
>> Any news?
>> CC: Dev
>>
>>
>> -Original Message-
>> Subject: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser
>>
>> Hi there,
>>
>> "Load term Info" shows 3650 for a specific term "MyTerm", and when I execute
>> query "channel:MyTerm" it shows 650 documents foundŠ possibly bugŠ it
>> happens after I commit data too, nothing changes; and this field is
>> single-valued non-tokenized string.
>>
>> -Fuad
>>
>> --
>> Fuad Efendi
>> 416-993-2060
>> http://www.tokenizer.ca
>>
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>



-- 
Lance Norskog
goks...@gmail.com


Re: How do I represent a group of customer key/value pairs

2012-08-25 Thread Lance Norskog
There are more advanced ways to embed hierarchy in records. This describes them:

http://wiki.apache.org/solr/HierarchicalFaceting

(This is a great page, never noticed it.)

On Fri, Aug 24, 2012 at 8:12 PM, Sheldon P  wrote:
> Thanks for the prompt reply Jack.  Could you point me towards any code
> examples of that technique?
>
>
> On Fri, Aug 24, 2012 at 4:31 PM, Jack Krupansky  
> wrote:
>> The general rule in Solr is simple: denormalize your data.
>>
>> If you have some maps (or tables) and a set of keys (columns) for each map
>> (table), define fields with names like _, such as
>> "map1_name", "map2_name", "map1_field1", "map2_field1". Solr has dynamic
>> fields, so you can define "_*" to have a desired type - if all the
>> keys have the same type.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Sheldon P
>> Sent: Friday, August 24, 2012 3:33 PM
>> To: solr-user@lucene.apache.org
>> Subject: How do I represent a group of customer key/value pairs
>>
>>
>> I've just started to learn Solr and I have a question about modeling data
>> in the schema.xml.
>>
>> I'm using SolrJ to interact with my Solr server.  It's easy for me to store
>> key/value paris where the key is known.  For example, if I have:
>>
>> title="Some book title"
>> author="The authors name"
>>
>>
>> I can represent that data in the schema.xml file like this:
>>
>>> stored="true"/>
>>> stored="true"/>
>>
>> I also have data that is stored as a Java HashMap, where the keys are
>> unknown:
>>
>> Map map = new HashMap();
>> map.put("some unknown key", "some unknown data");
>> map.put("another unknown key", "more unknown data");
>>
>>
>> I would prefer to store that data in Solr without losing its hierarchy.
>> For example:
>>
>> 
>>
>> > stored="true"/>
>>
>> > stored="true"/>
>>
>> 
>>
>>
>> Then I could search for "some unknown key", and receive "some unknown data".
>>
>> Is this possible in Solr?  What is the best way to store this kind of data?



-- 
Lance Norskog
goks...@gmail.com


Re: More debugging DIH - URLDataSource

2012-08-25 Thread Lance Norskog
About XPaths: the XPath engine does a limited range of xpaths. The doc
says that your paths are covered.

About logs: You only have the RegexTransformer listed. You need to add
LogTransformer to the transformer list:
http://wiki.apache.org/solr/DataImportHandler#LogTransformer

Having xml entity codes in the url string seems right. Can you verify
the url that goes to the remote site? Can you read the logs at the
remote site? Can you run this code through a proxy and watch the data?

On Fri, Aug 24, 2012 at 1:34 PM, Carrie Coy  wrote:
> I'm trying to write a DIH to incorporate page view metrics from an XML feed
> into our index.   The DIH makes a single request, and updates 0 documents.
> I set log level to "finest" for the entire dataimport section, but I still
> can't tell what's wrong.  I suspect the XPath.
> http://localhost:8080/solr/core1/admin/dataimport.jsp?handler=/dataimport
> returns 404.  Any suggestions on how I can debug this?
>
>*
>
>  solr-spec
>  4.0.0.2012.08.06.22.50.47
>
>
> The XML data:
>
> 
> 
> 
> 
> 
> PRODUCT: BURLAP POTATO
> SACKS  (PACK OF 12) (W4537)
> 2388
> 
> 
> PRODUCT: OPAQUE PONY
> BEADS 6X9MM  (BAG OF 850) (BE9000)
> 1313
> 
> 
> 
> 
>
> My DIH:
>
> |
>type="URLDataSource"
>  encoding="UTF-8"
>  connectionTimeout="5000"
>  readTimeout="1"/>
>
>  
>  dataSource="coremetrics"
> pk="id"
>
> url="https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=**&username=&format=XML&userAuthKey=&language=en_US∓viewID=9475540&period_a=M20110930";
> processor="XPathEntityProcessor"
> stream="true"
> forEach="/ReportDataResponse/Data/Rows/Row"
> logLevel="fine"
> transformer="RegexTransformer"  >
>
>  xpath="/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_NAME']"
> regex="/^PRODUCT:.*\((.*?)\)$/"  replaceWith="$1"/>
>  xpath="/ReportDataResponse/Data/Rows/Row/Value[@columnId='PAGE_VIEWS']"  />
>
>  
> 
> |
>
> |||This little test perl script correctly extracts the data:|
> ||
> |use XML::XPath;|
> |use XML::XPath::XMLParser;|
> ||
> |my $xp = XML::XPath->new(filename => 'cm.xml');|
> |||my $nodeset = $xp->find('/ReportDataResponse/Data/Rows/Row');|
> |||foreach my $node ($nodeset->get_nodelist) {|
> |||my $page_name = $node->findvalue('Value[@columnId="PAGE_NAME"]');|
> |my $page_views = $node->findvalue('Value[@columnId="PAGE_VIEWS"]');|
> |$page_name =~ s/^PRODUCT:.*\((.*?)\)$/$1/;|
> |}|
>
> From logs:
>
> INFO: Loading DIH Configuration: data-config.xml
> Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter
> loadDataConfig
> INFO: Data Configuration loaded successfully
> Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
> INFO: [ssww] webapp=/solr path=/dataimport params={command=full-import}
> status=0 QTime=2
> Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> INFO: Starting Full Import
> Aug 24, 2012 3:53:10 PM
> org.apache.solr.handler.dataimport.SimplePropertiesWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Aug 24, 2012 3:53:10 PM org.apache.solr.update.DirectUpdateHandler2
> deleteAll
> INFO: [ssww] REMOVING ALL DOCUMENTS FROM INDEX
> Aug 24, 2012 3:53:10 PM org.apache.solr.handler.dataimport.URLDataSource
> getData
> FINE: Accessing URL:
> https://welcome.coremetrics.com/analyticswebapp/api/1.0/report-data/contentcategory/bypage.ftl?clientId=*&username=***&format=XML&userAuthKey=**&language=en_US&viewID=9475540&period_a=M20110930
> Aug 24, 2012 3:53:10 PM org.apache.solr.core.SolrCore execute
> INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
> QTime=0
> Aug 24, 2012 3:53:12 PM org.apache.solr.core.SolrCore execute
> INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
> QTime=1
> Aug 24, 2012 3:53:14 PM org.apache.solr.core.SolrCore execute
> INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
> QTime=1
> Aug 24, 2012 3:53:16 PM org.apache.solr.core.SolrCore execute
> INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
> QTime=0
> Aug 24, 2012 3:53:18 PM org.apache.solr.core.SolrCore execute
> INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
> QTime=0
> Aug 24, 2012 3:53:20 PM org.apache.solr.core.SolrCore execute
> INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
> QTime=0
> Aug 24, 2012 3:53:22 PM org.apache.solr.core.SolrCore execute
> INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
> QTime=0
> Aug 24, 2012 3:53:24 PM org.apache.solr.core.SolrCore execute
> INFO: [ssww] webapp=/solr path=/dataimport params={command=status} status=0
> QTime=0
> Aug 24, 2012 3:53:27 PM org.apache.solr.core.SolrCore execute
> INFO: [ssww] webapp=/sol

Re: Solr - Index Concurrency - Is it possible to have multiple threads write to same index?

2012-08-25 Thread Lance Norskog
A few other things:
Support: many of the Solr committers do not like the Embedded server.
It does not get much attention, so if you find problems with it you
may have to fix them and get someone to review and commit the fixes.
I'm not saying they sabotage it, there just is not much interest in
making it first-class.

Replication: you can replicate from the Embedded server with the old
rsync-based replicator. The Java Replication tool requires servlets.
If you are Unix-savvy, the rsync tool is fine.

Indexing speed:
1) You can use shards to split the index into pieces. This divides the
indexing work among the shards.
2) Do not store the giant data. A lot of sites instead archive the
datafile and index a link to the file. Giant stored fields cause
indexing speed to drop dramatically because stored data is not saved
just once: it is copied repeatedly during merging as new documents are
added. Index data is also copied around, but this tends to increase
sub-linearly since documents share terms.
3) Do not store positions and offsets. These allow you to do phrase
queries because they store the position of each word. They take a lot
of memory, and have to be copied around during merging.

On Thu, Aug 23, 2012 at 1:31 AM, Mikhail Khludnev
 wrote:
> I know the following drawbacks of EmbServer:
>
>- org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams()
>which is called on handling update request, provides a lot of garbage in
>memory and bloat it by expensive XML.
>- 
> org.apache.solr.response.BinaryResponseWriter.getParsedResponse(SolrQueryRequest,
>SolrQueryResponse) does something like this on response side - it just
>bloat your heap
>
> for me your task is covered by Multiple Cores. Anyway if you are ok with
> EmbeddedServer let it be. Just be aware of stream updates feature
> http://wiki.apache.org/solr/ContentStream
>
> my average indexing speed estimate is for fairly small docs less than 1K
> (which are always used for micro-benchmarking).
>
> Much analysis is the key argument for invoking updates in multiple threads.
> What's your CPU stat during indexing?
>
>
>
>
> On Thu, Aug 23, 2012 at 7:52 AM, ksu wildcats wrote:
>
>> Thanks for the reply Mikhail.
>>
>> For our needs the speed is more important than flexibility and we have huge
>> text files (ex: blogs / articles ~2 MB size) that needs to be read from our
>> filesystem and then store into the index.
>>
>> We have our app creating separate core per client (dynamically) and there
>> is
>> one instance of EmbeddedSolrServer for each core thats used for adding
>> documents to the index.
>> Each document has about 10 fields and one of the field has ~2MB data stored
>> (stored = true, analyzed=true).
>> Also we have logic built into our webapp to dynamically create the solr
>> config files
>> (solrConfig & schema per core - filters/analyzers/handler values can be
>> different for each core)
>> for each core before creating an instance of EmbeddedSolrServer for that
>> core.
>> Another reason to go with EmbeddedSolrServer is to reduce overhead of
>> transporting large data (~2 MB) over http/xml.
>>
>> We use this setup for building our master index which then gets replicated
>> to slave servers
>> using replication scripts provided by solr.
>> We also have solr admin ui integrated into our webapp (using admin jsp &
>> handlers from solradmin ui)
>>
>> We have been using this MultiCore setup for more than a year now and so far
>> we havent run into any issues with EmbeddedSolrServer integrated into our
>> webapp.
>> However I am now trying to figure out the impact if we allow multiple
>> threads sending request to EmbeddedSolrServer (same core) for adding docs
>> to
>> index simultaneously.
>>
>> Our understanding was that EmbeddedSolrServer would give us better
>> performance over http solr for our needs.
>> Its quite possible that we might be wrong and http solr would have given us
>> similar/better performance.
>>
>> Also based on documentation from SolrWiki I am assuming that
>> EmbeddedSolrServer API is same as the one used by Http Solr.
>>
>> Said that, can you please tell if there is any specific downside to using
>> EmbeddedSolrServer that could cause issues for us down the line.
>>
>> I am also interested in your below comment about indexing 1 million docs in
>> few mins. Ideally we would like to get to that speed
>> I am assuming this depends on the size of the doc and type of
>> analyzer/tokenizer/filters being used. Correct?
>> Can you please share (or point me to documentation) on how to get this
>> speed
>> for 1 mil docs.
>> >>  - one million is a fairly small amount, in average it should be indexed
>> >> in few mins. I doubt that you really need to distribute indexing
>>
>> Thanks
>> -K
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-Index-Concurrency-Is-it-possible-to-have-multiple-threads-write-to-same-index-tp4002544p4002776.html
>> Sent from the Solr - User m

Re: How do I represent a group of customer key/value pairs

2012-08-25 Thread Sheldon P
Thanks Lance.  It looks like it's worth investigating.  I've already
started down the path of using a bean with "@Field(map_*)" on my
HashMap setter.  This defect tipped me off on this functionality:
https://issues.apache.org/jira/browse/SOLR-1357
This technique provides me with a mechanism to store the HashMap data,
but flattens the structure.  I'll play with the ideas provided on
"http://wiki.apache.org/solr/HierarchicalFaceting";.  If anyone has
some sample code (java + schema.xml) they can point me too that does
"Hierarchical Faceting" I would very much appreciate it.


On Sat, Aug 25, 2012 at 6:42 PM, Lance Norskog  wrote:
> There are more advanced ways to embed hierarchy in records. This describes 
> them:
>
> http://wiki.apache.org/solr/HierarchicalFaceting
>
> (This is a great page, never noticed it.)
>
> On Fri, Aug 24, 2012 at 8:12 PM, Sheldon P  wrote:
>> Thanks for the prompt reply Jack.  Could you point me towards any code
>> examples of that technique?
>>
>>
>> On Fri, Aug 24, 2012 at 4:31 PM, Jack Krupansky  
>> wrote:
>>> The general rule in Solr is simple: denormalize your data.
>>>
>>> If you have some maps (or tables) and a set of keys (columns) for each map
>>> (table), define fields with names like _, such as
>>> "map1_name", "map2_name", "map1_field1", "map2_field1". Solr has dynamic
>>> fields, so you can define "_*" to have a desired type - if all the
>>> keys have the same type.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: Sheldon P
>>> Sent: Friday, August 24, 2012 3:33 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: How do I represent a group of customer key/value pairs
>>>
>>>
>>> I've just started to learn Solr and I have a question about modeling data
>>> in the schema.xml.
>>>
>>> I'm using SolrJ to interact with my Solr server.  It's easy for me to store
>>> key/value paris where the key is known.  For example, if I have:
>>>
>>> title="Some book title"
>>> author="The authors name"
>>>
>>>
>>> I can represent that data in the schema.xml file like this:
>>>
>>>>> stored="true"/>
>>>>> stored="true"/>
>>>
>>> I also have data that is stored as a Java HashMap, where the keys are
>>> unknown:
>>>
>>> Map map = new HashMap();
>>> map.put("some unknown key", "some unknown data");
>>> map.put("another unknown key", "more unknown data");
>>>
>>>
>>> I would prefer to store that data in Solr without losing its hierarchy.
>>> For example:
>>>
>>> 
>>>
>>> >> stored="true"/>
>>>
>>> >> stored="true"/>
>>>
>>> 
>>>
>>>
>>> Then I could search for "some unknown key", and receive "some unknown data".
>>>
>>> Is this possible in Solr?  What is the best way to store this kind of data?
>
>
>
> --
> Lance Norskog
> goks...@gmail.com


RE: Can't extract Outlook message files

2012-08-25 Thread Alexander Cougarman
This is an issue with "extractOnly=true" on Solr 3.6.1. We upgraded to 4.0 Beta 
2 and the problem went away. Just in case anyone runs into this.

Sincerely,
Alex 


-Original Message-
From: Alexander Cougarman [mailto:acoug...@bwc.org] 
Sent: 23 August 2012 12:27 PM
To: solr-user@lucene.apache.org
Subject: Can't extract Outlook message files

Hi. We're trying to use the following Curl command to perform an "extract only" 
of *.MSG file, but it blows up:

   curl "http://localhost:8983/solr/update/extract?extractOnly=true"; -F 
"myfile=@92.msg"

If we do this, it works fine:

  curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; 
-F "myfile=@92.msg"

We've tried a variety of MSG files and they all produce the same error; they 
all have content in them. What are we doing wrong?

Here's the exception the extractOnly=true command generates:



 
Error 500 null

org.apache.solr.common.SolrException
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:233)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:58)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:244)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:260)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
99)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
66)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
52)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
2)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
n.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
ava:582)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException 
from org.apache.tika.parser.microsoft.OfficeParser@aaf063
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:227)
... 23 more
Caused by: java.lang.IllegalStateException: Internal: Internal error: element 
st ate is zero.
at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
wn Source)
at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
ler.java:256)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
java:273)
at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
er.java:213)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
:178)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
... 26 more


HTTP ERROR 500
Problem accessing /solr/update/extract. Reason:
null

org.apache.solr.common.SolrException
at org.apache.solr.handler.extraction.ExtractingDoc