Re: indexing issue

2012-09-23 Thread Erick Erickson
That's exactly how I would expect WordDelimiterFilterFactory to
split up that input.

You really need to look at the analysis chain to understand what
happens here, simply saying the field "text" isn't enough. What I'm
looking for is the "..." definition.

In solr 3.6, for example, there's no  wrote:
> Thank you very much guys for your help.
> @Erick
> FieldType is Text and from anylsis following is the result.
> 
>
> From image, you can see its not tokenizing every possible segment of
> '8E0061123-8E1' but just some of them.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/indexing-issue-tp4009122p4009372.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with new Join Functionallity in Solr 4.0

2012-09-23 Thread Erick Erickson
The very first thing to try is flatten your data so you don't have to use joins.
I know that goes against your database instincts, but Solr easily handles
millions and millions of documents. So if the cross-product of docs and modules
isn't prohibitive, that's what I'd do first. Then it's just a matter of
forming a search without joins

Joins run into performance issues when the join field has many unique
values, unfortunately the field people often want to join on is something
like a  (or PK in RDBMS terms), so be aware of that.

Best
Erick

On Fri, Sep 21, 2012 at 5:46 AM,   wrote:
> Dear Solr community,
>
> I am rather new to Solr, however I already find it kind of attractive. We are 
> developing a research application, which contains a Solr index with three 
> different kinds of documents, here the basic idea:
>
>
> -  A document of type "doc" consisting of fields id, docid, doctitle 
> and some other metadata
>
> -  A document of type "module" consisting of fields id, modid and text
>
> -  A document of type "docmodule" consisting of fields id, docrefid, 
> modrefid and some metadata about the relation between a document and a 
> module; filed docrefid refers to the id of a "doc" document, while field 
> modrefid contains the id of a "module" document
>
> In other words, in our model there are documents (type "doc") consisting of 
> several modules and there is some characterization of each link between a 
> document and a module.
>
> Almost all fields of a "doc" document are searchable, as well as the text of 
> a module and the metadata of the "docmodule" entries.
>
> We are looking for a fast way to retrieve all modules containing a certain 
> text and associated with a given document, preferably with a single query. 
> This means we want to query the text from a "module" document while we set a 
> restriction on the docrefid from a "docmodule" or the id from a "doc" 
> document. Is this possible by means of the new pseudo joins? Any ideas are 
> highly appreciated!
>
> Thanks in advance!
>
> Milen Tilev
> Master of Science
> Softwareentwickler
> Business Unit Information
> 
>
> MATERNA GmbH
> Information & Communications
>
> Voßkuhle 37
> 44141 Dortmund
> Deutschland
>
> Telefon: +49 231 5599-8257
> Fax: +49 231 5599-98257
> E-Mail: milen.ti...@materna.de
>
> www.materna.de | 
> Newsletter | 
> Twitter | 
> XING | 
> Facebook
> 
>
> Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
> Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
> Amtsgericht Dortmund HRB 5839
>


Re: Solr Swap Function doesn't work when using Solr Cloud Beta

2012-09-23 Thread Mark Miller
FYI swap is def not supported in SolrCloud right now - even though it may work, 
it's not been thought about and there are no tests.

If you would like to see support, I'd add a JIRA issue along with any pertinent 
info from this thread about what the behavior needs to be changed to.

- Mark

On Sep 21, 2012, at 6:49 PM, sam fang  wrote:

> Hi Chris,
> 
> Thanks for your help. Today I tried again and try to figure out the reason.
> 
> 1. set up an external zookeeper server.
> 
> 2. change /opt/solr/apache-solr-4.0.0-BETA/example/solr/solr.xml persistent
> to true. and run below command to upload config to zk. (renamed multicore
> to solr, and need to put zkcli.sh related jar package.)
> /opt/solr/apache-solr-4.0.0-BETA/example/cloud-scripts/zkcli.sh -cmd
> upconfig -confdir /opt/solr/apache-solr-4.0.0-BETA/example/solr/core0/conf/
> -confname
> core0 -z localhost:2181
> /opt/solr/apache-solr-4.0.0-BETA/example/cloud-scripts/zkcli.sh -cmd
> upconfig -confdir /opt/solr/apache-solr-4.0.0-BETA/example/solr/core1/conf/
> -confname
> core1 -z localhost:2181
> 
> 3. Start jetty server
> cd /opt/solr/apache-solr-4.0.0-BETA/example
> java -DzkHost=localhost:2181 -jar start.jar
> 
> 4. publish message to core0
> /opt/solr/apache-solr-4.0.0-BETA/example/solr/exampledocs
> cp ../../exampledocs/post.jar ./
> java -Durl=http://localhost:8983/solr/core0/update -jar post.jar
> ipod_video.xml
> 
> 5. query to core0 and core1 is ok.
> 
> 6. Click "swap" in the admin page, the query to core0 and core1 is
> changing. Previous I saw sometimes returns 0 result. sometimes return 1
> result. Today
> seems core0 still return 1 result, core1 return 0 result.
> 
> 7. Then click "reload" in the admin page, the query to core0 and core1.
> Sometimes return 1 result, and sometimes return nothing. Also can see the zk
> configuration also changed.
> 
> 8. Restart jetty server. If do the query, it's same as what I saw in step 7.
> 
> 9. Stop jetty server, then log into zkCli.sh, then run command "set
> /clusterstate.json {}". then start jetty again. everything back to normal,
> that is what previous swap did in solr 3.6 or solr 4.0 w/o cloud.
> 
> 
> From my observation, after swap, seems it put shard information into
> actualShards, when user request to search, it will use all shard
> information to do the
> search. But user can't see zk update until click "reload" button in admin
> page. When restart web server, this shard information eventually went to
> zk, and
> the search go to all shards.
> 
> I found there is a option "distrib", and used url like "
> http://host1:18000/solr/core0/select?distrib=false&q=*%3A*&wt=xml";, then
> only get the data on the
> core0. Digged in the code (handleRequestBody method in SearchHandler class,
> seems it make sense)
> 
> I tried to stop tomcat server, then use command "set /clusterstate.json {}"
> to clean all cluster state, then use command "cloud-scripts/zkcli.sh -cmd
> upconfig" to upload config to zk server, and start tomcat server. It
> rebuild the right shard information in zk. then search function back to
> normal like what
> we saw in 3.6 or 4.0 w/o cloud.
> 
> Seems solr always add shard information into zk.
> 
> I tested cloud swap on single machine, if each core have one shard in the
> zk, after swap, eventually zk has 2 slices(shards) for that core because
> now only
> do the add. so the search will go to both 2 shards.
> 
> and tested cloud swap with 2 machine which each core have 1 shard and 2
> slices. Below the configuration in the zk. After swap, eventually zk has 4
> for that
> core. and search will mess up.
> 
>  "core0":{"shard1":{
>  "host1:18000_solr_core0":{
>"shard":"shard1",
>"roles":null,
>"leader":"true",
>"state":"active",
>"core":"core0",
>"collection":"core0",
>"node_name":"host1:18000_solr",
>"base_url":"http://host1:18000/solr"},
>  "host2:18000_solr_core0":{
>"shard":"shard1",
>"roles":null,
>"state":"active",
>"core":"core0",
>"collection":"core0",
>"node_name":"host2:18000_solr",
>"base_url":"http://host2:18000/solr"}}},
> 
> For previous 2 cases, if I stoped tomcat/jetty server, then manullay upload
> configuration to zk, then start tomcat server, zk and search become normal.
> 
> On Fri, Sep 21, 2012 at 3:34 PM, Chris Hostetter
> wrote:
> 
>> 
>> : Below is my solr.xml configuration, and already set persistent to true.
>>...
>> : Then publish 1 record to test1, and query. it's ok now.
>> 
>> Ok, first off -- please provide more details on how exactly you are
>> running Solr.  Your initial email said...
>> 
> In Solr 3.6, core swap function works good. After switch to use Solr
>> 4.0
> Beta, and found it doesn't work well.
>> 
>> ...but based on your solr.xml file and your logs, it appears you are now
>> trying to use some of the ZooKeeper/SolrCloud features that didn't even
>> exist in Solr 3.6, so it's kind of an apples 

AW: Help with new Join Functionallity in Solr 4.0

2012-09-23 Thread Milen.Tilev
Hello Erick,

Thanks a lot for your reply! Your suggestion is actually exactly the 
alternative solution we are thinking about and with your clarification on 
Solr's performance we are going to go for it! Many thanks again!

Milen


Von: Erick Erickson [erickerick...@gmail.com]
Gesendet: Sonntag, 23. September 2012 17:50
An: solr-user@lucene.apache.org
Betreff: Re: Help with new Join Functionallity in Solr 4.0

The very first thing to try is flatten your data so you don't have to use joins.
I know that goes against your database instincts, but Solr easily handles
millions and millions of documents. So if the cross-product of docs and modules
isn't prohibitive, that's what I'd do first. Then it's just a matter of
forming a search without joins

Joins run into performance issues when the join field has many unique
values, unfortunately the field people often want to join on is something
like a  (or PK in RDBMS terms), so be aware of that.

Best
Erick

On Fri, Sep 21, 2012 at 5:46 AM,   wrote:
> Dear Solr community,
>
> I am rather new to Solr, however I already find it kind of attractive. We are 
> developing a research application, which contains a Solr index with three 
> different kinds of documents, here the basic idea:
>
>
> -  A document of type "doc" consisting of fields id, docid, doctitle 
> and some other metadata
>
> -  A document of type "module" consisting of fields id, modid and text
>
> -  A document of type "docmodule" consisting of fields id, docrefid, 
> modrefid and some metadata about the relation between a document and a 
> module; filed docrefid refers to the id of a "doc" document, while field 
> modrefid contains the id of a "module" document
>
> In other words, in our model there are documents (type "doc") consisting of 
> several modules and there is some characterization of each link between a 
> document and a module.
>
> Almost all fields of a "doc" document are searchable, as well as the text of 
> a module and the metadata of the "docmodule" entries.
>
> We are looking for a fast way to retrieve all modules containing a certain 
> text and associated with a given document, preferably with a single query. 
> This means we want to query the text from a "module" document while we set a 
> restriction on the docrefid from a "docmodule" or the id from a "doc" 
> document. Is this possible by means of the new pseudo joins? Any ideas are 
> highly appreciated!
>
> Thanks in advance!
>
> Milen Tilev
> Master of Science
> Softwareentwickler
> Business Unit Information
> 
>
> MATERNA GmbH
> Information & Communications
>
> Voßkuhle 37
> 44141 Dortmund
> Deutschland
>
> Telefon: +49 231 5599-8257
> Fax: +49 231 5599-98257
> E-Mail: milen.ti...@materna.de
>
> www.materna.de | 
> Newsletter | 
> Twitter | 
> XING | 
> Facebook
> 
>
> Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
> Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
> Amtsgericht Dortmund HRB 5839
>


Return only matched multiValued field

2012-09-23 Thread Dotan Cohen
Assuming a multivalued, stored and indexed field with name "comment".
When performing a search, I would like to return only the values of
"comment" which contain the match. For example:

When searching for "gold" instead of getting this result:



Theres a lady whos sure
all that glitters is gold
and shes buying a stairway to heaven



I would prefer to get this result:



all that glitters is gold



(psuedo-XML from memory, may not be accurate but illustrates the point)

Is there any way to do this with a Solr 4 index? The client accessing
Solr is on a dial-up connection (no provision for DSL or other high
speed internet) so I'd like to move as little data over the wire as
possible. In reality, the array will have tens of fields so returning
only the relevant fields may reduce the data transferred by an order
of magnitude.

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: what happends with slave during repliacation?

2012-09-23 Thread Bernd Fehling
Hi Amanda,
we don't use solr cloud jet, just 3 dedicated server.
When it comes to distribution the choice will be either solr cloud or elastic 
search.
But currently we use unix shell scripts with ssh for switching.
Easy, simple, stable :-)

Regards,
Bernd


Am 21.09.2012 16:03, schrieb yangqian_nj:
> Hi Bernd,
> 
> You mentioned: "Only one slave is online the other is for backup. The backup
> gets replicated first.
> After that the servers will be switched and the online becomes backup. "
> 
> Do you please let us know how to do you do the Switch? We use SWAP to switch
> in solr cloud. After SWAP, when we query, from the tomcat log, we could see
> the query actually go to both cores for some reason.
> 
> Thanks,
> Amanda
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/what-happends-with-slave-during-repliacation-tp4009100p4009417.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>