Ruby Forum Ruby > Cut a string if length > n

Posted by Pål Bergström (palb)
on 19.08.2008 17:28
What's the best way to cut a string if the length is above n characters?
Is it slice, or is there any other convenient method? Trying to
understand the string class from the docs but not sure which one to use.
Posted by Ara Howard (ahoward)
on 19.08.2008 17:50
(Received via mailing list)
On Aug 19, 2008, at 9:25 AM, Pål Bergström wrote:

> What's the best way to cut a string if the length is above n  
> characters?
> Is it slice, or is there any other convenient method? Trying to
> understand the string class from the docs but not sure which one to  
> use.


this_always_works = string[ 0 .. n ]

regardless of whether the string is <, =, or > than n

a @ http://codeforpeople.com/
Posted by Robert Klemme (Guest)
on 19.08.2008 18:16
(Received via mailing list)
On 19.08.2008 17:45, ara.t.howard wrote:
> 
> regardless of whether the string is <, =, or > than n

I believe slice! also always works:

irb(main):011:0> s="abcdefghijklmnop"
=> "abcdefghijklmnop"
irb(main):012:0> s.length
=> 16
irb(main):013:0> s.slice! 50..-1
=> nil
irb(main):014:0> s
=> "abcdefghijklmnop"
irb(main):015:0> s.slice! 5..-1
=> "fghijklmnop"
irb(main):016:0> s
=> "abcde"

It looks uglier but it modifies the string in place which might come in
handy at times.

Kind regards

  robert
Posted by Pål Bergström (palb)
on 19.08.2008 19:05
Ara Howard wrote:
> On Aug 19, 2008, at 9:25 AM, P�l Bergstr�m wrote:
> 
>> What's the best way to cut a string if the length is above n  
>> characters?
>> Is it slice, or is there any other convenient method? Trying to
>> understand the string class from the docs but not sure which one to  
>> use.
> 
> 
> this_always_works = string[ 0 .. n ]
> 
> regardless of whether the string is <, =, or > than n
> 
> a @ http://codeforpeople.com/

Found a problem with this. It messes up swedish letters that, I guess, 
is cut right of as if they where handled as latin1. What method can 
handle utf-8?
Posted by Pål Bergström (palb)
on 19.08.2008 19:46
Pål Bergström wrote:

Don't say I've run into a classic encoding issue. :-(

Does Ruby have a problem with utf8 support?
Posted by Rob Biedenharn (Guest)
on 19.08.2008 21:41
(Received via mailing list)
On Aug 19, 2008, at 1:42 PM, Pål Bergström wrote:

> Pål Bergström wrote:
>
> Don't say I've run into a classic encoding issue. :-(
>
> Does Ruby have a problem with utf8 support?
> -- 
> Posted via http://www.ruby-forum.com/.
>


I did this for a project (way back in Rails 1.1.6)

$KCODE = 'UTF8'
require 'jcode'

class String
   def first(limit = 1)
     self.match(%r{^(.{0,#{limit}})})[1]
   end
end

You'd still have to assign this, but:

irb> name = "P\303\245l Bergstr\303\266m"
=> "P\303\245l Bergstr\303\266m"
irb> puts name
Pål Bergström=> nil
irb> puts name.first(2)
På 
=> nil
irb> puts name.first(3)
Pål=> nil
irb> puts name.first(12)
Pål Bergströ 
=> nil
irb> puts name[0...12]
Pål Bergstr
=> nil
irb> puts name[0...2]
P?
=> nil
irb> p name[0...2]
"P\303"
=> nil

I hope that helps you.

-Rob

Rob Biedenharn    http://agileconsultingllc.com
Rob@AgileConsultingLLC.com
Posted by Pål Bergström (palb)
on 19.08.2008 22:11
This seems to do the trick. Will it always work?

lastspace = message.message[0..70].rindex(" ")
puts message.message[0..lastspace.to_i]
Posted by Mark Thomas (markthomas)
on 19.08.2008 22:20
(Received via mailing list)
On Aug 19, 11:25 am, Pål Bergström <p...@palbergstrom.com> wrote:
> What's the best way to cut a string if the length is above n characters?
> Is it slice, or is there any other convenient method? Trying to
> understand the string class from the docs but not sure which one to use.

Rails has a utf8-compatible helper called truncate, called like so:

truncate(text, length = 30, truncate_string = "...")
If text is longer than length, text will be truncated to the length of
length (defaults to 30) and the last characters will be replaced with
the truncate_string (defaults to "...").

And the implementation is:

  def truncate(text, length = 30, truncate_string = "...")
    if text
      l = length - truncate_string.chars.length
      chars = text.chars
      (chars.length > length ? chars[0...l] + truncate_string :
text).to_s
    end
  end

Where chars is a string method in Ruby 1.8.7 or greater.
Posted by Pål Bergström (palb)
on 19.08.2008 22:25
Mark Thomas wrote:

> Rails has a utf8-compatible helper called truncate, called like so:
> 
> truncate(text, length = 30, truncate_string = "...")
> If text is longer than length, text will be truncated to the length of
> length (defaults to 30) and the last characters will be replaced with
> the truncate_string (defaults to "...").
> 
> And the implementation is:
> 
>   def truncate(text, length = 30, truncate_string = "...")
>     if text
>       l = length - truncate_string.chars.length
>       chars = text.chars
>       (chars.length > length ? chars[0...l] + truncate_string :
> text).to_s
>     end
>   end
> 
> Where chars is a string method in Ruby 1.8.7 or greater.

Perfect :-)
Posted by Rob Biedenharn (Guest)
on 19.08.2008 22:37
(Received via mailing list)
On Aug 19, 2008, at 4:08 PM, Pål Bergström wrote:

> This seems to do the trick. Will it always work?
>
> lastspace = message.message[0..70].rindex(" ")
> puts message.message[0..lastspace.to_i]


You likely want an exclusive range unless you want the space at the end.

  message.message[0...lastspace.to_i]

What if there's no space?  Then perhaps:

  message.message[0..(lastspace || -1).to_i]

Of course, that puts you back in the position of getting the full
string rather than some truncated version with no more than your
desired number of characters.

-Rob

Rob Biedenharn    http://agileconsultingllc.com
Rob@AgileConsultingLLC.com
Posted by Robert Klemme (Guest)
on 20.08.2008 07:25
(Received via mailing list)
On 19.08.2008 22:33, Rob Biedenharn wrote:
>   message.message[0...lastspace.to_i]
> 
> What if there's no space?  Then perhaps:
> 
>   message.message[0..(lastspace || -1).to_i]
> 
> Of course, that puts you back in the position of getting the full  
> string rather than some truncated version with no more than your  
> desired number of characters.

Pal, it seems we haven't seen the complete specification of what you
want to do.  It seems that not only length is a condition but also
positions of spaces.

I get the feeling that a solution using regular expressions might be
more efficient and also easier in your case.  Maybe any of

str.sub! %r{\A(.{50}).*\z}, '\\1'
str.sub! %r{\A(.{1,50})(?: .*)?\z}, '\\1'
str.sub! %r{\s\S*\z}, ''

Depends on your string contents of course.

Kind regards

  robert
Posted by Pål Bergström (palb)
on 20.08.2008 07:38
Robert Klemme wrote:

> Pal, it seems we haven't seen the complete specification of what you
> want to do.  It seems that not only length is a condition but also
> positions of spaces.
> 
> I get the feeling that a solution using regular expressions might be
> more efficient and also easier in your case.  Maybe any of
> 
> str.sub! %r{\A(.{50}).*\z}, '\\1'
> str.sub! %r{\A(.{1,50})(?: .*)?\z}, '\\1'
> str.sub! %r{\s\S*\z}, ''
> 
> Depends on your string contents of course.
> 
> Kind regards
> 
>   robert

Hi. My app shows data in a list from a table called messages but only a 
fragment of it if it's to long.

My solution seems to work fine. What do you think of it? Will it always 
work?

lastspace = message.message[0..70].rindex(" ")
puts message.message[0..lastspace.to_i]

It doesn't do anything for non existing messages and text that isn't 
that long as 70. And it finds the last space. We might have long words 
in Swedish, but not that long :-) There will always be a space somewhere 
in the end where it can cut it without cutting and leave strange 
question marks as it cuts a swedish character (as Latin1 I guess).
Posted by Robert Klemme (Guest)
on 20.08.2008 09:40
(Received via mailing list)
2008/8/20 Pål Bergström <pal@palbergstrom.com>:
> work?
>
> lastspace = message.message[0..70].rindex(" ")
> puts message.message[0..lastspace.to_i]
>
> It doesn't do anything for non existing messages and text that isn't
> that long as 70. And it finds the last space. We might have long words
> in Swedish, but not that long :-) There will always be a space somewhere
> in the end where it can cut it without cutting and leave strange
> question marks as it cuts a swedish character (as Latin1 I guess).

As has been mentioned you probably should use the ... range form
because otherwise you'll have a space at the end.  But this does not
work in light of multiple spaces either.  I'd probably choose a
solution with regular expressions:

09:36:10 RKlemme$ /c/Temp/truncate.rb
["aaaaa", 5, "a", "", "aaaaa", "aaaaa", "aaaaa"]
["aaaaa bbbbb", 11, "aaaaa ", "aaaaa", "aaaaa bbbbb", "aaaaa bbbbb", 
"aaaaa"]
["aaaaa bbbbb ccccc", 17, "aaaaa bbbbb ", "aaaaa bbbbb", "aaaaa bbbbb
", "aaaaa bbbbb", "aaaaa bbbbb"]
["aaaaa bbbbb ccccc ddddd", 23, "aaaaa bbbbb ", "aaaaa bbbbb", "aaaaa
bbbbb ", "aaaaa bbbbb", "aaaaa bbbbb"]
["aaaaa", 5, "a", "", "aaaaa", "aaaaa", "aaaaa"]
["aaaaa  bbbbb", 12, "aaaaa  ", "aaaaa ", "aaaaa  bbbbb", "aaaaa
bbbbb", "aaaaa "]
["aaaaa  bbbbb  ccccc", 19, "aaaaa  ", "aaaaa ", "aaaaa  ", "aaaaa
bbbbb", "aaaaa  bbbbb"]
["aaaaa  bbbbb  ccccc  ddddd", 26, "aaaaa  ", "aaaaa ", "aaaaa  ",
"aaaaa  bbbbb", "aaaaa  bbbbb"]

Here's the code that you can use to play around with various solutions:

09:37:02 RKlemme$ cat /c/Temp/truncate.rb
#!/bin/env ruby

WD = 5
LIM = 12

strings = (0..3).map do |i|
  (0..i).map do |ii|
    (?a + ii).chr * WD
  end.join " "
end

strings.concat strings.map {|s| s.gsub /\s/, '  '}

strings.each do |str|
  p [
    str,
    str.length,
    str[0..str[0...LIM].rindex(' ').to_i],
    str[0...str[0...LIM].rindex(' ').to_i],
    str[%r{\A.{0,#{LIM}}\b(?!\s)}],
    str[%r{\A.{0,#{LIM - 1}}\S}], # <- my choice
    str[%r{\A(.{0,#{LIM}})\s}, 1] || str,
  ]
end

Kind regards

robert
Posted by Mark Thomas (markthomas)
on 20.08.2008 16:00
(Received via mailing list)
How about:

s[0..70].gsub(/\s?\S*$/,'')
Posted by Pål Bergström (palb)
on 20.08.2008 16:11
Mark Thomas wrote:
> How about:
> 
> s[0..70].gsub(/\s?\S*$/,'')

I'll try it.